reduced the size of index tables a bit, and some optional features.

This also marks the beginning of the 0.3 development branch. The documentation won't be synchronized from now on to 0.3.0. - `no-optimized-legacy-encoding` Cargo feature can be used to cut most of backward mapping tables at the expense of encoding performance (1/3 size vs. 5--20x slower). - Very, very slightly `rustfmt`ed the library code. `Makefile` has been changed as a result.
link2xt · Dec 27, 2015 · 726d6da · 726d6da
1 parent 1c0905d
commit 726d6da
Show file tree

Hide file tree

Showing 46 changed files with 18,972 additions and 21,240 deletions.
diff --git a/AUTHORS.txt b/AUTHORS.txt
@@ -28,6 +28,7 @@ Peter Atashian <[email protected]>
 Pierre Baillet <[email protected]>
 Robert Straw <[email protected]>
 Simon Sapin <[email protected]>
+Simonas Kazlauskas <[email protected]>
 Son <[email protected]>
 Steve Klabnik <[email protected]>
 klutzy <[email protected]>

diff --git a/Cargo.toml b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "encoding"
-version = "0.2.32"
+version = "0.3.0-dev"
 authors = ["Kang Seonghoon <[email protected]>"]
 
 description = "Character encoding support for Rust"
@@ -14,6 +14,15 @@ license = "MIT"
 [lib]
 name = "encoding"
 
+[features]
+no-optimized-legacy-encoding = [
+	"encoding-index-singlebyte/no-optimized-legacy-encoding",
+	"encoding-index-korean/no-optimized-legacy-encoding",
+	"encoding-index-japanese/no-optimized-legacy-encoding",
+	"encoding-index-simpchinese/no-optimized-legacy-encoding",
+	"encoding-index-tradchinese/no-optimized-legacy-encoding",
+]
+
 [dependencies.encoding-types]
 version = "0.2"
 path = "src/types"
@@ -27,23 +36,23 @@ path = "src/types"
 # so we should use tilde requirements here.
 
 [dependencies.encoding-index-singlebyte]
-version = "~1.20141219.5"
+version = "~1.20141219.6"
 path = "src/index/singlebyte"
 
 [dependencies.encoding-index-korean]
-version = "~1.20141219.5"
+version = "~1.20141219.6"
 path = "src/index/korean"
 
 [dependencies.encoding-index-japanese]
-version = "~1.20141219.5"
+version = "~1.20141219.6"
 path = "src/index/japanese"
 
 [dependencies.encoding-index-simpchinese]
-version = "~1.20141219.5"
+version = "~1.20141219.6"
 path = "src/index/simpchinese"
 
 [dependencies.encoding-index-tradchinese]
-version = "~1.20141219.5"
+version = "~1.20141219.6"
 path = "src/index/tradchinese"
 
 [dev-dependencies]

diff --git a/Makefile b/Makefile
@@ -24,17 +24,16 @@ readme: README.md
 
 README.md: src/lib.rs
 	# really, really sorry for this mess.
-	awk '/^# Encoding /{print "[Encoding][doc]",$$3}' $< > $@
-	awk '/^# Encoding /{print "[Encoding][doc]",$$3}' $< | sed 's/./=/g' >> $@
+	awk '/^\/\/! # Encoding /{print "[Encoding][doc]",$$4}' $< > $@
+	awk '/^\/\/! # Encoding /{print "[Encoding][doc]",$$4}' $< | sed 's/./=/g' >> $@
 	echo >> $@
 	echo '[![Encoding on Travis CI][travis-image]][travis]' >> $@
 	echo >> $@
 	echo '[travis-image]: https://travis-ci.org/lifthrasiir/rust-encoding.png' >> $@
 	echo '[travis]: https://travis-ci.org/lifthrasiir/rust-encoding' >> $@
-	awk '/^# Encoding /,/^## /' $< | tail -n +2 | head -n -2 >> $@
-	echo >> $@
-	echo '[Complete Documentation][doc]' >> $@
+	awk '/^\/\/! # Encoding /,/^\/\/! ## /' $< | cut -b 5- | grep -v '^#' >> $@
+	echo '[Complete Documentation][doc] (stable)' >> $@
 	echo >> $@
 	echo '[doc]: https://lifthrasiir.github.io/rust-encoding/' >> $@
 	echo >> $@
-	awk '/^## /,/^\*\/$$/' $< | grep -v '^# ' | head -n -2 >> $@
+	awk '/^\/\/! ## /,!/^\/\/!/' $< | cut -b 5- >> $@
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
-[Encoding][doc] 0.2.32
-======================
+[Encoding][doc] 0.3.0-dev
+=========================
 
 [![Encoding on Travis CI][travis-image]][travis]
 
@@ -10,7 +10,10 @@ Character encoding support for Rust. (also known as `rust-encoding`)
 It is based on [WHATWG Encoding Standard](http://encoding.spec.whatwg.org/),
 and also provides an advanced interface for error detection and recovery.
 
-[Complete Documentation][doc]
+*This documentation is for the development version (0.3).
+Please see the [stable documentation][doc] for 0.2.x versions.*
+
+[Complete Documentation][doc] (stable)
 
 [doc]: https://lifthrasiir.github.io/rust-encoding/
 
@@ -20,14 +23,7 @@ Put this in your `Cargo.toml`:
 
 ```toml
 [dependencies]
-encoding = "0.2"
-```
-
-Or in the case you are using Rust 1.0 beta, pin the exact version:
-
-```toml
-[dependencies]
-encoding = "=0.2.32"
+encoding = "0.3"
 ```
 
 Then put this in your crate root:
@@ -36,6 +32,22 @@ Then put this in your crate root:
 extern crate encoding;
 ```
 
+### Data Table
+
+By default, Encoding comes with ~480 KB of data table ("indices").
+This allows Encoding to encode and decode legacy encodings efficiently,
+but this might not be desirable for some applications.
+
+Encoding provides the `no-optimized-legacy-encoding` Cargo feature
+to reduce the size of encoding tables (to ~185 KB)
+at the expense of encoding performance (typically 5x to 20x slower).
+The decoding performance remains identical.
+**This feature is strongly intended for end users.
+Do not try to enable this feature from library crates, ever.**
+
+For finer-tuned optimization, see `src/index/gen_index.py` for
+custom table generation.
+
 ## Overview
 
 To encode a string:
@@ -160,7 +172,8 @@ There are two ways to get `Encoding`:
   You should use them when the encoding would not change or only handful of them are required.
   Combined with link-time optimization, any unused encoding would be discarded from the binary.
 * `encoding::label` has functions to dynamically get an encoding from given string ("label").
-  They will return a static reference to the encoding, which type is also known as `EncodingRef`.
+  They will return a static reference to the encoding,
+  which type is also known as `EncodingRef`.
   It is useful when a list of required encodings is not available in advance,
   but it will result in the larger binary and missed optimization opportunities.
 
@@ -214,3 +227,4 @@ Consequently one should be careful when picking a desired character encoding.
 The only standards reliable in this regard are WHATWG Encoding Standard and
 [vendor-provided mappings from the Unicode consortium](http://www.unicode.org/Public/MAPPINGS/).
 Whenever in doubt, look at the source code and specifications for detailed explanations.
+