Skip to content

Commit

Permalink
reduced the size of index tables a bit, and some optional features.
Browse files Browse the repository at this point in the history
This also marks the beginning of the 0.3 development branch.
The documentation won't be synchronized from now on to 0.3.0.

- `no-optimized-legacy-encoding` Cargo feature can be used to
  cut most of backward mapping tables at the expense of
  encoding performance (1/3 size vs. 5--20x slower).

- Very, very slightly `rustfmt`ed the library code.
  `Makefile` has been changed as a result.
  • Loading branch information
lifthrasiir committed Dec 27, 2015
1 parent 1c0905d commit 726d6da
Show file tree
Hide file tree
Showing 46 changed files with 18,972 additions and 21,240 deletions.
1 change: 1 addition & 0 deletions AUTHORS.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Peter Atashian <[email protected]>
Pierre Baillet <[email protected]>
Robert Straw <[email protected]>
Simon Sapin <[email protected]>
Simonas Kazlauskas <[email protected]>
Son <[email protected]>
Steve Klabnik <[email protected]>
klutzy <[email protected]>
Expand Down
21 changes: 15 additions & 6 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "encoding"
version = "0.2.32"
version = "0.3.0-dev"
authors = ["Kang Seonghoon <[email protected]>"]

description = "Character encoding support for Rust"
Expand All @@ -14,6 +14,15 @@ license = "MIT"
[lib]
name = "encoding"

[features]
no-optimized-legacy-encoding = [
"encoding-index-singlebyte/no-optimized-legacy-encoding",
"encoding-index-korean/no-optimized-legacy-encoding",
"encoding-index-japanese/no-optimized-legacy-encoding",
"encoding-index-simpchinese/no-optimized-legacy-encoding",
"encoding-index-tradchinese/no-optimized-legacy-encoding",
]

[dependencies.encoding-types]
version = "0.2"
path = "src/types"
Expand All @@ -27,23 +36,23 @@ path = "src/types"
# so we should use tilde requirements here.

[dependencies.encoding-index-singlebyte]
version = "~1.20141219.5"
version = "~1.20141219.6"
path = "src/index/singlebyte"

[dependencies.encoding-index-korean]
version = "~1.20141219.5"
version = "~1.20141219.6"
path = "src/index/korean"

[dependencies.encoding-index-japanese]
version = "~1.20141219.5"
version = "~1.20141219.6"
path = "src/index/japanese"

[dependencies.encoding-index-simpchinese]
version = "~1.20141219.5"
version = "~1.20141219.6"
path = "src/index/simpchinese"

[dependencies.encoding-index-tradchinese]
version = "~1.20141219.5"
version = "~1.20141219.6"
path = "src/index/tradchinese"

[dev-dependencies]
Expand Down
11 changes: 5 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,16 @@ readme: README.md

README.md: src/lib.rs
# really, really sorry for this mess.
awk '/^# Encoding /{print "[Encoding][doc]",$$3}' $< > $@
awk '/^# Encoding /{print "[Encoding][doc]",$$3}' $< | sed 's/./=/g' >> $@
awk '/^\/\/! # Encoding /{print "[Encoding][doc]",$$4}' $< > $@
awk '/^\/\/! # Encoding /{print "[Encoding][doc]",$$4}' $< | sed 's/./=/g' >> $@
echo >> $@
echo '[![Encoding on Travis CI][travis-image]][travis]' >> $@
echo >> $@
echo '[travis-image]: https://travis-ci.org/lifthrasiir/rust-encoding.png' >> $@
echo '[travis]: https://travis-ci.org/lifthrasiir/rust-encoding' >> $@
awk '/^# Encoding /,/^## /' $< | tail -n +2 | head -n -2 >> $@
echo >> $@
echo '[Complete Documentation][doc]' >> $@
awk '/^\/\/! # Encoding /,/^\/\/! ## /' $< | cut -b 5- | grep -v '^#' >> $@
echo '[Complete Documentation][doc] (stable)' >> $@
echo >> $@
echo '[doc]: https://lifthrasiir.github.io/rust-encoding/' >> $@
echo >> $@
awk '/^## /,/^\*\/$$/' $< | grep -v '^# ' | head -n -2 >> $@
awk '/^\/\/! ## /,!/^\/\/!/' $< | cut -b 5- >> $@
38 changes: 26 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[Encoding][doc] 0.2.32
======================
[Encoding][doc] 0.3.0-dev
=========================

[![Encoding on Travis CI][travis-image]][travis]

Expand All @@ -10,7 +10,10 @@ Character encoding support for Rust. (also known as `rust-encoding`)
It is based on [WHATWG Encoding Standard](http://encoding.spec.whatwg.org/),
and also provides an advanced interface for error detection and recovery.

[Complete Documentation][doc]
*This documentation is for the development version (0.3).
Please see the [stable documentation][doc] for 0.2.x versions.*

[Complete Documentation][doc] (stable)

[doc]: https://lifthrasiir.github.io/rust-encoding/

Expand All @@ -20,14 +23,7 @@ Put this in your `Cargo.toml`:

```toml
[dependencies]
encoding = "0.2"
```

Or in the case you are using Rust 1.0 beta, pin the exact version:

```toml
[dependencies]
encoding = "=0.2.32"
encoding = "0.3"
```

Then put this in your crate root:
Expand All @@ -36,6 +32,22 @@ Then put this in your crate root:
extern crate encoding;
```

### Data Table

By default, Encoding comes with ~480 KB of data table ("indices").
This allows Encoding to encode and decode legacy encodings efficiently,
but this might not be desirable for some applications.

Encoding provides the `no-optimized-legacy-encoding` Cargo feature
to reduce the size of encoding tables (to ~185 KB)
at the expense of encoding performance (typically 5x to 20x slower).
The decoding performance remains identical.
**This feature is strongly intended for end users.
Do not try to enable this feature from library crates, ever.**

For finer-tuned optimization, see `src/index/gen_index.py` for
custom table generation.

## Overview

To encode a string:
Expand Down Expand Up @@ -160,7 +172,8 @@ There are two ways to get `Encoding`:
You should use them when the encoding would not change or only handful of them are required.
Combined with link-time optimization, any unused encoding would be discarded from the binary.
* `encoding::label` has functions to dynamically get an encoding from given string ("label").
They will return a static reference to the encoding, which type is also known as `EncodingRef`.
They will return a static reference to the encoding,
which type is also known as `EncodingRef`.
It is useful when a list of required encodings is not available in advance,
but it will result in the larger binary and missed optimization opportunities.

Expand Down Expand Up @@ -214,3 +227,4 @@ Consequently one should be careful when picking a desired character encoding.
The only standards reliable in this regard are WHATWG Encoding Standard and
[vendor-provided mappings from the Unicode consortium](http://www.unicode.org/Public/MAPPINGS/).
Whenever in doubt, look at the source code and specifications for detailed explanations.

Loading

0 comments on commit 726d6da

Please sign in to comment.