Skip to content

Commit 55f0155

Browse files
committed
Add warning about unsupported encodings
1 parent 855bda8 commit 55f0155

File tree

2 files changed

+47
-2
lines changed

2 files changed

+47
-2
lines changed

Cargo.toml

Lines changed: 46 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,9 +54,54 @@ async-tokio = ["tokio"]
5454
## UTF-16 will not work (therefore, `quick-xml` is not [standard compliant]).
5555
##
5656
## List of supported encodings includes all encodings supported by [`encoding_rs`]
57-
## crate, that satisfied the restriction above.
57+
## crate, that satisfied the restriction above. So, the following encodings are
58+
## **not supported**:
59+
## - [UTF-16BE]
60+
## - [UTF-16LE]
61+
## - [ISO-2022-JP]
62+
##
63+
## You should stop to process document when one of that encoding will be detected,
64+
## because generated events can be wrong and do not reflect a real document structure!
65+
##
66+
## Because there is only supported encodings that is not ASCII compatible, you can
67+
## check for that to detect them:
68+
##
69+
## ```
70+
## use quick_xml::events::Event;
71+
## use quick_xml::reader::Reader;
72+
##
73+
## # fn to_utf16le_with_bom(string: &str) -> Vec<u8> {
74+
## # let mut bytes = Vec::new();
75+
## # bytes.extend_from_slice(&[0xFF, 0xFE]); // UTF-16 LE BOM
76+
## # for ch in string.encode_utf16() {
77+
## # bytes.extend_from_slice(&ch.to_le_bytes());
78+
## # }
79+
## # bytes
80+
## # }
81+
## let xml = to_utf16le_with_bom(r#"<?xml encoding='UTF-16'><element/>"#);
82+
## let mut reader = Reader::from_reader(xml.as_ref());
83+
## reader.trim_text(true);
84+
##
85+
## let mut buf = Vec::new();
86+
## let mut unsupported = false;
87+
## loop {
88+
## if !reader.decoder().encoding().is_ascii_compatible() {
89+
## unsupported = true;
90+
## break;
91+
## }
92+
## buf.clear();
93+
## match reader.read_event_into(&mut buf).unwrap() {
94+
## Event::Eof => break,
95+
## _ => {}
96+
## }
97+
## }
98+
## assert_eq!(unsupported, true);
99+
## ```
58100
##
59101
## [standard compliant]: https://www.w3.org/TR/xml11/#charencoding
102+
## [UTF-16BE]: encoding_rs::UTF_16BE
103+
## [UTF-16LE]: encoding_rs::UTF_16LE
104+
## [ISO-2022-JP]: encoding_rs::ISO_2022_JP
60105
encoding = ["encoding_rs"]
61106

62107
## Enables support for recognizing all [HTML 5 entities](https://dev.w3.org/html5/html-author/charref)

src/encoding.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ pub(crate) const UTF16_BE_BOM: &[u8] = &[0xFE, 0xFF];
2828
/// key is not defined or contains unknown encoding.
2929
///
3030
/// The library supports any UTF-8 compatible encodings that crate `encoding_rs`
31-
/// is supported. [*UTF-16 is not supported at the present*][utf16].
31+
/// is supported. [*UTF-16 and ISO-2022-JP are not supported at the present*][utf16].
3232
///
3333
/// If feature `encoding` is disabled, the decoder is always UTF-8 decoder:
3434
/// any XML declarations are ignored.

0 commit comments

Comments
 (0)