-
Notifications
You must be signed in to change notification settings - Fork 283
Use Buffers when decoding
Decoding a string is probably the most common mistake when working with legacy encoded resources. Why? Lets see.
This is wrong:
var http = require('http'),
iconv = require('iconv-lite');
http.get("http://website.com/", function(res) {
var body = '';
res.on('data', function(chunk) {
body += chunk;
});
res.on('end', function() {
var decodedBody = iconv.decode(body, 'win1252');
console.log(decodedBody);
});
});
Before being decoded with iconv.decode
function, the original resource was (unintentionally) already decoded in body += chunk
via javascript type conversion. What really happens here is:
res.on('data', function(chunkBuffer) {
body += chunkBuffer.toString('utf8');
});
The same conversion is done behind the scenes if you call res.setEncoding('utf8');
.
Not only double-decoding leads to wrong results, it is also nearly impossible to restore original bytes because utf8 conversion is lossy, so even iconv.decode(new Buffer(body, 'utf8'), 'win1252')
will not help.
Note: theoretically, if you use 'binary' encoding to first decode to strings, then feed them to decode
, you get the correct results. This is a bad practice because it's slower, it's mixing concepts and 'binary' encoding is deprecated.
Keep original Buffer
-s and provide them to iconv.decode
. Use Buffer.concat()
if needed.
In general, keep in mind that all javascript strings are already decoded and should not be decoded again.
http.get("http://website.com/", function(res) {
var chunks = [];
res.on('data', function(chunk) {
chunks.push(chunk);
});
res.on('end', function() {
var decodedBody = iconv.decode(Buffer.concat(chunks), 'win1252');
console.log(decodedBody);
});
});
// Or, with [email protected] and Node v0.10+, you can use streaming support with `collect` helper
http.get("http://website.com/", function(res) {
res.pipe(iconv.decodeStream('win1252')).collect(function(err, decodedBody) {
console.log(decodedBody);
});
});
iconv.skipDecodeWarning = true;