how to setup a compressUTF8() variant #155

nielsnl68 · 2021-07-06T07:17:17Z

Hi, could you give me some hints to change the compressToUTF16() to compressToUTF8()?
I use an other lib to encrypt the string but it cant handle utf16 atm.

pieroxy · 2021-08-05T11:40:23Z

Can you tell us why you need a utf8 implementation? It makes little to no sense at all.

You talk about encryption but utf (8 or 16) is about encoding. Can you tell us more about what you’re trying to do?

nielsnl68 · 2021-08-11T08:51:41Z

Hello @pieroxy ,

Thanks for your reply, in my case i use our compression module to talk between the server and the clients over ajax call's and compressed url's to activate mailbox connections.
At the moment all communications between each other is based on the utf8 encoding. I tried using your compressUTF16 solution but i found that the data was not send correctly or better the receiving part saw some end of file bytes and diched the remainder of the data,

At the moment i use de base64 solution but now the packages are 2x as big.

So i was hoping that we could convert the encrypted data into a utf8 string like you do with utf16.

Rycochet · 2021-08-11T12:03:33Z

Just to note - Javascript does not support UTF8 - it converts any text strings etc into UTF16, which means there is potentially data change when converting (one UTF8 character might be >1 UTF16 character etc).

Saying that, if you need to send as UTF8 then something like https://gist.github.com/joni/3760795/8f0c1a608b7f0c8b3978db68105c5b1d741d0446 might be a good starting point for how to convert - you'll still need to send it as raw binary data from the array. Decoding is another matter as it'll need converting to binary without allowing JS itself to convert to UTF16 (which will break it entirely).

Are you sure it won't be easier to put a UTF16 support library on the backend instead?

nielsnl68 · 2021-08-11T13:10:36Z

I did not know that javascript did not support utf8, this is the first time i read about it.

That gist example does look perfect.

Rycochet · 2021-08-11T13:14:24Z

I did not know that javascript did not support utf8, this is the first time i read about it.

Pretty good writeup including references - https://flaviocopes.com/javascript-unicode/

nielsnl68 · 2021-08-11T14:31:19Z

okey, then i makes me wonder why ajax calls are breaking.
I will investigate that soon.

It does make sense now to change

Rycochet · 2021-08-11T14:38:57Z

Binary is always a safer transport mechanism than some encoding ;-)

nielsnl68 · 2021-08-19T16:22:44Z

okay, i created my own fromUTF8() and toUTF8() functions so they a readable string from a byteArry,

I understand how your _compress() function works. like: _compress(uncompressed, 8,function (a) { return f(a); });
But i have no clue as how i should set the _decompress() function's second parameter.

could you help with that?

const UTF8convert = [
    0x01, 0x06, 0x3D, 0x3E, 0x5a,  // 
    0x60, 0x62, 0x64, 0x68, 0x69, 0x6B, 0x70, 0x73, 0x93
];

function toUTF8(data) { // array of bytes
    const char = (value) => {
        let q = UTF8convert.indexOf(value);
        if (q >= 0) {
            return String.fromCharCode(q + 0xE0);
        } else if (value < 0x5d) {
            return String.fromCharCode(value + 0x21);
        } else {
            return String.fromCharCode(value + 0x44);
        }
    }

    var str = '',
        shift = 0,
        i;

    for (i = 0; i < data.length; i++) {
        var value = (data[i] << (i % 7)) + shift;
        shift = value >> 7;
        value = value & 0x7f;
        str += char(value);
        if (i % 7 === 6) {
            str += char(shift);
            shift = 0;
        }
    }
    str += char(shift);
    return str;
}

function fromUTF8(str) {
    var utf8 = [], shift = 0, x = 0;
    for (var i = 0; i < str.length; i++) {
        var charcode = str.charCodeAt(i);
        if (charcode >= 0xe0) {
            charcode = UTF8convert.at(charcode - 0xe0);
        } else if (charcode >= 0xA1) {
            charcode = charcode - (0x44);
        } else {
            charcode = charcode - (0x21);
        }
        if ((i % 8) > 0) {
            shift = (charcode << (8 - (i % 8))) & 0xff;
            utf8[utf8.length - 1] += shift;
        }
        if ((i == 0 || ((i % 8) != 7)) && (i < str.length - 1)) {
            charcode = charcode >> (x % 7);
            x++;
            utf8.push(charcode);
        }
    }
    return utf8;
}

karnthis · 2024-01-22T05:36:14Z

Due to the listed situation with Javascript using UTF16 and not UTF8, I think the best solution if UTF8 is truly desired would be compressToUint8Array() and encoding the resulting output with some implementation-specific function.

This can probably be closed as a "won't implement"

Rycochet · 2024-01-22T09:16:34Z

Not sure - NodeJS is supposed to support it better, but we need compatibility between front-end and back-end - there is TextEncoder and TextDecoder to play with once everything else is updated properly - holding off closing until we've had a chance :-P

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to setup a compressUTF8() variant #155

how to setup a compressUTF8() variant #155

nielsnl68 commented Jul 6, 2021

pieroxy commented Aug 5, 2021

nielsnl68 commented Aug 11, 2021

Rycochet commented Aug 11, 2021

nielsnl68 commented Aug 11, 2021

Rycochet commented Aug 11, 2021

nielsnl68 commented Aug 11, 2021

Rycochet commented Aug 11, 2021

nielsnl68 commented Aug 19, 2021 •

edited

Loading

karnthis commented Jan 22, 2024

Rycochet commented Jan 22, 2024

how to setup a compressUTF8() variant #155

how to setup a compressUTF8() variant #155

Comments

nielsnl68 commented Jul 6, 2021

pieroxy commented Aug 5, 2021

nielsnl68 commented Aug 11, 2021

Rycochet commented Aug 11, 2021

nielsnl68 commented Aug 11, 2021

Rycochet commented Aug 11, 2021

nielsnl68 commented Aug 11, 2021

Rycochet commented Aug 11, 2021

nielsnl68 commented Aug 19, 2021 • edited Loading

karnthis commented Jan 22, 2024

Rycochet commented Jan 22, 2024

nielsnl68 commented Aug 19, 2021 •

edited

Loading