Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to setup a compressUTF8() variant #155

Open
nielsnl68 opened this issue Jul 6, 2021 · 10 comments
Open

how to setup a compressUTF8() variant #155

nielsnl68 opened this issue Jul 6, 2021 · 10 comments

Comments

@nielsnl68
Copy link

Hi, could you give me some hints to change the compressToUTF16() to compressToUTF8()?
I use an other lib to encrypt the string but it cant handle utf16 atm.

@pieroxy
Copy link
Owner

pieroxy commented Aug 5, 2021

Can you tell us why you need a utf8 implementation? It makes little to no sense at all.

You talk about encryption but utf (8 or 16) is about encoding. Can you tell us more about what you’re trying to do?

@nielsnl68
Copy link
Author

Hello @pieroxy ,

Thanks for your reply, in my case i use our compression module to talk between the server and the clients over ajax call's and compressed url's to activate mailbox connections.
At the moment all communications between each other is based on the utf8 encoding. I tried using your compressUTF16 solution but i found that the data was not send correctly or better the receiving part saw some end of file bytes and diched the remainder of the data,

At the moment i use de base64 solution but now the packages are 2x as big.

So i was hoping that we could convert the encrypted data into a utf8 string like you do with utf16.

@Rycochet
Copy link
Collaborator

Just to note - Javascript does not support UTF8 - it converts any text strings etc into UTF16, which means there is potentially data change when converting (one UTF8 character might be >1 UTF16 character etc).

Saying that, if you need to send as UTF8 then something like https://gist.github.com/joni/3760795/8f0c1a608b7f0c8b3978db68105c5b1d741d0446 might be a good starting point for how to convert - you'll still need to send it as raw binary data from the array. Decoding is another matter as it'll need converting to binary without allowing JS itself to convert to UTF16 (which will break it entirely).

Are you sure it won't be easier to put a UTF16 support library on the backend instead?

@nielsnl68
Copy link
Author

I did not know that javascript did not support utf8, this is the first time i read about it.

That gist example does look perfect.

@Rycochet
Copy link
Collaborator

I did not know that javascript did not support utf8, this is the first time i read about it.

Pretty good writeup including references - https://flaviocopes.com/javascript-unicode/

@nielsnl68
Copy link
Author

okey, then i makes me wonder why ajax calls are breaking.
I will investigate that soon.

It does make sense now to change

@Rycochet
Copy link
Collaborator

Binary is always a safer transport mechanism than some encoding ;-)

@nielsnl68
Copy link
Author

nielsnl68 commented Aug 19, 2021

okay, i created my own fromUTF8() and toUTF8() functions so they a readable string from a byteArry,

I understand how your _compress() function works. like: _compress(uncompressed, 8,function (a) { return f(a); });
But i have no clue as how i should set the _decompress() function's second parameter.

could you help with that?

const UTF8convert = [
    0x01, 0x06, 0x3D, 0x3E, 0x5a,  // 
    0x60, 0x62, 0x64, 0x68, 0x69, 0x6B, 0x70, 0x73, 0x93
];

function toUTF8(data) { // array of bytes
    const char = (value) => {
        let q = UTF8convert.indexOf(value);
        if (q >= 0) {
            return String.fromCharCode(q + 0xE0);
        } else if (value < 0x5d) {
            return String.fromCharCode(value + 0x21);
        } else {
            return String.fromCharCode(value + 0x44);
        }
    }

    var str = '',
        shift = 0,
        i;

    for (i = 0; i < data.length; i++) {
        var value = (data[i] << (i % 7)) + shift;
        shift = value >> 7;
        value = value & 0x7f;
        str += char(value);
        if (i % 7 === 6) {
            str += char(shift);
            shift = 0;
        }
    }
    str += char(shift);
    return str;
}

function fromUTF8(str) {
    var utf8 = [], shift = 0, x = 0;
    for (var i = 0; i < str.length; i++) {
        var charcode = str.charCodeAt(i);
        if (charcode >= 0xe0) {
            charcode = UTF8convert.at(charcode - 0xe0);
        } else if (charcode >= 0xA1) {
            charcode = charcode - (0x44);
        } else {
            charcode = charcode - (0x21);
        }
        if ((i % 8) > 0) {
            shift = (charcode << (8 - (i % 8))) & 0xff;
            utf8[utf8.length - 1] += shift;
        }
        if ((i == 0 || ((i % 8) != 7)) && (i < str.length - 1)) {
            charcode = charcode >> (x % 7);
            x++;
            utf8.push(charcode);
        }
    }
    return utf8;
}

@karnthis
Copy link
Contributor

Due to the listed situation with Javascript using UTF16 and not UTF8, I think the best solution if UTF8 is truly desired would be compressToUint8Array() and encoding the resulting output with some implementation-specific function.

This can probably be closed as a "won't implement"

@Rycochet
Copy link
Collaborator

Not sure - NodeJS is supposed to support it better, but we need compatibility between front-end and back-end - there is TextEncoder and TextDecoder to play with once everything else is updated properly - holding off closing until we've had a chance :-P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants