Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-ascii character support? #60

Open
timdau opened this issue Jan 17, 2022 · 6 comments
Open

Non-ascii character support? #60

timdau opened this issue Jan 17, 2022 · 6 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@timdau
Copy link

timdau commented Jan 17, 2022

I have a bunch of files with non-ascii characters which I expect are causing this error?

Error: class: class NodeError stack: TypeError [ERR_INVALID_CHAR]: Invalid character in header content ["x-amz-copy-source"] at ClientRequest.setHeader (_http_outgoing.js:536:3) at new ClientRequest (_http_client.js:249:14) at Object.request (https.js:310:10) at features.constructor.handleRequest (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/http/node.js:45:23) at executeSend (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/event_listeners.js:370:29) at Request.SEND (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/event_listeners.js:384:9) at Request.callListeners (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/sequential_executor.js:102:18) at Request.emit (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/sequential_executor.js:78:10) at Request.emit (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/request.js:686:14) at Request.transition (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/request.js:22:10) at AcceptorStateMachine.runTo (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/state_machine.js:14:12) at /home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/state_machine.js:26:10 at Request.<anonymous> (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/request.js:38:9) at Request.<anonymous> (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/request.js:688:12) at Request.callListeners (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/sequential_executor.js:116:18) at callNextListener (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/sequential_executor.js:96:12) at /home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/event_listeners.js:269:9 at finish (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/config.js:396:7) at /home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/config.js:414:9 at SharedIniFileCredentials.get (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/credentials.js:127:7) at getAsyncCredentials (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/config.js:408:24) at Config.getCredentials (/home/paul/.npm/_npx/2190/lib/node_modules/s3p/node_modules/aws-sdk/lib/config.js:428:9)

@shanebdavis
Copy link
Member

Because of limitations of S3P's listBucket and how the algorithm works this is... not trivial. However, could you provide some examples of the file names you'd like to be supported?

@shanebdavis shanebdavis added the question Further information is requested label Mar 3, 2022
@timdau
Copy link
Author

timdau commented Mar 3, 2022

Hey Shane, I moved onto using a different approach and I don't have access to the data anymore. The characters were cyrillic. I guess you can close this as wontfix if you want.

@201341
Copy link

201341 commented Mar 25, 2022

@shanebdavis hi, If I want to extent the algorithm to Chinese, What do I need to do? For example the filename is 测试.txt

@shanebdavis
Copy link
Member

Basically, the function getBisectKey in https://github.com/generalui/s3p/blob/master/source/S3Parallel/Lib/S3Keys.caf needs to be updated to support the Chinese character ranges in Unicode.

I'd like to make it optional since it will likely decrease the performance of the algorithm. There are 74605 (2^16 ish) compared to the current 95 (2^7).

I -think- this can be accomplished by adding, in unicode order, the characters you want to support to the supportedKeyChars string.

@201341
Copy link

201341 commented Mar 25, 2022

Thanks for your replying. I know a little about the unicode. And I have another question, if we need to support all other languages such as Korean\ Janpenese, does it mean the supportedKeyChars string will be very very long, and the performance of the algorithm may decrease a lot and even not better than single thread?

@shanebdavis
Copy link
Member

We'd have to experiment to see just how this impacts performance. Probably a CLI option to add custom unicode character ranges would be the right place to start.

@shanebdavis shanebdavis added the enhancement New feature or request label Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants