Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail to search for index in directory with directory name containing CJK characters #3008

Open
1 task done
wh-timme opened this issue Nov 13, 2024 · 5 comments
Open
1 task done
Labels

Comments

@wh-timme
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

file path:
./public/hello/你好/index.html
will return 404 when accessing to http://serving_domain/hello/你好

Code snippet

app.static('/', './public', index='index.html')

Expected Behavior

redirect to ./public/hello/你好/index.html instead of 404 page

How do you run Sanic?

As a script (app.run or Sanic.serve)

Operating System

Linux

Sanic Version

24.6.0

Additional context

No response

@wh-timme wh-timme added the bug label Nov 13, 2024
@adk23333
Copy link

Anyway, you shouldn't use chinese path.
不建议使用中文目录,这会出现什么问题,我也不知道。

@Tronic
Copy link
Member

Tronic commented Nov 18, 2024

How are CJK pathnames handled nowadays? Unicode or some other weirdness? To my knowledge, Sanic and Python ought to handle Unicode conversions properly, if the browser supplies the path in UTF-8.

To triage, could you print request.url_bytes and request.url, and try if you can open the file with Python open(...), and if so, how do you need to write the CJK characters in that.

@adk23333
Copy link

How are CJK pathnames handled nowadays? Unicode or some other weirdness? To my knowledge, Sanic and Python ought to handle Unicode conversions properly, if the browser supplies the path in UTF-8.

It is usually URL encoding.
example website

@Tronic
Copy link
Member

Tronic commented Nov 20, 2024

Yes, but that doesn't tell which encoding. UTF-8 and other encodings produce different %-codes for the same character, and I have no idea what you actually use in China.

@adk23333
Copy link

UTF-8 and other encodings produce different %-codes for the same character, and I have no idea what you actually use in China.

Most people generally use UTF - 8 encoding, and some old websites may use GBK.

example as

https://fanyi.baidu.com/#en/zh/Yes%2C%20but%20that%20doesn't%20tell%20which%20encoding.%20UTF-8%20and%20other%20encodings%20produce%20different%20%25-codes%20for%20the%20same%20character%2C%20and%20I%20have%20no%20idea%20what%20you%20actually%20use%20in%20China.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants