-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
interpret filenames as UTF-8 even without general purpose bit 11 to workaround Mac bug #84
Comments
The problem seems to be that the filenames are encoded in UTF8, but general purpose bit 11 is not set. The zipfile claims the filenames are encoded with CP437, and in that encoding, the filename you're seeing is the correct interpretation. The zip file is expecting zipfile readers (like yauzl) to interpret the filename as UTF8 without being instructed to do so. In other words, yauzl is behaving correctly, and the zipfile is malformed. Do you know what program created this zip file? |
Is it Archive Utility on Mac? |
@imcuttle I have a need to handle similar not-so-standard .zip files in my application, and I wanted to share my heuristic solution. If you only need to deal with this file and similar files that are always UTF-8 (even if they don't indicate this), you can use the In my case, it is a bit more complicated, as I need to simultaneously handle zip files that are UTF-8 (with and without the proper bit being set), as well as files that are CP437 encoded. My solution is to use Specifically, I use the code in this gist to get some information on the name Buffers, followed by this logic: const aggs = checkStringBufs(entries.map(entry => entry.fileName as Buffer));
let encoding: string;
if (aggs.allAsciiChar) {
// utf8 is backwards compatible with ascii
encoding = 'utf8';
} else if (aggs.all7Bit) {
// Hmmm, no high bits but some control chars, probably cp437
encoding = 'cp437';
} else if (aggs.validUtf8) {
// Some high bits set, but seems to be UTF-8
encoding = 'utf8';
} else {
// Some high bits set, but not UTF-8!
encoding = 'cp437';
} This has been working well for the .zip files that I deal with. |
Yep, the zip file created by mac system, It's puzzled that the zip file is malformed. |
You'd think that Apple would be better at writing software, but their Archive Utility really sucks at zip files. I've been working around bugs in that software for years. If this issue is as simple as "Archive Utility always forgets to set bit 11", then maybe yauzl should have better support for this situation. I'll think about this. |
我也遇到了这个问题,请问这个问题解决了嘛? |
Sorry, I haven't been working on this project lately. I'll revisit this issue next week. |
I guess that isn't the wrong of yauzl. when I passed option |
I believe the |
I did some research into Info-ZIP's charset detection code, and in the absence of General Purpose Bit 11, Info-ZIP uses a different charset depending on the operating system. It will only use CP437 as required by the spec on some platforms, presumably DOS. However, on Linux and Mac, Info-ZIP will simply always use UTF-8 for decoding file paths, because UTF-8 is the "native" charset on those platforms, whatever that means. This suggests it's safe for yauzl to drop support for CP437 and just use UTF-8 in all situations as well. 🤔 |
pr is rejected !! No clue about how to better deal with this issue, any progress ??? |
Any news on this issue. The problem still exist with OSX archives. |
最近我也遇到了一个问题,主要是Overleaf一直用的也是yauzl作为压缩包的处理方法。对于这种,我认为目前的解决方案就是:
Recently, I also encountered a similar issue with Overleaf, where I was using yauzl for handling compressed files. For such cases, I believe the current solution options are:
Considering that it seems the author is not actively maintaining the project anymore, this historical issue can only be addressed by future developers. MacOS indeed has some non-standard behaviors when it comes to zip files, so currently, we have to follow the author's statement that this is not a problem with yauzl but rather an issue with non-standard zip files. |
Hello everyone. Sorry for the long silence. Whenever Mac Archive Utility has a bug I'm supposed to work around, it's really demotivating. I suggest everyone who is interested in this issue getting addressed please file a bug report against Apple's Archive Utility to "Set General Purpose Bit 11 to indicate UTF-8 encoded file names". If you can't figure out how to file a bug report against Archive Utility, then you now understand some of my demotivation for working around their broken trashware. If yauzl were to offer a workaround for this, it would mean that the authors of code calling into yauzl would need to know whether a zipfile was created by Apple or whether it was a conformant zipfile. Given that I, the human author of yauzl, while using a zip file analyzer tool that I made, still cannot reliably tell whether any given zip file was created by Archive Utility vs something else, I don't think it makes any sense for yauzl to offer a configuration option that requires this determination be made by programmers who are relying on yauzl to handle the quirks of the zip file format so they don't have to think about it. The ideal support for working around this issue would be to have everything "just work" all the time, which means changing the default interpretation from CP437 to UTF-8, and then seeing how that breaks everyone's zip files. Presumably, this better supports OSX, and worse supports DOS. Seems like a reasonable tradeoff, but I'm a stickler for following the spec (it's literally the number 1 design principle of yauzl.), and allowing a megacorporate bully to shape defacto standards through incompetence feels really bad. Good feelings are literally the funding keeping this volunteer project going, and currently this issue, issue #84, has no available funding behind it. Again, please file a bug report against Archive Utility if you can. And if you figure out how to do that, please do report back here. It would be very encouraging to find out that Apple is willing to listen to reports of the damage they are causing in the software world. |
Please don't. It will break gazillions of zip filenames from Windows. Windows built-in "Compressed zip folder" is still using OEM code page (437 on en_US, but it is not even 437 on some other locales). Windows had no GUI to create a zip file whose filename encoding is UTF-8 until very recently. Windows 11 24H2 finally switched to libarchive and started using UTF-8 filenames for new zip archives. But if you add a file to an existing zip archives, the older zip folder code will run and OEM code page will be used. Most Windows software (including libarchive) follows the older zip folder's behavior (i.e. assume OEM code page if EFS is not set).
Although the latest zip spec says that the filename encoding is code page 437 if EFS is not set, many apps do not obey that because the older (before 6.3.0) specs said nothing about the filename encoding and apps are required to be backward-compatible. If a user updated an app and filenames in existing zip archives are suddenly broken, the user will very likely to blame the app developer. So unfortunately you can't assume that the filename encoding is code page 437 if EFS is not set especially when the "version made by" field is less than 63. For clarity, I'm not defending Apple. Rather, Windows users (especially on CJK locales) are suffered from broken filenames from Mac because Archive Utility uses UTF-8 without setting EFS and Windows apps assume that the filename encoding is OEM code page if EFS is not set. You're absolutely right about that we should file a bug against Apple. |
File: 中文测试.zip
The zip file contains
中文测试.md
,when I passdecodeStrings: true
, the result iswhen I pass
decodeStrings: false
, the errorThe "path" argument must be of type string
be thrown.The text was updated successfully, but these errors were encountered: