-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patch: On-the-fly UTF-8 conversion #128
Comments
It is fun to see that fast-export is used. Have you noted issue #95? When it is complete, it will provide a way to filter file contents during conversion without having to modify fast-export. As you say, this is completely specific to your use-case, so I'll close this issue for now, perhaps it can serve as inspiration for a future |
#95 is now merged, the above mangling of file contents can now be done by an external program using |
@frej @tsdh I am using the fast-export to convert some mercurial repos to git but facing ASCII encoding errors during java compilation. Before finding this thread , I tried below approach and that fixed my current failures use below in the javac ant target. If the above approach is not correct then can you please point towards how I can solve the encoding using the --filter-contents option. |
@Utkarsh-nk, I'm sorry but I don't understand your question nor what your issue with fast-export is. If your issue is with how to use If you're asking about how to write a filter that detects an arbitrary encoding and converts it to UTF-8 or how to configure a Java build system to accept a particular encoding, this is not the right forum. BTW, as far as I can tell from the man page for gitattributes, the directive |
Over the last weekend, I've migrated our huge mercurial repository with 16 years of history (nearly 200.000 commits, about 20.000 java files) to git using the
hg-fast-export.sh
script.My goal was to also convert our java files from a wild mix of ISO-8859-15, Cp1252, UTF-8, or simply broken to UTF-8. I've tested a
git filter-branch --tree-filter ...
approach but someone at StackOverflow suggested to do the conversion either in fast-export or fast-import directly. That's what I did, and it worked like a charm. Conversion time increased from about 4-5 hours to 56 hours, though. Guessing the current encoding using chardet is a bit costly, and decoding/encoding is expensive, too.The patch is against the hg-4.6-compat branch.
Of course, the patch is completely specific to my use-case (target encoding fixed, only do it on java files, also change the encoding setting in Maven
pom.xml
files) but it might be the start for a new conversion option.(Just to address the obvious question: why not update the encoding with a normal commit?
Because (depending on how many non-ASCII characters you have in your code) it becomes almost impossible to cherry-pick/graft/merge commits from the era before the UTF-8 change-over without getting tons of merge conflicts.)
The text was updated successfully, but these errors were encountered: