-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bibtex.py parser accents problems #250
Comments
Hi, I also use this library in one of my projects and I agree with the bug. Nevertheless, I do not with the solution. Your workaround is equivalent to replace in your bibtex I investigated a bit the issue, not that much. My understanding is the following: I guess there is two problems at least:
[1] As far as I understand, it does not come from strip_braces calls in add_val. Probably before in parse_record(), in the block starting by the comment # for each line in record |
OK. I think I got it. was easier than I thought. There is my diff. In my previous post, I was wrong. The first point I mentioned seems to be already supported. diff --git a/parserscrapers_plugins/bibtex.py b/parserscrapers_plugins/bibtex.py
index cfea621..aa9d669 100755
--- a/parserscrapers_plugins/bibtex.py
+++ b/parserscrapers_plugins/bibtex.py
@@ -244,11 +244,8 @@ class BibTexParser(object):
for k, v in self.unicode_to_latex.iteritems():
if v in val:
parts = val.split(str(v))
- for key,val in enumerate(parts):
- if key+1 < len(parts) and len(parts[key+1]) > 0:
- parts[key+1] = parts[key+1][0:]
val = k.join(parts)
- val = val.replace("{","").replace("}","")
+ val = val.replace("{","").replace("}","")
return val
def add_val(self, val): Let me know if everything is fine on your side. |
No, it does not work for me. I keep having `Eric instead of Èric Not sure if I made this clear, but for me the problem is with the browser display of the unicode json produced. |
I see. Try this |
Thanks for the answer again! I think I did not made myself clear, so here we go with all the case:
In latex {'E}ric and '{E}ric gives the same output, I think the parser here does not. |
Thanks for the details. I did a quick search on the internet about the best coding for accent. I I do not belong to this project, so this is only my own opinion. Regarding only bibtex.py, a uniq dict would be enough because we iterate over values, not keys. But, since it's a library, it can be used by elsewhere in the other way (This is the case in my project for instance). dict does not ensure the order. What do you think about this suggestion? |
While displaying the json genrated by the bibtex.py parser I got all the accents wrong (shifted in one position). For instance: `Eric instead of Èric (which correspond to change \u0301Eric to E\u0301ric)
I did a simple patch, but not sure it will work for all the cases.
In the string_subst(self, val) function change:
for
am I missing something with my solution?
The text was updated successfully, but these errors were encountered: