-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
truncate_html does not respect Unicode #35
Comments
It's going to take me a little while to go describe the regex I'm afraid, but I'll take this as a bug report and try to fix it soon. If you get to it sooner, please submit a pull request! Thanks |
OK, thanks! |
I have the same problem using |
Oops. It seems this has been solved on master already 😃 Thanks for the hard work. |
Thanks for verifying @dmfrancisco :) |
Sorry @hgmnz, I should have tested this better before commenting. My tests pass for portuguese special characters but I tested the original string provided by @adamflorin and it seems to fail. Example:
In short, it seems the master branch fixes the issue for alphabets with special characters but not for unicode symbols. |
ahhh, thanks. Reopening this then |
Aha,truncate_html filt all the Chinese unicode words, this bug still exists. |
Looks like it works on master, and not work on gem? |
Is that the case? There doesn't seem any changes since 0.9.2 that would do that, but it could be accidental |
@hgmnz Yes, http://gurudigger.com/products/tuicool I use truncate_html to implement "More" on this page。 |
This is broken in version 0.9.2 of the gem. |
I confirm, broken in version 0.9.2 and works for me using master branch. What about a 0.9.3 new gem ? ;) |
This is particularly painful in HTML use-cases (i.e. truncating stuff from TinyMCE) where random spaces are dropped because the The second space is the 2 byte character Unicode for
Using 0.9.2 |
I found this library that does not drop Unicode characters. https://github.com/nono/HTML-Truncator Time for a beer! |
This is still an issue — emoji disappears 😢 |
I confirm, version 0.9.3 removes Euro (€) and UK Pound Sterling (£) symbols. |
Hi @hgmnz,
A client is running some content with Unicode characters (namely, an up arrow) through
truncate_html
and noticing that those characters are disappearing.I've narrowed it down to the
scan
inTruncateHtml::HtmlString
. However, that's a hell of a regex to read, so I was wondering if you wouldn't mind walking me through it.You can paste this code into an
.rb
file and run it to see what I mean:The result at the command line is
Thanks!
The text was updated successfully, but these errors were encountered: