Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in handling Persian hashtags #10

Open
ahangarha opened this issue Mar 1, 2020 · 1 comment
Open

Issue in handling Persian hashtags #10

ahangarha opened this issue Mar 1, 2020 · 1 comment

Comments

@ahangarha
Copy link
Contributor

It seems the code cannot handle those Persian hashtags that contain ZWNJ. For example #نرم‌افزار_آزاد gets converted into افزار_آزاد

I have investigated and found that the issue is related to preprocessor package which doesn't recognize ZWNJ as a valid character in hashtag. I have reported the issue (s/preprocessor#30) and also sent a PR to the project (s/preprocessor#31)

Though solving that problem is good, but it will harm this project as it results in complete removal of hashtag. I suggest to keep hashtags in the word cloud by removing # character manually before passing it into preprocessor.

If you agree, I send PR for it

@behrouzbakhtiari
Copy link
Owner

I agree, please send it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants