You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was just playing around with split_sentence and noticed that :
In [16]: split_sentence("This is a test\nAnd here's another one", "en", 25)
Out[16]: ["This is a test And here's", 'another one']
In [17]: split_sentence("This is a test.And here's another one", "en", 25)
Out[17]: ['This is a test.', "And here's another one"]
Given that I use markdown bullet points a lot, I often have line that end with no punctuation.
What do you think about automatically replacing newlines by a point if it's not already following a punctuation mark?
Also, there's no env variable to set the text length for the splitter right? I think lowering that would too reduce my VRAM need. Any opinion on this?
The text was updated successfully, but these errors were encountered:
Maybe a simple fix would be to first pass the text through pysbd instead of split_sentence. And only pass sentence that are longer than some limit to split_sentence.
I discovered pysbd trough another of your repos so am also curious about why you used it in some places but not this time.
I did have a version with pysbd instead, but found no major difference except that perhaps sentence_split was perhaps better for some languages. So why include the extra dependency? Anyways, I'm probably going to restore it after I look more deeply into this problem.
Hi,
I was just playing around with split_sentence and noticed that :
Given that I use markdown bullet points a lot, I often have line that end with no punctuation.
What do you think about automatically replacing newlines by a point if it's not already following a punctuation mark?
Also, there's no env variable to set the text length for the splitter right? I think lowering that would too reduce my VRAM need. Any opinion on this?
The text was updated successfully, but these errors were encountered: