You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used the following function to enhance the handling of line breaks in pdf after it converted into markdown. I hope it could be considered in the next revision, thanks!
defremove_pdf_newlines(text):
# Convert Windows-style newlines to Unix styletext=text.replace('\r\n', '\n')
# Merge lines that do not end with a period, question mark, or exclamation pointtext=re.sub(r'(?<![.!?])\n(?=[a-zA-Z])', ' ', text)
# Preserve newlines between paragraphstext=re.sub(r'\n\s*\n', '\n\n', text)
# Remove trailing whitespace characters from linestext=re.sub(r'[ \t]+$', '', text, flags=re.MULTILINE)
returntext.strip()
The text was updated successfully, but these errors were encountered:
I used the following function to enhance the handling of line breaks in pdf after it converted into markdown. I hope it could be considered in the next revision, thanks!
The text was updated successfully, but these errors were encountered: