Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a Tool to Automatically Insert Semantic Breaks for Overflowing Lines #1

Open
mattt opened this issue Mar 1, 2019 · 10 comments

Comments

@mattt
Copy link
Member

mattt commented Mar 1, 2019

Without tooling, a specification like this one can only be descriptive, like a style guide. As a future enhancement, it would be nice to build a tool that automatically inserts line breaks at semantic boundaries for lines that extend beyond a prescribed width (e.g. 80 columns). Like a cross between prettier and fold.

@waldyrious
Copy link

Some years ago I created a very crude proof-of-concept to do precisely this. It can be tested here: https://waldyrious.github.io/semantic-linebreaker/, hosted directly from the code in https://github.com/waldyrious/semantic-linebreaker/.

It is very primitive as the README (and the code) attests to, and there are issues to be handled, but any help would be greatly appreciated and quickly merged.

@waldyrious
Copy link

(Btw, I'd be happy to move the repo to this organization, if that's desired.)

@mattt
Copy link
Member Author

mattt commented Mar 8, 2019

Thanks so much for sharing that, @waldyrious!

I think regular expressions offer a quick and clever approximation of the kind of line-breaking behavior we're looking for. However, I don't think it's feasible to consistently express semantic boundaries with them.

My current thinking is that a complete solution would probably have to apply an algorithm like Knuth-Plass or Wadler ("prettier printer"), feeding in tokens from a linguistic syntax tree.

@SilasK
Copy link

SilasK commented Oct 9, 2023

Are there any updates on this?

@SilasK
Copy link

SilasK commented Nov 16, 2023

here are some options:

  • Readable: A promising tool, I just don't like the included formater as it distores other elements in my markdown.

  • Obsidian Sembr (archived): Another tool that offers semantic line break support for the Obsidian note-taking app.

@silopolis
Copy link

That readable project indeed looks nice and promising!
Thanks for sharing 🙏

@waldyrious
Copy link

That readable project indeed looks nice and promising!

Agreed! For reference, the actual implementation is here.

@SilasK
Copy link

SilasK commented Nov 24, 2023

Does someone have enough Typescript to deactivate the rest of the formater. bobheadxi/readable#30

@admk
Copy link

admk commented Dec 4, 2023

This is an itch that hasn't been scratched for a long time!

Semantic line breaking is by nature an NLP problem, so I fine-tuned Bert models as token classifiers to predict line breaks (and, surprise 😮, indent levels) on my text and it works reasonably well. I have thus created a tool that uses these models to insert breaks automatically in your text. CUDA (Linux / Windows) or MPS (Mac) acceleration are supported. Currently it works well for LaTeX and plain text, other markup languages are not tested.

The fine-tuned models can be found here on Hugging Face.

Suggestions for improvements and contributions to features, models, or datasets are all welcomed! Feel free to explore and contribute to the project: https://github.com/admk/sembr.

@silopolis
Copy link

That's playing in another league!
Very very interesting project indeed... With the amount of lightweight markup content produced these days, support for markdown, asciidoc and restructuredtext would surely be fantastic!

Awesome work 🤩

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants