-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a Tool to Automatically Insert Semantic Breaks for Overflowing Lines #1
Comments
Some years ago I created a very crude proof-of-concept to do precisely this. It can be tested here: https://waldyrious.github.io/semantic-linebreaker/, hosted directly from the code in https://github.com/waldyrious/semantic-linebreaker/. It is very primitive as the README (and the code) attests to, and there are issues to be handled, but any help would be greatly appreciated and quickly merged. |
(Btw, I'd be happy to move the repo to this organization, if that's desired.) |
Thanks so much for sharing that, @waldyrious! I think regular expressions offer a quick and clever approximation of the kind of line-breaking behavior we're looking for. However, I don't think it's feasible to consistently express semantic boundaries with them. My current thinking is that a complete solution would probably have to apply an algorithm like Knuth-Plass or Wadler ("prettier printer"), feeding in tokens from a linguistic syntax tree. |
Are there any updates on this? |
here are some options:
|
That readable project indeed looks nice and promising! |
Agreed! For reference, the actual implementation is here. |
Does someone have enough Typescript to deactivate the rest of the formater. bobheadxi/readable#30 |
This is an itch that hasn't been scratched for a long time! Semantic line breaking is by nature an NLP problem, so I fine-tuned Bert models as token classifiers to predict line breaks (and, surprise 😮, indent levels) on my text and it works reasonably well. I have thus created a tool that uses these models to insert breaks automatically in your text. CUDA (Linux / Windows) or MPS (Mac) acceleration are supported. Currently it works well for LaTeX and plain text, other markup languages are not tested. The fine-tuned models can be found here on Hugging Face. Suggestions for improvements and contributions to features, models, or datasets are all welcomed! Feel free to explore and contribute to the project: https://github.com/admk/sembr. |
That's playing in another league! Awesome work 🤩 |
Without tooling, a specification like this one can only be descriptive, like a style guide. As a future enhancement, it would be nice to build a tool that automatically inserts line breaks at semantic boundaries for lines that extend beyond a prescribed width (e.g. 80 columns). Like a cross between
prettier
andfold
.The text was updated successfully, but these errors were encountered: