Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support automatic language code determination with guesslang #319

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lh0x00
Copy link

@lh0x00 lh0x00 commented Feb 10, 2025

Pull Request Description

Summary

This PR enhances _CustomMarkdownify with a feature:

  1. Automatic Code Language Detection:
    • Uses guesslang to detect programming languages in code snippets.
    • Provides a fallback if guesslang is not installed.

Changes

  • Imported Guess from guesslang with a fallback.
  • Added code_language_callback to detect code languages.

How to Test

  1. Install guesslang via pip install guesslang.
  2. Convert a document with code snippets.
  3. Verify correct language detection.

Thank you for reviewing!

@afourney
Copy link
Member

afourney commented Mar 8, 2025

Love this idea, but I'm thinking of using magika for type detection (See #1108). Since magika can do code language detection as well, would you consider updating this PR to use it instead (to avoid another dependency)?

You should be able to just do:

from magika import Magika
m = Magika()
res = m.identify_bytes(b".... contents of the code block ...")
if res.status == "ok" and res.prediction.output.group in ["text", "code"]:
    language =  res.prediction.output.label

Or something like that.

@lh0x00 lh0x00 force-pushed the feature/GuessLanguageCode branch from ff902b1 to 65b3f4a Compare March 10, 2025 03:29
@lh0x00 lh0x00 changed the base branch from main to magika March 10, 2025 03:35
@lh0x00 lh0x00 changed the base branch from magika to main March 10, 2025 03:36
@lh0x00
Copy link
Author

lh0x00 commented Mar 10, 2025

I rebase it with your magika branch, and update it as requested, it should be merged after magika. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants