Markdown LLM Corrector

This tool processes markdown files, correcting errors and inconsistencies using Large Language Model (LLM).

While still undergoing refinement, the tool produces relatively 'usable' results, as demonstrated in the samples section below.

LLM Model

The current version leverages the LLAMA 2-70B chat model through IBM WatsonX. The tool is built using Langchain. The design based on Langchain makes it easy to target other LLMs in the future.

Samples

Some samples of original markdown files and their automated correction are available under the samples directory.

Usage

Prerequisites

Ensure you have valid IBM Cloud API key and WatsonX project ID set as environment variables.

Set IBM_CLOUD_API_KEY: Your IBM Cloud API Key.
Set PROJECT_ID: Your Watson Machine Learning Project ID.

Ensure you have markdownlint installed on the machine running this tool. Prior to passing the content to the LLM, the markdown structure is corrected using markdownlint.

Clone the Repo

Clone the markdown-llm-corrector repo locally. In a terminal, run:

git clone https://github.com/vburckhardt/markdown-llm-corrector.git

Install Python Dependencies

To install, run the following command:

pip install -r ./requirements.txt

Usage

To use the tool, run the following command:

Run with a Single Markdown File

python main.py --input_file path/to/your/markdown_file.md

Run with a Directory of Markdown Files

python main.py --input_dir path/to/directory/containing/markdown_files

GitHub Repository

You can specify a GitHub repository using the --repo_org and --repo_name flags:

--repo_org: Specify the GitHub organization of the repository.
--repo_name: Specify the name of the GitHub repository.

The tool opens a PR with the corrected MD files.

python main.py --repo_org ORG_NAME --repo_name REPO_NAME

Options

--repo_org: Specify the GitHub organization of the repository.
--repo_name: Specify the GitHub repository name.
--input_file: Provide a path to a single input markdown file.
--input_dir: Provide a path to a directory containing markdown files.
--working_dir: Set a working directory for operations. Defaults to a randomly generated directory name.

Description

This tool leverages IBM Watson's LLM model to correct Markdown files. It can be utilized to correct a single file or a directory of files. If run on a GitHub repository, it can also create a pull request with the corrected files.

Contributing

Contributions are welcome! Please open a pull request with your proposed changes.

License

This project is licensed under the Apache 2 License.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
samples		samples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cleanup.sh		cleanup.sh
data.py		data.py
examples.csv		examples.csv
git.py		git.py
main.py		main.py
markdown.py		markdown.py
requirements.txt		requirements.txt
text_splitter.py		text_splitter.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Markdown LLM Corrector

LLM Model

Samples

Usage

Prerequisites

Clone the Repo

Install Python Dependencies

Usage

Run with a Single Markdown File

Run with a Directory of Markdown Files

GitHub Repository

Options

Description

Contributing

License

About

Releases

Packages

Contributors 3

Languages

License

vburckhardt/markdown-llm-corrector

Folders and files

Latest commit

History

Repository files navigation

Markdown LLM Corrector

LLM Model

Samples

Usage

Prerequisites

Clone the Repo

Install Python Dependencies

Usage

Run with a Single Markdown File

Run with a Directory of Markdown Files

GitHub Repository

Options

Description

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages