Dr. Doc

Dr. Doc is currently a toy but useful project to improve documentation files by identifying and correcting grammar, formatting errors, and broken links using large language models. Currently, this project uses the Argo API, which provides access to OpenAI models for Argonne researchers. Future updates will include support for other models, structured output to simplify prompts, as well as GitHub and GitLab actions for continuous integration.

Please note that the current version of gpt4o used by Argo is limited to 4096 output tokens. Therefore, the largest files you can process with Argo/gpt4o are around 15 KB.

Features

Fixes grammar, formatting, and link issues in documentation files.
Supports Markdown (.md), reStructuredText (.rst), and plain text formats.
Provides a detailed explanation of changes made to the documentation.
Optional Git integration to commit changes directly.

Requirements

Argo API credentials (ARGO_URL and ARGO_USER must be defined in the environment)
Python 3.8 or higher
requests>=2.25.0

Setup

Clone the repository and navigate to the project directory:
```
git clone <repository-url>
cd drdoc
```

Define the required environment variables for the Argo API:

export ARGO_URL=<your-argo-url>
export ARGO_USER=<your-argo-user>

(Optional) Install the package:
```
pip install -e .
```

Usage

If you have installed Dr. Doc with pip as described above, you can run it with drdoc (drdoc -h for help menu). If not, you need to run the Python script with python <path_to_drdoc>/drdoc.py.

drdoc <doc_path> [options]

or without installation:

python <path_to_drdoc>/drdoc.py <doc_path> [options]

Command Line Options

doc_path: (Required) Path to the documentation file or directory containing files to process.
--argo_url: (Optional) Argo API endpoint URL (default: value of ARGO_URL environment variable).
--argo_user: (Optional) Argo API user (default: value of ARGO_USER environment variable).
--model: (Optional) Model to use (e.g., gpt4o, gpt35; default: gpt4o).
--temperature: (Optional) Sampling temperature for the model (default: 0.1).
--top_p: (Optional) Top-p sampling for the model (default: 0.9).
--max_tokens: (Optional) Max tokens for the prompt (default: 4096).
--max_completion_tokens: (Optional) Max tokens for the completion (default: 16000).
--inplace: (Optional) Modify the original file in place instead of creating a new one.
--commit: (Optional) Commit changes to Git with the explanation as the commit message.
--format: (Optional) Format of the documentation file (md, rst, or txt; default: md).

Example Commands

Process a Markdown file:

drdoc doc/sample.md

This would create doc/sample_fixed.md.

Process all ReStructuredText documentation files (`*.rst` files) in the `doc` directory:

drdoc doc/ --format rst

Process a file and modify it in-place:

drdoc doc/sample.md --inplace

Process a file in place and commit changes (you need to run it inside the git project):

cd <your_git_repo>
drdoc README.md --inplace --commit

TODO

Add support for LangChain to use other models.
Optionally ask for confirmation for each change.
Enable using ALCF inference endpoints.
Add GitHub and GitLab actions to process documentation files for CI.
Improve the prompts and user experience with feedback.

Contributing

We welcome contributions to improve Dr. Doc! Please open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dr. Doc

Features

Requirements

Setup

Usage

Command Line Options

Example Commands

Process a Markdown file:

Process all ReStructuredText documentation files (`*.rst` files) in the `doc` directory:

Process a file and modify it in-place:

Process a file in place and commit changes (you need to run it inside the git project):

TODO

Contributing

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dr. Doc

Features

Requirements

Setup

Usage

Command Line Options

Example Commands

Process a Markdown file:

Process all ReStructuredText documentation files (*.rst files) in the doc directory:

Process a file and modify it in-place:

Process a file in place and commit changes (you need to run it inside the git project):

TODO

Contributing

License

Process all ReStructuredText documentation files (`*.rst` files) in the `doc` directory: