diff --git a/README.md b/README.md index 05cb2ee..048d239 100644 --- a/README.md +++ b/README.md @@ -11,12 +11,6 @@ ## Overview Loki is our open-source solution designed to automate the process of verifying factuality. It provides a comprehensive pipeline for dissecting long texts into individual claims, assessing their worthiness for verification, generating queries for evidence search, crawling for evidence, and ultimately verifying the claims. This tool is especially useful for journalists, researchers, and anyone interested in the factuality of information. To stay updated, please subscribe to our newsletter at [our website](https://www.librai.tech/) or join us on [Discord](https://discord.gg/NRge6RS7)! -## Components -- **Decomposer:** Breaks down extensive texts into digestible, independent claims, setting the stage for detailed analysis. -- **Checkworthy:** Assesses each claim's potential significance, filtering out vague or ambiguous statements to focus on those that truly matter. For example, vague claims like "MBZUAI has a vast campus" are considered unworthy because of the ambiguous nature of "vast." -- **Query Generator:** Transforms check-worthy claims into precise queries, ready to navigate the vast expanse of the internet in search of truth. -- **Evidence Crawler:** Ventures into the digital realm, retrieving relevant evidence that forms the foundation of informed verification. -- **ClaimVerify:** Examines the gathered evidence, determining the veracity of each claim to uphold the integrity of information. ## Quick Start @@ -49,42 +43,19 @@ You can choose to export essential api key to the environment ```bash export SERPER_API_KEY=... # this is required in evidence retrieval if serper being used export OPENAI_API_KEY=... # this is required in all tasks -export ANTHROPIC_API_KEY=... # this is required only if you want to replace openai with anthropic -export LOCAL_API_KEY=... # this is required only if you want to use local LLM -export LOCAL_API_URL=... # this is required only if you want to use local LLM ``` -Alternatively, you can save the api information in a yaml file with the same key names as the environment variables and pass the path to the yaml file as an argument to the `check_response` method. - -See `demo_data\api_config.yaml` as an example of the api configuration file. -- Example: Pass the path to the api configuration file -```bash -python -m factcheck --modal string --input "MBZUAI is the first AI university in the world" --api_config demo_data/api_config.yaml -``` - -### Test +Alternatively, you configure API keys via a YAML file, see [user guide](docs/user_guide.md) for more details. +A sample test case:

-To test the project, you can run the `factcheck.py` script: -```bash -# String -python -m factcheck --modal string --input "MBZUAI is the first AI university in the world" -# Text -python -m factcheck --modal text --input demo_data/text.txt -# Speech -python -m factcheck --modal speech --input demo_data/speech.mp3 -# Image -python -m factcheck --modal image --input demo_data/image.webp -# Video -python -m factcheck --modal video --input demo_data/video.m4v -``` - ## Usage -The main interface of the Fact-check Pipeline is located in `factcheck/core/FactCheck.py`, which contains the `check_response` method. This method integrates the complete pipeline, where each functionality is encapsulated in its class as described in the Features section. +The main interface of Loki fact-checker located in `factcheck/__init__.py`, which contains the `check_response` method. This method integrates the complete fact verification pipeline, where each functionality is encapsulated in its class as described in the Features section. + +#### Used as a Library -Example usage: ```python from factcheck import FactCheck @@ -98,35 +69,29 @@ results = factcheck_instance.check_response(text) print(results) ``` -Web app usage: +#### Used as a Web App ```bash python webapp.py --api_config demo_data/api_config.yaml ``` -

-

- - -## Customize Your Experience - -### Custom Models -```bash -python -m factcheck --modal string --input "MBZUAI is the first AI university in the world" --api_config demo_data/api_config.yaml --model claude-3-opus-20240229 --prompt claude_prompt -``` - -### Custom Evidence Retrieval -```bash -python -m factcheck --modal string --input "MBZUAI is the first AI university in the world" --api_config demo_data/test_api_config.yaml --retriever google -``` +#### Multimodal Usage -### Custom Prompts ```bash -python -m factcheck --modal string --input "MBZUAI is the first AI university in the world" --api_config demo_data/test_api_config.yaml --prompt demo_data/sample_prompt.yaml +# String +python -m factcheck --modal string --input "MBZUAI is the first AI university in the world" +# Text +python -m factcheck --modal text --input demo_data/text.txt +# Speech +python -m factcheck --modal speech --input demo_data/speech.mp3 +# Image +python -m factcheck --modal image --input demo_data/image.webp +# Video +python -m factcheck --modal video --input demo_data/video.m4v ``` -## Contributing to Loki -Welcome and thank you for your interest in the Loki project! We welcome contributions and feedback from the community. To get started, please refer to our [Contribution Guidelines](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/CONTRIBUTING.md). +#### Customize Your Experience +For advanced usage, please see our [user guide](docs/user_guide.md). ## Ready for More? @@ -166,27 +131,29 @@ Your support enables us to: [TRY NOW!](https://aip.librai.tech/login) -## Stay Connected and Informed - - -Don’t miss out on the latest updates, feature releases, and community insights! We invite you to subscribe to our newsletter and become a part of our growing community. - -💌 Subscribe now at [our website](https://www.librai.tech/)! - +### Contributing to Loki project -## License -This project is licensed under the [MIT license](LICENSE.md) - see the LICENSE file for details. +Welcome and thank you for your interest in the Loki project! We welcome contributions and feedback from the community. To get started, please refer to our [Contribution Guidelines](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/CONTRIBUTING.md). -## Acknowledgments +### Acknowledgments - Special thanks to all contributors who have helped in shaping this project. + +### Stay Connected and Informed + +Don’t miss out on the latest updates, feature releases, and community insights! We invite you to subscribe to our newsletter and become a part of our growing community. + +💌 Subscribe now at [our website](https://www.librai.tech/)! + + + ## Star History -[![Star History Chart](https://api.star-history.com/svg?repos=Libr-AI/OpenFactVerification&type=Date)](https://star-history.com/#Libr-AI/OpenFactVerification&Date) +> [![Star History Chart](https://api.star-history.com/svg?repos=Libr-AI/OpenFactVerification&type=Date)](https://star-history.com/#Libr-AI/OpenFactVerification&Date) ## Cite as ``` diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md deleted file mode 100644 index 7c5c5a0..0000000 --- a/docs/CONTRIBUTING.md +++ /dev/null @@ -1,42 +0,0 @@ -# Contribute to Loki - -Welcome and thank you for your interest in the Loki project! We welcome contributions and feedback from the community. This document outlines the process for contributing to the project. - -## How to Contribute - -We recommend a few best practices to make your contributions or reported errors easier to assist with. - -### For Pull Requests - -* PRs should be titled descriptively, and be opened with a brief description of the scope and intent of the new contribution. -* New features should have appropriate documentation added alongside them. -* Aim for code maintainability, and minimize code copying. - -### For Feature Requests - -* Provide a short paragraph's worth of description. What is the feature you are requesting? What is its motivation, and an example use case of it? How does this differ from what is currently supported? - -### For Bug Reports - -* Provide a short description of the bug. -* Provide a reproducible example--what is the command you run with our library that results in this error? Have you tried any other steps to resolve it? -* Provide a full error traceback of the error that occurs, if applicable. A one-line error message or small screenshot snippet is unhelpful without the surrounding context. -* Note what version of the codebase you are using, and any specifics of your environment and setup that may be relevant. - -## Code Style - -Loki uses [black](https://github.com/psf/black) and [flake8](https://pypi.org/project/flake8/) to enforce code style, via [pre-commit](https://pre-commit.com/). Before submitting a pull request, please run the following commands to ensure your code is properly formatted: - -```bash -pip install pre-commit -pre-commit install -``` - -## How Can I Get Involved? - -There are a number of distinct ways to contribute to Loki: - -* Implement new features or fix bugs by submitting a pull request: If you want to use a new model or retriever, or if you have an idea for a new feature, we would love to see your contributions. -* We have our [development plan](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/development_plan.md) that outlines the roadmap for the project. If you are interested in contributing to any of the tasks, please join our [Discord](https://discord.gg/NRge6RS7) and direct message to @Haonan Li. - -We hope you find this project interesting and would like to contribute to it. If you have any questions, please feel free to reach out to us on our [Discord](https://discord.gg/NRge6RS7). diff --git a/docs/DEVELOPMENT_PLAN.md b/docs/DEVELOPMENT_PLAN.md deleted file mode 100644 index 7ca0702..0000000 --- a/docs/DEVELOPMENT_PLAN.md +++ /dev/null @@ -1,29 +0,0 @@ -## Development Plan - -As Loki continues to evolve, our development plan focuses on broadening capabilities and enhancing flexibility to meet the diverse needs of our users. Here are the key areas we are working on: - -## 1. Support for Multiple Models -- **Broader Model Compatibility:** - - Integration with leading AI models besides ChatGPT and Claude to diversify fact-checking capabilities, including Command R and Gemini. - - Implementation of self-hosted model options for enhanced privacy and control, e.g., FastChat, TGI, and vLLM. - -## 2. Model-specific Prompt Engineering -- **Unit Testing for Prompts:** - - Develop robust unit tests for each step to ensure prompt reliability and accuracy across different scenarios. - -## 3. Expanded Search Engine Support -- **Diverse Search Engines:** - - Incorporate a variety of search engines including Bing, scraperapi to broaden search capabilities. - - Integration with [Searxng](https://github.com/searxng/searxng), an open-source metasearch engine. - - Support for specialized indexes like LlamaIndex and Langchain, and the ability to search local documents. - -## 4. Deployment and Scalability -- **Dockerization:** - - Packaging Loki into Docker containers to simplify deployment and scale-up operations, ensuring Loki can be easily set up and maintained across different environments. - -## 5. Multi-language Support -- **Language Expansion:** - - Support for additional languages beyond English, including Chinese, Arabic, etc, to cater to a global user base. - - -We are committed to these enhancements to make Loki not just more powerful, but also more adaptable to the needs of a global user base. Stay tuned as we roll out these exciting developments! diff --git a/docs/README.md b/docs/README.md index 7f2fd71..123b638 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,8 +2,47 @@ Welcome to the OpenFactVerification (Loki) documentation! This repository contains the codebase for the Loki project, which is a fact-checking pipeline that leverages state-of-the-art language models to verify the veracity of textual claims. The pipeline is designed to be modular, allowing users to easily customize the evidence retrieval, language model, and prompt used in the fact-checking process. -## Table of Contents +## Related Documents -* To learn about how to use the Loki pipeline, please refer to the [User Guide](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/user_guide.md). +* For users who want to try advanced features, please refer to the [User Guide](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/user_guide.md). -* To learn how to add a new language model support, new search engine support, or new prompt support, please refer to the [Development Guide](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/development_guide.md). +* For developers who want to contribute to the project, please go to the [How-to-contribute](#how-to-contribute) section, and also [Development Guide](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/development_guide.md). + + +## How to Contribute +We welcome contributions and feedback from the community and recommend a few best practices to make your contributions or reported errors easier to assist with. + +### For Pull Requests + +* PRs should be titled descriptively, and be opened with a brief description of the scope and intent of the new contribution. +* New features should have appropriate documentation added alongside them. +* Aim for code maintainability, and minimize code copying. + +### For Feature Requests + +* Provide a short paragraph's worth of description. What is the feature you are requesting? What is its motivation, and an example use case of it? How does this differ from what is currently supported? + +### For Bug Reports + +* Provide a short description of the bug. +* Provide a reproducible example--what is the command you run with our library that results in this error? Have you tried any other steps to resolve it? +* Provide a full error traceback of the error that occurs, if applicable. A one-line error message or small screenshot snippet is unhelpful without the surrounding context. +* Note what version of the codebase you are using, and any specifics of your environment and setup that may be relevant. + +## Code Style + +Loki uses [black](https://github.com/psf/black) and [flake8](https://pypi.org/project/flake8/) to enforce code style, via [pre-commit](https://pre-commit.com/). Before submitting a pull request, please run the following commands to ensure your code is properly formatted: + +```bash +pip install pre-commit +pre-commit install +``` + +## How Can I Get Involved? + +There are a number of distinct ways to contribute to Loki: + +* Implement new features or fix bugs by submitting a pull request: If you want to use a new model or retriever, or if you have an idea for a new feature, we would love to see your contributions. +* We have our [development plan](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/development_plan.md) that outlines the roadmap for the project. If you are interested in contributing to any of the tasks, please join our [Discord](https://discord.gg/NRge6RS7) and direct message to @Haonan Li. + +We hope you find this project interesting and would like to contribute to it. If you have any questions, please feel free to reach out to us on our [Discord](https://discord.gg/NRge6RS7). diff --git a/docs/development_guide.md b/docs/development_guide.md index 5f5b3d6..4ac0d0c 100644 --- a/docs/development_guide.md +++ b/docs/development_guide.md @@ -1,8 +1,13 @@ -# Loki Development Guide +# Development Guide This documentation page provides a guide for developers to want to contribute to the Loki project, for versions v0.0.2 and later. -## Loki Framework Introduction +- [Development Guide](#development-guide) + - [Framework Introduction](#framework-introduction) + - [Development Plan](#development-plan) + + +## Framework Introduction Loki leverage state-of-the-art language models to verify the veracity of textual claims. The pipeline is designed to be modular in `factcheck/core/`, which include the following components: @@ -16,8 +21,7 @@ To support each component's functionality, Loki relies on the following utils: - **Language Model:** Currently, 4 out of 5 components (including: Decomposer, Checkworthy, Query Generator, and ClaimVerify) use the language model (LLMs) to perform their tasks. The supported LLMs are defined in `factcheck/core/utils/llmclient/` and can be easily extended to support more LLMs. - **Prompt:** The prompt is a crucial part of the LLMs, and is usually optimized for each LLM to achieve the best performance. The prompt is defined in `factcheck/core/utils/prompt/` and can be easily extended to support more prompts. - -## New LLM Support +### Support a New LLM Client A new LLM should be defined in `factcheck/core/utils/llmclient/` and should be a subclass of `BaseClient` from `factcheck/core/utils/llmclient/base.py`. The LLM should implement the `_call` method, which take a single string input and return a string output. @@ -27,19 +31,49 @@ A new LLM should be defined in `factcheck/core/utils/llmclient/` and should be a We find that ChatGPT [json_mode](https://platform.openai.com/docs/guides/text-generation/json-mode) is a good choice for the LLM, as it can generate structured output. To support a new LLM, you may need to implement a post-processing to convert the output of the LLM to a structured format. -## New Search Engine (Retriever) Support +### Support a New Search Engine (Retriever) Evidence retriever should be defined in `factcheck/core/Retriever/` and should be a subclass of `EvidenceRetriever` from `factcheck/core/Retriever/base.py`. The retriever should implement the `retrieve_evidence` method. -## New Language Support +### Support a New Language To support a new language, you need to create a new file in `factcheck/utils/prompt/` with the name `_prompt_.py`. For example, to create a prompt suite for ChatGPT in Chinese, you can create a file named `chatgpt_prompt_zh.py`. The prompt file should contains a class which is a subclass of `BasePrompt` from `factcheck/core/utils/prompt/base.py`, and been registered in `factcheck/utils/prompt/__init__.py`. -## Prompt Optimization +### Prompt Optimization To optimize the prompt for a specific LLM, you can modify the prompt in `factcheck/utils/prompt/`. We will release a minimal test suite to evaluate the performance of the prompt in the future. -## Others + + +## Development Plan + +As Loki continues to evolve, our development plan focuses on broadening capabilities and enhancing flexibility to meet the diverse needs of our users. Here are the key areas we are working on: + +### 1. Support for Multiple Models +- **Broader Model Compatibility:** + - Integration with leading AI models besides ChatGPT and Claude to diversify fact-checking capabilities, including Command R and Gemini. + - Implementation of self-hosted model options for enhanced privacy and control, e.g., FastChat, TGI, and vLLM. + +### 2. Model-specific Prompt Engineering +- **Unit Testing for Prompts:** + - Develop robust unit tests for each step to ensure prompt reliability and accuracy across different scenarios. + +### 3. Expanded Search Engine Support +- **Diverse Search Engines:** + - Incorporate a variety of search engines including Bing, scraperapi to broaden search capabilities. + - Integration with [Searxng](https://github.com/searxng/searxng), an open-source metasearch engine. + - Support for specialized indexes like LlamaIndex and Langchain, and the ability to search local documents. + +### 4. Deployment and Scalability +- **Dockerization:** + - Packaging Loki into Docker containers to simplify deployment and scale-up operations, ensuring Loki can be easily set up and maintained across different environments. + +### 5. Multi-language Support +- **Language Expansion:** + - Support for additional languages beyond English, including Chinese, Arabic, etc, to cater to a global user base. + + +We are committed to these enhancements to make Loki not just more powerful, but also more adaptable to the needs of a global user base. Stay tuned as we roll out these exciting developments! diff --git a/docs/user_guide.md b/docs/user_guide.md index 7006336..6412265 100644 --- a/docs/user_guide.md +++ b/docs/user_guide.md @@ -118,13 +118,15 @@ text = "Your text here" results = factcheck_instance.check_response(text) print(results) ``` - ### Used as a Web App ```bash python webapp.py --api_config demo_data/api_config.yaml ``` +

+

+ ## Advanced Features ### Multimodality