- Project Overview
- Features
- Installation
- Usage
- API Endpoints
- Configuration
- Database
- Security
- Deployment
- Contributing
- License
This project implements a foundational pipeline to integrate the Hugging Face Datasets library into a technological system. It creates a seamless pipeline for accessing and utilizing Knowledge Graph Question-Answer datasets, fostering efficient integration with Hugging Face's extensive NLP resources. The framework serves as a robust backbone for subsequent integration of specific Hugging Face datasets, enhancing capabilities in the realm of augmenting Artificial General Intelligence (AGI) research.
- Data retrieval from Hugging Face Datasets library
- Data transformation and schema mapping
- Integration with Neo4j graph database
- RESTful API for dataset access and querying
- Support for multiple datasets:
- Tree of Knowledge
- HotpotQA
- TimeQA
- Security measures including API key authentication
- Deployment on AWS infrastructure
-
Clone the repository:
git clone https://github.com/singnet/HFDLSP.git
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up the environment variables:
- Copy the
.env.example
file to.env
- Fill in the necessary environment variables in the
.env
file
-
Set up the Neo4j database (see Database section for details)
-
Run database migrations:
python manage.py migrate
To start the development server:
The API will be available at http://localhost:8000/
.
/answer/
: Get answers from the dataset/fetch_dataset/
: Fetch and insert datasets into Neo4j/schema/
: OpenAPI schema/swagger-ui/
: Swagger UI for API documentation
For detailed API documentation, visit the Swagger UI at /swagger-ui/
when the server is running.
The project uses environment variables for configuration. Key settings include:
SECRET_KEY
: Django secret keyDEBUG
: Debug mode (set to 0 for production)DJANGO_ALLOWED_HOSTS
: Allowed hosts for DjangoNEO4J_DATABASE_URL
: URL for the Neo4j databaseAPI_KEY
: API key for authentication
Refer to settings.py
for all available configuration options.
This project uses Neo4j as its primary database. Ensure you have Neo4j installed and running. Update the NEOMODEL_NEO4J_BOLT_URL
in settings.py
or set the NEO4J_DATABASE_URL
environment variable to point to your Neo4j instance.
The API is secured using API key authentication. Ensure you set the API_KEY
environment variable and include it in the Authorization
header when making requests to the API.
The project is deployed on AWS, but is designed to be deployed on any Cloud service. Key components of the deployment include:
- EC2 instances for hosting the application
- Load balancing and auto-scaling configurations
- CI/CD pipeline for automated testing and deployment
Refer to the deployment documentation for detailed instructions on setting up the AWS infrastructure.
Contributions to this project are welcome. Please follow these steps:
- Fork the repository
- Create a new branch for your feature
- Commit your changes
- Push to the branch
- Create a new Pull Request
Apache-2.0 License: http://www.apache.org/licenses/