(Currently being developed)
This project aims to create an index by crawling web pages, extracting their text content, processing, indexing, and storing it. The text is accompanied by metadata.
- Setup and Installation
- Prerequisites
- Production Setup
- Development Setup
- Running the Project
- Configuration
Before you begin, ensure you have a running instance of Milvus.
The easiest way to start Milvus is by using Docker. You can do this by modifying the docker-compose.yml
file to include all necessary services.
To set up Milvus using Docker Compose, follow these instructions: Install Milvus Standalone with Docker Compose
This guide will walk you through the steps to configure and launch a standalone instance of Milvus efficiently.
Note:
-
Clone the repository:
git clone <repository-url> cd <repository-directory>
-
Create a
.env
file:Copy the contents of
.env_example
to a new file named.env
and update the values as needed.cp .env_example .env
-
Build and run the Docker containers:
docker-compose up --build
-
Clone the repository:
git clone <repository-url> cd <repository-directory>
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Create a
.env
file:Copy the contents of
.env_example
to a new file named.env
and update the values as needed.cp .env_example .env
-
Run the application:
python src/crawl_ai/main.py
(Currently being developed)
To run the project using Docker, use the following command:
docker-compose up
This will start the necessary services and run the application.
To run the project without Docker, ensure you have followed the development setup instructions and then run:
python src/crawl_ai/main.py
The project uses a combination of YAML and environment variables for configuration. The main configuration file is config.yaml
. You can find an example configuration in config_example.yaml
.
The .env
file contains environment-specific variables. An example .env
file is provided as .env_example
.