This project involves creating an AI agent that reads through a dataset (CSV or Google Sheets) and performs web searches to retrieve specific information for each entity in a chosen column. The agent leverages a large language model (LLM) to parse the web results and extract requested data, such as email addresses, company details, or other specified information. The project also includes a user-friendly dashboard where users can upload files, define search queries, and view or download the extracted results.
- Upload CSV files or connect to Google Sheets.
- Specify search queries with dynamic placeholders for entity values.
- Perform web searches and extract relevant information using LLMs.
- View extracted information in a structured format.
- Download the extracted results as a CSV.
Before setting up the project, ensure you have the following installed:
- Python 3.x
- pip (Python package installer)
- A Google Cloud Project with Sheets API enabled and a service account key for authentication (for Google Sheets integration).
-
Clone the repository:
git clone https://github.com/yourusername/ai-agent-web-search.git cd ai-agent-web-search
-
Create a virtual environment (optional but recommended):
python3 -m venv venv source venv/bin/activate # For macOS/Linux venv\Scripts\activate # For Windows
-
Install the required dependencies:
pip install -r requirements.txt
-
In the root directory of the project, create a
.env
file. -
Add the following variables to the
.env
file:SERPAPI_KEY=your_serpapi_key HUGGINGFACE_API_KEY=your_huggingface_api_key
- Replace
your_serpapi_key
with your SerpAPI key for performing web searches. - Replace
your_huggingface_api_key
with your HuggingFace API key for natural language processing.
- Replace
-
Create a
config/
folder in the root directory of your project. -
Place your Google Service Account JSON file inside the
config/
folder. You can create a Google service account by following the instructions here.GOOGLE_SERVICE_ACCOUNT_JSON=config/gcp_service_account.json
Once you've installed the dependencies and set up the .env
file and config/
folder, you can run the application using Streamlit.
-
Start the Streamlit app:
streamlit run app.py
-
The application will launch in your web browser, displaying the dashboard where you can:
- Upload CSV: Choose a CSV file with data.
- Connect Google Sheets: Provide the link to your Google Sheet.
- Select Primary Column: Choose the column from your dataset that contains the entities (e.g., company names).
- Define a Query: Enter a custom query, such as "Get the email address of {company}", where
{company}
will be replaced with each entity's name from the dataset. - Extract Information: Click "Run Search" to start the search process and display extracted information.
- Download Results: After the search completes, you can download the extracted results as a CSV file.
To connect Google Sheets:
- Ensure that your Google Sheet is shared with the link set to "Anyone with the link can view."
- Paste the link of your Google Sheet into the input field.
- The app will load data from the sheet, allowing you to select a column and query it for information.
For the application to function properly, you need to configure the following API keys:
- SerpAPI Key: This key is used for performing web searches. You can get your key by signing up on SerpAPI.
- HuggingFace API Key: This key allows you to use the HuggingFace API for natural language processing. Obtain it from HuggingFace.
- Google Service Account Key: You will need a Google service account key to authenticate with the Google Sheets API. Follow the instructions here to create and download the service account JSON key.
Once you have the API keys, add them to your .env
file as mentioned in the setup instructions.
Watch the video below to see a demonstration of how the AI agent works, performing web searches and extracting specific information from datasets.
- Introduction to the Project: Overview of the AI agent and its key features.
- How the Agent Works: Walkthrough of how it processes CSV/Google Sheets data and performs web searches.
- Custom Query Handling: See how users can define custom queries to extract specific information.
- Results Extraction: Watch the process of collecting and viewing the extracted data.
- Title: AI Agent for Web Search and Information Extraction
- Description: In this tutorial, we explain the functionalities of the AI agent, how it interacts with datasets, and extracts information using web searches and large language models.
- Duration: 3:45 minutes
- Published on: [Date]
- Link: Watch the video here
ai-agent-web-search/
├── app.py # Main application file
├── requirements.txt # List of dependencies
├── .env # Environment variables file
├── config/ # Folder for configuration files
│ └── gcp_service_account.json # Google Service Account Key
└── README.md # Project documentation