The goal of this project is to make it easier to digest news articles and find biases within them. Our plan is to use a vector database to store articles scraped from the web and posted by users for analysis, then use a large language model (LLM) like ChatGPT to analyze the articles' biases.
Problem: There are many different reporters and organizations reporting news, and it is impossible for one person to research each one. It is also impossible for news reporting to be 100% objective since reporters have to choose what details to include in their stories and frame the story in a way that keeps readers interested. Given all this, how can a person catch the biases that inform each news report they consume?
Solution: Build a tool that will find all the articles/reports for a given story, analyze the differences between the articles, and use AI to detect the potential biases for each story. When a consumer reads a news article, they can use the tool analyze "between the lines" of the article and get a sense of how biased/objective that article may be.
This is a senior design project. Progress and Goals is a discussion of what we completed in Fall 2023, and what we plan to complete in Spring 2024.
People who read news articles, and wish for ease in handling the complexities of news intake.
- scrape all the news
- store in vector database
- analyze bias with LLM
- For this idea we'll need to scrape web data to find the publicly available articles on a specific topic to compare against the article the user wants to analyze.
- We can use a vector database to store article content. This will store the relationships/similarities between articles. We can update this database with articles users upload (that they want to analyze) and articles scraped from the web. Doing so will update existing relationships in the database. Out-of-date articles can also be updated.
- We will use a large language model (LLM, e.g. ChatGPT) to process an article and detect biases (and possibly find misinformation or opinionated judgements in the article). We may also keep track of media scores to deepen the model's understanding of what content is biased and in what direction.
- We will have API endpoints to interact with the database and return the bias analysis on a given article.
These are things we will consider but are out of the scope of our solution...for this semester, at least ;)
- prioritize a user-friendly interface
- include a search tool to find news articles from our site
- analyze sentiment of a text
- We will be accessing so much data. It's hard to parse it all correctly and find the best way to store it.
- We may not have enough information on a topic stored - this has the potential of spreading misinformation.
- how to prevent hallucinations?
- properly analyzing news sources with context
- how to know what's biased, and what ways it's biased?
- needs research
- dealing with vector databases
- Ground News
- AllSides
- The Dispatch - right-leaning
- News facts network - left-center leaning
Web scraping
LangChain and vector databases
Our documentation
https://newsdata.io/blog/news-api-python-client/
Note
If you're running a python environment, ensure you have it activated. Also be sure to have your NewsData.io API key stored in the environment variable NEWS_API_KEY
. To automate starting these, you can run source act.sh
if you have an existing virtual environment at the root named .venv
and your API key is stored in a file named .env
based on .env.example
. For more information on NewsData.io, see API Setup.
- Run
make dev_env
. - Run
make tests
. - Run
./local.sh
. - Run
make prod
. - Run the menu:
dev.sh
.
-
On macOS or Linux:
- Open your terminal.
- Run the following command to set an environment variable (replace
YOUR_API_KEY
with the actual API key):
export NEWS_API_KEY="YOUR_API_KEY"
-
On Windows:
- Press
Win + X
and select "System". - Click on "Advanced system settings" on the left.
- Click on "Environment Variables".
- Under "System variables", click "New" and enter
NEWS_API_KEY
as the variable name and your actual API key as the variable value.
- Press
-
It's important to ensure that the environment variable is set every time you run your Python script. You might want to add the export command to your shell's profile script (e.g., ~/.bash_profile or ~/.zshrc on macOS and Linux) to ensure the environment variable is set automatically whenever you open a new terminal window.
-
Future notes/plans
- For a more permanent and portable solution, you might want to consider using a configuration file or a more advanced secret management solution, especially in a production environment.
This section guides you through the process of installing MongoDB on MacOS using Homebrew and connecting to your MongoDB instance.
- MacOS with Homebrew installed.
- Terminal access.
-
Add MongoDB Repository to Homebrew: MongoDB provides a custom Homebrew tap. Adding this tap allows you to install MongoDB directly through Homebrew. Run the following command to add the MongoDB tap:
brew tap mongodb/brew
-
Install MongoDB Community Edition: Once you have tapped the MongoDB repository, you can install MongoDB Community Edition using the following command:
brew install mongodb-community
-
Start the MongoDB Service: After installation, you can start the MongoDB service. This will initiate the MongoDB server and make it ready for connections:
brew services start mongodb-community
-
Connecting to MongoDB:
- Using
mongosh
(Recommended): Newer MongoDB installations come withmongosh
, the MongoDB Shell, as the default CLI tool for interaction. You can connect to your local MongoDB instance by simply typing:mongosh
- Using
mongo
(Legacy): In older versions or if you havemongo
installed separately, you can connect using:mongo
- Using
To verify that MongoDB is running correctly, use the mongosh
or mongo
command to connect to your MongoDB instance. If you encounter any issues, ensure that MongoDB is correctly started and that there are no network or firewall configurations blocking the connection.
For detailed documentation and advanced configuration, refer to the official MongoDB documentation.