Skip to content

In this project I created a python script using data scraping techniques to extract HTML content data from the Trybe's blog and stored in a MongoDB database.

Notifications You must be signed in to change notification settings

Rafaqfg/web-scraping-project-Python

Repository files navigation

Web Scrapping Project

Developed by

Description

  • In this project I created a python script to scrap technologies news from the Trybe's blog .

Stack

Development: Python, Docker, pymongo, beautifulsoup4 and MongoDB.

How to run the application with Docker (you need have already docker-compose installed in your machine)

Clone the repository

  git clone [email protected]:Rafaqfg/web-scraping-project-Python.git

Enter in the project folder

  cd web-scraping-project-Python

Create and activate the virtual environment for the project

  python3 -m venv .venv && source .venv/bin/activate

install the dependencies

  python3 -m pip install -r dev-requirements.txt

📌 Note: If during the installation you received some red error message just repeat the previous step until the error message is gone.

Up the Docker containers using the compose file (door 27017 must be avaible)

  docker-compose up -d

Run the menu.py file

   python3 tech_news/menu.py

Enjoy scrapping xD


📌 Note: All scrapped website is in portuguese, therefore you need to write your searches in portuguese.

Steps of development

description finished
Create the fetch function ✔️
Create the function scrape_novidades ✔️
Create the scrape_next_page_link function ✔️
Create the scrape_noticia function ✔️
Create the get_tech_news function to get the news! ✔️
Create the function search_by_title ✔️
create the function search_by_date ✔️
Create the function search_by_tag ✔️
Create the function search_by_category ✔️
Create the function top_5_news ✔️
Create the function top_5_categories ✔️
Create the menu function ✔️
Implement the menu features ✔️

Gif of the application

About

In this project I created a python script using data scraping techniques to extract HTML content data from the Trybe's blog and stored in a MongoDB database.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published