Simplified Searching Engine

that crawls, scraps, indexes data and stores it into a database

The program is written in Python Language, uses regex to parse HTML, and MultiThreading to go faster. The database part is assured by MongoDB The Project contains 4 files:

PersonnalParser.py:

- Contains PersonnalParser class, that gets HTML content, parses it, stores it and starts new PersonnalParser Thread for each link in the page content.

DBManager.py

- Contains DBManager class, which assure the connexion with DB and inserting and/or finding operations.

fill_database.py:

- Contains the general settings like start URL, proxy settings and depth search. The first crawl Thread starts here.

main.py

- Contains the code that gets the user search, gets the database content and sorts the results by relevance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Simplified Searching Engine

that crawls, scraps, indexes data and stores it into a database

PersonnalParser.py:

DBManager.py

fill_database.py:

main.py

Files

README.md

Latest commit

History

README.md

File metadata and controls

Simplified Searching Engine

that crawls, scraps, indexes data and stores it into a database

PersonnalParser.py:

DBManager.py

fill_database.py:

main.py