Skip to content

Latest commit

 

History

History
53 lines (35 loc) · 1.37 KB

README.md

File metadata and controls

53 lines (35 loc) · 1.37 KB

Search Engine in Python

A tiny search engine in python following the guide https://www.alexmolas.com/2024/02/05/a-search-engine-in-80-lines.html

Test Status

codecov

Provide an index of links to crawl.

cat > feeds.txt <<EOF
http://bair.berkeley.edu/blog/feed.xml
http://benanne.github.io/feed.xml
https://simonwillison.net/atom/entries/
https://blog.bytebytego.com/feed
https://eli.thegreenplace.net/feeds/all.atom.xml
EOF

Crawl the feeds.txt

python crawler.py --feed-path feeds.txt

This will create a file output.parquet, which is the parquet format

Search the index

python main.py --data-path output.parquet
INFO:     Started server process [27449]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     127.0.0.1:51026 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:51041 - "GET /results/gpt HTTP/1.1" 200 OK

Screenshots