Intelgenie - Semantic blog search engine [WIP]

Motivation

To keep up with the fast moving ML/DS space, I wanted a registry of curated content that I found useful and might want to refer to in the future. The large number of blogposts and the variety of content covered makes it difficult to manually tag entries and organize them in a principled manner.

Solution

I built a web app that takes as input a curated list of sources (blogs, newsletter, websites) and queries their RSS feeds. The HTML pages retrieved are parsed to retrieve <article> tags and be formatted to the hierarchical Article and Paragraphs schemas.

The text objects are embedded using the Contextionary Weaviate module. This stores both the raw text and the vector representation. Objects can be queried using the Weaviate GraphQL API. I built a GUI for for simple "keyword" and "concept" (vector) searches.

The embedding defined by the Contextionary can be queried to retrieve nearest neighbors. Also, concepts can be added to the Contextionary by writing a text definiton. Existing concepts can be modified with new definition to tune the embedding for your use case.

Query example

For the keyword query "future", the sorted results are:

""Validating your data with Hamilton"
"Deployment for Free -- A Machine Learning Platform for Stitch Fix's Data Scientists"
"Introduction to streaming for data scientists"

For the semantic query "future", the sorted results are:

"Explosion in 2021: Our Year in Review"
"7 reasons not to join a startup and 1 reason to"
"Machine Learning Tools Landscape v2 (+84 new tools)"

For the query "future" for both keyword + semantic:

"7 reasons not to join a startup and 1 reason to"
"Enabling Creative Expression with Concept Activation Vectors"
"Towards Helpful Robots: Grounding Language in Robotic Affordances"

Contextionary definition example

Default nearest neighbor for "future":

hopes
bringing
potential

Default nearest neighbor for "mlops:

"ops"
"yongsonpa"
"pluginbenchmarkaddon"

After defining "mlops" using wikipedia passages:

developed
relies
demonstrate (That's a step in the right direction!)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
notebooks		notebooks
schema		schema
src		src
weaviate		weaviate
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intelgenie - Semantic blog search engine [WIP]

Motivation

Solution

Query example

Contextionary definition example

About

Releases

Packages

Languages

zilto/intelgenie

Folders and files

Latest commit

History

Repository files navigation

Intelgenie - Semantic blog search engine [WIP]

Motivation

Solution

Query example

Contextionary definition example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages