Do you like this project? We love getting a star ⭐ and shout-out 🗣️from you in return! 🤗
AquilaDB is a vector database to store Feature Vectors along with JSON Document Metadata. Do k-NN retrieval from anywhere, even from the darkest rifts of Aquila (in progress). It is dead simple to set up, language agnostic and drop in addition for your Machine Learning Applications. AquilaDB, as of current features is ready solution for Machine Learning engineers and Data scientists to build Neural Information Retrieval applications out of the box with minimal dependencies (visit wiki page for use case examples).
AquilaDB 1.0 release is a distant goal to achieve. Visit contribute section below to see detailed development plan and milestones.
We make sure that each release
and AquilaDB Master branch
are stable with all features planned up to date. All new pull requests are made to develop
branch. So, develop
is the default and bleeding edge branch with all the latest updates.
Github, Docker Hub, Documentation (dedicated Wiki page)
- If you are working on a data science project and need to store a hell lot of data and retrieve similar data based on some feature vector, this will be a useful tool to you, with extra benefits a real world web application needs.
- Are you dealing with a lot of images and related metadata? Want to find the similar ones? You are at the right place.
- If you are looking for a document database, this is not the right place for you.
AquilaDB is not built from scratch. Thanks to OSS community, it is based on a couple of cool open source projects out there. We took a couch and added some wheels and jetpacks to make it a super cool butt rest for Data Science Engineers. While CouchDB provides us network and scalability benefits, FAISS and Annoy provides superfast similarity search. Along with our peer management service, AquilaDB provides a unique solution.
You need docker
installed.
AquilaDB is quick to setup and run as docker a container. All you need to do is either build it from source or pull it from Docker hub.
- clone this repository
- build image:
docker build -t ammaorg/aquiladb:latest .
- pull image:
docker pull ammaorg/aquiladb:latest
- deploy:
docker run -d -i -p 50051:50051 -v "<local data persist directory>:/data" -t ammaorg/aquiladb:latest
We currently have multiple client libraries in progress to abstract the communication between deployed AquilaDB and your applications.
AquilaDB exposes gRPC APIs for the clients. Which means, you can communicate directly to AquilaDB from your favourite language (API reference). Above clients makes use of that to abstract the communication details from end user. If you are familiar with gRPC and would like to contribute a new client library in any other language, please let us know. Protocol buffers API reference. Example usage of APIs in node js.
For benchmark results, visit https://aquiladb.xyz/docs/adb-benchmarks
This project is still under active development (pre-release). It can be used as a standalone database now. Peer manager is a work in progress, so networking capabilities are not available now. With release v1.0 we will release pre-optimized version of AquilaDB.
We have prepared a document to get anyone interested to contribute, immediately started with AquilaDB.
Here is our high level release roadmap.
We have started meeting developers and do small talks on AquilaDB. Here are the slides that we use on those occasions: http://bit.ly/AquilaDB-slides
Video:
As of current AquilaDB release features, you can build Neural Information Retrieval applications out of the box without any external dependencies. Here are some useful links to learn more about it and start building:
- These use case examples will give you an understanding of what is possible and what not: https://github.com/a-mma/AquilaDB/wiki
- Microsoft published a paper and youtube video on this to onboard anyone interested:
- Embeddings for Everything: Search in the Neural Network Era: https://www.youtube.com/watch?v=JGHVJXP9NHw
- Autoencoders are one such deep learning algorithms that will help you to build semantic vectors - foundation for Neural Information retrieval. Here are some links to Autoencoders based IR:
- Note that, the idea of information retrieval applies not only to text data but for any data. All you need to do is, encode any source datatype to a dense vector with deep neural networks.
Apache License 2.0 license file
created with ❤️ a-mma.indic (a_മ്മ)