Welcome to the Java ONNX Embedding & Retrieval-Augmented Generation (RAG) Engine! This project showcases how to integrate modern AI models with legacy Java systems using ONNX. While most AI development today happens in Python, many enterprises still rely heavily on Java ecosystems. This solution bridges that gap, allowing seamless embedding generation and document retrieval using popular transformer models. This code/library was first developed for the InfiniteStack of SciCrop and is now open-sourced as a SciCrop Academy initiative.
- Language: Java 👷♂️
- AI Framework: DJL (Deep Java Library) with ONNX Runtime support
- Purpose: Generate text embeddings using transformer models (BERT, Sentence-BERT) and build a RAG system in Java
- Why Java? Enterprises have vast Java codebases. Integrating AI in Java reduces friction, leverages existing infrastructure, and avoids rewriting core services.
- ONNX Model Support: Load and run transformer models (e.g., BERT, Sentence-BERT) converted to ONNX format.
- Embedding Generation: Create vector representations of text for downstream NLP tasks.
- Semantic Search: Compare embeddings and rank documents by similarity using cosine similarity.
- Model Agnostic Design: Easily extend to other ONNX models like RoBERTa, DistilBERT, or custom fine-tuned models.
- Modular Code Structure:
BertEmbeddingEngine
: Handles BERT-based models.MpnetEmbeddingEngine
: Supports Sentence-BERT models.EmbeddingChecker
: Ranks documents based on query similarity.QueryEngine
: Provides CLI interface to run queries.
- Java 11 or higher
- Maven for dependency management
- ONNX Models (BERT, Sentence-BERT, etc.)
# Clone the project
$ git clone https://github.com/Scicrop/javaSentenceBertEmbedding
$ cd javaSentenceBertEmbedding
Ensure Maven is installed and run:
# Build the project and download dependencies
$ mvn clean install
-
BERT Base (Uncased)
- Download from Hugging Face:
bert-base-uncased
- Convert to ONNX:
python3 -m optimum.exporters.onnx --model bert-base-uncased ./onnx_bert/
- Place
model.onnx
,vocab.txt
, andconfig.json
in/opt/infinitestack/onnx_bert/
- Download from Hugging Face:
-
Sentence-BERT (all-mpnet-base-v2)
- Download from Hugging Face:
all-mpnet-base-v2
- Convert to ONNX:
python3 -m optimum.exporters.onnx --model all-mpnet-base-v2 ./onnx_mpnet/
- Place files in
/opt/infinitestack/onnx_mpnet/
- Download from Hugging Face:
Place your text files in a directory (e.g., /tmp/sources/
). Then run:
# For BERT embeddings
$ java -jar target/BertDataEmbedd.jar /tmp/sources /tmp/embeddings/ /opt/infinitestack/onnx_bert/
# For Sentence-BERT embeddings
$ java -jar target/MpnetDataEmbedd.jar /tmp/sources /tmp/embeddings/ /opt/infinitestack/onnx_mpnet/
In the above commands embeddings will be saved as .json
files in /tmp/embeddings/
.
Run a semantic search query against your embeddings:
$ java -jar target/QueryEngine.jar "how was the computer invented?" /tmp/embeddings/
This will output the most relevant document and its similarity score.
- We started with BERT to demonstrate basic embedding capabilities.
- Later, we migrated to Sentence-BERT (MPNet) for better semantic understanding and performance in RAG systems.
- Support for Additional Models: RoBERTa, DistilBERT, custom fine-tuned models.
- Vector Database Integration: Connect to Pinecone, FAISS, or Milvus for scalable retrieval.
- REST API: Expose embedding generation and query features as APIs.
- Spring Boot Integration: Embed this system in existing Java enterprise applications.
- Fine-Tuning Support: Integrate with ONNX Runtime Training for on-the-fly model updates.
In the world of AI-driven solutions, Python dominates. But many large enterprises have mission-critical applications written in Java. Rewriting these in Python is often not feasible due to cost, security, or compliance concerns.
By enabling RAG systems and semantic search in Java:
- We bring state-of-the-art AI to legacy enterprise systems.
- Ensure robust performance using the ONNX Runtime.
- Provide a pathway for future AI integrations without disrupting existing Java codebases.
Contributions are welcome! Feel free to open issues or submit PRs to enhance functionality, support more models, or improve performance.
This project is licensed under the Apache License.
Stay ahead in the AI revolution while leveraging your trusted Java infrastructure. Let's build intelligent, scalable, and enterprise-ready solutions together! 💡💪