Skip to content

This is the original repo code developed as part of Chroma DB's Semantic Chunking paper.

License

Notifications You must be signed in to change notification settings

brandonstarxel/semantic-chunker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
“Brandon
Jun 27, 2024
999a08b · Jun 27, 2024

History

20 Commits
Jun 27, 2024
Apr 16, 2024
May 16, 2024
Apr 19, 2024
Apr 19, 2024
Jun 27, 2024
Jun 27, 2024
Apr 16, 2024
May 16, 2024
May 28, 2024
Apr 19, 2024
Apr 15, 2024
Apr 16, 2024
Jun 27, 2024
Apr 24, 2024
Apr 23, 2024
Jun 27, 2024
May 28, 2024
May 28, 2024
May 28, 2024
Jun 27, 2024
Apr 23, 2024
Jun 27, 2024
Apr 16, 2024
Apr 23, 2024
Apr 23, 2024
Apr 19, 2024
May 16, 2024

Repository files navigation

Semantic Chunking Project

Welcome to the Semantic Chunking Project, a comprehensive research initiative focused on developing and evaluating innovative methods for semantic chunking. Our goal is to enhance understanding and processing of natural language through advanced chunking techniques.

Project Structure

The project is organized into several key components, each designed to facilitate efficient research and development:

Notebooks

  • Location: notebooks/lab_book.ipynb
  • Purpose: This Jupyter notebook contains detailed records of experiments, including data analysis, algorithm testing, and results evaluation.

Codebase

  • Package: chroma_research
  • Example Usage: Refer to main.py for example implementations using our package, demonstrating how to apply our methods to your datasets.

Data

  • Location: /data
  • Details: All databases and datasets utilized in our research are stored here, ensuring easy access and reproducibility.

Documentation

  • Notes: notes.md
  • Contents: Contains references, bibliographic details, and inspiration sources that have guided our research methodology.

Technical Specifications

  • Python Version: 3.11.8

Please ensure that your development environment matches the specified Python version to avoid compatibility issues.

Getting Started

To begin using the Semantic Chunker project, clone the repository and navigate to the respective files detailed above. Installation instructions and additional documentation can be found within each component's respective files.


We encourage contributions and feedback on our project to continuously improve and push the boundaries of semantic chunking research. Feel free to fork the repository, suggest changes, or discuss your ideas with us.

About

This is the original repo code developed as part of Chroma DB's Semantic Chunking paper.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published