Semantic Chunking Project

Welcome to the Semantic Chunking Project, a comprehensive research initiative focused on developing and evaluating innovative methods for semantic chunking. Our goal is to enhance understanding and processing of natural language through advanced chunking techniques.

Project Structure

The project is organized into several key components, each designed to facilitate efficient research and development:

Notebooks

Location: notebooks/lab_book.ipynb
Purpose: This Jupyter notebook contains detailed records of experiments, including data analysis, algorithm testing, and results evaluation.

Codebase

Package: chroma_research
Example Usage: Refer to main.py for example implementations using our package, demonstrating how to apply our methods to your datasets.

Data

Location: /data
Details: All databases and datasets utilized in our research are stored here, ensuring easy access and reproducibility.

Documentation

Notes: notes.md
Contents: Contains references, bibliographic details, and inspiration sources that have guided our research methodology.

Technical Specifications

Python Version: 3.11.8

Please ensure that your development environment matches the specified Python version to avoid compatibility issues.

Getting Started

To begin using the Semantic Chunker project, clone the repository and navigate to the respective files detailed above. Installation instructions and additional documentation can be found within each component's respective files.

We encourage contributions and feedback on our project to continuously improve and push the boundaries of semantic chunking research. Feel free to fork the repository, suggest changes, or discuss your ideas with us.

Name	Name	Last commit message	Last commit date
Latest commit “Brandon pushed all experiment run work Jun 27, 2024 999a08b · Jun 27, 2024 History 20 Commits
__pycache__	__pycache__	pushed all experiment run work	Jun 27, 2024
chroma_research	chroma_research	First push	Apr 16, 2024
data	data	semantic chunkers	May 16, 2024
eval_questions	eval_questions	pushing workspace but not db since too large	Apr 19, 2024
figures	figures	Ran some tests with GPT3.5 and GPT4	Apr 19, 2024
full_run	full_run	pushed all experiment run work	Jun 27, 2024
notebooks	notebooks	pushed all experiment run work	Jun 27, 2024
papers_for_questions	papers_for_questions	added research papers	Apr 16, 2024
scripts	scripts	semantic chunkers	May 16, 2024
.DS_Store	.DS_Store	repo update	May 28, 2024
.gitignore	.gitignore	pushing workspace but not db since too large	Apr 19, 2024
LICENSE	LICENSE	Initial commit	Apr 15, 2024
README.md	README.md	nicer readme	Apr 16, 2024
chroma_chunkers.py	chroma_chunkers.py	pushed all experiment run work	Jun 27, 2024
example.db	example.db	added first density score metric	Apr 24, 2024
full_run.py	full_run.py	added a lab_book.md to clearly show all test results. Also investigat…	Apr 23, 2024
highlighsim.png	highlighsim.png	pushed all experiment run work	Jun 27, 2024
ioc_recall.py	ioc_recall.py	repo update	May 28, 2024
lab_book.md	lab_book.md	repo update	May 28, 2024
llama_test.py	llama_test.py	repo update	May 28, 2024
main.py	main.py	pushed all experiment run work	Jun 27, 2024
notes.md	notes.md	added a lab_book.md to clearly show all test results. Also investigat…	Apr 23, 2024
questiondup.png	questiondup.png	pushed all experiment run work	Jun 27, 2024
requirements.txt	requirements.txt	Added anthrotipc and openAI, plus ARAGOGs sample questions	Apr 16, 2024
results.md	results.md	added a lab_book.md to clearly show all test results. Also investigat…	Apr 23, 2024
run_test.py	run_test.py	added a lab_book.md to clearly show all test results. Also investigat…	Apr 23, 2024
test.py	test.py	pushing workspace but not db since too large	Apr 19, 2024
utils.py	utils.py	semantic chunkers	May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Chunking Project

Project Structure

Notebooks

Codebase

Data

Documentation

Technical Specifications

Getting Started

About

Releases

Packages

Languages

License

brandonstarxel/semantic-chunker

Folders and files

Latest commit

History

Repository files navigation

Semantic Chunking Project

Project Structure

Notebooks

Codebase

Data

Documentation

Technical Specifications

Getting Started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages