Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(knowledge-base): implement course material knowledge base #7690

Draft
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

Jonaspng
Copy link

@Jonaspng Jonaspng commented Dec 9, 2024

image

This PR lays the ground work for Retreival Augmented Generation (RAG) in Coursemology by introducing a knowledge base for course material (currently only supports pdf and txt file).

Setup Note

In order for the text chunking ability to work you need to have 2 things:

  1. OpenAI API Key in env file
  2. pgvector extension installed in psql

Todos:

  • Test Cases
  • component for RAG so that it can be turned on and off in component settings
  • Frontend Suggestion: Do file type checking in frontend instead of in backend

- pgvector for psql to support vector storage and operations
- neigbor for code easier db migrations including vectors
- langchainrb and ruby-openai for LLM services
- pdf-reader for reading text from pdff
- openai api key is not complete
- initialise LLM models that will be used in code
- LANGCHAIN_OPENAI model is used for normal RAG operations
- RAGAS (Retrieval Augmented Generation Assessment) model is used for evaluation of RAG
@cysjonathan cysjonathan changed the title Jonas/course material knowledge base feat(knowledge-base): implement course material knowledge base Dec 10, 2024
@Jonaspng Jonaspng force-pushed the jonas/course-material-knowledge-base branch from d6e8da5 to 3035b4d Compare January 5, 2025 14:42
- add course_material_text_chunks which belongs to course_materials
- add course_material_text_chunkings which belongs to course_materials and trackable_jobs
- add workflow_state column to course_materials table.
- add text_chunk model to represent segments of material after chunking, including content and associated embeddings
- add text_chunking model to represent trackable text_chunking jobs
- trackable job that tracks text chunking of course material
- add create_text_chunks that create text chunks from material
- add destory_text_chunks that destroy materials's text chunks
- modify update so that if the file contents was updated (i.e new file upload) it will destroy current text chunks related to previous material
- modify destroy to ensure that material cannot be deleted while material is still undergoing text chunk job
- only course owner or manager will be allowed to manage text chunks
- add workflow state to material model
- material has_many text_chunks and has_one text_chunking
- add chunking service that handles the chunking of text and file
- add llm service that handles services provided by llm which are text embedding and getting image caption
- update material view to include workflow state
- update folder and subfolder permission view to include canManageKnowledgeBase
- create_text_chunks handle creation of text_chunks
- destroy text_chunks handle deletion of text_chunks
- switch that create or destroy course material text chunks
@Jonaspng Jonaspng force-pushed the jonas/course-material-knowledge-base branch from 3035b4d to c9985ff Compare January 5, 2025 14:56
Integrate new backend API changes by updating the UI components and related files:
- Updated `store.ts` to handle new state management for the backend changes
- Modified `operations.ts` to incorporate new API calls
- Adjusted types in `types.ts` to reflect backend schema changes
- Updated React components to work with the updated state and API logic
- add course_material_text_chunk_references table
- update course_material_text_chunks table
@Jonaspng Jonaspng force-pushed the jonas/course-material-knowledge-base branch 2 times, most recently from fba8f7f to 8895a12 Compare January 11, 2025 13:28
@Jonaspng Jonaspng force-pushed the jonas/course-material-knowledge-base branch from 8895a12 to c62a9a1 Compare January 11, 2025 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant