Skip to content

Commit

Permalink
Merge pull request #19 from Daethyra/2.0.0-st-refactorization
Browse files Browse the repository at this point in the history
Update to 2.0.0 (Streamlit fork) | total refactorization + documentation overhaul
  • Loading branch information
Daethyra authored Feb 12, 2024
2 parents c42e747 + e45c43d commit 0a3bcb2
Show file tree
Hide file tree
Showing 5 changed files with 397 additions and 150 deletions.
84 changes: 84 additions & 0 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"

on:
#push:
#branches: [ "main", "streamlit" ]
pull_request:
branches: [ "main", "streamlit" ]
schedule:
- cron: '15 23 * * 3'

jobs:
analyze:
name: Analyze
# Runner size impacts CodeQL analysis time. To learn more, please see:
# - https://gh.io/recommended-hardware-resources-for-running-codeql
# - https://gh.io/supported-runners-and-hardware-resources
# - https://gh.io/using-larger-runners
# Consider using larger runners for possible analysis time improvements.
runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 'ubuntu-latest' }}
timeout-minutes: ${{ (matrix.language == 'swift' && 120) || 360 }}
permissions:
# required for all workflows
security-events: write

# only required for workflows in private repositories
actions: read
contents: read

strategy:
fail-fast: false
matrix:
language: [ 'python' ]
# CodeQL supports [ 'c-cpp', 'csharp', 'go', 'java-kotlin', 'javascript-typescript', 'python', 'ruby', 'swift' ]
# Use only 'java-kotlin' to analyze code written in Java, Kotlin or both
# Use only 'javascript-typescript' to analyze code written in JavaScript, TypeScript or both
# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support

steps:
- name: Checkout repository
uses: actions/checkout@v4

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.

# For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
# queries: security-extended,security-and-quality


# Autobuild attempts to build any compiled languages (C/C++, C#, Go, Java, or Swift).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v3

# ℹ️ Command-line programs to run using the OS shell.
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun

# If the Autobuild fails above, remove it and uncomment the following three lines.
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.

# - run: |
# echo "Run, Build Application using script"
# ./location_of_script_within_repo/buildscript.sh

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v3
with:
category: "/language:${{matrix.language}}"
73 changes: 68 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,78 @@
# FreeStream

## Description
With this project I aim to build a reliable chatbot for my professional friends in law and medicine. I am to create a solution that is as privacy friendly as possible. However, currently the project only offers GPT-3.5-Turbo as it is the easiest and most consistent model to use in development testing. While OpenAI doesn't train models on data sent to the API, it is worth noting that this solution isn't finished considering there is no absolutely-privacy-friendly model installed, yet.
Providing AI solutions for everyday people

***TLDR***:
- Free access to generative AI models
- Unlimited file uploads (per user-session; deletes data on exiting the web page)
- Leverage state of the art RAG techniques for accurate, helpful text generation without managing the underlying prompt-engineering
- No AI model training on your data
- No sign-up or login required

## Table of Contents

- [Quickstart](#quickstart)
- [Installation](#installation)
- [Description](#description)
- [Vocabulary](#critical-vocabulary)
- [Current Functionality](#what-can-freestream-do-for-me-currently)
- [Future Functionality Plans](#future-functionality-plans)

## Quickstart

As of version 1.0.1, a test version is hosted via Streamlit Community Cloud, [here](https://freestream.streamlit.app/ "Version 2.0.0")

### Installation

This project uses `poetry` for dependency management because that's what Streamlit Community Cloud uses to deploy the project.

To install the project's dependencies in a virtual environment using poetry, run:

```bash
poetry install
```

I wanted to build something that truly helps people, doesn't cost them any money, or require signing up for an account.
You can then start the development server with hot reloading by running:

```bash
poetry run streamlit run ./freestream/main.py
```

---

### Drawbacks:
## Description
The original inspiration for this project was to create a chatbot for friends in law and medicine, but I quickly realized the system should be flexible enough to serve in any domain.

#### -- **Critical Vocabulary** --

| **Vocab** | **Definition** |
| ---- | ---------- |
| RAG | Retrieval Augmented Generation |
| C-RAG | Corrective-Retrieval Augmented Generation |
| Self-RAG | Self-reflective Retrieval Augmented Generation |

### What can FreeStream do for me, currently?

Right now, FreeStream is basically a chatbot powered by GPT-3.5-Turbo that requires that you upload a file(s) before you interact with it. You'll take advantage of state of the art prompt-engineering logical flow that helps ensure the best results are retrieved from your uploaded files.

#### Things worth noting:
- Currently only supports the GPT-3.5-Turbo model
- The implemented `qa_chain` forces answers to be based on the context, and the context only. This makes it difficult to interact with the chat history in a meaningful way
- The implemented RAG chain forces answers to be based on the context, and the context only. This makes it difficult to interact with the chat history in a nuanced, meaningful way.

## Roadmapping Out Loud
The current focus is to overhaul the retrieval prompting by removing `ConversationalRetrievalChain`, as it hard codes the logical flow of prompting, retrieving, prompting, and responding to the user, and it also makes it difficult to get nuanced answers. The fix for this is to implement LangGraph so that I have control over "nodes" and "edges", which basically means I'll have absolute control over how the AI makes its decisions, drastically enhancing the generated responses' helpfulness and pertinence to the query.

## Future Functionality Plans

- [x] Create an RAG chatbot
- [ ] Add Gemini-Pro to the model list
- [ ] Add AI decision making
- [ ] Implement Corrective-RAG
- [ ] Turn into a Multi-Page Application (MPA)
- [ ] (Homepage) Add a welcome screen with a description and table of contents
- [ ] (Page) Migrate RAG SPA code
- [ ] (Page) Add habit tracking spreadsheet template with visualization feature
- [ ] (Page) Add a "Task Transcriber" page that transcribes the user's speech into a task outlined with details and requirements, along with action steps to enact the plan

---

Expand Down
Loading

0 comments on commit 0a3bcb2

Please sign in to comment.