Skip to content

Commit

Permalink
uhhh
Browse files Browse the repository at this point in the history
  • Loading branch information
rmusser01 committed Dec 18, 2024
1 parent edf8558 commit 97df047
Show file tree
Hide file tree
Showing 18 changed files with 579 additions and 625 deletions.
2 changes: 2 additions & 0 deletions App_Function_Libraries/Gradio_UI/Video_transcription_tab.py
Original file line number Diff line number Diff line change
Expand Up @@ -956,6 +956,8 @@ def process_url_with_metadata(input_item, num_speakers, whisper_model, custom_pr
def toggle_confabulation_output(checkbox_value):
return gr.update(visible=checkbox_value)



confab_checkbox.change(
fn=toggle_confabulation_output,
inputs=[confab_checkbox],
Expand Down
556 changes: 0 additions & 556 deletions App_Function_Libraries/RAG/RAG_Examples.md

This file was deleted.

342 changes: 342 additions & 0 deletions App_Function_Libraries/Web_Scraping/Search_Prompt.py

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion Docs/Design/Coding_Page.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ https://github.com/simonw/files-to-prompt
https://github.com/yamadashy/repomix/tree/main
https://github.com/chanhx/crabviz


https://github.com/charmandercha/ArchiDoc
https://pythontutor.com/c.html#mode=edit
https://pythontutor.com/articles/c-cpp-visualizer.html

Expand Down
2 changes: 1 addition & 1 deletion Docs/Design/Creative_Writing.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

### Link Dump:
https://github.com/p-e-w/arrows

https://huggingface.co/jukofyork/creative-writing-control-vectors-v3.0



Expand Down
5 changes: 5 additions & 0 deletions Docs/Design/DB_Design.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@
- [Interesting/Relevant Later](#interesting-relevant-later)



SQLite
https://highperformancesqlite.com/watch/dot-commands
https://www.youtube.com/watch?v=XP-h304N06I

Migrating to sqlite-vec
https://www.youtube.com/live/xmdiwdom6Vk?t=1740s
https://alexgarcia.xyz/blog/2024/sqlite-vec-metadata-release/index.html
Expand Down
42 changes: 22 additions & 20 deletions Docs/Design/ETL_Pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,39 +8,41 @@ This page serves as documentation regarding the ETL pipelines within tldw and pr
## ETL Pipelines

### Data Sources
-
- **Audio**
- f
- faster_whisper
- pyaudio
- **Ebooks (epub)**
- f
- ebooklib
- **PDFs**
- Docling
- pymupdf4llm
- **Plain Text(`.md`, `.txt`)**
- f
- **Podcasts**
- f
- stdlib
- **PowerPoint Presentations** - need to add
- docling
- **Rich Text(`.rtf`, `.docx`)**
- f
- doc2txt
- pypandoc
- **RSS Feeds**:
- f
- **Videos**
- f
- **Websites**:
- f
- playwright
- bs4
- requests
- **XML Files**
- f'



- xml.etree.ElementTree
- **3rd-Party Services**
- Sharepoint
* https://llamahub.ai/l/readers/llama-index-readers-microsoft-sharepoint
*

### Tools
https://github.com/ucbepic/docetl
https://ucbepic.github.io/docetl/concepts/optimization/

### Links
https://arxiv.org/html/2410.21169

### Link Dump:
https://arxiv.org/abs/2410.12189
https://ucbepic.github.io/docetl/concepts/optimization/
https://arxiv.org/abs/2410.21169
https://towardsdatascience.com/etl-pipelines-in-python-best-practices-and-techniques-0c148452cc68
https://arxiv.org/html/2410.21169v2
https://github.com/whyhow-ai/knowledge-table
https://github.com/yobix-ai/extractous
https://llamahub.ai/l/readers/llama-index-readers-microsoft-sharepoint
6 changes: 6 additions & 0 deletions Docs/Design/Prompts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Prompts & Prompt Engineering

### Link Dump:
https://github.com/PySpur-Dev/PySpur
https://github.com/itsPreto/tangent
https://github.com/LouisShark/chatgpt_system_prompt
59 changes: 59 additions & 0 deletions Docs/Design/Researcher.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,65 @@ https://arxiv.org/abs/2411.15114
https://journaliststudio.google.com/pinpoint/about/
https://blog.google/products/gemini/google-gemini-deep-research/
https://github.com/neuml/annotateai
https://pub.towardsai.net/learn-anything-with-ai-and-the-feynman-technique-00a33f6a02bc
https://help.openalex.org/hc/en-us/articles/24396686889751-About-us
https://www.ginkgonotes.com/
https://www.reddit.com/r/Anki/comments/17u01ge/spaced_repetition_algorithm_a_threeday_journey/
https://github.com/open-spaced-repetition/fsrs4anki/wiki/Spaced-Repetition-Algorithm:-A-Three%E2%80%90Day-Journey-from-Novice-to-Expert#day-3-the-latest-progress


### Ideas

Follow gptresearchers method at first, planner LLM -> query LLM -> analyzer LLM -> summarizer LLM



Researcher config section
```
[researcher]
# Researcher settings
default_search_engine = google
# Options are: google, bing, yandex, baidu, searx, kagi, serper, tavily
default_search_type = web
# Options are: web, local, both
default_search_language = en
# Options are: FIXME
default_search_report_language = en
# Options are: FIXME
default_search_sort = relevance
# Options are: relevance, date
default_search_safe_search = moderate
# Options are: off, moderate, strict
default_search_planner = openai-o1-full
# Options are: FIXME
default_search_planner_max_tokens = 8192
default_search_analyzer = openai-o1-full
# Options are: FIXME
default_search_analyzer_max_tokens = 8192
default_search_summarization = openai-o1-full
# Options are: FIXME
default_search_summarization_max_tokens = 8192
search_max_results = 100
search_report_format = markdown
# Options are: markdown, html, pdf
search_max_iterations = 5
search_max_subtopics = 4
search_custom_user_agent = "CUSTOM_USER_AGENT_HERE"
```





### Researcher Workflow



### Researcher Prompts







Expand Down
62 changes: 42 additions & 20 deletions Docs/Design/Structured_Outputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,27 +16,49 @@ This page serves as documentation regarding the structured outputs within tldw a
- .toml file
-
2. Data Extraction
- https://github.com/yobix-ai/extractous
- Can use structured outputs for data extraction from unstructured text. Though why isn't this talked about/even mentioned in any of the papers about RAG or writeups on RAG implementations? hmmmm......
3. Data Generation
- Can use structured outputs for data generation from unstructured text.
- Could come in handy for RPGs/Text-based games reliant on world building/lore generation.


### Implementation
- Integration for file creation
- Look at using for ETL pipeline
- Support/integration for content creation pipelines for RPG campaigns, etc.


Process
https://python.plainenglish.io/generating-perfectly-structured-json-using-llms-all-the-time-13b7eb504240

Tools
https://python.useinstructor.com/
https://github.com/mlc-ai/xgrammar
https://github.com/guidance-ai/guidance
https://github.com/boundaryml/baml
https://docs.pydantic.dev/latest/
https://github.com/outlines-dev/outlines
https://github.com/Dan-wanna-M/formatron/tree/master
https://github.com/whyhow-ai/knowledge-table

Examples
https://github.com/dottxt-ai/demos/tree/main/lore-generator
https://github.com/dottxt-ai/demos/tree/main/logs
https://github.com/dottxt-ai/demos/tree/main/earnings-reports
https://github.com/dottxt-ai/demos/tree/main/its-a-smol-world
https://github.com/dottxt-ai/cursed/tree/main/scp

### Link Dump:
https://github.com/yobix-ai/extractous
https://llamahub.ai/l/readers/llama-index-readers-microsoft-sharepoint
https://blog.dottxt.co/say-what-you-mean.html
https://github.com/dottxt-ai/demos/tree/main/lore-generator
https://github.com/dottxt-ai/cursed/tree/main/scp
https://python.useinstructor.com/
https://github.com/mlc-ai/xgrammar
https://github.com/guidance-ai/guidance
https://blog.dottxt.co/coalescence.html
https://arxiv.org/html/2408.02442v1
https://www.boundaryml.com/blog/sota-function-calling
https://arxiv.org/abs/2408.02442
https://towardsdatascience.com/enforcing-json-outputs-in-commercial-llms-3db590b9b3c8
https://python.plainenglish.io/generating-perfectly-structured-json-using-llms-all-the-time-13b7eb504240
https://docs.pydantic.dev/latest/
https://github.com/outlines-dev/outlines[
https://github.com/Dan-wanna-M/formatron/tree/master
https://blog.dottxt.co/selective-multiplication.html
https://blog.dottxt.co/say-what-you-mean.html

Reliability/Quality of:
https://dylancastillo.co/posts/say-what-you-mean-sometimes.html
https://blog.dottxt.co/say-what-you-mean.html

Papers
https://arxiv.org/html/2408.02442v1 - Structured Outputs harms reasoning capabilities


Gemini
https://ai.google.dev/gemini-api/docs/structured-output?lang=python

### Link Dump:
15 changes: 15 additions & 0 deletions Docs/Design/TTS_STT.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,21 @@ Flow:

### Link Dump:
https://github.com/albirrkarim/react-speech-highlight-demo
https://funaudiollm.github.io/cosyvoice2/
https://funaudiollm.github.io/cosyvoice2/
https://github.com/InternLM/InternLM-XComposer/tree/main/InternLM-XComposer-2.5-OmniLive



Gemini
https://ai.google.dev/gemini-api/docs#rest
https://ai.google.dev/gemini-api/docs/models/gemini-v2

ElevenLabs
https://github.com/elevenlabs/elevenlabs-examples/blob/main/examples/text-to-speech/python/text_to_speech_file.py
https://elevenlabs.io/docs/api-reference/text-to-speech
https://elevenlabs.io/docs/developer-guides/how-to-use-tts-with-streaming

Models
https://huggingface.co/NexaAIDev/Qwen2-Audio-7B-GGUF

Expand Down
5 changes: 5 additions & 0 deletions Docs/Design/UX.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,11 @@ https://github.com/Vali-98/ChatterUI
https://github.com/Rugz007/liha
https://astro.new/latest/
https://github.com/lobehub/lobe-chat
https://www.scoutos.com/
https://github.com/FishiaT/yawullm
https://ilikeinterfaces.com/2015/03/09/map-ui-ghost-in-the-shell/
https://jdan.github.io/98.css
https://github.com/vercel/ai-chatbot
https://www.nngroup.com/videos/the-danger-of-defaults/
https://writings.stephenwolfram.com/2024/12/useful-to-the-point-of-being-revolutionary-introducing-wolfram-notebook-assistant/
https://en.wikipedia.org/wiki/Template:Google_payment_apps
Expand Down
5 changes: 4 additions & 1 deletion Docs/Design/VLMs.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ https://huggingface.co/Qwen/Qwen2-VL-2B
https://www.reddit.com/r/StableDiffusion/comments/1h7hunp/how_to_run_hunyuanvideo_on_a_single_24gb_vram_card/
https://arxiv.org/abs/2412.05185
https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c

https://huggingface.co/Infinigence/Megrez-3B-Omni
https://huggingface.co/papers/2412.07626
https://huggingface.co/AI-Safeguard/Ivy-VL-llava
https://github.com/matatonic/openedai-vision
Expand All @@ -32,3 +32,6 @@ https://arxiv.org/abs/2409.17146



https://lyra-omni.github.io/
https://apollo-lmms.github.io/
https://huggingface.co/Apollo-LMMs
57 changes: 35 additions & 22 deletions Docs/Design/WebSearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,32 +3,16 @@
## Introduction
This page serves as documentation regarding the web search functionality within tldw and provides context/justification for the decisions made within the module.

## Web Search


### Link Dump:
https://github.com/appvoid/search
https://github.com/felladrin/MiniSearch
https://github.com/TheBlewish/Web-LLM-Assistant-Llamacpp-Ollama
https://github.com/pengfeng/ask.py
https://cookbook.openai.com/examples/third_party/web_search_with_google_api_bring_your_own_browser_tool
https://developers.google.com/custom-search/v1/overview
https://www.ignorance.ai/p/how-to-build-an-ai-search-engine-83b?publication_id=1407539
https://www.ignorance.ai/p/how-to-build-an-ai-search-engine
https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/document_loaders/brave_search.py
Could instantiate a browser, perform a search with X engine, and then parse the results.
https://github.com/YassKhazzan/openperplex_backend_os
https://github.com/InternLM/MindSearch
https://github.com/developersdigest/llm-answer-engine



### Search Engines
- **Google Search**
- [Google Search API]( FIXME )
-
- [Google Search API](https://developers.google.com/custom-search/v1/overview)
- Setup:
- Setup a `Programmable Search Engine`
- Get the `API Key`
- 100 Search queries per day for free
- **Bing Search**
- [Bing Search API]( FIXME )
- [Bing Search API](https://www.microsoft.com/en-us/bing/apis/bing-web-search-api)
-
- **Yandex Search**
- [Yandex Search API](https://yandex.com/dev/search/)
Expand All @@ -39,3 +23,32 @@ https://github.com/developersdigest/llm-answer-engine
- **Searx Search**
- [Searx Search API](https://searx.github.io/searx/)
-
- **Kagi**
- [Kagi Search API](https://help.kagi.com/kagi/api/search.html)



### Implementaiton
- **Text Search Workflow**
1. User inputs a search query
2. User selects a search engine (Option for default search engine in config file)
3. The user presses 'Search'
4. The search query is passed to the selected search engine
5. The appropriate search engine is used to perform a search via API call
6. The search results are returned from the search engine's API
7. Search engine results are then _MODIFIED_ (if necessary/enabled) to fit the user's preferences
- This could include re-ranking, summarization/analysis, or other modifications
8. The (modified) search results are displayed to the user
9. Results are then displayed to the user,
- either as titles of pages with dropdown for all info,
- or as a list of links with a briefing/summary of each link
- or as a single briefing/summary of all results
10. User may then select to save this resulting text to the DB as a plaintext entry, with metadata containing the search query, search engine, and any other relevant information
11. Search results are then saved to the DB as a plaintext entry, with metadata containing the search query, search engine, and any other relevant information
12. This is then searchable via the Media DB


### Link Dump:
https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/document_loaders/brave_search.py
Could instantiate a browser, perform a search with X engine, and then parse the results.

3 changes: 2 additions & 1 deletion Docs/Handy_Dandy_Papers.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,11 @@ https://arxiv.org/abs/2412.05265
https://arxiv.org/abs/2411.19865
https://arxiv.org/abs/2412.06769
- https://arxiv.org/abs/2412.01113

https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute

### Test-Time Compute
- https://github.com/GAIR-NLP/O1-Journey
- https://arxiv.org/abs/2408.03314


### Personalization
Expand Down
2 changes: 2 additions & 0 deletions Docs/Issues/Citations_and_Confabulations.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@


https://arxiv.org/abs/2412.04235
https://arxiv.org/abs/2412.11536
https://github.com/sunnynexus/RetroLLM

RAG
https://www.lycee.ai/blog/rag-ragallucinations-and-how-to-fight-them
Expand Down
Loading

0 comments on commit 97df047

Please sign in to comment.