Skip to content

Commit

Permalink
search parsing + building the chain & prompts
Browse files Browse the repository at this point in the history
  • Loading branch information
rmusser01 committed Dec 28, 2024
1 parent 2f48ffc commit cb63eb9
Show file tree
Hide file tree
Showing 9 changed files with 1,401 additions and 678 deletions.
616 changes: 616 additions & 0 deletions App_Function_Libraries/Web_Scraping/Search_Prompt.md

Large diffs are not rendered by default.

342 changes: 0 additions & 342 deletions App_Function_Libraries/Web_Scraping/Search_Prompt.py

This file was deleted.

580 changes: 537 additions & 43 deletions App_Function_Libraries/Web_Scraping/WebSearch_APIs.py

Large diffs are not rendered by default.

21 changes: 21 additions & 0 deletions Docs/Design/Education.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,24 @@ https://arxiv.org/abs/2411.07407
https://arxiv.org/abs/2412.16429
https://huggingface.co/papers/2412.15443






one2manny
Today at 12:43 AM
A great way to make studying more efficient and convenient is to take a digital PDF textbook, split it into separate files for each chapter, and organize them individually.
I then create a dedicated notebook for each chapter, treating it as a focused single source.
From there, I convert each chapter into an audio format, like a podcast.
This approach makes it easy to study while commuting, relaxing in bed with your eyes closed, or at any time when reading isn’t practical.

I also recommend creating a study guide for each chapter, fully breaking down key concepts and definitions.
For more complex topics, the “explain like I’m 5” method works wonders—it simplifies challenging ideas into digestible explanations.

To take this further, incorporate a Personal Knowledge Management (PKM) system into your routine.
Apps like Obsidian are perfect for this, with their flexible folder structures and Markdown formatting.
I optimize my AI outputs for Markdown so I can copy, paste, and organize them into clean, structured notes.
This ensures your materials are not only well-organized but also easy to access and build on later.
A solid PKM system is invaluable for managing knowledge and staying on top of your studies!
331 changes: 40 additions & 291 deletions Docs/Design/Researcher.md

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions Docs/Design/TTS_STT.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,14 @@ https://github.com/microsoft/SpeechT5
https://github.com/smellslikeml/dolla_llama


Coqui TTS
https://github.com/idiap/coqui-ai-TTS

Cartesia
https://docs.cartesia.ai/get-started/make-an-api-request

F5 TTS
https://github.com/SWivid/F5-TTS


Podcastfy
Expand Down
32 changes: 31 additions & 1 deletion Docs/Design/WebSearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,40 @@
## Introduction
This page serves as documentation regarding the web search functionality within tldw and provides context/justification for the decisions made within the module.


Pipeline:
1. User posts question
- Gradio/UI/API
2. Question is analyzed
- Question is analyzed to identify most likely purpose/goal of question, and Sub-questions are generated to support this
- User has option of seeing/modifying prompt used for Analysis/sub-question creation
3. Search(es) is/are performed - User toggled
- Search is performed using the user's question and sub-questions
4. Results are collected, stored, and analyzed
- Results are collected, stored in a temp 'search_results' dict, and analyzed for relevance, based on initial snippet(? or full page?)
- User has the option of seeing all results, or only relevant results
- User has the option to select which results are 'relevant',
- User also has the option to select which 'relevant' results are used to answer the question
5. Relevant results are added to result dictionary
- Results determined to be relevant are then stored in a 'relevant_results' dictionary, and the process is repeated until all results are analyzed/limit is hit.
6. Once all results are collected, they are then used to answer the user's question/sub-questions
- The relevant results are then used to answer the user's question/sub-questions
- Each result is first abstract summarized, FIXME
7. The final answer/'briefing' is then presented to the user
8. User has the option to save the results to the DB
9. User has the option to ask follow-up questions / see potential other questions





----------------
### Setting the Stage
- The goal of this module is to provide a simple, easy-to-use interface for searching the web and retrieving results.
- All the web searches are simple HTTP requests to an API or to the direct endpoint and then scraping the results.
- Parsing results is TODO.
- Results are then reviewed for relevancy, if relevant, the full page is fetched and analyzed.
- The results are then stored in a dictionary, and the process is repeated until all results are analyzed/limit is hit.
- Once all results are collected, they are then operated on, being used to create whatever final product is desired by the user.
- The goal is to provide a simple, easy-to-use interface for searching the web and retrieving results.
- Other modules are responsible for anything else, this module just performs the search, and delivers the results.
- **Main Function:**
Expand Down
134 changes: 134 additions & 0 deletions Server_API/API_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# API Documentation

## Overview

API uses FastAPI to provide a RESTful interface to the backend services. The API is designed to be simple and easy to use, with a focus on providing a clean interface for the frontend to interact with.

- **URLs**
- Main page: http://127.0.0.1:8000
- API Documentation page: http://127.0.0.1:8000/docs



## Endpoints



```
Here’s the important part. We’ll create:
A global asyncio.Queue of “write tasks.”
A WriteTask class that holds the SQL, parameters, and an asyncio.Future to signal completion.
A background worker (writer_worker) that pops tasks from the queue, executes them, and sets the result in the Future.
Endpoints that push a WriteTask onto the queue, then await the Future before returning.
# main.py
import asyncio
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Any, Tuple, Union
from database import get_db_connection
app = FastAPI()
# -----------------------------
# 1) A global queue + task class
# -----------------------------
class WriteTask:
"""Holds SQL, parameters, and a Future to let the enqueuing code wait for completion."""
def __init__(self, sql: str, params: tuple[Any, ...]):
self.sql = sql
self.params = params
self.future: asyncio.Future = asyncio.get_event_loop().create_future()
write_queue: asyncio.Queue[WriteTask] = asyncio.Queue()
# -----------------------------
# 2) The background worker
# -----------------------------
async def writer_worker():
"""Continuously processes write tasks from the queue, one at a time."""
while True:
task: WriteTask = await write_queue.get()
try:
# Perform the write
with get_db_connection() as conn:
conn.execute(task.sql, task.params)
conn.commit()
# If success, set the result of the Future
task.future.set_result(True)
except Exception as e:
# If failure, set the exception so the caller can handle it
task.future.set_exception(e)
finally:
write_queue.task_done()
# -----------------------------
# 3) Start the worker on startup
# -----------------------------
@app.on_event("startup")
async def startup_event():
# Launch the writer worker as a background task
asyncio.create_task(writer_worker())
# -----------------------------
# 4) Pydantic model for input
# -----------------------------
class ItemCreate(BaseModel):
name: str
# -----------------------------
# 5) Write endpoint (POST)
# -----------------------------
@app.post("/items")
async def create_item(item: ItemCreate):
"""Queue a write to the database, then wait for its completion."""
sql = "INSERT INTO items (name) VALUES (?)"
params = (item.name,)
# Create a WriteTask
write_task = WriteTask(sql, params)
# Put the task in the queue
await write_queue.put(write_task)
# Wait for the task to complete
try:
result = await write_task.future # This will be True if successful
return {"status": "success", "name": item.name}
except Exception as exc:
# If the DB write failed for some reason, raise a 500
raise HTTPException(status_code=500, detail=str(exc))
# -----------------------------
# 6) Read endpoint (GET)
# -----------------------------
@app.get("/items")
def read_items():
"""Simple read operation that does not need the queue."""
with get_db_connection() as conn:
cursor = conn.cursor()
cursor.execute("SELECT id, name FROM items")
rows = cursor.fetchall()
return [{"id": row[0], "name": row[1]} for row in rows]
Explanation
WriteTask stores (sql, params, future). The future is how we pass success/failure back to the original request.
When a request hits POST /items, we:
Construct a WriteTask.
put() it on the write_queue.
Immediately await write_task.future. We don’t return until the DB operation is done.
The writer_worker loop picks tasks in FIFO order and executes them one-by-one, guaranteeing no concurrency for writes (thus avoiding locks).
On success, task.future.set_result(True) is called. On failure, task.future.set_exception(e).
The awaiting endpoint sees either a success (and returns HTTP 200) or an exception (and returns HTTP 500).
This pattern means each request is effectively serialized for writes, but the user still gets a definitive success/failure response in the same request/response cycle.
```
17 changes: 16 additions & 1 deletion Server_API/app/main.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,22 @@
# main.py
# Description: This file contains the main FastAPI application, which serves as the primary API for the tldw application.
#
# Imports
#
# 3rd-party Libraries
from fastapi import FastAPI
#
# Local Imports
#
########################################################################################################################
#
# Functions:

app = FastAPI(title="TLDW API", version="1.0.0")
# Usage: uvicorn main:app --reload
app = FastAPI(title="tldw API", version="1.0.0")

@app.get("/")
async def root():
return {"message": "Welcome to the tldw API"}


0 comments on commit cb63eb9

Please sign in to comment.