search parsing + building the chain & prompts

rmusser01 · Dec 28, 2024 · cb63eb9 · cb63eb9
1 parent 2f48ffc
commit cb63eb9
Show file tree

Hide file tree

Showing 9 changed files with 1,401 additions and 678 deletions.
diff --git a/App_Function_Libraries/Web_Scraping/Search_Prompt.md b/App_Function_Libraries/Web_Scraping/Search_Prompt.md
diff --git a/App_Function_Libraries/Web_Scraping/Search_Prompt.py b/App_Function_Libraries/Web_Scraping/Search_Prompt.py
diff --git a/App_Function_Libraries/Web_Scraping/WebSearch_APIs.py b/App_Function_Libraries/Web_Scraping/WebSearch_APIs.py
diff --git a/Docs/Design/Education.md b/Docs/Design/Education.md
@@ -9,3 +9,24 @@ https://arxiv.org/abs/2411.07407
 https://arxiv.org/abs/2412.16429
 https://huggingface.co/papers/2412.15443
 
+
+
+
+
+
+one2manny
+ — 
+Today at 12:43 AM
+A great way to make studying more efficient and convenient is to take a digital PDF textbook, split it into separate files for each chapter, and organize them individually. 
+I then create a dedicated notebook for each chapter, treating it as a focused single source. 
+From there, I convert each chapter into an audio format, like a podcast. 
+This approach makes it easy to study while commuting, relaxing in bed with your eyes closed, or at any time when reading isn’t practical.
+
+I also recommend creating a study guide for each chapter, fully breaking down key concepts and definitions. 
+For more complex topics, the “explain like I’m 5” method works wonders—it simplifies challenging ideas into digestible explanations.
+
+To take this further, incorporate a Personal Knowledge Management (PKM) system into your routine. 
+Apps like Obsidian are perfect for this, with their flexible folder structures and Markdown formatting. 
+I optimize my AI outputs for Markdown so I can copy, paste, and organize them into clean, structured notes. 
+This ensures your materials are not only well-organized but also easy to access and build on later. 
+A solid PKM system is invaluable for managing knowledge and staying on top of your studies!
diff --git a/Docs/Design/Researcher.md b/Docs/Design/Researcher.md
diff --git a/Docs/Design/TTS_STT.md b/Docs/Design/TTS_STT.md
@@ -50,8 +50,14 @@ https://github.com/microsoft/SpeechT5
 https://github.com/smellslikeml/dolla_llama
 
 
+Coqui TTS
+    https://github.com/idiap/coqui-ai-TTS
 
+Cartesia
+    https://docs.cartesia.ai/get-started/make-an-api-request
 
+F5 TTS
+    https://github.com/SWivid/F5-TTS
 
 
 Podcastfy

diff --git a/Docs/Design/WebSearch.md b/Docs/Design/WebSearch.md
@@ -3,10 +3,40 @@
 ## Introduction
 This page serves as documentation regarding the web search functionality within tldw and provides context/justification for the decisions made within the module.
 
+
+Pipeline:
+1. User posts question
+   - Gradio/UI/API
+2. Question is analyzed
+    - Question is analyzed to identify most likely purpose/goal of question, and Sub-questions are generated to support this
+    - User has option of seeing/modifying prompt used for Analysis/sub-question creation
+3. Search(es) is/are performed - User toggled
+    - Search is performed using the user's question and sub-questions
+4. Results are collected, stored, and analyzed
+    - Results are collected, stored in a temp 'search_results' dict, and analyzed for relevance, based on initial snippet(? or full page?)
+    - User has the option of seeing all results, or only relevant results
+    - User has the option to select which results are 'relevant',
+    - User also has the option to select which 'relevant' results are used to answer the question
+5. Relevant results are added to result dictionary
+    - Results determined to be relevant are then stored in a 'relevant_results' dictionary, and the process is repeated until all results are analyzed/limit is hit.
+6. Once all results are collected, they are then used to answer the user's question/sub-questions
+    - The relevant results are then used to answer the user's question/sub-questions
+    - Each result is first abstract summarized, FIXME
+7. The final answer/'briefing' is then presented to the user
+8. User has the option to save the results to the DB 
+9. User has the option to ask follow-up questions / see potential other questions
+
+
+
+
+
 ----------------
 ### Setting the Stage
+- The goal of this module is to provide a simple, easy-to-use interface for searching the web and retrieving results.
 - All the web searches are simple HTTP requests to an API or to the direct endpoint and then scraping the results.
-- Parsing results is TODO.
+- Results are then reviewed for relevancy, if relevant, the full page is fetched and analyzed.
+- The results are then stored in a dictionary, and the process is repeated until all results are analyzed/limit is hit.
+- Once all results are collected, they are then operated on, being used to create whatever final product is desired by the user.
 - The goal is to provide a simple, easy-to-use interface for searching the web and retrieving results.
 - Other modules are responsible for anything else, this module just performs the search, and delivers the results.
 - **Main Function:**

diff --git a/Server_API/API_README.md b/Server_API/API_README.md
@@ -0,0 +1,134 @@
+# API Documentation
+
+## Overview
+
+API uses FastAPI to provide a RESTful interface to the backend services. The API is designed to be simple and easy to use, with a focus on providing a clean interface for the frontend to interact with.
+
+- **URLs**
+    - Main page: http://127.0.0.1:8000
+    - API Documentation page: http://127.0.0.1:8000/docs
+
+
+
+## Endpoints
+
+
+
+```
+Here’s the important part. We’ll create:
+
+    A global asyncio.Queue of “write tasks.”
+    A WriteTask class that holds the SQL, parameters, and an asyncio.Future to signal completion.
+    A background worker (writer_worker) that pops tasks from the queue, executes them, and sets the result in the Future.
+    Endpoints that push a WriteTask onto the queue, then await the Future before returning.
+
+# main.py
+import asyncio
+from fastapi import FastAPI, HTTPException
+from pydantic import BaseModel
+from typing import Any, Tuple, Union
+
+from database import get_db_connection
+
+app = FastAPI()
+
+# -----------------------------
+# 1) A global queue + task class
+# -----------------------------
+class WriteTask:
+    """Holds SQL, parameters, and a Future to let the enqueuing code wait for completion."""
+    def __init__(self, sql: str, params: tuple[Any, ...]):
+        self.sql = sql
+        self.params = params
+        self.future: asyncio.Future = asyncio.get_event_loop().create_future()
+
+write_queue: asyncio.Queue[WriteTask] = asyncio.Queue()
+
+
+# -----------------------------
+# 2) The background worker
+# -----------------------------
+async def writer_worker():
+    """Continuously processes write tasks from the queue, one at a time."""
+    while True:
+        task: WriteTask = await write_queue.get()
+        try:
+            # Perform the write
+            with get_db_connection() as conn:
+                conn.execute(task.sql, task.params)
+                conn.commit()
+
+            # If success, set the result of the Future
+            task.future.set_result(True)
+        except Exception as e:
+            # If failure, set the exception so the caller can handle it
+            task.future.set_exception(e)
+        finally:
+            write_queue.task_done()
+
+
+# -----------------------------
+# 3) Start the worker on startup
+# -----------------------------
+@app.on_event("startup")
+async def startup_event():
+    # Launch the writer worker as a background task
+    asyncio.create_task(writer_worker())
+
+
+# -----------------------------
+# 4) Pydantic model for input
+# -----------------------------
+class ItemCreate(BaseModel):
+    name: str
+
+
+# -----------------------------
+# 5) Write endpoint (POST)
+# -----------------------------
+@app.post("/items")
+async def create_item(item: ItemCreate):
+    """Queue a write to the database, then wait for its completion."""
+    sql = "INSERT INTO items (name) VALUES (?)"
+    params = (item.name,)
+
+    # Create a WriteTask
+    write_task = WriteTask(sql, params)
+
+    # Put the task in the queue
+    await write_queue.put(write_task)
+
+    # Wait for the task to complete
+    try:
+        result = await write_task.future  # This will be True if successful
+        return {"status": "success", "name": item.name}
+    except Exception as exc:
+        # If the DB write failed for some reason, raise a 500
+        raise HTTPException(status_code=500, detail=str(exc))
+
+
+# -----------------------------
+# 6) Read endpoint (GET)
+# -----------------------------
+@app.get("/items")
+def read_items():
+    """Simple read operation that does not need the queue."""
+    with get_db_connection() as conn:
+        cursor = conn.cursor()
+        cursor.execute("SELECT id, name FROM items")
+        rows = cursor.fetchall()
+        return [{"id": row[0], "name": row[1]} for row in rows]
+
+Explanation
+
+    WriteTask stores (sql, params, future). The future is how we pass success/failure back to the original request.
+    When a request hits POST /items, we:
+        Construct a WriteTask.
+        put() it on the write_queue.
+        Immediately await write_task.future. We don’t return until the DB operation is done.
+    The writer_worker loop picks tasks in FIFO order and executes them one-by-one, guaranteeing no concurrency for writes (thus avoiding locks).
+    On success, task.future.set_result(True) is called. On failure, task.future.set_exception(e).
+    The awaiting endpoint sees either a success (and returns HTTP 200) or an exception (and returns HTTP 500).
+
+    This pattern means each request is effectively serialized for writes, but the user still gets a definitive success/failure response in the same request/response cycle.
+```
diff --git a/Server_API/app/main.py b/Server_API/app/main.py
@@ -1,7 +1,22 @@
+# main.py
+# Description: This file contains the main FastAPI application, which serves as the primary API for the tldw application.
+#
+# Imports
+#
+# 3rd-party Libraries
 from fastapi import FastAPI
+#
+# Local Imports
+#
+########################################################################################################################
+#
+# Functions:
 
-app = FastAPI(title="TLDW API", version="1.0.0")
+# Usage: uvicorn main:app --reload
+app = FastAPI(title="tldw API", version="1.0.0")
 
 @app.get("/")
 async def root():
     return {"message": "Welcome to the tldw API"}
+
+