Docsite explainer #124

josephjclark · 2024-11-27T18:03:51Z

Short Description

Create an LLM-based service to explain passages from the OpenFn Documentation.

Some things that need to be done on the Apollo side before releasing to production:

Quality test: At the minimum, manually test more varied input passages and evaluate the output to assess whether the prompt, RAG and LLM settings are adequate and there are no errors for certain types of inputs. (Target e.g. long input paragraphs, short inputs, texts requiring general knowledge, documentation-specific knowledge). Ideally, have a more extensive second round of testing using an LLM-as-a-judge approach.
Quick style check – is the LLM response tone OK for the site
Calculate worst case scenario and expected scenario costs (OpenAI, Zilliz, Anthropic). Ensure API usage limits will cover the expected usage.
Verify keys are passed correctly
FIX bug - some paragraph inputs give error (e.g. "In this walkthrough, we will configure a Workflow to automatically sync user data from a web REST API, and import this data to a GoogleSheet." at Tutorials > Http)

Beyond the LLM/RAG pipeline, the front-end formatting, and possible bot usage will also need checking.

Implementation Details

This service uses an input text to query a Claude model for a clarification. It also searches a Zilliz embedding collection containing the full documentation as OpenFn embeddings to give the model passages from it as additional context.

This service leverages the existing Search service to vectorise and search the documentation.

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

You can read more details in our Responsible AI Policy

josephjclark · 2024-11-29T10:00:43Z

On top of the other work needed here, we need to pass a responsible AI wand over this (here or on the docs side?)

josephjclark · 2024-11-29T10:07:46Z

@hanna-paasivirta Can I ask you, when you have time, to:

Re-base this branch against main (its a bit behind) (I can help you with rebasing)
Write a brief PR overview and list what needs to be done before we can release to production (excluding the build-all-the-things thing)
Propose a solution to build an in-memory cached explanation of every line (I think we should develop this on a separate branch, just in case we decide not to deploy it)

hanna-paasivirta · 2024-11-29T13:17:21Z

Proposed solution to build an in-memory cached explanation of every line:

One option to consider as part of the cost evaluation and front-end design is to build an in-memory cached explanation of every possible input paragraph for each version of the Documentation. On the backend, this would need the following process to run once:

Script to split the documentation into the correct paragraphs. Save and index.
Feed the splits to the document explainer pipeline. (Could use the Anthropic Batches API to process at non-peak times to lower costs).
Save the answers on the server. Index so they can be fetched with the corresponding input.

In this scenario, the front-end "explain" button would not trigger the document explainer pipeline, but would fetch the relevant saved answer from the server.

josephjclark · 2024-11-29T13:38:33Z

Fantastic write up @hanna-paasivirta , thank you!

hanna-paasivirta added 2 commits November 27, 2024 17:14

docsite_explainer

9adea6b

add readme

df9daac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docsite explainer #124

Docsite explainer #124

josephjclark commented Nov 27, 2024 •

edited by hanna-paasivirta

Loading

josephjclark commented Nov 29, 2024

josephjclark commented Nov 29, 2024

hanna-paasivirta commented Nov 29, 2024

josephjclark commented Nov 29, 2024

Docsite explainer #124

Are you sure you want to change the base?

Docsite explainer #124

Conversation

josephjclark commented Nov 27, 2024 • edited by hanna-paasivirta Loading

Short Description

Implementation Details

AI Usage

josephjclark commented Nov 29, 2024

josephjclark commented Nov 29, 2024

hanna-paasivirta commented Nov 29, 2024

josephjclark commented Nov 29, 2024

josephjclark commented Nov 27, 2024 •

edited by hanna-paasivirta

Loading