Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docsite explainer #124

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Docsite explainer #124

wants to merge 2 commits into from

Conversation

josephjclark
Copy link
Collaborator

@josephjclark josephjclark commented Nov 27, 2024

Short Description

Create an LLM-based service to explain passages from the OpenFn Documentation.

Some things that need to be done on the Apollo side before releasing to production:

  • Quality test: At the minimum, manually test more varied input passages and evaluate the output to assess whether the prompt, RAG and LLM settings are adequate and there are no errors for certain types of inputs. (Target e.g. long input paragraphs, short inputs, texts requiring general knowledge, documentation-specific knowledge). Ideally, have a more extensive second round of testing using an LLM-as-a-judge approach.
  • Quick style check – is the LLM response tone OK for the site
  • Calculate worst case scenario and expected scenario costs (OpenAI, Zilliz, Anthropic). Ensure API usage limits will cover the expected usage.
  • Verify keys are passed correctly
  • FIX bug - some paragraph inputs give error (e.g. "In this walkthrough, we will configure a Workflow to automatically sync user data from a web REST API, and import this data to a GoogleSheet." at Tutorials > Http)

Beyond the LLM/RAG pipeline, the front-end formatting, and possible bot usage will also need checking.

Implementation Details

This service uses an input text to query a Claude model for a clarification. It also searches a Zilliz embedding collection containing the full documentation as OpenFn embeddings to give the model passages from it as additional context.

This service leverages the existing Search service to vectorise and search the documentation.

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our Responsible AI Policy

@josephjclark
Copy link
Collaborator Author

On top of the other work needed here, we need to pass a responsible AI wand over this (here or on the docs side?)

@josephjclark
Copy link
Collaborator Author

@hanna-paasivirta Can I ask you, when you have time, to:

  1. Re-base this branch against main (its a bit behind) (I can help you with rebasing)
  2. Write a brief PR overview and list what needs to be done before we can release to production (excluding the build-all-the-things thing)
  3. Propose a solution to build an in-memory cached explanation of every line (I think we should develop this on a separate branch, just in case we decide not to deploy it)

@hanna-paasivirta
Copy link
Contributor

Proposed solution to build an in-memory cached explanation of every line:

One option to consider as part of the cost evaluation and front-end design is to build an in-memory cached explanation of every possible input paragraph for each version of the Documentation. On the backend, this would need the following process to run once:

  1. Script to split the documentation into the correct paragraphs. Save and index.
  2. Feed the splits to the document explainer pipeline. (Could use the Anthropic Batches API to process at non-peak times to lower costs).
  3. Save the answers on the server. Index so they can be fetched with the corresponding input.

In this scenario, the front-end "explain" button would not trigger the document explainer pipeline, but would fetch the relevant saved answer from the server.

@josephjclark
Copy link
Collaborator Author

Fantastic write up @hanna-paasivirta , thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants