Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same citation numbers #109

Merged
merged 4 commits into from
Oct 1, 2023
Merged

Same citation numbers #109

merged 4 commits into from
Oct 1, 2023

Conversation

mruwnik
Copy link
Collaborator

@mruwnik mruwnik commented Sep 30, 2023

#42 Reuse citation numbers

image

@@ -23,6 +23,9 @@ def search_authored(query: str):


def get_followups(query):
if not query.strip():
return []
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was causing errors when the server returned an error (because of the empty query) which was parsed as followups

</A>
);
export const CitationRef: React.FC<{citation: Citation}> = ({citation}) => {
const url = citation.url && citation.url !== ""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep wanting to add a linter, but worry that it'll mess up other PRs... :/

setCurrent({ phase: data.phase, ...result });
break;

case "citations":
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a separate step for parsing citations

@@ -70,6 +77,32 @@ const Home: NextPage = () => {
}
};


const updateCitations = (allCitations: Citation[], current: CurrentSearch) => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLM flow is:

  1. various loading stages
  2. citations are passed in
  3. more loading phases
  4. the main response starts coming in in small chunks
  5. more loading
  6. the followups are fetched
  7. done

The main issue here is that the citations are loaded first, and there usually are like 10 possible articles to be cited, but they're not used until they appear in the actual text. And most of them won't be used anyway.
This means that magic has to be done to work out which of the possible citations are actually cited, and for them to then get the appropriate numbers.

I dislike this function. Quite a bit. But couldn't come up with a cleaner way of doing it :/
At least most of the magic is focused here, with subsequent code assuming that each citation will have an index field with the canonical number to be displayed

citations: result?.citations || [],
base_count: result?.base_count || baseReferencesIndex,
citationsMap: findCitations(content, result?.citations || []),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLM returns citations like "bla bla bla [a] ble ble [b][c]". These then need to be mapped to the appropriate numbers. This citationMap is a mapping of {<letter>: <full citation object>}. It's possible to change from letters to numbers - it's all in the LLM prompt

web/src/components/citations.tsx Outdated Show resolved Hide resolved
web/src/components/citations.tsx Show resolved Hide resolved
web/src/components/citations.tsx Show resolved Hide resolved
web/src/pages/index.tsx Outdated Show resolved Hide resolved
Comment on lines +102 to +103
setCitations(allCitations)
setCurrent(current)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) as a first impression, it looks suspicious to me to set both of these at the same event, I would expect that all citations need to be calculated less often than whatever is "current" ... but I didn't look into it in details, I'm happy if this works as intended

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agreed. This whole function is really ugly :/
This function should only be called when the current text gets updated. Which is quite a few times per query, but not that much.
current is the results of the current query, which gets updated a lot - once it's final it gets added to the history (which are previous interactions) and current is set to null (or something like that).
allCitations is the list of all displayed citations.
This function needs to update both or none - if the newest text chunk doesn't contain citations (or only known ones), then neither should be updated, but if it contains a new citation, then the map of <letter>: <citation> in the current object needs to be updated with the new indexes, but that also implies that the list of all show citations also needs to be updated with the new citations.

@mruwnik mruwnik merged commit 40b518d into main Oct 1, 2023
1 check passed
@mruwnik mruwnik deleted the same-citation-numbers branch October 1, 2023 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants