Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract search #303

Closed
wants to merge 4 commits into from
Closed

Extract search #303

wants to merge 4 commits into from

Conversation

mruwnik
Copy link
Collaborator

@mruwnik mruwnik commented Aug 20, 2023

  • Refactor the search module so it's independent (no external dependancies)
  • Extract the search module as a separate library
  • Add an example server+page to showcase that the independent search works by itself
  • Change the fronted to use the search library
  • Add CORS for selected routes (including static resources)
  • Allow searching directly from the url

This can also be seen in action here - might need to press enter for it to start searching. This involves hacky malusage of HTML, so is very clunky - I couldn't be bothered to do it properly.

A deployed version of this can be found here

@mruwnik mruwnik force-pushed the extract-search branch 2 times, most recently from 9b8a6c4 to b8759d9 Compare August 20, 2023 20:22
@@ -107,12 +141,13 @@ export default function Search({onSiteAnswersRef, openQuestionTitles, onSelect}:
</div>
</AutoHeight>
</div>
{showMore && (
{showMore && searcher && (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the searcher gets created as soon as the component is loaded. It's not properly configured then, as the worker still has to start up, but it's around, so this will be displayed as soon as the component gets loaded

return json(results, {
headers: {
'Access-Control-Allow-Methods': 'GET, OPTIONS',
'Access-Control-Allow-Origin': ALLOW_ORIGINS,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CORS stuff, so this can be called from other sites. It would probably be nicer as some kind of wrapper or something, but this should be fine for now?

server.js Outdated
}

const fetchCorsAsset = (event) => {
const resp = handleAsset(event, build)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the createEventHandler has no way to modify the resulting Response, nor any way to set CORS headers, so I ended up having to pretty much replicate some of the code paths in it :/

server.js Outdated
import * as build from '@remix-run/dev/server-build'

addEventListener('fetch', createEventHandler({build, mode: process.env.NODE_ENV}))
const CORS_ASSETS = ['/tfWorker.js']
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably overkill. But if any other static resources need to be fetched with CORS, it should suffice to add them here

@@ -0,0 +1,62 @@
<!DOCTYPE html>
<html>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a sample page to show how (that?) things work. I deployed it here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

live questions don't work for me


const defaultSearchConfig = {
getAllQuestions: () => [] as Question[],
onResolveCallback: null,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used as a callback if a search succeeds and hasn't been canceled (e.g. by a new search being started)

.trim()

const defaultSearchConfig = {
getAllQuestions: () => [] as Question[],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is called for the baseline search

.then((response) => response.text())
.then((code) => {
const blob = new Blob([code])
return new Worker(URL.createObjectURL(blob))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CORS problems. The actual code can be fetched without any problems, but workers don't want to start if the js file is on a different domain. This is fine, though. I don't know what to think about how easy it is to get round these various js restrictions...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe using workers for shared modular code we expect other people to include on websites is a good idea 😢

this.currentSearch.resolve(searchResults)
this.currentSearch = null
if (this.searchConfig.onResolveCallback) {
this.searchConfig.onResolveCallback(query, searchResults)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only call the callback if the result was for the most recent search. Cases like the following are fine:

  1. search for "this is the first thing"
  2. search for "this is the second one" -> will "cancel" the first one, but the worker will still be processing it
  3. search for "this is the first thing" again -> will cancel the previous call, but if the worker returns the result from the first call, it will treat it as answering this one. Which is fine?

runLiveSearch = (query: string, resultsNum?: number) => {
const questions = this.searchConfig.getAllQuestions ? this.searchConfig.getAllQuestions() : []
if (query != this.currentSearch?.query) {
return // this search has been superceeded with a newer one, so just give up
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can happen if a search is started before the worker is ready - it then waits 100ms and tries again. If a new query is started in the mean time, the first query shouldn't keep trying

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is too big for me to review right now, should get to it around August 30...

I noticed the deployed version does not update the search query in URL when updating search box.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it update the utl? I thought that this was already too big :/

No hurry on the reviewing. The main thing is that search can be invoked from other sites, and that's doable by e.g. importing https://media.ahiru.pl/stampy/stampySearch.min.js. I wanted this to be available before the aisafety.com hackathon thingy, in case it would be needed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL is not getting any shorter by not updating it:
Screenshot 2023-08-22 at 14 22 21

Feel free to completely remove the URL state if people don't need it. But if it's needed, it should be correct.

@@ -0,0 +1,62 @@
<!DOCTYPE html>
<html>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

live questions don't work for me

.then((response) => response.text())
.then((code) => {
const blob = new Blob([code])
return new Worker(URL.createObjectURL(blob))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe using workers for shared modular code we expect other people to include on websites is a good idea 😢

logSearch(value)
}
setPendingSearches(true)
searcher && searcher.searchLive(value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably mostly for TS reasons, but it shouldn't be a valid option to NOT perform the search at this step, if it "always works" it should throw an exception after future changes that make it suddenly "not work"

useEffect(() => {
setSearcher(
new Searcher({
getAllQuestions: () => onSiteAnswersRef.current,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a very useful exercise to find out how reusable is current architecture, and I'd be happy for this code to live on a branch or fork for experimental deployments.

However, I am not able to confirm this is the right step for the master branch and production deployment, where we should aim for more maintainability and simplified architecture. Adding more features on top of previous hacks does not sound sustainable to me.

For 3rd party users of this feature, creating new class instances to spin of workers sensitive to CORS and special API calls while having to provide your own getAllQuestions, then dealing with optional-results success callbacks without error handling and quite complex semantics ... that doesn't sound very simple to use / modular / ...

If there is someone else who wants to take responsibility for long term maintenance of this repo with occational firefighting and debugging if the site stops working, I'd be happy to hand over the "architecture oversight".

If not, I will rather focus on figuring out how to include stampy search on 3rd party sites using some simpler API, without this new PoC feature 🤔 Ideally something drop-in replacable with API from https://nlp.stampy.ai/ (or whatever is / will be the current url / name for that feature)

@Aprillion
Copy link
Collaborator

Aprillion commented Sep 3, 2023

We are doing #307 at the aisafety.com hackathon and we found a compromise between practicality, magic, and at least halfway hoping it's still maintainable in PRs related to that issue => let's keep this branch for inspiration of future improvements, but not merging to master so closing the PR.

Thank you very much for exploring this path too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants