-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve search functionality (goal is to improve how it returns results for lv-dof) #20
Comments
I think for our application it makes the most sense to insert NAICS API data into a DB. In the same vein, I think the API would probably draw some benefit from moving to a DB as well (as opposed to a flat JSON file). We could utilize an existing FTS program, and we'd probably gain some response time speed improvements while we're at it. Maybe this deserves a new thread, but what are the thoughts on adding tags to classification codes? I think the additional metadata would help in retrieving relevant codes for FTS's. |
What's an FTS? How are tags generated? I think the "index entries" and "illustrative examples" were an attempt by the NAICS writers to provide text hooks for finding codes by similar titles. |
FTS is full-text search. I’d hold off on a DB for now; the dataset IIRC is very small, and if loaded at launch time could happily hang out in memory without the overhead of an external database service. I’m not even sure this is worth testing for now; the simplicity benefit of running from flat local files is tremendous. Building a simple full text index is very easy, if even necessary here. |
That's a good point, actually. Using a DB for the API would probably be a little overkill. Mostly what I'd like to get is something a little fuzzier than exact match. I've been playing around with some of the FTS JS libraries that have this out of the box (e.g. lunr, fullproof, etc), but they've all been a little wonky. My initial thought would be that the addition of tags (maybe generated by Mechanical Turk?) would weight the searches in the right direction. Not sure how feasible that is, though. Open to ideas. |
I'd suggest weighting searches in this order: title, index entries, illustrative examples, description As an aside, the DOF front end is now storing all search inputs in the background for later analysis. Maybe something useful could come of that. |
@louh interesting. Should we persist search terms in the session? We could save them into the DB for later usage. |
@rclosner Yep, I'm trying to figure out how LocalStorage works now... if we want to save it, it'll be there for now on the user side. |
@louh 👍 |
LocalStorage or cookies are probably a better bet, since they will leave us with a read-only application for lower maintenance burden. Fuzzy matches are an interesting opportunity for Git/JSON-driven search terms. They can be initially generated automatically, committed to the project, and then further edited via manual updates and such. Things like synonyms and related terms are so human. |
Some things @rclosner and I discussed:
cc @migurski
The text was updated successfully, but these errors were encountered: