-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: extend troubleshooting for very large repositories #329
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just left some minor suggestions.
@@ -39,3 +39,6 @@ next-env.d.ts | |||
|
|||
# search index file generated on build | |||
/public/search.json | |||
|
|||
# IDEs | |||
.idea |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you also have great taste in IDEs 😊
|
||
You can use Code Search to test the query against a particular timestamp in a given repository. | ||
|
||
Since Code Insights computes data points for twelve datapoints in the give time range, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unindexed search usually can take longer the further back you go in history. For older commits, more files are different from HEAD, so searcher needs to perform more brute-force file searches.
Could we suggest a specific time to target, like a worst-case? I'm not sure how far back Code Insights goes by default. We could even suggest rev:at.time(...)
as a convenience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I was hoping for some info like that, since I had a suspicion about older commits taking longer to search. I'll update some text above to work your suggestion in. Let me know if that sounds good :)
|
||
For example, if you want to track the version of a NPM dependency in your code base, searching for `my_library file:package.json` will compute much faster because there are less files to look at and fewer results to return. | ||
|
||
We recommend to make your query as precise as possible (and even omit results that may be relevant) until you reach a query that is able to compute fast enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny suggestions:
- Great that you mentioned mention file filters, maybe we could mention
lang
too - We could also mention the importance of quotes "..." if your search string contains whitespace
- Maybe we shouldn't say "and even omit results that may be relevant" since we do really want these queries to be relevant :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great ideas! I wasn't so sure about the "omit" part. Dropped that now. I've added some more tips, but added a disclaimer to the lang filter. From what I've seen in Language Stats Insights, this filter needs to load the file content and read it, and can therefore be a bit slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The search language filter is implemented differently, and tends to run very quickly. (If you're interested in the technical details, the search lang
filter first consults the file name, and avoids loading and analyzing content in the vast majority of cases.)
For https://github.com/sourcegraph/sourcegraph/issues/62295
This PR updates the documentation with more tips for very large repositories.
There are difficulties with Code Insights where it may run for a while, and then tell the user that there were incomplete data points. This probably came from very large repositories not being able to compute reasonably fast.
In addition to this documentation update I'm working on giving users more information about which repositories lead to incomplete datapoints: https://github.com/sourcegraph/sourcegraph/issues/62578
@sourcegraph/search-platform I poked a bit at the search backend when gathering this info, and would like to get your input if it's accurate, and if there may be other improvements to make complex queries run faster on very large repos :)
@mike-r-mclaughlin Could you review if this new info would be helpful for customers? I'm planning to expose the repositories that caused incomplete datapoints with https://github.com/sourcegraph/sourcegraph/issues/62578. Then a customer can see which repository didn't compute, pick that one, optimize the query, and then run the big Code Insight again.