-
Notifications
You must be signed in to change notification settings - Fork 46
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: extend troubleshooting for very large repositories (#329)
For https://github.com/sourcegraph/sourcegraph/issues/62295 This PR updates the documentation with more tips for very large repositories. There are difficulties with Code Insights where it may run for a while, and then tell the user that there were incomplete data points. This probably came from very large repositories not being able to compute reasonably fast. In addition to this documentation update I'm working on giving users more information about which repositories lead to incomplete datapoints: https://github.com/sourcegraph/sourcegraph/issues/62578 --- @sourcegraph/search-platform I poked a bit at the search backend when gathering this info, and would like to get your input if it's accurate, and if there may be other improvements to make complex queries run faster on very large repos :) @mike-r-mclaughlin Could you review if this new info would be helpful for customers? I'm planning to expose the repositories that caused incomplete datapoints with https://github.com/sourcegraph/sourcegraph/issues/62578. Then a customer can see which repository didn't compute, pick that one, optimize the query, and then run the big Code Insight again.
- Loading branch information
1 parent
c6ae66d
commit 1c1b9a8
Showing
2 changed files
with
74 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,3 +39,6 @@ next-env.d.ts | |
|
||
# search index file generated on build | ||
/public/search.json | ||
|
||
# IDEs | ||
.idea |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,10 @@ In all of these cases, if data is returned at all, it will be an undercount. | |
|
||
See the below situations for tips on avoiding and troubleshooting these errors. | ||
|
||
## Search Queries | ||
|
||
This guide applies to all types of Code Insights that use search queries. Language Stats Insights do not use search queries. | ||
|
||
## Timeout errors | ||
|
||
For searches that take a long time to complete, it's possible for them to timeout before the search ends, and before we can record the data value. | ||
|
@@ -24,6 +28,73 @@ You can read more about this case in our [limitations](/code_insights/explanatio | |
|
||
If the data is available, the error alert will inform you which times the search has timed out. | ||
|
||
## Strategies for very large repositories | ||
|
||
When dealing with large repositories, several strategies can help identify and manage search limitations effectively. | ||
|
||
The goal is to find a query that can execute successfully, and then ramp up complexity until we find the breaking point. | ||
|
||
### Use Code Search to test your query | ||
|
||
You can use Code Search to verify if the query is able to complete within the given timeout. | ||
|
||
Once a query was executed the results may be indexed and you need to choose a different commit or date to achieve comparable results to Code Insights. | ||
|
||
#### Unlimited results | ||
|
||
Code Search limits the number of results by default. Code Insights however needs to count through all the results. | ||
|
||
Add `count:all` to your query to get comparable results. | ||
|
||
#### Picking the right commit | ||
|
||
When Code Insights runs a search query, it will do so for 12 commits spread out over the configured time range. This | ||
unindexed search can take longer the further back in time the search goes. | ||
|
||
To test that the slowest search will succeed in time, we recommend using the `rev:at.time(...)` (available from version 5.4.0) | ||
with the time range that you selected. E.g. if your Code Insight looks at the last 2 years you should use `rev:at.time(2y)`. | ||
|
||
For example the Code Insights query `file:.*\.md hello repo:^github\.com/my_org/my_repo count:all` should be written as | ||
`context:global file:.*\.md hello repo:^github\.com/my_org/my_repo count:all rev:at.time(2y)` in Code Search. | ||
|
||
If your Sourcegraph instance is on a version older than 5.4.0, you can pick a commit sha from e.g. 2 years ago. Here the query | ||
`file:.*\.md hello repo:^github\.com/my_org/my_repo count:all` becomes `context:global file:.*\.md hello repo:^github\.com/my_org/my_repo@my_commit_sha count:all`. | ||
|
||
#### Timeout | ||
|
||
If the query fails with a timeout before one minute has passed, add the parameter `timeout:1m`. | ||
|
||
### Precise queries compute faster | ||
|
||
If your query is not able to compute results in a reasonable time, you can try to reduce the number of results returned by the query. | ||
|
||
For example, if you want to track the version of a NPM dependency in your code base, searching for `my_library file:package.json` will compute much faster because there are less files to look at and fewer results to return. | ||
|
||
We recommend to make your query as precise as possible and reduce the number of results until you reach a query that is able to compute fast enough. | ||
|
||
Here are some tips: | ||
- Search only files with a particular ending. E.g. use `file:.*\.md` to search for files with the `.md` ending. | ||
- Search for files with a certain language. E.g. use `lang:javascript` to search for all JavaScript files. Please note | ||
that this can be slower than the file ending filter, but may be necessary for ambiguous file endings (e.g. `.m` is used for Objective-C and Matlab). | ||
- Put quotes around literal terms, unless you want to search for multiple keywords. E.g. searching for `"hello world"` will only | ||
yield results where both words are connected by a whitespace, but `hello world` will yield results where either word appears. | ||
|
||
To learn more about writing precise queries, see our [search query syntax](/code-search/queries). | ||
|
||
### Increase the timeout | ||
|
||
By default, search queries have a one minute timeout. This value is capped by the setting `search.limits.maxTimeoutSeconds` which by default is also one minute. | ||
|
||
Your site admin can change this value to increase the maximum timeout. Once it is increased, you can use the search parameter `timeout:` to give the query more time to run. | ||
|
||
Example: | ||
|
||
``` | ||
timeout:10m | ||
``` | ||
|
||
Note that very long searches can have a negative impact on performance, so it's important to use this parameter with caution. | ||
|
||
## Other errors | ||
|
||
For other errors, please reach out to our support team through your usual channels or at [email protected]. |