Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: extend troubleshooting for very large repositories #329

Merged
merged 2 commits into from
May 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,6 @@ next-env.d.ts

# search index file generated on build
/public/search.json

# IDEs
.idea
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you also have great taste in IDEs 😊

71 changes: 71 additions & 0 deletions docs/code_insights/references/incomplete_data_points.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ In all of these cases, if data is returned at all, it will be an undercount.

See the below situations for tips on avoiding and troubleshooting these errors.

## Search Queries

This guide applies to all types of Code Insights that use search queries. Language Stats Insights do not use search queries.

## Timeout errors

For searches that take a long time to complete, it's possible for them to timeout before the search ends, and before we can record the data value.
Expand All @@ -24,6 +28,73 @@ You can read more about this case in our [limitations](/code_insights/explanatio

If the data is available, the error alert will inform you which times the search has timed out.

## Strategies for very large repositories

When dealing with large repositories, several strategies can help identify and manage search limitations effectively.

The goal is to find a query that can execute successfully, and then ramp up complexity until we find the breaking point.

### Use Code Search to test your query

You can use Code Search to verify if the query is able to complete within the given timeout.

Once a query was executed the results may be indexed and you need to choose a different commit or date to achieve comparable results to Code Insights.

#### Unlimited results

Code Search limits the number of results by default. Code Insights however needs to count through all the results.

Add `count:all` to your query to get comparable results.

#### Picking the right commit

When Code Insights runs a search query, it will do so for 12 commits spread out over the configured time range. This
unindexed search can take longer the further back in time the search goes.

To test that the slowest search will succeed in time, we recommend using the `rev:at.time(...)` (available from version 5.4.0)
with the time range that you selected. E.g. if your Code Insight looks at the last 2 years you should use `rev:at.time(2y)`.

For example the Code Insights query `file:.*\.md hello repo:^github\.com/my_org/my_repo count:all` should be written as
`context:global file:.*\.md hello repo:^github\.com/my_org/my_repo count:all rev:at.time(2y)` in Code Search.

If your Sourcegraph instance is on a version older than 5.4.0, you can pick a commit sha from e.g. 2 years ago. Here the query
`file:.*\.md hello repo:^github\.com/my_org/my_repo count:all` becomes `context:global file:.*\.md hello repo:^github\.com/my_org/my_repo@my_commit_sha count:all`.

#### Timeout

If the query fails with a timeout before one minute has passed, add the parameter `timeout:1m`.

### Precise queries compute faster

If your query is not able to compute results in a reasonable time, you can try to reduce the number of results returned by the query.

For example, if you want to track the version of a NPM dependency in your code base, searching for `my_library file:package.json` will compute much faster because there are less files to look at and fewer results to return.

We recommend to make your query as precise as possible and reduce the number of results until you reach a query that is able to compute fast enough.

Here are some tips:
- Search only files with a particular ending. E.g. use `file:.*\.md` to search for files with the `.md` ending.
- Search for files with a certain language. E.g. use `lang:javascript` to search for all JavaScript files. Please note
that this can be slower than the file ending filter, but may be necessary for ambiguous file endings (e.g. `.m` is used for Objective-C and Matlab).
- Put quotes around literal terms, unless you want to search for multiple keywords. E.g. searching for `"hello world"` will only
yield results where both words are connected by a whitespace, but `hello world` will yield results where either word appears.

To learn more about writing precise queries, see our [search query syntax](/code-search/queries).

### Increase the timeout

By default, search queries have a one minute timeout. This value is capped by the setting `search.limits.maxTimeoutSeconds` which by default is also one minute.
bahrmichael marked this conversation as resolved.
Show resolved Hide resolved

Your site admin can change this value to increase the maximum timeout. Once it is increased, you can use the search parameter `timeout:` to give the query more time to run.

Example:

```
timeout:10m
```

Note that very long searches can have a negative impact on performance, so it's important to use this parameter with caution.

## Other errors

For other errors, please reach out to our support team through your usual channels or at [email protected].
Loading