Incomplete data points

There are a few cases when a Code Insight may return incomplete data.

In all of these cases, if data is returned at all, it will be an undercount.

See the below situations for tips on avoiding and troubleshooting these errors.

Search Queries

This guide applies to all types of Code Insights that use search queries. Language Stats Insights do not use search queries.

Identify affected repositories

Through the Code Insights you can only see an aggregated warning for incomplete datapoints. If you're running the Code Insight with more than one repository, you may want to identify which repositories are causing incomplete datapoints.

To identify affected repositories you first need the series ID. Go to your insights dashboard, and click on the hamburger menu in the top right of your insight's widget. Click on "Get shareable link". It may look like the one below. The last part is the ID.

SHELL
https://test.sourcegraph.com/insights/aW5zaWdodF92aWV3OiIyZ1ZmQzMxMjEwRGhKVkxvc1YybnBqbEduYnUi

With the ID, you can now use the GraphQL API Console through your user settings page, or navigate to /api/console on your Sourcegraph instance.

Paste the GraphQL query below into the query field, and replace MY_SERIES_ID with the ID you copied from above.

SHELL
{ insightViews(id: "MY_SERIES_ID") { nodes { repositoryDefinition { ...on RepositorySearchScope { search } } dataSeries { status { incompleteDatapoints { time ...on TimeoutDatapointAlert { __typename repositories { name } } ...on GenericIncompleteDatapointAlert { __typename repositories { name } } } } } } } }

Running this query will give you a list of incomplete datapoints along with the repositories that are causing them.

You can then follow the other recommendations on this page to address issues for those repositories.

Timeout errors

For searches that take a long time to complete, it's possible for them to timeout before the search ends, and before we can record the data value.

To address this, try to minimize or avoid cases where your insight data series:

  1. Runs over extra large sets of repositories (scope your insight further to fewer repositories)
  2. Uses many boolean combinations (consider splitting into multiple data series)
  3. Runs a complex search over an especially large monorepo (consider optimizing your search query to be more specific or include more filters, like lang: or file:)

In addition:

  1. Timeout errors on points that have been backfilled before the creation date of the insight are more likely to occur on single, large repositories, because backfill points are calculated by running many searches, repository by repository, individually.
  2. Timeout errors on points that have been snapshot after the creation date of the insight are more likely to occur on insights running complex searches over large numbers of repositories, because snapshot points are calculated by running a single global search against the current index. You can read more about this case in our limitations.

If the data is available, the error alert will inform you which times the search has timed out.

Strategies for very large repositories

When dealing with large repositories, several strategies can help identify and manage search limitations effectively.

The goal is to find a query that can execute successfully, and then ramp up complexity until we find the breaking point.

Use Code Search to test your query

You can use Code Search to verify if the query is able to complete within the given timeout.

Once a query was executed the results may be indexed and you need to choose a different commit or date to achieve comparable results to Code Insights.

Unlimited results

Code Search limits the number of results by default. Code Insights however needs to count through all the results.

Add count:all to your query to get comparable results.

Picking the right commit

When Code Insights runs a search query, it will do so for 12 commits spread out over the configured time range. This unindexed search can take longer the further back in time the search goes.

To test that the slowest search will succeed in time, we recommend using the rev:at.time(...) (available from version 5.4.0) with the time range that you selected. E.g. if your Code Insight looks at the last 2 years you should use rev:at.time(2y).

For example the Code Insights query file:.*\.md hello repo:^github\.com/my_org/my_repo count:all should be written as context:global file:.*\.md hello repo:^github\.com/my_org/my_repo count:all rev:at.time(2y) in Code Search.

If your Sourcegraph instance is on a version older than 5.4.0, you can pick a commit sha from e.g. 2 years ago. Here the query file:.*\.md hello repo:^github\.com/my_org/my_repo count:all becomes context:global file:.*\.md hello repo:^github\.com/my_org/my_repo@my_commit_sha count:all.

Timeout

If the query fails with a timeout before one minute has passed, add the parameter timeout:1m.

Precise queries compute faster

If your query is not able to compute results in a reasonable time, you can try to reduce the number of results returned by the query.

For example, if you want to track the version of a NPM dependency in your code base, searching for my_library file:package.json will compute much faster because there are less files to look at and fewer results to return.

We recommend to make your query as precise as possible and reduce the number of results until you reach a query that is able to compute fast enough.

Here are some tips:

  • Search only files with a particular ending. E.g. use file:.*\.md to search for files with the .md ending.
  • Search for files with a certain language. E.g. use lang:javascript to search for all JavaScript files. Please note that this can be slower than the file ending filter, but may be necessary for ambiguous file endings (e.g. .m is used for Objective-C and Matlab).
  • Put quotes around literal terms, unless you want to search for multiple keywords. E.g. searching for "hello world" will only yield results where both words are connected by a whitespace, but hello world will yield results where either word appears.

To learn more about writing precise queries, see our search query syntax.

Increase the timeout

By default, search queries have a one minute timeout. This value is capped by the setting search.limits.maxTimeoutSeconds which by default is also one minute.

Your site admin can change this value to increase the maximum timeout. Once it is increased, you can use the search parameter timeout: to give the query more time to run.

Example:

SHELL
timeout:10m

Note that very long searches can have a negative impact on performance, so it's important to use this parameter with caution.

Other errors

For other errors, please reach out to our support team through your usual channels or at [email protected].