Code search turned code checker
I find code checkers like linters and lightweight static analyzers most valuable when they teach me better ways to code in a language or framework. For example, the Go staticcheck tool finds expensive string comparisons like:
and suggests instead:
These short-and-sweet replacements (even if not always semantics-preserving) are a great way to learn framework idioms or library functions, like strings.EqualFold
in Go.1 And as a codebase grows, small inefficiencies and inconsistencies compound. Code patterns creep in that affect readability and performanceβand it matters.
Running code checks quickly, easily, and comprehensively
Tools like staticcheck
and linters need setting up: a local clone or project
build on your machine, or a continuous integration (CI) or editor set up. When I
learn about patterns like strings.EqualFold
, I want to know where else they
might be lurking: in my code, in my team's code, or in open source projects. To
do that I really need a lightweight workflow, not something that needs cloning
repositories, CI or editor setup. Too much hassle. What I'm really after is a
nimble way to find patterns in a bunch of code over many repositories, quickly
and comprehensively. Something that feels a lot more like searching code than
running analyzers.
Naturally, a plaintext search with grep
can find snippets of EqualFold
calls. In practice though, this plaintext treatment can't offer the fidelity of
dedicated checkers that understand more about code structure and type
information. But I believe there's a midway. What about a lightweight wokflow
where that EqualFold
check is a simpler but comparably effective search query
that could completely eradicate all those inefficient comparisons in my
code, my organization's code, or even all of open source code?
Example: turning code checks into code search queries
Earlier this year Sourcegraph introduced structural search
to search over code syntax. Structural search uses comby to
implement a basic building block in traditional code checkers: it interprets
programs as concrete syntax trees, not just plaintext.
Using file filters and our new support for or
clauses,
it's possible to write configurable code checks as
self-contained search queries. Let's explore this idea!
Here's a search query inspired by
a check where strings.Index
comparisons can be replaced with strings.Contains
:
This query matches all .go
files, excluding file paths that contain test
,
vendor
, and Godeps
. It's sensible to exclude these file paths if we
want to actually propose changes to a project (more on that later). The patterns
strings.Index(..., ....)
match syntax of strings.Index
calls, and the ...
ellipses are special placeholders that match at least two arguments.2
The or
keyword separates individual patterns.
We can search over the top 100 Go repositories on GitHub (by stars) those by
adding repogroup:go-gh-100
to the query. Have a look at some of the results:
π Find ways to improve code in popular Go projects βSide note: our multiline query editor is in a proof-of-concept phase.
The query finds matches in some of the most popular Go projects in a couple of seconds. An exhaustive search shows that there are more than 10 matches at the time of writing. For this flavor of syntactic change, I have a good sense that these are real hits of code that we can fix up.
Turning more code checks into search queries
Because structural search only looks at syntax, it can't yet operate at the
level of a tool like staticcheck
, which knows more about static properties
like type information and variable scope to implement checks. At the same time,
staticcheck
isn't a search tool, it's an entire toolset that includes a suite
of pre-written, high-precision checks that's very effective in certain
workflows, like CI. The question is not necessarily whether a search tool can
achieve parity with a tool like staticcheck
. But given the overlap with
now-expressible search queries, I wanted to know how this search workflow stacks
up: how far can we push structural code search to find similarly actionable
code checks? I.e., checks that match real cases of code waiting to be improved,
minus the hassle.
Approach
So, taking inspiration from staticcheck
, I wanted to see how many of its
checks translate to search queries that I could have high confidence in (i.e.,
all patterns find legitimate issues; zero or very-near-zero false positives). I
chose to look at staticcheck
for its clear documentation, which made it easy
to find examples.3 I ran my search queries against staticcheck
's
own test files to check that they don't match unintended patterns (false
positives) and don't miss real patterns (false negatives). Each check may have
more than one syntactic variant, so I tried to implement patterns for as many
variants as I could find in tests. It's a neat exercise to develop patterns
against the reference tests and discover which variants to cover, all in a
self-contained search webapp. Here's an example where the query matches all the
true hits in the test file, annotated with // want strings.Contains ...
:
π Example query to match known patterns in test filesβ
Using the test files isn't a guarantee that I've implemented all the checks or that it's entirely precise, but it adds a lot more confidence than inferring patterns only from documentation.
Results
Without making any claims about completeness, I was able to implement
at least one variant for 20 out of 34 checks that I feel confident
about. I relied only on patterns in staticcheck
's test data to discover syntactic
variants for checks, so I don't know if I covered all the patterns that
staticcheck
implements in its code.
The majority of checks that I couldn't write required type information to implement correctly (13 of 34). Other checks I couldn't write required more complex syntax matching rules (8 of 34). This table roughly quantifies the expressive needs for implementing checks:
Description | # of checks |
---|---|
Total | 34 |
Works (at least one variant) | 20 |
Works (all variants) | 11 |
Needs type info | 13 |
Needs additional syntax matching | 8 |
You can see and run the final query here:
π Run the query for all code checks on popular Go projects β
Side note: patterns run in order and search stops after finding at least 30 matches. Feel free to delete some patterns to see others in action.
It was tempting to compare more directly by downloading the
100 Go repositories to disk, running staticcheck
, and comparing matches.
For staticcheck
to be effective, the project typically needs to be built first
(my experience was that running staticcheck
on individual files can be
hit-and-miss, and understandably so). I didn't like the idea of doing all that
work, so I punted. That's really the takeaway of these results: many useful checks can fit into
a lightweight search workflow that scales and generalizes to a lot of projects, without
ad-hoc setup for project builds and dedicated tools.5
Extending search queries to access static properties like type information is a also natural extension for writing better code checks, and an area of code intelligence work that we're exploring.
Code checks beyond dedicated tools
The direct comparison to staticcheck
is interesting, but the ease of
a search workflow means there are also different benefits over dedicated tools. For
example, in just the last couple of weeks I learned about a more elegant way to
write code for appending bytes in Go.
A pattern like this:
can instead be written:
This one isn't available in staticcheck
, but I could immediately implement the check and find hits using Sourcegraph:
π Find appends of three or more bytes β
There's also the notion of project-specific checks
that will never make it into a general tool like staticcheck
. For example, in
our Sourcegraph codebase we've mostly moved away from testing values with !=
and
instead use cmp.Diff
.
But I know there are still places where we do comparisons the old way. This
query highlights some of the remnants, and how how the inner block uses the
compared values:
π file:test if want != got {...} β
So code checks can also be specific and customizable to your organization that you won't find in off-the-shelf tools.
Takeaway
Code search can be so much more than finding your favorite provider called
ChocolateCake
identifierβ.
It can be a lightweight workflow for revealing short-and-sweet ways to make your
code better. Perhaps the most powerful idea is that a universal code search can
wholesale find and eradicate the code slips we're always bound to make.
Starting small, the solution could look like a search query that finds issues in active and popular Go projects at just the push of a button. Why not start there? I'm excited about the idea that we can run that query for everything at once:
π Run all the code checks on popular Go projects β
The great thing about code checks as queries is that it's easy to simply
delete patterns that we don't find as valuable. When I explored some
individual queries, it was also reassuring to discover that no patterns occur
in any of the Go repositories. For example,
S1035 checks that there are unneeded
http.CanonicalHeaderKey
calls on the first argument of certain functions:
There are no matches β for this pattern in the Go repositories, but there are some matches β in vendored files, when we we remove the not file:vendor
filter.
Proposing changes
The most exciting part of finding real hits with any code checking tool is that we discover an actionable way to improve the code! I've narrowed down the search query so that it (hopefully) finds only real and uncontroversial patterns to improve and contribute to active Go projects.
Part of an effective contribution means that we need to avoid matches in files that are tests, vendored, or external dependencies. While anyone can use the results to make contributions, it's important to be mindful of contributor guidelines, clearly communicate and motivate proposed changes in pull requests, and validate that the change passes a project's tests or CI checks. And, while the query does exclude common test and vendored files, it's best to check that matches occur in files that really are part of the project.
If you're interested in turning the results of the complete query into open source contributions, e-mail me at [email protected], I can help.
What's next for search queries
There are many languages and code checkers to explore, and this post only skims the surface. We want to make high-quality code checks available for your projects and languages. If you have a project or query in mind, feel free to open an issue on GitHub with your thoughts. We're still working on performance and usability enhancements so that you can search your code and not just popular projects, so stay tuned! In the meantime, you can find an all-in-one config for replacing the patterns on your local machine at comby-tools/go-patterns. If you want to make large scale, org-wide changes, talk to us about Batch Changes. And if you find code checking valuable and want to learn more about our work at Sourcegraph, reach us at [email protected]. Happy searching!
1 Another one of my favorite tools using the same principle is Kibit for Clojure.
2 See the structural search reference for more details about special syntax.
3 These are all the simple static check of the form S<number>
, excluding SA
and ST
checks.
4 There's plenty of opportunity to pick other tools or languages, I picked staticcheck
because its the nicely illustrative.
5 Also, it probably makes sense for these projects to run staticcheck
in their CI, and a quick search helps make it more obvious which ones do not π.
Acks: Thanks @lguychard, @thorstenball, @stefanhengl @beyang, @sqs for feedback on the content of this post.