I find code checkers like linters and lightweight static analyzers most valuable when they teach me better ways to code in a language or framework. For example, the Go staticcheck tool finds expensive string comparisons like:
go if strings.ToLower(string1) == strings.ToLower(string2) { ... }
and suggests instead:
go if strings.EqualFold(string1, string2) { ... }
These short-and-sweet replacements (even if not always semantics-preserving) are a great way to learn framework idioms or library functions, like strings.EqualFold in Go.1 And as a codebase grows, small inefficiencies and inconsistencies compound. Code patterns creep in that affect readability and performance—and it matters.
Running code checks quickly, easily, and comprehensively
Tools like staticcheck and linters need setting up: a local clone or project build on your machine, or a continuous integration (CI) or editor set up. When I learn about patterns like strings.EqualFold, I want to know where else they might be lurking: in my code, in my team's code, or in open source projects. To do that I really need a lightweight workflow, not something that needs cloning repositories, CI or editor setup. Too much hassle. What I'm really after is a nimble way to find patterns in a bunch of code over many repositories, quickly and comprehensively. Something that feels a lot more like searching code than running analyzers.
Naturally, a plaintext search with grepcan find snippets of EqualFold calls. In practice though, this plaintext treatment can't offer the fidelity of dedicated checkers that understand more about code structure and type information. But I believe there's a midway. What about a lightweight wokflow where that EqualFold check is a simpler but comparably effective search query that could completely eradicate all those inefficient comparisons in my code, my organization's code, or even all of open source code?
Example: turning code checks into code search queries
Earlier this year Sourcegraph introduced structural search to search over code syntax. Structural search uses comby to implement a basic building block in traditional code checkers: it interprets programs as concrete syntax trees, not just plaintext. Using file filters and our new support for or clauses, it's possible to write configurable code checks as self-contained search queries. Let's explore this idea!
Here's a search query inspired by a check where strings.Index comparisons can be replaced with strings.Contains:
```python language:go not file:test not file:vendor not file:Godeps
strings.Index(..., ...) > -1
or
strings.Index(..., ...) >= 0
or
strings.Index(..., ...) != -1 ```
This query matches all .go files, excluding file paths that contain test, vendor, and Godeps. It's sensible to exclude these file paths if we want to actually propose changes to a project (more on that later). The patterns strings.Index(..., ....) match syntax of strings.Index calls, and the ... ellipses are special placeholders that match at least two arguments.2 The or keyword separates individual patterns.
We can search over the top 100 Go repositories on GitHub (by stars) those by adding repogroup:go-gh-100 to the query. Have a look at some of the results:
The query finds matches in some of the most popular Go projects in a couple of seconds. An exhaustive search shows that there are more than 10 matches at the time of writing. For this flavor of syntactic change, I have a good sense that these are real hits of code that we can fix up.
Turning more code checks into search queries
Because structural search only looks at syntax, it can't yet operate at the level of a tool like staticcheck, which knows more about static properties like type information and variable scope to implement checks. At the same time, staticcheck isn't a search tool, it's an entire toolset that includes a suite of pre-written, high-precision checks that's very effective in certain workflows, like CI. The question is not necessarily whether a search tool can achieve parity with a tool like staticcheck. But given the overlap with now-expressible search queries, I wanted to know how this search workflow stacks up: how far can we push structural code search to find similarly actionable code checks? I.e., checks that match real cases of code waiting to be improved, minus the hassle.
Approach
So, taking inspiration from staticcheck, I wanted to see how many of its checks translate to search queries that I could have high confidence in (i.e., all patterns find legitimate issues; zero or very-near-zero false positives). I chose to look at staticcheck for its clear documentation, which made it easy to find examples.3 I ran my search queries against staticcheck's own test files to check that they don't match unintended patterns (false positives) and don't miss real patterns (false negatives). Each check may have more than one syntactic variant, so I tried to implement patterns for as many variants as I could find in tests. It's a neat exercise to develop patterns against the reference tests and discover which variants to cover, all in a self-contained search webapp. Here's an example where the query matches all the true hits in the test file, annotated with // want strings.Contains ...:
Get Cody, the AI coding assistant
Cody makes it easy to write, fix, and maintain code.