May 21, 2026

Matt Tanner

Code Complexity: Metrics, Examples, and How to Reduce It

A practical guide to code complexity: how cyclomatic and cognitive complexity are measured, the refactorings that actually reduce it, and how to track complexity patterns across a large codebase.

A piece of code that took ten minutes to read last month takes thirty today. A bug fix that should be ten lines becomes a hundred. A new engineer asks, "Why does this branch even exist?" and nobody on the team can answer. That's code complexity making itself felt, and in software development, it's not the same thing as algorithmic complexity. Big-O describes how an algorithm's resource use grows as input size increases; code complexity measures how hard a piece of code is to read, change, and trust. It predicts where the next defect lands and how long onboarding takes. This guide covers what code complexity is, the code complexity metrics that quantify it, the workflows that reduce code complexity, and how to track complexity-related patterns across a software system too large for any one engineer to hold in their head.

What Is Code Complexity?

Code complexity refers to a measurable property of source code that describes how difficult it is for a human to read, understand, modify, and test. Higher code complexity correlates with higher defect density, slower onboarding, and more painful refactoring. It is a quantitative measure of software quality distinct from computational complexity, which describes algorithmic runtime.

That distinction matters because Google's top results for the phrase mix the two. If you came here looking for Big-O or "code complexity O(1)," you're on the wrong page. The complexity we care about here lives in the human reader, not the CPU. The most common metrics count code paths through control flow structures, nesting depth, and the symbols a developer must hold in working memory.

Why Code Complexity Matters

High code complexity is a leading indicator. CodeScene's research on the Bumpy Road code smell puts it bluntly: "Given these cognitive limitations, it's no wonder that nesting complexity has a high correlation to defects." Their machine-learning work on their Code Health metric found something stronger and arguably more controversial: when they trained an ML model to predict defects, the cyclomatic complexity score was weighted very low, while nested complexity weighed in heavily.

The downstream effects compound. High complexity in a piece of code creates more execution paths, which raises testing effort and the number of execution paths a test suite must cover. It onboards new engineers slowly. High code complexity resists refactoring and slows code review, because reviewers have to fit the function in their heads to spot a problem. Code complexity directly relates to defect rates: high complexity drives bugs, and bugs drive emergency patches that increase complexity without removing any.

The Main Code Complexity Metrics

No single software metric captures the entire phenomenon. Each code complexity metric measures a different aspect of "hard to read." The table below summarizes the trade-offs, and the subsections describe each one.

Metric	What it measures	Strengths	Weaknesses
Cyclomatic complexity	Linearly independent paths through a function	Easy to compute, links directly to test coverage	Ignores nesting and readability
Cognitive complexity	Mental effort required to follow the code	Penalizes nesting, matches developer intuition	Less standardized across tools
Halstead complexity measures	Operators and operands	Language-agnostic, gives a code volume estimate	Sensitive to coding style, less actionable
Lines of code	Sheer size	Trivial to compute, useful as a basic measure	A crude proxy; long ≠ complex
Maintainability index	Composite complexity score	Single number for dashboards	Hides which signal is firing
Coupling and cohesion	Structural dependencies	Surfaces structural complexity	Hard to measure consistently across languages

Cyclomatic complexity (McCabe)

Cyclomatic complexity, introduced in Thomas McCabe's 1976 paper "A Complexity Measure," quantifies the number of linearly independent paths through a program's source code. The definition is graph-theoretic: build a control flow graph from a program's source code, then compute M = E − N + 2P, where E is the number of edges, N the number of nodes, and P the number of connected components. For a single function with one entry and one exit, P is always 1, so the formula collapses to M = E − N + 2.

There's a friendlier way to read the same number. In many structured-language implementations, the number of linearly independent paths equals the number of decision points (if statements, while, for, case, catch, ternary, short-circuit operators) plus one, though exact increments vary by language and analyzer. Most static analyzers compute cyclomatic complexity scores by walking the AST and counting branches in the control flow graph.

Cyclomatic complexity control flow graph diagram

McCabe's original recommendation was to limit cyclomatic complexity to 10 and split any module that exceeds it. NIST's Structured Testing methodology later confirmed that the threshold of 10 had "received substantial corroborating evidence," while noting that some circumstances justify going as high as 15 with a written explanation.

Cognitive complexity (Sonar)

Sonar's cognitive complexity metric measures the mental effort required to read a piece of code, fixing where cyclomatic complexity fails as a maintainability signal. As Sonar's white paper frames it, cyclomatic complexity "excels at" measuring testability, but its underlying mathematical model "is unsatisfactory at producing a value that measures" maintainability. The reason is that "methods with equal Cyclomatic Complexity do not necessarily present equal difficulty to the maintainer."

Cognitive complexity follows three rules. Ignore structures that let multiple statements be readably shorthanded into one. Increment by one for each break in the linear flow of the code. And increment again when flow-breaking control structures are nested. The third rule is what makes the metric track human intuition. A switch statement with twelve cases is annoying but easy to scan; complex code with three levels of nested structures is genuinely hard to hold in your head, which is exactly what cognitive load measures.

Halstead complexity measures

Maurice Halstead's 1977 Halstead complexity measures derive from four counts in source code: distinct operators (η1), distinct operands (η2), total operators (N1), and total operands (N2). From those numbers fall vocabulary, length, code volume (V = N × log₂η), difficulty, effort, and an estimate of delivered bugs. Halstead complexity values are language-agnostic and computed statically, which makes them useful when comparing pieces of code across programming languages. They're also style-sensitive (rename one variable and the operand count moves), so most engineering teams treat Halstead complexity as a directional signal.

Lines of code, maintainability index, and structural complexity

Three more signals round out the picture. Lines of code are crude but easy to read; a 5,000-line piece of code is complex by definition, regardless of branching. The maintainability index combines Halstead volume, cyclomatic complexity, and lines of code into a 0-100 complexity score that's convenient for dashboards but hides which input is moving when the maintainability index drops. Coupling and cohesion capture structural complexity in how different modules depend on each other (coupling) and how tightly responsibilities sit inside a module (cohesion). They don't show up inside any single function, but they appear the first time you try to extract a service or swap a dependency.

Code Complexity Examples (Good vs. Bad)

Two functions can have identical cyclomatic complexity scores but feel very different to read. The first code example below is a deliberately over-nested validator with overly complex logic; the second is the same logic flattened. Together, they show how nesting drives software complexity even when the path count doesn't change:

def validate_order_bad(order):
    if order is not None:
        if order.items:
            if order.user:
                if order.user.is_active:
                    for item in order.items:
                        if item.in_stock:
                            if item.price > 0:
                                continue
                            else:
                                return "bad_price"
                        else:
                            return "out_of_stock"
                    return "ok"
                else:
                    return "inactive_user"
            else:
                return "no_user"
        else:
            return "no_items"
    else:
        return "no_order"

Here's the same code example flattened with guard clauses:

def validate_order_good(order):
    if order is None:
        return "no_order"
    if not order.items:
        return "no_items"
    if not order.user:
        return "no_user"
    if not order.user.is_active:
        return "inactive_user"
    for item in order.items:
        if not item.in_stock:
            return "out_of_stock"
        if item.price <= 0:
            return "bad_price"
    return "ok"

The cyclomatic complexity score is roughly the same in both functions. The cognitive complexity is dramatically lower in the second, because the nested structures are gone. This is exactly the gap Sonar's metric was designed to expose. The first version is more complex code that takes minutes to skim; the flattened version takes seconds.

How to Measure Code Complexity (Tools)

Most engineering teams approach measuring code complexity in three places: their editor, their CI pipeline, and a codebase-wide dashboard.

Language-specific tools

Single-language teams usually start by measuring code complexity at the editor. Python developers reach for radon or flake8's mccabe plugin. JavaScript and TypeScript teams use ESLint's complexity rule. Polyglot teams often standardize on lizard, a tool that supports a long tail of programming languages with one CLI. Java teams have Checkstyle and PMD. Each computes cyclomatic complexity locally and surfaces it as a lint warning, which keeps the complexity score in front of developers as they're writing code and helps catch unnecessary complexity early in the development process.

Platform tools

SonarQube and CodeScene approach measuring code complexity at the project level. Both compute cyclomatic and cognitive complexity, surface hotspot reports, and trend the numbers over time within a single software project. Sonar's contribution to software metrics is cognitive complexity itself; CodeScene's is the behavioral-code-analysis angle that ties measuring code complexity to where engineers actually spend their time. Either complements the automated code review tools your CI already runs.

Codebase-wide complexity tracking

Dedicated analyzers are strong at calculating code complexity inside the repositories they analyze: cyclomatic complexity, cognitive complexity, Halstead complexity, and code health. There's a different layer of complexity management most teams have no tool for: measuring code complexity through search-driven views of custom patterns across many repositories and code hosts. That's the layer Sourcegraph Code Insights addresses, and the dedicated section below covers how.

How to Reduce Code Complexity

Reducing complexity in real codebases is a craft. To reduce code complexity in practice, four refactorings handle most cases of complex code.

Extract method/function decomposition

Pull a logical chunk out of a long, complex function and give it a name. CodeScene's research on the Bumpy Road smell calls this out directly: "the EXTRACT METHOD refactoring is the primary weapon of counter-attack." Naming is half the win. A well-named helper turns ten lines of arithmetic into one English noun phrase the reader can skip past, which improves code readability and supports code reuse in the long run.

Reduce nesting (early returns, guard clauses)

The validator example above is the canonical case. Nesting taxes working memory because every open brace is a frame the reader has to track. Cognitive psychology research cited by CodeScene puts working memory at around 3-4 items, depending on the type of information. Three levels of nested control structures is already at the ceiling for most readers. Inverting the conditional and returning early is almost always cheaper than the indentation you save.

Replace conditionals with polymorphism/lookup tables

A 17-case switch on a string type is a polymorphism opportunity in disguise, as is a chain of if isinstance(x, T) calls. Replacing conditional statements with a dispatch table or a small class hierarchy moves the branching out of the function and into the type system: an object-oriented programming pattern that improves code structure and lowers cognitive load.

Apply single responsibility / split classes

When a class is doing two things, no amount of internal refactoring will simplify it past a certain floor. Split it. The methods that are coordinated across responsibilities become focused, and their complexity drops. Complex code in legacy systems is the most rewarding place to apply all four techniques: reducing complexity there delivers the highest marginal benefit, chips away at technical debt, and leaves more maintainable code in its place.

Tracking Code Complexity Across a Large Codebase

We're not a replacement for language-specific complexity analyzers. Sourcegraph complements them by helping engineering teams find, monitor, and trend complexity-related patterns across many repositories: long-lived deprecated APIs, deeply nested complex code, large files, duplicated helper patterns, migration progress, and code smells expressible as search queries. Per-PR complexity is largely solved with lint rules and CI gates; the harder layer is the organization-wide view. If your software system spans hundreds or thousands of repositories, the question of which services are accumulating high code complexity fastest lives across all of them rather than inside any one software project.

This is what we built Sourcegraph Code Insights for: tracking anything you can express as a Sourcegraph search query, across thousands of repositories. That includes migrations, package use, version adoption, code smells, codebase size, and more. Write a search query for a complexity-related signal (long files, deeply nested patterns, risky API usage, deprecated abstractions), and Code Insights turns it into a trend chart across repositories, backfills historical data, and keeps the view updated as commits land. The Svelte 4 → Svelte 5 migration tracker on our homepage is the same pattern applied to a framework upgrade. Swap the Svelte query for a complexity-pattern query, and you get the same trend view, without the unnecessary complexity of spreadsheets.

Code Monitoring closes the loop. It supports one trigger: when new search results are detected for a particular search query. We run the query periodically over new commits and fire an action (email, Slack, or webhook) when new results match. Wire that to a query that flags new deeply nested constructs, and teams can be notified when that pattern appears in merged code, a useful complement to traditional metrics in CI.

When the right move is a refactor at scale, Batch Changes can apply a defined change across many repositories, create changesets for the target repositories, and let teams review, update, and merge those changes through their normal code-host workflow.

Code Complexity in the AI Era

In modern software development, AI-generated code makes complexity tracking more important, not less. Agent-written code goes wrong in two ways. The first is obvious: agents produce confidently-wrong code that compiles, passes the tests it wrote, and quietly adds three levels of nesting or a defensive try/except block around a function that already handled the error. The second is structural. Agents without enough context tend to add new helpers next to existing ones rather than extending the original abstraction. Over time, the software system accumulates excessive complexity from near-duplicates that look fine in isolation but increase complexity in aggregate.

The fix isn't to slow agents down. It's to give them a richer context about the codebase they're working in. That's the bet behind our MCP server and our Amp coding agent: giving models more of the surrounding codebase, so they're more likely to find existing patterns, related abstractions, and similar implementations before writing new code. Pair that with codebase-wide tracking of complexity-related signals, and you get a feedback loop that catches AI-introduced complexity as a measurable trend, not just as individual PR-level warnings.

Our own CodeScaleBench research found that giving agents real code-intelligence tooling (rather than naive grep) materially improves cross-repo task performance. The same intuition applies to complexity: if you can't see it across the org, you can't reduce code complexity there either.

Conclusion

Understanding code complexity matters because it's measurable, predictive of defects, and reducible. Each metric captures a different aspect of the problem, the refactorings to reduce code complexity are well-understood, and the tooling to keep code quality in check is mature inside a single repository. The gap most engineering teams have is the codebase-wide view: which services are getting worse, which teams are getting better, and which software engineering initiatives actually moved the numbers. That's where we built Sourcegraph Code Insights to help. To see code complexity trends across your own codebase, schedule a demo.

Frequently Asked Questions

What is a good cyclomatic complexity?

McCabe's original recommendation is to limit cyclomatic complexity to 10 or below and split anything that exceeds it. NIST's Structured Testing methodology accepted 10 as the working threshold and allowed 15 in justified cases. Many engineering teams start with 10 as a review threshold, then tune the gate by language, codebase age, and risk tolerance.

How do I reduce code complexity?

Four refactorings cover most cases: extract method, reduce nesting with early returns, replace conditional statements with polymorphism or lookup tables, and split classes that violate single responsibility. Track complexity over time so you can tell whether your refactors are working at the codebase level, not just inside one function. Pair that with regular code reviews and automated code review tools to catch overly complex logic before it lands.

What's the difference between cyclomatic and cognitive complexity?

Cyclomatic counts paths; cognitive measures how hard the code is for a human to follow. The difference is nesting. Cyclomatic treats one if statement the same at the top level or buried three levels deep. Cognitive penalizes the deeper one because high complexity in nested code taxes working memory and increases mental effort.

Does AI-generated code have higher or lower complexity?

It depends on the model and the prompting, but the failure mode to watch for is structural: agents working without full codebase context tend to add new helpers next to existing ones, accumulating near-duplicate code that each looks fine in isolation. Run complexity metrics in CI on AI-written code the same way you would for human-written code, and monitor trends across the codebase to maintain code quality.