March 16, 2025

Matt Tanner

Multi-repo search: How to search across multiple repositories

When your codebase is spread across hundreds of repositories and multiple code hosts, finding critical code for security remediation or reuse can take days. Read the definitive guide to multi-repo search to compare your options, understand the approaches, and see how Sourcegraph provides the universal code search platform built for enterprise scale.

Your codebase isn't in a single repository. It's spread across hundreds of repos, multiple code hosts, and several teams that each own different services. When a critical vulnerability hits a shared library, you need to find every repository that imports it. When you're onboarding to a new team, you need to understand how a function is called across the entire system. Multi-repo search is the ability to search across multiple repositories simultaneously, and for engineering teams managing code at this scale, it's the difference between minutes and days.

This guide covers what multi-repo search is, why teams need it, how the major approaches compare, and how to configure it for your organization.

What is Multi-repo search?

Multi-repo search (also written as multi-repo search or multi-repository search) means running a single query that returns results from multiple repositories at once, rather than searching one repo at a time. Instead of opening GitHub, navigating to a repository, searching, going back, opening the next repo, and repeating, you type one query and see matches across your entire codebase.

That sounds simple, but the technical challenge is real. A search tool needs to index code across different code hosts (GitHub, GitLab, Bitbucket, Perforce), handle different authentication and access models, and fetch results fast enough that developers actually use it instead of falling back to grep on whatever they have cloned locally.

The distinction matters because most code hosts only search within their own platform. GitHub search covers your GitHub repositories. GitLab search covers your GitLab projects. If your organization uses both, or migrated from Bitbucket years ago and still has repos there, native search tools leave gaps. A dedicated multi-repo search tool like Sourcegraph Code Search indexes repositories across all your code hosts into a single searchable index, so one query covers everything.

There's also a difference between text-based multi-repo search and code-aware multi-repo search. Text-based search finds string matches. It tells you "this pattern appears in these files." Code-aware search goes further. With precise code navigation powered by indexing formats like SCIP (Source Code Intelligence Protocol), a search tool can navigate from an import statement in one repository to the function definition in another, track project references across repository boundaries, and distinguish between a function called process and a variable called process. Sourcegraph supports both: literal, regex, and structural search for pattern matching, plus SCIP-powered code navigation for precise cross-repository symbol resolution.

Why do teams need Multi-repo search?

The need for multi-repo search shows up the moment your organization grows past a handful of repos. Here are the scenarios that make it critical.

Security vulnerability remediation. When a CVE drops for a dependency your teams use, the clock starts. You need to find every repository that imports the vulnerable package, check which version they're running, and ensure patches are applied everywhere. Without multi-repo search, this means asking team leads to check their repos manually, or writing scripts that clone every repository and run grep. With Sourcegraph, you can run a regex query like file:package.json "vulnerable-lib":\s*"[~^]?1.2.3" patterntype:regexp and see every affected repository in seconds.

Code reuse and discovery. Large organizations solve the same problem multiple times because engineers don't know a solution already exists in another team's repo. Multi-repo search turns your entire codebase into a searchable knowledge base. Instead of building a new authentication wrapper from scratch, you can search for existing implementations, learn how other teams approached the same issue, and reuse what works.

Developer onboarding. New engineers joining a team with 200+ repositories face a steep learning curve. Multi-repo search lets them ask questions of the codebase directly: "Where is the payment processing logic?" "Which repos define gRPC services?" "How do other teams handle rate limiting?" Instead of searching issues or asking five people on Slack, a new user can open Sourcegraph and find answers immediately. Deep Search takes this further by accepting natural language questions like "How do we handle rate limiting in our APIs?" and returning structured, code-grounded answers with links to the specific searches and files it used.

Impact analysis before changes. Changing a shared library's API signature? Renaming a widely-used function? You need to understand every caller before you merge. Multi-repo search shows you the blast radius of changes by finding every reference across every repository. Sourcegraph's precise code navigation goes beyond text matching here. It uses SCIP indexes to resolve actual symbol references, so you see real function calls and imports, not just string matches that happen to contain the function name.

Compliance and audit. Regulated industries need to verify that deprecated API calls are no longer in use, that specific encryption standards are applied correctly, or that API keys aren't committed to source control. Multi-repo search makes these audits a query instead of a project. Sourcegraph's Code Insights can track these metrics over time with dashboards that show migration progress, vulnerability remediation rates, and codebase health across all repositories.

Multi-repo search approaches compared

Not all multi-repo search approaches are equal. The right choice depends on your organization's size, code host configuration, and what you need beyond basic text matching.

CLI tools (grep, ripgrep, and custom scripts). The simplest approach: clone your repositories locally, then use grep -r or rg (ripgrep) to search across directories. Ripgrep is fast and respects .gitignore files by default, making it a solid choice for searching a handful of local repos. The limitation is scale. This approach requires every repo to be cloned to your machine, doesn't work across teams or machines, has no web UI, no result ranking, and no code-aware understanding. For a team with 10 repositories, it's fine. For an organization with 500+ repos across multiple code hosts, it's not practical.

Native code host search (GitHub, GitLab). GitHub search supports substring, regex, and symbol search across repositories you have access to. GitLab offers its own search capabilities, including an Exact Code Search feature (powered by Zoekt) that is currently in limited availability for Premium and Ultimate tiers, where enabled. Both platforms are improving, and both have the same fundamental constraint: they only search within their own platform. If your code lives on GitHub and GitLab (common after acquisitions or for teams using GitLab CI with GitHub repos), native search gives you two separate, limited views of your codebase.

Dedicated code search tools. Tools built specifically for multi-repo search solve the cross-platform problem by indexing code from multiple sources into a unified search index. Sourcegraph is the primary example here: it connects to GitHub, GitLab, Bitbucket, and Perforce, indexes all your repos regardless of where they're hosted, and provides a single search interface. The result is that one query covers your entire codebase, with advanced capabilities like structural search (syntax-aware pattern matching), regex, symbol search, commit and diff search, and precise code navigation across repository boundaries.

The comparison breaks down like this:

Capability	CLI (grep/rg)	GitHub Search	GitLab Search	Sourcegraph
Cross-platform search	No (local only)	GitHub only	GitLab only	All code hosts
Scale (1000+ repos)	Impractical	Yes	Varies by tier	Yes
Precise code navigation	No	Limited	No	Yes (SCIP)
Regex support	Yes	Yes	Yes	Yes
Structural search	No	No	No	Yes
Commit/diff search	Via git log	Yes	Yes	Yes (dedicated)
Web UI	No	Yes	Yes	Yes
Search contexts	No	No	No	Yes

Key features of an enterprise Multi-repo search tool

When evaluating multi-repo search tools for your organization, these are the capabilities that separate a tool that works at enterprise scale from one that doesn't. Note that not every search tool implements all of these, so it's important to understand which details matter for your use case.

Universal code host support. The tool must connect to every code host your organization uses. Sourcegraph supports GitHub, GitLab, Bitbucket, and Perforce. This matters because enterprise codebases rarely live on a single platform, especially after mergers, acquisitions, or groups that chose different tools for different projects.

Structural search. Regular expressions match text patterns, but they don't account for code structure. Structural search (sometimes called syntax-aware search) lets you write queries that match code patterns rather than raw strings. For example, you can filter for a specific function call pattern and correctly ignore unrelated matches in comments or log statements. In Sourcegraph, structural search matches patterns regardless of whitespace, formatting, or variable naming, removing the false positives that happen with regex-only tools. For deeper semantic understanding of symbols (like distinguishing a function definition from a call), SCIP-backed code navigation provides that precision.

Code navigation across repositories. Searching finds where code appears. Code navigation tells you what it means. Sourcegraph's SCIP-powered code navigation provides go-to-definition and find-references that work across repository boundaries. Click a function imported from another package, and Sourcegraph takes you directly to the definition in whatever repository defines it, with version awareness so you see the correct implementation for the version your code depends on. This is implemented for many major languages, including Go, TypeScript, JavaScript, Python, Java, and C/C++, with the list growing as new SCIP indexers are added. You can also navigate to references from your IDE with the Sourcegraph extension, or right-click a symbol in the web UI to jump to its definition.

Search contexts. Large organizations don't always want to search everything by default. Search contexts in Sourcegraph let teams define named groups of repositories to search within. A frontend team can create a context that includes only their UI repositories. A platform team can create one that covers infrastructure repos. This keeps results relevant without losing the ability to search globally when needed.

Batch Changes for acting on search results. Finding a problem across 200 repositories is only half the job. Fixing it everywhere is the other half. Sourcegraph's Batch Changes lets you define a code transformation, run it across multiple repos, and track every resulting pull request through review and merge. This turns "find vulnerable package version" from a search result into an automated fix across your entire codebase.

Tracking and visibility with Code Insights. After running a search, teams often need to track progress over time. How many repositories still use the deprecated API? Is the Python 3 migration actually progressing? Sourcegraph Code Insights turns search queries into time-series dashboards, giving engineering leaders visibility into codebase-wide trends without relying on manual spreadsheets or surveys. You can generate these dashboards from any search query and share them with your account stakeholders.

How to set up Multi-repo search for your organization

Setting up multi-repo search depends on the approach you choose. Here's what each setup involves and how to configure it correctly for your instance.

Level 1: Local CLI search. If you're working with fewer than 20 repositories that fit on your laptop, start here. Create a parent directory, clone your repos into it, and use ripgrep to search across them:

This works for individual developers, but breaks down when you need team-wide search, repos you don't have cloned, or anything beyond text matching. You'll also need to authenticate with each code host separately to clone private repos, and there's no way to filter results across a branch or tag boundary.

Level 2: Native code host search. If all your code lives on a single platform, use what's built in. GitHub search supports queries like org:your-org language:python "import requests" to find all Python files importing the requests library across your organization. GitLab's group-level search does the same within a GitLab group. No setup is required, and the settings are minimal, but you're limited to what that platform provides.

Level 3: Dedicated code search with Sourcegraph. For organizations that need to search across multiple repos on different code hosts, need precise code navigation, or have more than a few hundred repositories, Sourcegraph is the purpose-built solution. Setup involves:

Deploy your Sourcegraph instance. Choose between Sourcegraph Cloud (hosted) or a self-hosted deployment on your infrastructure. Enterprise trials are available through the Get Started page.
Connect your code hosts. Add connections to GitHub, GitLab, Bitbucket, or Perforce through the admin interface. Sourcegraph uses OAuth or personal access tokens to authenticate and respects repository permissions from each code host. You'll configure the URL and access credentials for each account you want to connect.
Configure indexing. Sourcegraph automatically indexes your repositories for text search. For code-aware navigation (go-to-definition, find-references across repos), enable SCIP auto-indexing for your languages. Enterprise deployments support auto-indexing, which generates and uploads SCIP indexes automatically. Alternatively, add SCIP indexers to your CI/CD pipeline for manual control.
Set up search contexts. Create named repository groups so teams can scope their searches. For example, a frontend team context might include repo:github.com/org/ui-* OR repo:github.com/org/design-system.
Roll out to your team. Share the Sourcegraph URL with your engineering team. Open it in any browser and sign in. The web-based interface requires no local setup, no cloning, and no IDE plugins.

Multi-repo search best practices

Once you have multi-repo search running, these practices help your team get the most out of it.

Use search contexts to reduce noise. Searching 2,000 repositories when you only care about 50 produces noisy results. Define contexts for each team, project, or service area. In Sourcegraph, contexts persist across sessions, so your team doesn't re-filter every time they run a new query.

Combine search types for precision. Start with a broad literal search to understand the landscape, then narrow with regex or structural search. For example, first search deprecated_function to see how widespread usage is, then use a structural search to find only the call sites (removing matches in comments, string literals, or variable names). Sourcegraph's search syntax supports patterntype:structural for syntax-aware matching.

Build search-driven dashboards. Don't just search once and move on. For ongoing topics like migration progress, security vulnerability remediation, or coding standard adoption, convert your searches into Code Insights dashboards. These track your query results over time, so you can show stakeholders that the Log4j vulnerable versions dropped from 47 repositories to 3 over the past quarter.

Integrate search into your incident response. When a production incident points to a specific error message or log pattern, multi-repo search should be the first tool you reach for. Create bookmarked searches or saved queries for common incident patterns. For example, if you see an error in your request logs, a quick multi-repo search can help you find which commit introduced it. Sourcegraph's Code Monitors can even alert you automatically when new code matching a pattern (like a known vulnerability signature) appears in any repository.

Use Batch Changes to act on what you find. Searching is only valuable if you can act on the results. When a search reveals a problem across dozens of repositories, define a Batch Change to fix it everywhere. This closes the loop from "found it" to "fixed it" without manual pull requests in each repo.

Share search URLs with your team. Sourcegraph search queries are shareable URLs. When you find something relevant during an investigation, copy the search URL and paste it into Slack or your incident channel. Your teammates can click, open the same results instantly, and see live updates as code changes.

Conclusion

As codebases grow across more repositories and more code hosts, the ability to search across multiple repos from a single interface becomes a foundational capability for engineering teams. Multi-repo search is what makes security remediation take minutes instead of weeks, what lets new engineers understand a system without interrupting five people, and what turns code reuse from an aspiration into a daily practice.

CLI tools work for small-scale local searches. Native code host search works if everything lives on one platform. For organizations with hundreds or thousands of repositories across GitHub, GitLab, Bitbucket, and Perforce, Sourcegraph provides universal code search with precise cross-repository navigation, and the ability to act on search results with Batch Changes.

Get started with Sourcegraph to search across all your repositories from a single interface, or explore how Code Search works to see what's possible when every repo is searchable.