Back up or migrate Sourcegraph data to a new instance

In some circumstances it may be necessary or advantageous to migrate from one Sourcegraph instance or deployment to another. This page describes how to execute such a migration.

Specific guides

Data stores

While much of Sourcegraph's data can be regenerated, some state can be stored in multiple locations.

Configuration JSON

Most parts of Sourcegraph's configuration are managed in the web app via text files. These files are typically stored in the Postgres database (described below), but are translated into text for editing in the web UI.

These files are the most essential pieces of information required for a migration to work.

DataCan it be recreated without a backup?Notes
Site configurationNoThis file contains key configuration that defines how the product works.
Code host connection configuration(s)NoEach connection to an external code host has its own short configuration file.
Global settingsNoDefault settings can be set by administrators for all users by editing this file.

Backing up this data is as simple as copy-pasting the text from the files described above on the old Sourcegraph instance into the new one.

Internal database (PostgreSQL)

Sourcegraph's internal database stores most of Sourcegraph's state. While most of this data can be restored after a migration, some cannot.

This list is not guaranteed to be complete, but rather representative of the types of data stored here.

DataCan it be recreated without a backup?Notes
Repository metadata (e.g. clone URLs, whether it is a fork or archive, etc.)Yes
User accountsYes (if using SSO authentication), No if using builtin authentication
Repository permissionsYes
OrganizationsNo
User and org settingsNoGlobal settings can be backed up as described above, but user-level and org-level settings cannot.
Saved searchesNo
User-generated access tokensNo
Batch ChangesNo
Code graph metadataYes (if manually regenerated)This can be regenerated by re-running the indexing and upload process for affected repositories and revisions, but will not be regenerated by default.
User survey responsesNo
Usage statistics and event logsNoEvent logs allow admins to track and audit usage, but are not necessary for Sourcegraph to work.

Data stored on disk

Git data, search indexes, precise code-intel data, Prometheus metrics, and some other large data sources are stored on disk.

This list is not guaranteed to be complete, but rather representative of the types of data stored here.

DataCan it be recreated without a backup?Notes
Repository (git) dataYes
Search indexesYes
Code graph dataNoThis can be regenerated by re-running the indexing and upload process for affected repositories and revisions, but will not be regenerated by default.
Prometheus metricsNo
blobstoreYesThis is where unprocessed uploads are stored.

Ephemeral data (Redis)

Short-lived data, including session data and some usage statistics, are stored in Redis. This data can all be recreated without backups.

External data

Certain categories of data can be stored outside the Sourcegraph deployment. For example, configuration JSON files can be loaded from disk, and Sourcegraph can connect to external services (PostgreSQL, Redis, S3/GCS) instead of using PostgreSQL, Redis, and blobstore internally.

In these cases, no migration should be necessary, simply re-use the existing external data sources on the new Sourcegraph instance.

Migration and backup options

Option 1: Configuration only

The easiest option is to simply back up or migrate configuration JSON data. Simply back up (by copying) the configuration files listed above, and they can be pasted into a new Sourcegraph instance's UI after startup.

Option 2: All Postgres data

This option provides a more complete backup, and ensures that almost all state will be restored. Repositories will have to be re-cloned and re-indexed, so some downtime will be required while these operations complete.

Follow the instructions in our Docker to Docker Compose migration guide to generate a dump of Sourcegraph's Postgres database. Contact us for specific recommendations for your deployment type.

Option 3: All data

Backing up all persistent volumes is the most complete option. Instructions for doing this depends on the deployment method and the cloud host. Contact us to discuss more.

Persistent data backup in Kubernetes

Please use the below table for reference when migrating your data from a Kubernetes Cluster:

NameRe-creatableNotes
codeinsights-dbYes
codeintel-dbYesWhile the data is re-creatable, we suggest including the disk during your migration as it often contains a lot of data that would take awhile to regenerate
indexed-searchYes
gitserverYes
grafanaYes
blobstoreYes
pgsqlNOThis is the main database of Sourcegraph where most of the data are stored
prometheusYES
redis-cacheYES
redis-storeYES