Batch Changes design

Why is the Batch Changes feature designed the way it is?

Principles

Declarative API (not imperative). You declare your intent, such as "lint files in all repositories with a package.json file"
Define a batch change in a file (not some online API). The source of truth of a batch change's definition is a file that can be stored in version control, reviewed in code review, and re-applied by CI. This is in the same spirit as IaaC (infrastructure as code; e.g., storing your Terraform/Kubernetes/etc. files in Git). We prefer this approach over a (worse) alternative where you define a batch change in a UI with a bunch of text fields, checkboxes, buttons, etc., and need to write a custom API client to import/export the batch change definition.
Shareable and portable. You can share your batch specs, and it's easy for other people to use them. A batch spec expresses an intent that's high-level enough to (usually) not be specific to your own particular repositories. You declare and inject configuration and secrets to customize it instead of hard-coding those values.
Large-scale. You can run batch changes across 10,000s of repositories. It might take a while to compute and push everything, and the current implementation might cap out lower than that, but the fundamental design scales well.
Accommodates a variety of code hosts and review/merge processes. Specifically, we don't want Batch Changes to only work for GitHub pull requests. (See current support list.)

Comparison to other distributed systems

Kubernetes is a distributed system with an API that many people are familiar with. Batch Changes is also a distributed system. All APIs for distributed systems need to handle a similar set of concerns around robustness, consistency, etc. Here's a comparison showing how these concerns are handled for a Kubernetes Deployment and a batch change. In some cases, we've found Kubernetes to be a good source of inspiration for the Batch Changes API, but resembling Kubernetes is not an explicit goal.

	Kubernetes Deployment	Sourcegraph Batch Changes
What underlying thing does this API manage?	Pods running on many (possibly unreliable) nodes	Branches and changesets on many repositories that can be rate-limited and externally modified (and our authorization can change)
Spec YAML
How desired state is computed	Evaluate `replicas`, etc. to determine pod count and other template inputs Instantiate `template` once for each pod to produce PodSpecs	Evaluate `on`, `steps` to determine list of patches Instantiate `changesetTemplate` once for each patch to produce ChangesetSpecs
Desired state consists of...	DeploymentSpec file (the YAML above) List of PodSpecs (template instantiations)	batch spec file (the YAML above) List of ChangesetSpecs (template instantiations)
Where is the desired state computed?	The deployment controller (part of the Kubernetes cluster) consults the DeploymentSpec and continuously computes the desired state.	The Sourcegraph CLI (running on your local machine, not on the Sourcegraph server) consults the batch spec and computes the desired state when you invoke `src batch apply`. Difference vs. Kubernetes: A batch change's desired state is computed locally, not on the server. It requires executing arbitrary commands, which is not yet supported by the Sourcegraph server.
Reconciling desired state vs. actual state	The "deployment controller" reconciles the resulting PodSpecs against the current actual PodSpecs (and does smart things like rolling deploy).	The "Batch Changes controller" (i.e., our backend) reconciles the resulting ChangesetSpecs against the current actual changesets (and does smart things like gradual roll-out/publishing and auto-merging when checks pass).

These docs explain more about Kubernetes' design: