Model Configuration

Learn how to configure Cody via modelConfiguration on a Sourcegraph Enterprise instance.

modelConfiguration is the recommended way to configure chat and autocomplete models in Sourcegraph v5.6.0 and later.

The modelConfiguration field in the Site config section allows you to configure Cody to use different LLM models for chat and autocomplete, enabling greater flexibility in selecting the best model for your needs.

Using this configuration, you get an LLM selector in the Cody chat with an Enterprise instance that allows you to select the model you want to use.

The LLM models available for a Sourcegraph Enterprise instance offer a combination of Sourcegraph-provided models and any custom models providers that you explicitly add to your Sourcegraph instance site configuration.

`modelConfiguration`

The model configuration for Cody is managed through the "modelConfiguration" field in the Site config section. It includes the following fields:

Field	Description
`sourcegraph`	Configures access to Sourcegraph-provided models available through Cody Gateway.
`providerOverrides`	Configures access to models through your keys with most common LLM providers (BYOK) or hosted behind any openaicompatible endpoint
`modelOverrides`	Extends or modifies the list of models Cody recognizes and their configurations.
`selfHostedModels`	Adds models to Cody’s recognized models list with default configurations provided by Sourcegraph. Only available for certain models; general models can be configured in `modelOverrides`.
`defaultModels`	Specifies the models assigned to each Cody feature (chat, fast chat, autocomplete).

Getting started with `modelConfiguration`

The recommended and easiest way to set up model configuration is using Sourcegraph-provided models through the Cody Gateway.

For a minimal configuration example, see Configure Sourcegraph-supplied models.

Sourcegraph-provided models

Sourcegraph-provided models, accessible through the Sourcegraph Model Provider (Cody Gateway), are managed via your site configuration.

For most administrators, relying on these models alone ensures access to high-quality models without needing to manage specific configurations. See Supported LLMs for the full list of available models.

The use of these models is controlled through the "modelConfiguration.sourcegraph" field in the site config.

Sourcegraph-provided models through Cody Gateway have context window limitations. To access models with extended context windows (such as Claude Sonnet 4's 1M token context), configure BYOK (Bring Your Own Key) models using providerOverrides and suitable contextWindow configuration. See the examples page for BYOK configuration details.

Configure Sourcegraph-provided models

The minimal configuration for Sourcegraph-provided models is:

JSON
"cody.enabled": true,
"modelConfiguration": {
  "sourcegraph": {}
}

The above configuration sets up the following:

Sourcegraph-provided models are enabled (sourcegraph is not set to null)
Requests to LLM providers are routed through the Cody Gateway (no providerOverrides field specified)
Sourcegraph-defined default models are used for Cody features (no defaultModels field specified)

There are three main settings for configuring Sourcegraph-provided LLM models:

Field	Description
`endpoint`	(Optional) The URL for connecting to Cody Gateway. The default is set to the production instance.
`accessToken`	(Optional) The access token for connecting to Cody Gateway defaults to the current license key.
`modelFilters`	(Optional) Filters specifying which models to include from Cody Gateway.

Disable Sourcegraph-provided models

To disable all Sourcegraph-provided models and use only the models explicitly defined in your site configuration, set the "sourcegraph" field to null as shown in the example below.

JSON
"cody.enabled": true,
"modelConfiguration": {
  "sourcegraph": null, // ignore Sourcegraph-provided models
  "providerOverrides": {
    // define access to the LLM providers
  },
  "modelOverrides": {
    // define models available via providers defined in the providerOverrides
  },
  "defaultModels": {
    // set default models per Cody feature from the list of models defined in modelOverrides
  }
}

Default models

The "modelConfiguration" setting includes a "defaultModels" field, which allows you to specify the LLM model used for each Cody feature ("chat", "fastChat", and "codeCompletion"). The values for each feature should be modelRefs of either Sourcegraph-provided models or models configured in the modelOverrides section.

If no default is specified or the specified model is not found, the configuration will silently fall back to a suitable alternative.

Model Filters

The "modelFilters" section allows you to control which Cody Gateway models are available to users of your Sourcegraph Enterprise instance. The following table describes each field:

Field	Description
`statusFilter`	Filters models based on their release status, such as stable, beta, deprecated or experimental. By default, all models available on Cody Gateway are accessible.
`allow`	An array of `modelRef`s specifying which models to include. Supports wildcards.
`deny`	An array of `modelRef`s specifying which models to exclude. Supports wildcards.

The following examples demonstrate how to use each of these settings together:

JSON
"cody.enabled": true,
"modelConfiguration": {
  "sourcegraph": {
    "modelFilters": {
      // Only allow "beta" and "stable" models.
      // Not "experimental" or "deprecated".
      "statusFilter": ["beta", "stable"],
 
      // Allow any models provided by Anthropic, OpenAI, Google and Fireworks.
      "allow": [
        "anthropic::*", // Anthropic models
        "openai::*", // OpenAI models
        "google::*", // Google Gemini models
        "fireworks::*", // Open-source models hosted by Sourcegraph on Fireworks.ai (typically used for autocomplete and Mistral Chat models)
      ],
 
      // Example: Do not include any models with the Model ID containing "turbo",
      // or any models from a hypothetical provider "AcmeCo"
      "deny": [
        "*turbo*",
        "acmeco::*"
      ]
    }
  }
}

Provider Overrides

A provider is an organizational concept for grouping LLM models. Typically, a provider refers to the company that produced the model or the specific API/service used to access it, serving as a namespace.

By defining a provider override in your Sourcegraph site configuration, you can introduce a new namespace to organize models or customize the existing provider namespace supplied by Sourcegraph (e.g., for all "anthropic" models).

Provider overrides are configured via the "modelConfiguration.providerOverrides" field in the site configuration. This field is an array of items, each containing the following fields:

Field	Description
`id`	The namespace for models accessed via the provider.
`displayName`	A human-readable name for the provider.
`serverSideConfig`	Defines how to access the provider. See the section below for additional details.

Example configuration:

JSON
"cody.enabled": true,
"modelConfiguration": {
  "sourcegraph": {},
  "providerOverrides": [
      {
        "id": "openai",
        "displayName": "OpenAI (via BYOK)",
        "serverSideConfig": {
          "type": "openai",
          "accessToken": "token",
          "endpoint": "https://api.openai.com"
        }
      },
  ],
  "defaultModels": {
    "chat": "google::v1::gemini-1.5-pro",
    "fastChat": "anthropic::2023-06-01::claude-3-haiku",
    "codeCompletion": "fireworks::v1::deepseek-coder-v2-lite-base"
  }
}

In the example above:

Sourcegraph-provided models are enabled (sourcegraph is not set to null)
The "openai" provider configuration is overridden to be accessed using your own key, with the provider API accessed directly via the specified endpoint URL. In contrast, models from the "anthropic" provider are accessed through the Cody Gateway

Refer to the examples page for additional examples.

Server-side config

The most important part of a provider's configuration is the "serverSideConfig" field, which defines how the LLM models should be invoked, i.e., which external service or API will handle the LLM requests.

Sourcegraph natively supports several types of LLM API providers. The current set of supported providers includes:

Provider type	Description
`"sourcegraph"`	Sourcegraph Model Provider (Cody Gateway), offering access to various models from multiple services
`"openaicompatible"`	Any OpenAI-compatible API implementation
`"awsBedrock"`	Amazon Bedrock
`"azureOpenAI"`	Microsoft Azure OpenAI
`"anthropic"`	Anthropic
`"fireworks"`	Fireworks AI
`"google"`	Google Gemini and Vertex
`"openai"`	OpenAI
`"huggingface-tgi"`	Hugging Face Text Generation Interface

The configuration for serverSideConfig varies by provider type.

Refer to the examples page for examples.

Model Overrides

With a provider defined (either a Sourcegraph-provided provider or a custom provider configured via the providerOverrides field), custom models can be specified for that provider by adding them to the "modelConfiguration.modelOverrides" section.

This field is an array of items, each with the following fields:

modelRef: Uniquely identifies the model within the provider namespace
- A string in the format ${providerId}::${apiVersionId}::${modelId}
- To associate a model with your provider, ${providerId} must match the provider’s ID
- ${modelId} can be any URL-safe string
- ${apiVersionId} specifies the API version, which helps detect compatibility issues between models and Sourcegraph instances. For example, "2023-06-01" can indicate that the model uses that version of the Anthropic API. If unsure, you may set this to "unknown" when defining custom models
displayName: An optional, user-friendly name for the model. If not set, clients should display the ModelID part of the modelRef instead (not the modelName)
modelName: A unique identifier the API provider uses to specify which model is being invoked. This is the identifier that the LLM provider recognizes to determine the model you are calling
capabilities: A list of capabilities that the model supports. Supported values: autocomplete, chat, vision, reasoning, edit, tools.
category: Specifies the model's category with the following options:
- "balanced": Typically the best default choice for most users. This category is suited for models like Sonnet 3.5 (as of October 2024)
- "speed": Ideal for low-parameter models that may not suit general-purpose chat but are beneficial for specialized tasks, such as query rewriting
- "accuracy": Reserved for models, like OpenAI o1, that use advanced reasoning techniques to improve response accuracy, though with slower latency
- "other": Used for older models without distinct advantages in reasoning or speed. Select this category if you are uncertain about which category to choose
- "deprecated": For models that are no longer supported by the provider and are filtered out on the client side (not available for use)
contextWindow: An object that defines the number of tokens (units of text) that can be sent to the LLM. This setting influences response time and request cost and may vary according to the limits set by each LLM model or provider. It includes two fields:
- maxInputTokens: Specifies the maximum number of tokens for the contextual data in the prompt (e.g., question, relevant snippets)
- maxOutputTokens: Specifies the maximum number of tokens allowed in the response
reasoningEffort: Specifies the effort on reasoning for reasoning models (having reasoning capability). Supported values: high, medium, low. How this value is treated depends on the specific provider. For example, for Anthropic models supporting thinking, low effort means that the minimum thinking.budget_tokens value (1024) will be used. For other reasoningEffort values, the contextWindow.maxOutputTokens / 2 value will be used. For OpenAI reasoning models, the reasoningEffort field value corresponds to the reasoning_effort request body value.
serverSideConfig: Additional configuration for the model. It can be one of the following:
- awsBedrockProvisionedThroughput: Specifies provisioned throughput settings for AWS Bedrock models with the following fields:
  - type: Must be "awsBedrockProvisionedThroughput"
  - arn: The ARN (Amazon Resource Name) for provisioned throughput to use when sending requests to AWS Bedrock. The ARN format for provisioned models is: arn:${Partition}:bedrock:${Region}:${Account}:provisioned-model/${ResourceId}.
- openaicompatible: Configuration specific to models provided by an OpenAI-compatible provider, with the following fields:
  - type: Must be "openaicompatible"
  - apiModel: The literal string value of the model field to be sent to the /chat/completions API. If set, Sourcegraph treats this as an opaque string and sends it directly to the API without inferring any additional information. By default, the configured model name is sent

Example configuration

JSON
"cody.enabled": true,
"modelConfiguration": {
  "sourcegraph": {},
  "providerOverrides": [
    {
      "id": "huggingface-codellama",
      "displayName": "huggingface",
      "serverSideConfig": {
        "type": "openaicompatible",
        // optional: disable the use of /completions for autocomplete requests, instead using
        // only /chat/completions. (available in Sourcegraph 6.4+ and 6.3.2692)
        //
        // "useLegacyCompletions": false,
        "endpoints": [
          {
            "url": "https://api-inference.huggingface.co/models/meta-llama/CodeLlama-7b-hf/v1/",
            "accessToken": "token"
 
            // optional: send custom headers (in which case accessToken above is not used)
            // (available in Sourcegraph 6.4+ and 6.3.2692)
            //
            // "headers": { "X-api-key": "foo", "My-Custom-Http-Header": "bar" },
          }
        ]
     }
    }
  ],
  "modelOverrides": [
    {
      "modelRef": "huggingface-codellama::v1::CodeLlama-7b-hf",
      "modelName": "CodeLlama-7b-hf",
      "displayName": "(HuggingFace) CodeLlama-7b-hf",
      "contextWindow": {
        "maxInputTokens": 8192,
        "maxOutputTokens": 4096
      },
      "serverSideConfig": {
        "type": "openaicompatible",
        "apiModel": "meta-llama/CodeLlama-7b-hf"
      },
      "clientSideConfig": {
        "openaicompatible": {}
      },
      "capabilities": ["autocomplete", "chat"],
      "category": "balanced",
      "status": "stable"
    }
  ],
  "defaultModels": {
    "chat": "google::v1::gemini-1.5-pro",
    "fastChat": "anthropic::2023-06-01::claude-3-haiku",
    "codeCompletion": "huggingface-codellama::v1::CodeLlama-7b-hf"
  }
}

In the example above:

Sourcegraph-provided models are enabled (sourcegraph is not set to null)
An additional provider, "huggingface-codellama", is configured to access Hugging Face’s OpenAI-compatible API directly
A custom model, "CodeLlama-7b-hf", is added using the "huggingface-codellama" provider
Default models are set up as follows:
- Sourcegraph-provided models are used for "chat" and "fastChat" (accessed via Cody Gateway)
- The newly configured model, "huggingface-codellama::v1::CodeLlama-7b-hf", is used for "codeCompletion" (connecting directly to Hugging Face’s OpenAI-compatible API)

Example configuration with Claude 3.7 Sonnet

JSON
{
  "modelRef": "anthropic::2024-10-22::claude-3-7-sonnet-latest",
  "displayName": "Claude 3.7 Sonnet",
  "modelName": "claude-3-7-sonnet-latest",
  "capabilities": [
    "chat",
    "reasoning"
  ],
  "category": "accuracy",
  "status": "stable",
  "tier": "pro",
  "contextWindow": {
    "maxInputTokens": 45000,
    "maxOutputTokens": 4000
  },
  "modelCost": {
    "unit": "mtok",
    "inputTokenPennies": 300,
    "outputTokenPennies": 1500
  },
  "reasoningEffort": "high"
},

In this modelOverrides config example:

The model is configured to use Claude 3.7 Sonnet with Cody Gateway
The model is configured to use the "chat" and "reasoning" capabilities
The reasoningEffort can be set to 3 different options in the Model Config. These options are high, medium and low
The default reasoningEffort is set to low
For Anthropic models supporting thinking, when the reasoning effort is low, 1024 tokens is used as the thinking budget. With medium and high the thinking budget is set to half of the maxOutputTokens value

Refer to the examples page for additional examples.

View configuration

To view the current model configuration, run the following command:

BASH
export INSTANCE_URL="https://sourcegraph.test:3443"  # Replace with your Sourcegraph instance URL
export ACCESS_TOKEN="your access token"
 
curl --location "${INSTANCE_URL}/.api/modelconfig/supported-models.json" \
--header "Authorization: token $ACCESS_TOKEN"

The response includes:

Configured providers and models—both Sourcegraph-provided (if enabled, with any applied filters) and any overrides
Default models for Cody features

Example response

JSON
"schemaVersion": "1.0",
"revision": "0.0.0+dev",
"providers": [
    {
        "id": "anthropic",
        "displayName": "Anthropic"
    },
    {
        "id": "fireworks",
        "displayName": "Fireworks"
    },
    {
        "id": "google",
        "displayName": "Google"
    },
    {
        "id": "openai",
        "displayName": "OpenAI"
    },
    {
        "id": "mistral",
        "displayName": "Mistral"
    }
],
"models": [
    {
        "modelRef": "anthropic::2024-10-22::claude-3-5-sonnet-latest",
        "displayName": "Claude 3.5 Sonnet (Latest)",
        "modelName": "claude-3-5-sonnet-latest",
        "capabilities": ["chat"],
        "category": "accuracy",
        "status": "stable",
        "contextWindow": {
            "maxInputTokens": 45000,
            "maxOutputTokens": 4000
        }
    },
    {
        "modelRef": "anthropic::2023-06-01::claude-3-haiku",
        "displayName": "Claude 3 Haiku",
        "modelName": "claude-3-haiku-20240307",
        "capabilities": ["chat"],
        "category": "speed",
        "status": "stable",
        "contextWindow": {
            "maxInputTokens": 7000,
            "maxOutputTokens": 4000
        }
    },
    {
        "modelRef": "fireworks::v1::starcoder",
        "displayName": "StarCoder",
        "modelName": "starcoder",
        "capabilities": ["autocomplete"],
        "category": "speed",
        "status": "stable",
        "contextWindow": {
            "maxInputTokens": 2048,
            "maxOutputTokens": 256
        }
    },
    {
        "modelRef": "fireworks::v1::deepseek-coder-v2-lite-base",
        "displayName": "DeepSeek V2 Lite Base",
        "modelName": "accounts/sourcegraph/models/deepseek-coder-v2-lite-base",
        "capabilities": ["autocomplete"],
        "category": "speed",
        "status": "stable",
        "contextWindow": {
            "maxInputTokens": 2048,
            "maxOutputTokens": 256
        }
    },
    {
        "modelRef": "google::v1::gemini-1.5-pro",
        "displayName": "Gemini 1.5 Pro",
        "modelName": "gemini-1.5-pro",
        "capabilities": ["chat"],
        "category": "balanced",
        "status": "stable",
        "contextWindow": {
            "maxInputTokens": 45000,
            "maxOutputTokens": 4000
        }
    },
    {
        "modelRef": "openai::2024-02-01::gpt-4o",
        "displayName": "GPT-4o",
        "modelName": "gpt-4o",
        "capabilities": ["chat"],
        "category": "accuracy",
        "status": "stable",
        "contextWindow": {
            "maxInputTokens": 45000,
            "maxOutputTokens": 4000
        }
    },
    {
        "modelRef": "openai::2024-02-01::cody-chat-preview-001",
        "displayName": "OpenAI o1-preview",
        "modelName": "cody-chat-preview-001",
        "capabilities": ["chat"],
        "category": "accuracy",
        "status": "waitlist",
        "contextWindow": {
            "maxInputTokens": 45000,
            "maxOutputTokens": 4000
        }
    },
],
"defaultModels": {
    "chat": "anthropic::2024-10-22::claude-3-5-sonnet-latest",
    "fastChat": "anthropic::2023-06-01::claude-3-haiku",
    "codeCompletion": "fireworks::v1::deepseek-coder-v2-lite-base"
}

Reasoning models

Claude 3.7 and 4 support is available with Sourcegraph versions 6.4+ and 6.3.4167 when using Cody Gateway. This section is primarily relevant to Sourcegraph Enterprise customers using AWS Bedrock or Google Vertex.

Reasoning models can be added via modelOverrides in the site configuration by adding the reasoning capability to the capabilities list, and setting the reasoningEffort field on the model. Both must be set for the models' reasoning functionality to be used (otherwise the base model without reasoning / exteded thinking will be used.)

For example, this modelOverride would create a Claude Sonnet 4 with Thinking option in the Cody model selector menu, and when the user chats with Cody with that model selected, it would use Claude Sonnet 4's Extended Thinking support with a low reasoning effort for the users' chat:

JSON
{
    "modelRef": "bedrock::2024-10-22::claude-sonnet-4-thinking-latest",
    "displayName": "Claude Sonnet 4 with Thinking",
    "modelName": "claude-sonnet-4-20250514",
    "contextWindow": {
      "maxInputTokens": 93000,
      "maxOutputTokens": 64000,
      "maxUserInputTokens": 18000
    },
    "capabilities": [
      "chat",
      "reasoning"
    ],
    "reasoningEffort": "low",
    "category": "accuracy",
    "status": "stable"
}

Google Vertex AI Authentication Configuration

Configure Google Vertex AI authentication using any of the following methods in your providerOverrides:

JSON
"providerOverrides": [
  {
    "id": "google",
    "displayName": "Anthropic via Vertex AI",
    "serverSideConfig": {
      "type": "google",
      "endpoint": "https://us-east5-aiplatform.googleapis.com/v1/projects/your-project/locations/us-east5/publishers/anthropic/models"
 
      // Option 1: Application Default Credentials (recommended - no additional config)
 
      // Option 2: Explicit credentials file
      // "googleWorkloadIdentityConfig": {
      //   // Note: this file must be mounted in the Sourcegraph frontend container!
      //   "credentialsFile": "/var/secrets/google/credentials.json"
      // }
 
      // Option 3: Inline credentials JSON
      // "googleWorkloadIdentityConfig": {
      //   "credentialsJSON": "<YOUR BASE64-ENCODED CREDENTIALS JSON>"
      // }
 
      // Option 4: Legacy access token
      // "accessToken": "<YOUR BASE64-ENCODED SERVICE ACCOUNT JSON>"
    }
  }
]

In Sourcegraph versions prior to 6.7, only accessToken authentication was supported.

By default (with no additional configuration), Google Application Default Credentials (ADC) will be used to automatically discover credentials from:

GOOGLE_APPLICATION_CREDENTIALS environment variable
~/.config/gcloud/application_default_credentials.json
GCE/GKE metadata service
Workload Identity (when running in GKE)

If credentialsFile is specified, it must specify a filepath which exists in the Sourcegraph frontend container. See Creating service account keys for how to generate JSON credentials files.

If credentialsJSON is specified, it must be a base64-encoded string of the credentials.json file (also created via service account keys).

If accessToken is specified, it must be a base64-encoded service account JSON. This option disables ADC.

OAuth Scopes

Sourcegraph requests the https://www.googleapis.com/auth/cloud-platform OAuth scope, which is the recommended scope per Google's documentation for applications accessing Vertex AI services. See Google's Vertex AI authentication documentation and the complete OAuth 2.0 scopes reference for more details.