Cody FAQs
Find answers to the most common questions about Cody.
General
Does Cody train on my code?
For Enterprise customers, Sourcegraph will not train on your company’s data. For Free and Pro tier users, Sourcegraph will not train on your data without your permission.
Our third-party Language Model (LLM) providers do not train on your specific codebase. Cody operates by following a specific process to generate answers to your queries:
- User query: A user asks a question
- Code retrieval: Sourcegraph, our underlying code intelligence platform, performs a search and code intelligence operation to retrieve code snippets relevant to the user's question. During this process, strict permissions are enforced to ensure that only code that the user has read permission for is retrieved
- Prompt to Language Model: Sourcegraph sends a prompt, and the code snippets are retrieved to a Language Model (LLM). This prompt provides the context for the LLM to generate a meaningful response
- Response to user: The response generated by the LLM is then sent back to Cody and presented to the user
This process ensures that Cody can provide helpful answers to your questions while respecting data privacy and security by not training on or retaining your specific code.
Does Cody work with self-hosted Sourcegraph?
Yes, Cody is compatible with self-hosted Sourcegraph instances. However, there are a few considerations:
- Cody operates by sending code snippets (up to 28 KB per request) to a third-party cloud service. By default, this service is Anthropic but can also be OpenAI
- For certain repositories, Cody may utilize embeddings, which involves sending repository data to another third-party service like OpenAI
- To use Cody effectively, your self-hosted Sourcegraph instance must have internet access for these interactions with external services
Is Cody licensed for private code, and does it allow GPL-licensed code?
There are no checks or exclusions for Cody PLG (VS Code, JetBrains, Neovim) for private and GPL-licensed code. We are subject to whatever the LLMs are trained on. However, Cody can be used with StarCoder for autocomplete which is trained only on permissively licensed code.
Is there a public facing Cody API?
Currently, there is no public-facing Cody API available.
Does Cody require Sourcegraph to function?
Yes, Cody relies on Sourcegraph for two essential functions:
- It is used to retrieve context relevant to user queries
- Sourcegraph acts as a proxy for the LLM provider to facilitate the interaction between Cody and the LLM
What programming languages does Cody support?
Cody supports a wide range of programming languages, including JavaScript, TypeScript, PHP, Python, Java, C/C++, C#, Ruby, Go, SQL, Swift, Objective-C, Perl, Rust, Kotlin, Scala, Groovy, R, MATLAB, Dart, Lua, Julia, COBOL, and shell scripting languages (like Bash, PowerShell).
Cody's response quality on a programming language depends on many factors, including the underlying LLM being used. We monitor accuracy metrics across all languages and regularly make improvements. Let us know if you're seeing poor quality on a particular programming language.
What happened to the Cody App?
We’ve deprecated the Cody App to streamline the experience for our Cody Free and Cody Pro users. Now, anyone with a Sourcegraph.com account can generate local embeddings for their personal projects within the VS Code extension without downloading and connecting the Cody App. Local embeddings are only supported for VS Code, but we’re working on adding the same functionality to JetBrains IDEs.
Embeddings
What are embeddings for?
Embeddings help Sourcegraph retrieve relevant code to feed the Large Language Model as context. Embeddings, often associated with vector search, complement other strategies in the code retrieval process.
While embeddings excel in semantic matching — determining "what is this code about" and "what does it do" — they may not capture syntax and other specific matching details as effectively. Sourcegraph's approach involves getting the best results from various sources to deliver the most accurate and comprehensive answers possible.
Why were embeddings removed once my instance was upgraded to v5.3?
Cody now leverages Sourcegraph Search as a primary context provider, which comes with the following benefits:
- More secure: No code being sent to a third-party embedding API
- Easier to manage: Less tech debt for embeddings setup and need for refreshes
- More repos: Sourcegraph Search scales to larger repos and a greater number. Users on Enterprise instances will now be able to select multiple repos as context sources from within the IDE
- Equal, or better, quality: Sourcegraph Search provides high-quality retrieval, as tested over the last ten years. When a customer sees degradation, we will be ready to respond quickly.
Embeddings are just one mechanism for retrieval. We leverage multiple retrieval mechanisms to give Cody the right context and will be constantly iterating to improve Cody's quality. The most important aspect is getting the files from the codebase, not the specific algorithm used to find those files.
Why are embeddings no longer supported on Cody Enterprise?
There are two driving factors:
- The need for a retrieval system that can scale across repos and to repos of greater size
- A system that is secure and requires low maintenance on the part of users
Leveraging Sourcegraph Search allowed us to deliver these enhancements. Early evidence suggests that this context fetching works as well as embeddings and is sometimes better.
Why are embeddings only supported for Cody Free and Cody Pro users in VS Code and not for JetBrains?
Only users on VSCode will have continued access to local embeddings and only as a backup source to our local search index. The rationale here is that we want a place to continue to test embeddings to empirically measure their value and see if there are areas where we should consider returning them. This aligns with our strategy to use PLG as a test bed for features before they come to enterprise customers.
Third party dependencies
What is the default sourcegraph
provider for completions and embeddings?
The default provider for completions and embeddings, specified as "provider": "sourcegraph"
refers to the Sourcegraph Cody Gateway. The Cody Gateway facilitates access to completions and embeddings for Sourcegraph enterprise instances by leveraging third-party services such as Anthropic and OpenAI.
What third-party cloud services does Cody depend on?
Cody relies on one primary third-party dependency, i.e., Anthropic's Claude API. Users can use this with the OpenAI API configuration.
Additionally, Cody can optionally use OpenAI for generating embeddings, enhancing the quality of its context snippets, although this is not mandatory.
It's worth noting that these dependencies remain consistent when utilizing the default sourcegraph
provider, Cody Gateway, which uses the same third-party providers.
What is the retention policy for Anthropic and OpenAI?
Please refer to this terms and conditions for details regarding the retention policy for data managed by Anthropic and OpenAI.
Can I use my own API keys?
Yes! you can use your own API keys (Enterprise Users Only).
Can I use Cody with my Cloud IDE?
Yes, Cody supports the following cloud development environments:
- vscode.dev and GitHub Codespaces (install from the VS Code extension marketplace)
- Any editor supporting the Open VSX Registry, including Gitpod, Coder, and
code-server
(install from the Open VSX Registry)