Improving Cody Autocomplete: Faster and Smarter
We just wrapped up our Faster Autocomplete project and now Cody is faster and smarter for more of your coding tasks. The big changes that happened under the hood are that we switched from StarCoder to DeepSeek-V2 for our autocomplete model leading to improvements across the board.
Cody is built on model interoperability and promises to always provide the best, latest-gen models for every given task to help speed up devs and reduce toil. This update delivers on that promise by bringing a cutting-edge model to autocomplete, speeding up completions while increasing quality and accuracy.
The big benefits are a 350ms reduction in autocomplete speeds along with a 58% increase in the number of accepted characters per user. Let’s dive into some numbers and see how these changes are going to help Cody help you write code faster.
Speed Boost: 350ms Faster
Cody’s response time is down by a whopping 350ms. This means that our P75 latency (the time it takes for 75% of completions to appear) has decreased by 350ms for single-line completions, which represents 87% of all suggestions. Single-line completions have gone from 900ms to 690ms.
- Single line: 900ms -> 690ms
- 2-5 lines: 1100ms -> 850ms
- 6-10 lines: 1450ms -> 1100ms
The speed improvements are gained not only from switching to DeepSeek-V2, but also from adding two features: Hot Streak and Smart Throttle.
Implementing Hot Streak
Hot Streak is a client-side optimization technique implemented in Cody to enhance autocomplete efficiency and responsiveness. It works by caching subsequent completion lines, anticipating that a user might continue with a particular coding pattern.
Unlike traditional caching that might only store single-line completions, Hot Streak proactively caches multiple lines of potential completions. This allows for faster suggestion delivery of multi-line code snippets without the need to make additional server requests.
The implementation of Hot Streak contributed significantly to Cody's performance improvements. While not explicitly quantified in isolation, it was part of the improvements that led to:
- A reduction in P75 latency by approximately 110ms across all line-count groups.
- A 24% increase in the number of suggested characters.
- A 10% increase in the total number of accepted characters, indicating that Cody wrote more code for users.
- Approximately 35% increase in completions served from cache (in combination with Smart Throttle prefetching).
Implementing Smart Throttle
Smart Throttle is a client-side optimization technique implemented in Cody to improve autocomplete responsiveness. It works by intelligently reducing the debounce interval between user input and completion requests.
Unlike traditional fixed debounce timers, Smart Throttle dynamically adjusts its timing based on typing patterns and context. This allows for faster suggestion delivery without overwhelming the server with excessive requests.
The implementation of Smart Throttle contributed significantly to Cody's performance improvements. It helped reduce overall client-side latency by about 110ms across all line-count groups. Additionally, it played a role in increasing the number of suggested characters by 24% and boosting cache utilization by 35%.
Smarter Suggestions: 58% More Accepted Code
Not only is Cody faster, but it's also getting better at suggesting code you actually want to use. The DeepSeek-V2 model has significantly improved the quality of suggestions.
We’ve seen a 58% increase in the number of accepted characters per user. This means that the suggestions provided by Cody are more likely to be relevant and accepted. In practical terms, if you're writing a 1000-line program, Cody is now potentially writing 580 of those lines for you, up from about 367 before.
- Suggested characters per user increased from 1,600 to 2,500.
- Accepted characters per user increased from 600 to 950.
Here’s our analytics tracking suggested characters and accepted characters: