Go at the DARPA Cyber Grand Challenge
Liveblog by Beyang Liu (@beyang)
Slides for this talk have been posted here.
Will Hawkins, graduate student at the University of Virginia. He's much more of a C/C++ programmer than a Go programmer. He takes Dave Cheney's pop quizzes about Go on Twitter very seriously.
What is the DARPA Cyber Grand Challenge
DARPA is a relative of the old ARPA government, which played a critical role in creating the Internet. The Cyber Grand Challenge pits autonomous systems against one another to find vulnerabilities in sofrware and defend against those vulnerabiltiies.
"Autonomous" means no human input at all. The competition ran for 96 hours and until the last 2 hours, no human input allowed.
We had to find vulnerabilities in other software and prevent attacks against their own software.
Here were the teams. TechX was our team.
Their goal was to finish "not 7th" (there were 7 teams in the competition). They did not finish 7th (or 6th).
Go fit into the competition in a small but very important way. Go helped us build our offense and defense.
In the Grand Challenge, the competition framework would run and churn on different competitor binaries. If I wanted to launch an attack, I have to send them a binary. And we had to receive inputs from the framework testing our functionality.
So we wanted to capture all the data over the network to:
- Derive rules for intrusion detection to block malicious traffic while permitting non-malicious regular traffic
- Use inputs from Competition Framework to drive our fuzzing system. I.e., take real inputs, run them on binaries, see where they crash. If they crash, that's a likely indicator there's a vulnerability in our code. We can use this knowledge to then feedback vulnerabilities to other people.
The capturer (written in Go) captures fields from the packages:
- Side - either client or server
- Message ID
into a datastore that can be queried by different components:
- The Fuzzer
- The Rule Generator
This ended up being a significant component of our defensive systems.
The Fuzzer used data captured from the network tap to generate inputs for our Fuzzers.
The Rule Generator Rules were deployed in concert with hardened binaries. Hard to tell which was the effective defense. However, there were two specific cases where replacement binaries were vulnerable but the IDS rules generated by the Rule Generator protected the binary from successful attack.
We were one of only two teams that earnred points on defense. Other defenses were too heavyweight, so they incurred a penalty for functional degradation deploying their defenses.
Design, Architecture and Implementation
We had to capture traffic, filter it, and store it for later use. A lot of options here. Out of these possible options, the bold indicate which ones we chose:
- Simple tcpdump
- Flat file
- SQL: MySQL
- bash pipeline:
tcpdump ‘port 1993’ | sed | awk | … > file
- Custom program
- bash pipeline:
Note that we chose to write the custom program in Go.
When the data comes off the wire, we're gonna do something capturing, some filtering, some storing:
This is a pipeline of processes, so Go seemed like a natural fit.
If we started out with something concurrent, we could easily make it parallel later
gopacket. It did exactly what I wanted it to do.
It has both a live and non-live mode (for reproducibility).
Here's what the code looks like:
gopacket again. There's two steps here:
- From a packet, get the Layer (call
- Use the Layer.
There's a lot of metadata with packates, so we want to parse that and store that in a struct. This code probably could be better, but it's doing something straightforward:
Use database/sql in combination with a Go MySQL ddriver.
Three steps for use:
- Connect to the database: Open()
- Prepare statements (optional): Prepare()
- Execute statements: Exec()
- Check/Retrieve Results:
- Check return value, or Next()
Implementation note: the Ping method. Necessary to call after Open to verify that connection actually established.
DARPA played practice games with us. They gave us actual traffic directed at binaries we were trying to exploit and defend.
tcpdump to capture the traffic from the simulation. Then used
tcpreplay. Then stored the information.
Nice feature of
tcpreplay is its
-t flag, which accelerates the traffic. They accelerated the playback to playback 2000k packages in 15s (roughly 7.43 Mbps)
In the very first test, they had a 100% packet capture -> store rate:
Only 844 packets made it into the database.
So that's pretty bad. What to do now?
He remembered the old trick of throwing a C program into gprof and it would magically tell you where your program was stopping. So he decided to do that.
Maybe the bottleneck is at the parser phase, so let's try parallelizing that.
But in actuality, it was the storage layer that was the bottleneck:
Note the drop-off in capture rate as we ramp up the number of feeders into the database.
If we let the capture run, I should see the graph fall back to zero as packets are dropped. But that's not what happens. It stays steady. So it appears there's some buffering going on.
Where is the buffering happen? Possible sources:
- Capturer itself
- Operating system
On Linux, rmem_default:
If you're writing any network tool, you don't have to be as clever as you think. The ultimate optimization turned out to be about buffering.
Profiling can be
- somewhere in between
Misleading, suggests perf bottleneck is capturing packets off the wire:
Helpful, shows all slowness is coming from slowness putting things in DB:
In between, suggests packets slow off the network and slow putting into the database:
What's the cause of the bogus profiling data? Think about the ordering of events.
When I start the capturing tool and it's waiting for data, the first thing it does is sit and wait and block to get data off the channel. With no data, it just checks the empty buffer. This takes up CPU time, so it shows 100% of time is spent "reading" packets off the wire:
While storage is happening
Ideally, the profiler would just show me a slice that looks like this:
The solution here was to go into
gopacket and set the buffer size to 1, so that it wouldn't only handle one packet at a time.
- The equivalent of printf debugging
- Remove components until performance improves
Fortunately, it didn't take very long to turn off the thing that was causing the slowdown.
We want to get stuff off the wire as fast as possible, we create a massive buffer. This helps deal with spurty traffic. Doesn't solve all problems (e.g., doesn't address DB bottleneck), but does solve part.
We also wanted to take advantage of parallelism of DB. We found the optimal number of MySQL connections and made the number of storage workers in our program match that.
Final step: run MySQL on temp filesystem. Why does this work for us?
- time-limited competition
- finite number of packets
- enormous amount of memory on the hosts in the competition
Turns out, we did end up filling up the temp filesystem. But because the system worked, we think we only ran out of temp filesystem space at the very end. The pipeline continued to work even though we ran out of disk space.
Q: Did you batch insertions into MySQL?
A: That's a very good suggestion. We didn't. We really just needed a key-value store. If MySQL didn't work, we would've tried NoSQL databases.
Q: What place did you get?
A: 2nd place
Q: What area would you improve on given another opportunity?
A: There's so much in the Go runtime that I didn't understand. Would've done things in a more idiomatic Go way. Got great feedback from Dave Cheney: "Why do you have semicolons in all the code snippets???" Before learning about cancellable contexts and wait groups, was using raw channels for a bunch of synchronization.
Q: Did you consider using pf_ring or netmap?
A: gopacket supports different backends. I didn't look into those, because when I realized that wasn't bottleneck, I didn't investigate further.