Go at the DARPA Cyber Grand Challenge

Liveblog by Beyang Liu (@beyang)

Slides for this talk have been posted here.

Will Hawkins, graduate student at the University of Virginia. He's much more of a C/C++ programmer than a Go programmer. He takes Dave Cheney's pop quizzes about Go on Twitter very seriously.

What is the DARPA Cyber Grand Challenge

DARPA is a relative of the old ARPA government, which played a critical role in creating the Internet. The Cyber Grand Challenge pits autonomous systems against one another to find vulnerabilities in sofrware and defend against those vulnerabiltiies.

"Autonomous" means no human input at all. The competition ran for 96 hours and until the last 2 hours, no human input allowed.

We had to find vulnerabilities in other software and prevent attacks against their own software.

Here were the teams. TechX was our team.

Selection 069

Their goal was to finish "not 7th" (there were 7 teams in the competition). They did not finish 7th (or 6th).

Go fit into the competition in a small but very important way. Go helped us build our offense and defense.

In the Grand Challenge, the competition framework would run and churn on different competitor binaries. If I wanted to launch an attack, I have to send them a binary. And we had to receive inputs from the framework testing our functionality.

Selection 070

So we wanted to capture all the data over the network to:

Derive rules for intrusion detection to block malicious traffic while permitting non-malicious regular traffic
Use inputs from Competition Framework to drive our fuzzing system. I.e., take real inputs, run them on binaries, see where they crash. If they crash, that's a likely indicator there's a vulnerability in our code. We can use this knowledge to then feedback vulnerabilities to other people.

The Goal

The capturer (written in Go) captures fields from the packages:

Timestamp
Cbid
Conversation
Side - either client or server
Message ID
Contents

into a datastore that can be queried by different components:

The Fuzzer
The Rule Generator

The results

This ended up being a significant component of our defensive systems.

The Fuzzer used data captured from the network tap to generate inputs for our Fuzzers.

The Rule Generator Rules were deployed in concert with hardened binaries. Hard to tell which was the effective defense. However, there were two specific cases where replacement binaries were vulnerable but the IDS rules generated by the Rule Generator protected the binary from successful attack.

We were one of only two teams that earnred points on defense. Other defenses were too heavyweight, so they incurred a penalty for functional degradation deploying their defenses.

Design, Architecture and Implementation

We had to capture traffic, filter it, and store it for later use. A lot of options here. Out of these possible options, the bold indicate which ones we chose:

Capturing
- Simple tcpdump
- libpcap
Filtering
- BPF
- Custom
Storing
- Flat file
- Database
  - SQL: MySQL
  - NoSQL
Off-the-shelf? Custom?
- bash pipeline: tcpdump ‘port 1993’ | sed | awk | … > file
- Custom program
  - C
  - C++
  - Go

Note that we chose to write the custom program in Go.

When the data comes off the wire, we're gonna do something capturing, some filtering, some storing:

Selection 071

This is a pipeline of processes, so Go seemed like a natural fit.

If we started out with something concurrent, we could easily make it parallel later

Capturing

We used gopacket. It did exactly what I wanted it to do.

It has both a live and non-live mode (for reproducibility).

Here's what the code looks like:

live_handle, err = pcap.OpenLive(*iface, 1024, false, pcap.BlockForever)
…
defer live_handle.Close()
…
live_source := gopacket.NewPacketSource(live_handle, live_handle.LinkType())
…
for stopping == false {
	select {
		…
		case p, ok := <- live_source.Packets():
			if !ok {
				stopping = true
				break
			}
	}
}

Filtering

We use gopacket again. There's two steps here:

From a packet, get the Layer (call Layer())
Use the Layer.

Code:

var ipv4_layer gopacket.Layer
var udp_layer gopacket.Layer
var udp *layers.UDP
var ip *layers.IPv4
 
if ipv4_layer = i.Layer(layers.LayerTypeIPv4); ipv4_layer == nil {
…
}
if udp_layer = i.Layer(layers.LayerTypeUDP); udp_layer == nil {
	…
}
udp, _ = udp_layer.(*layers.UDP)
ip, _ = ipv4_layer.(*layers.IPv4)
if filter.DstPorts != nil && !cgc_utils.MatchAnyPort(udp.DstPort,filter.DstPorts){
	…
}

Parsing

There's a lot of metadata with packates, so we want to parse that and store that in a struct. This code probably could be better, but it's doing something straightforward:

func ParseCgcPacket(packet []byte) (cgc_packet IdsPacket, err error) {
    	packet_length := len(packet)
    	packet_offset := 0
 
    	if (packet_offset+4) > packet_length {
            	err = errors.New("Could not parse past first field.")
            	return
    	}
    	csid := binary.LittleEndian.Uint32(packet[packet_offset:])
    	cgc_packet.Csid = fmt.Sprintf("%x", csid)
    	packet_offset += 4
 
    	if (packet_offset+4) > packet_length {
            	err = errors.New("Could not parse past csid field.")
            	return
    	}
    	cgc_packet.ConnectionID = binary.LittleEndian.Uint32(packet[packet_offset:])
    	packet_offset += 4
	…
	return
}

Storing

Use database/sql in combination with a Go MySQL ddriver.

Three steps for use:

Connect to the database: Open()
Prepare statements (optional): Prepare()
Execute statements: Exec()
- Check/Retrieve Results:
- Check return value, or Next()

Code:

db, err := sql.Open("mysql",
                     user+":"+password+"@unix(/var/run/mysqld/mysqld.sock)/"+database)
if err != nil {
return nil, err
}
if err = db.Ping(); err != nil {
	return nil, err
}
statement, statement_err = database.Prepare("insert into pcap
                                             (cbid, conversation, side, message, contents)
                                      VALUES (?,    ?,            ?,    ?,       ?)")
 
if _, err := statement.Exec(packet.Csid,
                            packet.ConnectionID,
                            fmt.Sprintf("%v", packet.ConnectionSide),
                            packet.MessageID,
                            packet.Message); err != nil {
fmt.Printf("Exec() error: %v\n", err)
}

Implementation note: the Ping method. Necessary to call after Open to verify that connection actually established.

Testing

DARPA played practice games with us. They gave us actual traffic directed at binaries we were trying to exploit and defend.

I used tcpdump to capture the traffic from the simulation. Then used tcpreplay. Then stored the information.

Nice feature of tcpreplay is its -t flag, which accelerates the traffic. They accelerated the playback to playback 2000k packages in 15s (roughly 7.43 Mbps)

In the very first test, they had a 100% packet capture -> store rate:

200,000pkts/200,000pkts=100%

Just kidding.

844pkts/200,000pkts=0.42%

Only 844 packets made it into the database.

So that's pretty bad. What to do now?

Optimization

He remembered the old trick of throwing a C program into gprof and it would magically tell you where your program was stopping. So he decided to do that.

Selection 072

Maybe the bottleneck is at the parser phase, so let's try parallelizing that.

Selection 073

But in actuality, it was the storage layer that was the bottleneck:

Selection 074

Effects:

Note the drop-off in capture rate as we ramp up the number of feeders into the database.

Selection 075

If we let the capture run, I should see the graph fall back to zero as packets are dropped. But that's not what happens. It stays steady. So it appears there's some buffering going on.

Selection 076

Buffer(s)

Where is the buffering happen? Possible sources:

Capturer itself
gopackage/libpcap:

func (p *PacketSource) Packets() chan Packet {
    	if p.c == nil {
            	p.c = make(chan Packet, 1000)     // Note this line
            	go p.packetsToChannel()
    	}
    	return p.c
}

Operating system

On Linux, rmem_default:

$ cat /proc/sys/net/core/rmem_default
212992
On BSD, net.bpf.size/maxsize.

If you're writing any network tool, you don't have to be as clever as you think. The ultimate optimization turned out to be about buffering.

Profiling pitfalls

Profiling can be

misleading
helpful
somewhere in between

Misleading, suggests perf bottleneck is capturing packets off the wire:

go tool pprof -cum -top cap pcap.prof | head
210ms of 210ms total (  100%)
  	flat  flat%   sum%    	cum   cum%
     	0 	0% 	0%  		140ms 66.67%  runtime.goexit
     	0 	0% 	0%  		120ms 57.14%  github.com/google/gopacket.(*PacketSource).NextPacket
     	0 	0% 	0%  		120ms 57.14%  github.com/google/gopacket.(*PacketSource).packetsToChannel
     	0 	0% 	0%  		110ms 52.38%  github.com/google/gopacket/pcap.(*Handle).ReadPacketData
     	0 	0% 	0%  		110ms 52.38%  github.com/google/gopacket/pcap.(*Handle).getNextBufPtrLocked
  	10ms  4.76%  4.76%  	110ms 52.38%  github.com/google/gopacket/pcap._Cfunc_pcap_next_ex
  	80ms 38.10% 42.86%  	100ms 47.62%  runtime.cgocall
     	0 	0% 42.86%   	40ms 19.05%   runtime._System

Helpful, shows all slowness is coming from slowness putting things in DB:

$ go tool pprof -cum -top cap pcap.prof | head
80ms of 80ms total (  100%)
  	flat  flat%   sum%    	cum   cum%
     	0 	0% 	0%   		50ms 62.50%  runtime.goexit
     	0 	0% 	0%   		30ms 37.50%  runtime.gcDrain
     	0 	0% 	0%   		20ms 25.00%  runtime.gcBgMarkWorker
  	20ms 25.00% 25.00%   	20ms 25.00%  runtime.scanobject
  	10ms 12.50% 37.50%   	20ms 25.00%  runtime.systemstack
     	0 	0% 37.50%   	10ms 12.50%  database/sql.(*Stmt).Exec
     	0 	0% 37.50%   	10ms 12.50%  database/sql.resultFromStatement
     	0 	0% 37.50%   	10ms 12.50%  github.com/go-sql-driver/mysql.(*buffer).fill

In between, suggests packets slow off the network and slow putting into the database:

$ go tool pprof -cum -top cap pcap.prof | head
730ms of 1120ms total (65.18%)
Showing top 80 nodes out of 150 (cum >= 20ms)
  	flat  flat%   sum%    	cum   cum%
     	0 	0% 	0%  		810ms 72.32%  runtime.goexit
     	0 	0% 	0%  		360ms 32.14%  github.com/google/gopacket.(*PacketSource).NextPacket
     	0 	0% 	0%  		360ms 32.14%  github.com/google/gopacket.(*PacketSource).packetsToChannel
     	0 	0% 	0%  		300ms 26.79%  github.com/google/gopacket/pcap.(*Handle).ReadPacketData
     	0 	0% 	0%  		300ms 26.79%  github.com/google/gopacket/pcap.(*Handle).getNextBufPtrLocked
     	0 	0% 	0%  		300ms 26.79%  main.db_extract_output
     	0 	0% 	0%  		280ms 25.00%  github.com/google/gopacket/pcap._Cfunc_pcap_next_ex

What's the cause of the bogus profiling data? Think about the ordering of events.

When I start the capturing tool and it's waiting for data, the first thing it does is sit and wait and block to get data off the channel. With no data, it just checks the empty buffer. This takes up CPU time, so it shows 100% of time is spent "reading" packets off the wire:

Selection 077

Selection 078

While storage is happening

Selection 079

Ideally, the profiler would just show me a slice that looks like this:

Selection 080

The solution here was to go into gopacket and set the buffer size to 1, so that it wouldn't only handle one packet at a time.

Manual profiling

The equivalent of printf debugging
Remove components until performance improves

Fortunately, it didn't take very long to turn off the thing that was causing the slowdown.

func log_cgc_packet(packet cgc_utils.IdsPacket, statement sql.Stmt) {
	return // I added this
	if _, err := statement.Exec(packet.Csid,
 
packet.ConnectionID,
                            	fmt.Sprintf("%v", packet.ConnectionSide),
                            packet.MessageID,
                                 packet.Message); err != nil {
    fmt.Printf("Exec() error: %v\n", err)
  }
}

We want to get stuff off the wire as fast as possible, we create a massive buffer. This helps deal with spurty traffic. Doesn't solve all problems (e.g., doesn't address DB bottleneck), but does solve part.

We also wanted to take advantage of parallelism of DB. We found the optimal number of MySQL connections and made the number of storage workers in our program match that.

Final step: run MySQL on temp filesystem. Why does this work for us?

time-limited competition
finite number of packets
enormous amount of memory on the hosts in the competition

Turns out, we did end up filling up the temp filesystem. But because the system worked, we think we only ran out of temp filesystem space at the very end. The pipeline continued to work even though we ran out of disk space.

Q&A

Q: Did you batch insertions into MySQL?

A: That's a very good suggestion. We didn't. We really just needed a key-value store. If MySQL didn't work, we would've tried NoSQL databases.

Q: What place did you get?

A: 2nd place

Q: What area would you improve on given another opportunity?

A: There's so much in the Go runtime that I didn't understand. Would've done things in a more idiomatic Go way. Got great feedback from Dave Cheney: "Why do you have semicolons in all the code snippets???" Before learning about cancellable contexts and wait groups, was using raw channels for a bunch of synchronization.

Q: Did you consider using pf_ring or netmap?

A: gopacket supports different backends. I didn't look into those, because when I realized that wasn't bottleneck, I didn't investigate further.