Extracting Packets From Large Captures

By stretch | Monday, April 11, 2011 at 2:12 a.m. UTC

Sometimes we have to work with very large packet captures, captures that can be several gigabytes in size. These are cumbersome (if even possible) to analyze in an application like Wireshark because the entire capture file must be loaded into running memory at once. And manual analysis typically means we're only interested in a small portion of the capture file anyway. How can we break a huge capture into smaller, more manageable chunks?

One approach is to cut the file into slices, with each slice a containing a constant number of packets or bytes or covering a given length of time. Practically speaking, this is how huge traffic captures should be performed in the first place, using a ring buffer. But we can similarly chop up large capture files after the fact using editcap (part of the Wireshark family).

Take for instance this capture file of over a quarter million packets weighing in at 256 MB:

$ capinfos lotsapackets.cap
File name:           lotsapackets.cap
File type:           Wireshark/tcpdump/... - libpcap
File encapsulation:  Ethernet
Number of packets:   260778
File size:           267802612 bytes
Data size:           263630140 bytes
Capture duration:    204 seconds
Start time:          Mon Apr  4 22:31:52 2011
End time:            Mon Apr  4 22:35:16 2011
Data byte rate:      1290166.92 bytes/sec
Data bit rate:       10321335.37 bits/sec
Average packet size: 1010.94 bytes
Average packet rate: 1276.21 packets/sec

256 MB certainly isn't an insurmountable file size, but it's large enough that we may want something more flexible. We can split it into a few files of, say, 50,000 packets each and load each one into Wireshark individually.

$ editcap -c 50000 lotsapackets.cap fewerpackets.cap
$ capinfos -c fewerpackets*.cap
File name:           fewerpackets_00000_20110404223152.cap
Number of packets:   50000

File name:           fewerpackets_00001_20110404223310.cap
Number of packets:   50000

File name:           fewerpackets_00002_20110404223340.cap
Number of packets:   50000

File name:           fewerpackets_00003_20110404223410.cap
Number of packets:   50000

File name:           fewerpackets_00004_20110404223440.cap
Number of packets:   50000

File name:           fewerpackets_00005_20110404223510.cap
Number of packets:   10778

That's a pretty good solution, but as mentioned earlier, we're probably only interested in specific traffic; for example, only traffic destined for a particular host, or a particular UDP or TCP port number. Searching each of these smaller files by hand would be a tedious waste of your time.

Luckily, we can use tshark (another Wireshark tool) to extract interesting traffic from a capture file. We just need to define a display filter to match the traffic we want. For example, if we wanted to extract all DNS traffic from our large capture file, we could do this:

$ tshark -r lotsapackets.cap -R dns -w dns.cap
$ capinfos dns.cap
File name:           dns.cap
File type:           Wireshark/tcpdump/... - libpcap
File encapsulation:  Ethernet
Number of packets:   220
File size:           24664 bytes
Data size:           21120 bytes
Capture duration:    32 seconds
Start time:          Mon Apr  4 22:31:58 2011
End time:            Mon Apr  4 22:32:30 2011
Data byte rate:      656.35 bytes/sec
Data bit rate:       5250.76 bits/sec
Average packet size: 96.00 bytes
Average packet rate: 6.84 packets/sec

After just a few seconds, tshark copies every DNS packet from our original capture into the much smaller file dns.cap, which we can easily examine in Wireshark at our leisure.

Filters can be mixed and matched just like in Wireshark. The example below matches all DNS traffic and all traffic sent to or from TCP port 80.

$ tshark -r lotsapackets.cap -R "dns or tcp.port==80" -w web.cap
$ capinfos web.cap
File name:           web.cap
File type:           Wireshark/tcpdump/... - libpcap
File encapsulation:  Ethernet
Number of packets:   1559
File size:           644994 bytes
Data size:           620026 bytes
Capture duration:    165 seconds
Start time:          Mon Apr  4 22:31:55 2011
End time:            Mon Apr  4 22:34:40 2011
Data byte rate:      3757.36 bytes/sec
Data bit rate:       30058.86 bits/sec
Average packet size: 397.71 bytes
Average packet rate: 9.45 packets/sec

The Wireshark display filter cheat sheet offers an idea of the variety of filters available. With a little practice you should be able to automatically extract just about any type of traffic.

About the Author

Jeremy Stretch is a network engineer living in the Raleigh-Durham, North Carolina area. He is known for his blog and cheat sheets here at Packet Life. You can reach him by email or follow him on Twitter.

Posted in Packet Analysis

Comments


Chris Bennett (cgb) (guest)
April 11, 2011 at 4:28 a.m. UTC

Hi Stretch,

Nice article - I saw a similar tool on honeynet.org a tool named 'streams' which was announced recently. To quote their blog:
"If you ever needed to process large pcap files on a session level, you will love this tool"

See:
http://www.honeynet.org/node/633
and
http://src.carnivore.it/streams/about

for more info.

Chris


Cd-MaN (guest)
April 11, 2011 at 5:08 a.m. UTC

You can also use Perl to parse pcap files and extract out the packets which interest you: http://hype-free.blogspot.com/2010/03/parsing-pcap-files-with-perl.html


Erik H (guest)
April 11, 2011 at 3:33 p.m. UTC

There is a great command line tool for Windows called "SplitCap" that can split a pcap into multiple pcap files, one for each unique TCP/UDP session. SplitCap can also split a large PCAP file based on IP addresses, so that each IP host on the network gets its packets in a separate file (with the "-s host" switch)

SplitCap is really fast and outperforms all other pcap splitting tools I know.

SplitCap is available from SourceForge:
http://splitcap.sourceforge.net/


gnavarrette
April 13, 2011 at 12:07 a.m. UTC

Nice article Stretch.

I usually go the tshark route, since I'm usually dealing with large pcaps with VoIP traffic in it. Snag my signaling w/SDP with the first tshark run and print it so I can grab the UDP ports from the SDP, then run it a second time with the same signaling filter and add the UDP ports from my SDP. Works slick.


flett
April 14, 2011 at 4:00 p.m. UTC

Great article thanks Stretch. Wish i'd saw it a few days earlier :)


anonymous (guest)
December 18, 2011 at 12:24 a.m. UTC

Just droppin a note - this was helpful :) Didn't know these utils came with tshark/wireshark. Handy.


Pranav (guest)
March 18, 2015 at 11:55 a.m. UTC

Hi, For me Wireshark and tshark sometimes very painful process to work on. Because when I have very large PCAP file and i'm interested in very particular things like MAC Address, Header, Destination Address etc.. So first of all I have to filter that PCAP file and then i have to go through each frame, that is a big painful process. But many time we can use tools like PCAP2XML. This tool convert our PCAP file into XML or SQLite db and then using SQLite browser we can start work on our PCAP file. Advantage in this we can execute our queries in which we are interested. Have a look:

http://hackoftheday.securitytube.net/2015/03/pcap2xmlsqlite-convert-80211-packets-to.html

Comments have closed for this article due to its age.