In this article, I want to go through how to use Zig’s @bitCast
to extract specific fields from a network packet’s TCP header.
Context
Tigerbeetle have a demo on @bitCast
which shows how to fill a struct’s values by pointing it to bytes in memory without any parsing, deserialisation or other processing. This works so long as the memory layout at the specified address matches the struct’s layout. Zig will also check that the size
of the type we are casting to (e.g the struct’s type) is the same as the size of the data we are passing in.
In the image above, we show that 4 consecutive bytes in memory could be interpreted in multiple ways, for example as a single u32
, or as two u16
values (which could be a struct field each). This can be applied to bits as well where a u8
such as 10101010
could be interpreted as eight u1
values
each representing a state of “on” or “off” for example.
In this article, we want to extract the source and destination ports from the TCP header of a packet that we captured.
Wireshark and tshark are tools that can provide this information, but if we want to use those values in another program we would need to use a process substitution to retrieve them. This would slow the program down if we were parsing millions of pcaps.
We can however leverage the “pcap” library (libpcap) which those tools are built upon. Together with Zig’s @bitCast
we can extract the fields that we want.
We will use a simplified capture with a single packet extracted from a sample source pcap:
$ wget https://wiki.wireshark.org/uploads/__moin_import__/attachments/SampleCaptures/fix.pcap
$ editcap -r fix.pcap onefix.pcap 12
Let’s look at how we can achieve this!
Updates
Zig is still under heavy development and as such we can expect breaking changes when building older code against newer versions. I recently tried to build this code again and ran into some issues which are documented in this section.
20231127
Newer versions of zig changed the @bitCast function to take only a single argument. According to this commit, zig fmt
automatically fixes this.
$ zig version
0.12.0-dev.1744+f29302f91
$ zig build-exe -lpcap -lc src/main.zig --cache-dir zig-cache
src/main.zig:17:25: error: expected 1 argument, found 2
const tcp_src_dst = @bitCast(tcp_header_partial, packet[34..38].*); // <--
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ zig fmt
$ cat src/main.zig | grep bitCast
const tcp_src_dst = @as(tcp_header_partial, @bitCast(packet[34..38].*)); // <--
That looks good, although we could remove the @as
cast and explicitly declare the type of tcp_src_dst
. We’ll keep the code zig fmt
produced and try to build again:
$ zig build-exe -lpcap -lc src/main.zig --cache-dir zig-cache
src/main.zig:13:9: error: local variable is never mutated
var errbuf: [*]u8 = undefined;
^~~~~~
src/main.zig:13:9: note: consider using 'const'
The compiler gives us a great hint, let’s try its suggestion. Here is the final code:
const std = @import("std");
const c = @cImport({
@cInclude("pcap/pcap.h");
});
const tcp_header_partial = packed struct {
src_port: u16,
dst_port: u16,
};
pub fn main() !void {
var header = c.struct_pcap_pkthdr{ .ts = undefined, .caplen = undefined, .len = undefined };
const errbuf: [*]u8 = undefined;
var pcap = c.pcap_open_offline("onefix.pcap", errbuf);
errdefer pcap.close();
var packet = c.pcap_next(pcap, &header);
const tcp_src_dst = @as(tcp_header_partial, @bitCast(packet[34..38].*)); // <--
std.debug.print("src: {d}, dst: {d}\n", .{ @byteSwap(tcp_src_dst.src_port), @byteSwap(tcp_src_dst.dst_port) });
}
Looks like that fixes it and we are back to working code:
$ zig build-exe -lpcap -lc src/main.zig --cache-dir zig-cache; ./main
src: 11001, dst: 53867
Implementation
Pre-requisites
- Zig is installed
$ zig version
0.11.0-dev.2639+4df87b40f
- Tshark/Wireshark installed
$ tshark --version
TShark (Wireshark) 3.6.2 (Git v3.6.2 packaged as 3.6.2-2)
Part 1: Calling C from Zig
The first part of the puzzle is figuring out how to use the C library “pcap” from Zig.
The official Zig docs are a good starting point to understand how to interact with C.
We will be trying to use the function pcap_open_offline
which
let’s us open an existing pcap file for reading.
First, let’s create a directory and a main.zig file:
$ mkdir bitcast-pcap
$ cd bitcast-pcap
$ cat >main.zig <<EOF
const std = @import("std");
const c = @cImport({
@cInclude("pcap/pcap.h");
});
pub fn main() !void {
var errbuf: [*]u8 = undefined;
var pcap = c.pcap_open_offline("onefix.pcap", errbuf);
std.debug.print("{any}\n", .{pcap});
}
EOF
Next, let’s try to build the executable:
$ zig build-exe main.zig
main.zig:2:11: error: C import failed
const c = @cImport({
^~~~~~~~
main.zig:2:11: note: libc headers not available; compilation does not link against libc
The error tells us that by default, compilation does not link against libc. Maybe we need to tell build-exe
what libraries to link against. Let’s look at the help:
$ zig build-exe --help | grep -A1 "Link Options"
Link Options:
-l[lib], --library [lib] Link against system library (only if actually used)
There is an option for linking against specified system libraries. In our case we want to use “pcap.h” so let’s specify the library to link against:
$ zig build-exe -lpcap main.zig
[...]
/home/nequo/.cache/zig/o/d644b3b2a9abdbbfef932e09d37e5788/cimport.h:1:10: error: 'pcap/pcap.h' file not found
#include <pcap/pcap.h>
The error we get indicates that we are missing the required C header files. Let’s install the package that provides them1 and try again:
$ sudo apt install libpcap-dev
$ zig build-exe -lpcap main.zig
$ ./main
Segmentation fault at address 0x0
???:?:?: 0x0 in ??? (???)
/home/nequo/zig-linux-x86_64-0.11.0-dev.2639+4df87b40f/lib/std/start.zig:609:37: 0x208fbd in posixCallMainAndExit (main)
const result = root.main() catch |err| {
^
/home/nequo/zig-linux-x86_64-0.11.0-dev.2639+4df87b40f/lib/std/start.zig:368:5: 0x208a70 in _start (main)
@call(.never_inline, posixCallMainAndExit, .{});
^
Aborted
This time, the build worked but running the code produced a segmentation fault.
My intuition is that pcap.h
requires some components of libc
2. Let’s also link against libc
:
$ zig build-exe -lpcap -lc main.zig
$ ./main
.home.nequo..cache.zig.o.c835a6c34d3e1a61bd792506e3063b28.cimport.struct_pcap@d104c0
Ok great! We now have a way of using a function from the C library “pcap” in our Zig code!
Part 2: Using libpcap
In Part 1, we used the function pcap_open_offline
which will open a pcap file and return a pcap_t *
if the operation was successful, or write the error to errbuf
otherwise. How do we use that return value?
Reading through programming with pcap, several interesting functions are mentioned.
Here are their function signatures extracted from the documentation:
const u_char *pcap_next(pcap_t *p, struct pcap_pkthdr *h);
int pcap_loop(pcap_t *p, int cnt, pcap_handler callback, u_char *user);
int pcap_dispatch(pcap_t *p, int cnt, pcap_handler callback, u_char *user);
For our simple example, let’s use pcap_next
which takes a pcap_t
and a pcap_pkthdr
struct and returns a pointer to the start of the packet’s data3.
We are going to need a pointer to an empty pcap_pkthdr
struct to pass in, but how do we use a struct declared in a C library?. If we add --cache-dir zig-cache
to our build-exe
command, we will be able to see some extra build artifacts that include a cimport.zig
file with libpcap’s functions and structs imported into Zig:
$ zig build-exe -lpcap -lc main.zig --cache-dir zig-cache
$ find zig-cache -name cimport.zig
zig-cache/o/c835a6c34d3e1a61bd792506e3063b28/cimport.zig
Looking at cimport.zig
, we find the definition of our pcap_pkthdr
struct:
pub const struct_pcap_pkthdr = extern struct {
ts: struct_timeval,
caplen: bpf_u_int32,
len: bpf_u_int32,
};
Let’s use this to create an uninitialised struct in our code, then run pcap_next
to fill it. Here is the full main.zig file:
const std = @import("std");
const c = @cImport({
@cInclude("pcap/pcap.h");
});
pub fn main() !void {
var header = c.struct_pcap_pkthdr{ .ts = undefined, .caplen = undefined, .len = undefined };
var errbuf: [*]u8 = undefined;
var pcap = c.pcap_open_offline("onefix.pcap", errbuf);
var packet = c.pcap_next(pcap, &header);
_ = packet;
std.debug.print("{any}\n", .{header});
}
Let’s build and run our new program:
$ zig build-exe -lpcap -lc main.zig --cache-dir zig-cache
$ ./main
zig-cache.o.c835a6c34d3e1a61bd792506e3063b28.cimport.struct_pcap_pkthdr{ .ts = zig-cache.o.c835a6c34d3e1a61bd792506e3063b28.cimport.struct_timeval{ .tv_sec = 1448733590, .tv_usec = 539999 }, .caplen = 505, .len = 505 }
This prints out the values from the struct and we can see the length of the packet is 505 bytes. We can verify this with tshark
or wireshark
:
$ tshark -r onefix.pcap -T fields -e frame.len
505
At this point, packet
is a pointer to the start of the packet’s bytes, which will be the start of the packet’s ethernet header!
Part 3: The @bitCast trick
In this section, the goal will be to use the pointer we produced in “Part 2” to retrieve the source and destination ports from the packet’s TCP header. Let’s first get that information from tshark so that we know what to values we are looking for:
$ tshark -r onefix.pcap -T fields -e tcp
Transmission Control Protocol, Src Port: 11001, Dst Port: 53867, Seq: 1, Ack: 1, Len: 439
Recall that in our code, packet
is pointing to the start of the ethernet header. To get to the TCP header, we need to offset our pointer by sizeof(ETHERNET_HEADER) + sizeof(IP_HEADER)
. A regular ethernet frame is 14 bytes long, and for this packet, the IP header is 20 bytes long ( $ tshark -r onefix.pcap -T fields -e ip.hdr_len
). The TCP header starts with 16 bits for the source port, so we take 2 bytes from the 34th byte by converting from our packet pointer to a slice:
const std = @import("std");
const c = @cImport({
@cInclude("pcap/pcap.h");
});
pub fn main() !void {
var header = c.struct_pcap_pkthdr{ .ts = undefined, .caplen = undefined, .len = undefined };
var errbuf: [*]u8 = undefined;
var pcap = c.pcap_open_offline("onefix.pcap", errbuf);
var packet = c.pcap_next(pcap, &header);
const tcp_src_port = @bitCast(u16, packet[34..36].*); // <---
std.debug.print("{d}\n", .{tcp_src_port});
}
We build and run the code:
$ zig build-exe -lpcap -lc main.zig --cache-dir zig-cache
$ ./main
63786
Oops, that does not look like 11001.. I’ll save you the pain that I went through with debugging this with a single word: “endianness”.
11,001
in decimal is2AF9
in Hex.63786
in decimal isF92A
in Hex.
On one hand, @bitCast
uses the endianness of the host when casting. On the machine I am running this on, that is little endian.
On the other hand, our pcap’s bytes are arranged in Network Byte Order which is big endian.
The bytes are ordered 2A
F9
in memory, but the bitcast is reading them as little endian and producing F9
2A
.
Let’s use @byteSwap
to fix that and print the expected value:
const tcp_src_port = @bitCast(u16, packet[34..36].*);
std.debug.print("{d}\n", .{@byteSwap(tcp_src_port)}); // <--
We build and run the code again:
$ zig build-exe -lpcap -lc main.zig --cache-dir zig-cache
$ ./main
11001
This time we get what we expect! But we promised at the start of this article that we would be bitcasting to a struct so let’s see what that would look like. In our TCP header, the 16 bits after source port represent the destination port, so we just need the first 4 bytes of the header:
const std = @import("std");
const c = @cImport({
@cInclude("pcap/pcap.h");
});
const tcp_header_partial = packed struct {
src_port: u16,
dst_port: u16,
};
pub fn main() !void {
var header = c.struct_pcap_pkthdr{ .ts = undefined, .caplen = undefined, .len = undefined };
var errbuf: [*]u8 = undefined;
var pcap = c.pcap_open_offline("onefix.pcap", errbuf);
errdefer pcap.close();
var packet = c.pcap_next(pcap, &header);
const tcp_src_dst = @bitCast(tcp_header_partial, packet[34..38].*); // <--
std.debug.print("src: {d}, dst: {d}\n", .{ @byteSwap(tcp_src_dst.src_port), @byteSwap(tcp_src_dst.dst_port) });
}
Instead of casting into a u16
like in our previous example, we are casting into a packed struct containing two u16
s. The “packed” keyword guarantees the struct’s memory layout. Try removing it and see what the compiler outputs. Notice that we also added errdefer pcap.close();
. This will close the pcap file when the function returns and give an error if that fails (although in our case we ignore it).
Let’s run our new program:
$ zig build-exe -lpcap -lc main.zig --cache-dir zig-cache
$ ./main
src: 11001, dst: 53867
Those values match what tshark
gave us - we did it!
The end
We looked at how to use an external C library in Zig and learned about the basics of working with libpcap. We then used Zig’s @bitCast
to parse the source and destination port numbers from a packet’s TCP header without needing to allocate extra memory or call any parsing/deserialising functions. Remember that you can play around with the pointer offset to retrieve other field values from the packet!
There might be better ways to @bitCast
with different endianness, but @byteSwap
was the only solution that I found which worked for me. If you know of a better way please do let me know!
Taking this further
- Use
pcap_loop
instead ofpcap_next
to iterate over a pcap with multiple packets. - Produce an output pcap after applying filtering based on the extracted struct fields.
- Performance tests for the above.
- Use zig’s build system rather than
build-exe
.
-
I am using Ubuntu in WSL for this example - package names will vary between distros and operating systems. ↩︎
-
Looking at the codebase for
pcap.h
, we see that it includesstdio.h
. ↩︎ -
Pcap files have a defined file format described in
pcap_savefile
. There is a per-file header and a per-packet header. Here we extract the per-packet header off our single pcap. ↩︎