Data Plane Development Kit
The DPDK is a set of libraries and drivers for fast packet processing and runs mostly in Linux userland. It is a set of libraries that provide the so called “Environment Abstraction Layer” (EAL). The EAL hides the details of the environment and provides a standard programming interface. Common use cases are around special solutions for instance network function virtualization and advanced high-throughput network switching. The DPDK uses a run-to-completion model for fast data plane performance and accesses devices via polling to eliminate the latency of interrupt processing at the tradeoff of higher cpu consumption. It was designed to run on any processors. The first supported CPU was Intel x86 and it is now extended to IBM PPC64 and ARM64.
Ubuntu further provides some infrastructure to ease DPDKs usability.
Prerequisites
This package is currently compiled for the lowest possible CPU requirements. Which still requires at least SSE3 and anything activated by -march=corei7 (in gcc) to be supported by the CPU.
The list of upstream DPDK supported network cards can be found at supported NICs. But a lot of those are disabled by default in the upstream Project as they are not yet in a stable state. The subset of network cards that DPDK has enabled in the package as available in Ubuntu 16.04 is:
DPDK has “userspace” drivers for the cards called PMDs.
The packages for these follow the pattern of librte-pmd-<type>-<version>
. Therefore the example for an intel e1000 in 18.11 would be librte-pmd-e1000-18.11
.
The more commonly used, tested and fully supported drivers are installed as dependencies of dpdk
. But there are way more in universe that follow the same naming pattern.
Unassigning the default Kernel drivers
Cards have to be unassigned from their kernel driver and instead be assigned to uio_pci_generic
of vfio-pci
. uio_pci_generic
is older and usually getting to work more easily, but also has less features and isolation.
The newer vfio-pci requires that you activate the following kernel parameters to enable iommu.
iommu=pt intel_iommu=on
Or on AMD
amd_iommu=pt
On top for vfio-pci you then have to configure and assign the iommu groups accordingly. That is mostly done in Firmware and by HW layout, you can check the group assignment the kernel probed in /sys/kernel/iommu_groups/
.
Note: virtio is special, dpdk can directly work on those devices without vfio_pci/uio_pci_generic. But to avoid issues by kernel and DPDK managing the device you still have to unassign the kernel driver.
Manual configuration and status checks can be done via sysfs or with the tool dpdk_nic_bind
dpdk_nic_bind.py --help
Usage:
dpdk-devbind.py [options] DEVICE1 DEVICE2 ....
where DEVICE1, DEVICE2 etc, are specified via PCI "domain:bus:slot.func" syntax
or "bus:slot.func" syntax. For devices bound to Linux kernel drivers, they may
also be referred to by Linux interface name e.g. eth0, eth1, em0, em1, etc.
Options:
--help, --usage:
Display usage information and quit
-s, --status:
Print the current status of all known network, crypto, event
and mempool devices.
For each device, it displays the PCI domain, bus, slot and function,
along with a text description of the device. Depending upon whether the
device is being used by a kernel driver, the igb_uio driver, or no
driver, other relevant information will be displayed:
* the Linux interface name e.g. if=eth0
* the driver being used e.g. drv=igb_uio
* any suitable drivers not currently using that device
e.g. unused=igb_uio
NOTE: if this flag is passed along with a bind/unbind option, the
status display will always occur after the other operations have taken
place.
--status-dev:
Print the status of given device group. Supported device groups are:
"net", "crypto", "event", "mempool" and "compress"
-b driver, --bind=driver:
Select the driver to use or "none" to unbind the device
-u, --unbind:
Unbind a device (Equivalent to "-b none")
--force:
By default, network devices which are used by Linux - as indicated by
having routes in the routing table - cannot be modified. Using the
--force flag overrides this behavior, allowing active links to be
forcibly unbound.
WARNING: This can lead to loss of network connection and should be used
with caution.
Examples:
---------
To display current device status:
dpdk-devbind.py --status
To display current network device status:
dpdk-devbind.py --status-dev net
To bind eth1 from the current driver and move to use igb_uio
dpdk-devbind.py --bind=igb_uio eth1
To unbind 0000:01:00.0 from using any driver
dpdk-devbind.py -u 0000:01:00.0
To bind 0000:02:00.0 and 0000:02:00.1 to the ixgbe kernel driver
dpdk-devbind.py -b ixgbe 02:00.0 02:00.1
DPDK Device configuration
The package dpdk provides init scripts that ease configuration of device assignment and huge pages. It also makes them persistent across reboots.
The following is an example of the file /etc/dpdk/interfaces
configuring two ports of a network card. One with uio_pci_generic
and the other one with vfio-pci
.
# <bus> Currently only "pci" is supported
# <id> Device ID on the specified bus
# <driver> Driver to bind against (vfio-pci or uio_pci_generic)
#
# Be aware that the two DPDK compatible drivers uio_pci_generic and vfio-pci are
# part of linux-image-extra-<VERSION> package.
# This package is not always installed by default - for example in cloud-images.
# So please install it in case you run into missing module issues.
#
# <bus> <id> <driver>
pci 0000:04:00.0 uio_pci_generic
pci 0000:04:00.1 vfio-pci
Cards are identified by their PCI-ID. If you are unsure you might use the tool dpdk_nic_bind.py
to show the current available devices and the drivers they are assigned to.
dpdk_nic_bind.py --status
Network devices using DPDK-compatible driver
============================================
0000:04:00.0 'Ethernet Controller 10-Gigabit X540-AT2' drv=uio_pci_generic unused=ixgbe
Network devices using kernel driver
===================================
0000:02:00.0 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eth0 drv=tg3 unused=uio_pci_generic *Active*
0000:02:00.1 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eth1 drv=tg3 unused=uio_pci_generic
0000:02:00.2 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eth2 drv=tg3 unused=uio_pci_generic
0000:02:00.3 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eth3 drv=tg3 unused=uio_pci_generic
0000:04:00.1 'Ethernet Controller 10-Gigabit X540-AT2' if=eth5 drv=ixgbe unused=uio_pci_generic
Other network devices
=====================
<none>
DPDK HugePage configuration
DPDK makes heavy use of huge pages to eliminate pressure on the TLB. Therefore hugepages have to be configured in your system.
The dpdk package has a config file and scripts that try to ease hugepage configuration for DPDK in the form of /etc/dpdk/dpdk.conf
. If you have more consumers of hugepages than just DPDK in your system or very special requirements how your hugepages are going to be set up you likely want to allocate/control them by yourself. If not this can be a great simplification to get DPDK configured for your needs.
Here an example configuring 1024 Hugepages of 2M each and 4 1G pages.
NR_2M_PAGES=1024
NR_1G_PAGES=4
As shown this supports configuring 2M and the larger 1G hugepages (or a mix of both). It will make sure there are proper hugetlbfs mountpoints for DPDK to find both sizes no matter what your default huge page size is. The config file itself holds more details on certain corner cases and a few hints if you want to allocate hugepages manually via a kernel parameter.
It depends on your needs which size you want - 1G pages are certainly more effective regarding TLB pressure. But there were reports of them fragmenting inside the DPDK memory allocations. Also it can be harder to grab enough free space to set up a certain amount of 1G pages later in the life-cycle of a system.
Compile DPDK Applications
Currently there are not a lot consumers of the DPDK library that are stable and released. OpenVswitch-DPDK being an exception to that (see below) and more are appearing. But in general it might still happen that you might want to compile an app against the library.
You will often find guides that tell you to fetch the DPDK sources, build them to your needs and eventually build your application based on DPDK by setting values RTE_* for the build system. Since Ubuntu provides an already compiled DPDK for you can can skip all that.
DPDK provides a valid pkg-config file
to simplify setting the proper variables and options.
sudo apt-get install dpdk-dev libdpdk-dev
gcc testdpdkprog.c $(pkg-config --libs --cflags libdpdk) -o testdpdkprog
An example of a complex (autoconfigure) user of pkg-config of DPDK including fallbacks to older non pkg-config style can be seen in the OpenVswitch build system.
Depending on what you build it might be a good addition to install all of DPDK build dependencies before the make, which on Ubuntu can be done automatically with.
sudo apt-get install build-dep dpdk
DPDK in KVM Guests
If you have no access to DPDK supported network cards you can still work with DPDK by using its support for virtio. To do so you have to create guests backed by hugepages (see above).
On top of that there it is required to have at least SSE3. The default CPU model qemu/libvirt uses is only up to SSE2. So you will have to define a model that passed the proper feature flags (or use host-passthrough).
An example can be found in following snippet to your virsh xml (or the equivalent virsh interface you use).
<cpu mode='host-passthrough'>
Also virtio nowadays supports multiqueue which DPDK in turn can exploit for better speed. To modify a normal virtio definition to have multiple queues add the following to your interface definition. This is about enhancing a normal virtio nic to have multiple queues, to later on be consumed e.g. by DPDK in the guest.
<driver name="vhost" queues="4"/>
Use DPDK
Since DPDK on its own is only (massive) library you most likely might continue to OpenVswitch-DPDK as an example to put it to use.
Resources
Last updated a month ago. Help improve this document in the forum.