QEMU is an emulator that can be used during the development of an NVMe driver. It offers NVMe 1.4 spec-compliant controller emulation. The neat part about using QEMU is that it only emulates the controller and not the device itself, thereby allowing the driver writer to focus solely on writing a spec-compliant driver without initially worrying about the quirks that come along with an actual NVMe device. On top of that, QEMU offers tracing capabilities, making debugging very easy during initial development. And, last but not least, an actual NVMe device is not needed for development, and the host machine will not be affected in any way during the development. That is enough marketing as to why QEMU is excellent for NVMe driver development.

vmctl will be used to set up the QEMU development environment. It makes life a bit easy by automating the creation and management of QEMU as one of the primary usecase it specifically targets is NVMe development. However, it is not necessary to use this tool to create and manage QEMU. Vagrant with libvirt is a possible alternative

If you already have a QEMU setup for Linux development, only a few lines of setup command are required. Please go ahead and skip VMCTL section, as I will cover that at the end of the article.

VMCTL:

The official github page has an excellent README which should be good enough to get started. I will reiterate certain parts of the README in a different order and add a bit more context for readers who are entirely new to this topic.

Make sure to clone the official repo and make sure the vmctl is added to the path via a symlink as suggested in the official README. Before we use vmctl, a boot image needs to be created.

Here is an Ansible role to automate the steps described in this article for readers who prefer IaC.

Ubuntu boot image

Download a ubuntu cloud image from the official site. Resize the ubuntu image as follows:

>$ qemu-img resize ubuntu-<ver>-server-cloudimg-amd64.img 8G

Create a new folder called vms to hold all the VM related data and copy the ubuntu qcow into a folder named img (inside vms) as base.qcow2.

>$ mkdir -p vms/img
>$ cp ubuntu-<ver>-server-cloudimg-amd64.img vms/img/base.qcow2

cloud-init

After creating a Ubuntu based qcow image, the image can be configured using cloud-init script provided by vmctl. Running this helps set some defaults that will be useful when we boot the system. Also we will set it up to accept ssh connections from our host by providing it our ssh public key.

>$ ./vmctl/contrib/generate-cloud-config-seed.sh ~/.ssh/<your-public-key-for-qemu>.pub
>$mv seed.img vms/img/

Using vmctl to boot the image in QEMU

The official repo provides a set of example configuration files to boot Linux with NVMe storage. One thing to note is that even though the guest OS running in QEMU sees an NVMe drive, QEMU only emulates the NVMe controller, but underneath, it uses the storage media of the host. For more details on how QEMU emulates the NVMe controller, do check out this video by the current maintainer of the QEMU NVMe subsystem.

Copy the relevant files to the config subfolder inside vms folder.

>$ mkdir vms/config
>$ cp vmctl/examples/vm/nvme.conf vms/config
>$ cp vmctl/examples/vm/q35-base.conf vms/config
>$ cp vmctl/examples/vm/common.conf vms/config

Add just one line QEMU_PARAMS+=("-s") in the nvme.conf file as follows:

_setup_nvme() {
# setup basevm
  _setup_q35_base

  QEMU_PARAMS+=("-s")

The reason to add -s option to qemu is to enable debugging with gdb from the host machine. This will be handy to do remote debugging with gdb as it opens up port:1234 for that purpose.

Firstly, the image needs to be configured with the seed.img that was created. Run the following to do that:

>vms$ vmctl -c nvme.conf run -c

Once the configuration is complete from the previous step, the image can be booted by running the following command:

>vms$ vmctl -c nvme.conf run

Now check whether the build worked by sshing into the VM as follows from another terminal:

ssh -p 2222 'root@localhost'

Inside the VM, make sure that there is an NVMe driver attached to our VM by running lsblk inside the VM:

[root@archlinux ~]# lsblk
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vda     254:0    0   8G  0 disk
└─vda1  254:1    0   8G  0 part /
nvme0n1 259:0    0   1G  0 disk

As we can see, Linux detects the NVMe drive and creates a block device nvme0n1.

If you encounter any issues, ensure you are inside the vms folder. If you want to run the command from a different folder, set the VMCTL_VMROOT environment variable pointing to the vms directory.

Using a custom kernel and tracing

We can use Linus’s tree to use the latest mainline kernel version. And QEMU also has the option to use a custom kernel during boot. To compile a custom kernel, do the following:

>$ git clone https://github.com/torvalds/linux.git 
>$ cd linux
>$ make menuconfig
>$ make -j$(nproc)

Grab a cup of coffee while the kernel builds.

Once the build is complete, run the vmctl tool by pointing to the kernel build dir as follows:

>vms$ vmctl -c nvme.conf run -k <path-to-linux-dir>

It is also a good idea to enable pci_nvme tracing in QEMU to help with the debugging.

The final command, which does everything, is as follows:

>vms$ vmctl -c nvme.conf run -t pci_nvme -k <path-to-linux-dir>

Note that vmctl uses a systemd service to mount the kernel directory from host, enabling the use of modules compiled in the host to be consumed inside the Guest. This is a nice feature to avoid building all modules into a single kernel binary, thereby significantly reducing kernel build time when modifying a module. Also, some test suites, such as blktest, require some drivers to be dynamically loadable modules.

To view the QEMU trace, run the following in another terminal:

>vms$ vmctl -c nvme.conf log -f

Tmux could be used to run these commands in different panes in the same window.

VFIO device passthrough

VFIO passthrough can be used to access the NVMe device from the Guest OS. VFIO-PCI module will pass the NVMe device to the Guest, and the Guest OS’s NVMe driver can be used to talk to the device.

Pre and post hooks from vmctl config can be used to bind/unbind the respective device before starting QEMU. This article explains the setup needed to do vfio-pci passthrough.

The following pre hook will detach the host’s NVMe driver from 01:00.0 PCI port and attach it to the vfio-pci module before starting QEMU, and the post hook in nvme.conf will restore the original state after exiting QEMU :

_pre() {
		# Pre hook to run before starting QEMU
	 
		# unbind 0000:01:00.0 from nvme kernel module
		echo '0000:01:00.0' > /sys/bus/pci/drivers/nvme/unbind
	 
		# bind 0000:01:00.0 to vfio-pci kernel module
		echo '0000:01:00.0' > /sys/bus/pci/drivers/vfio-pci/bind
	}
	 
_post() {
		# Post hook to run after exiting QEMU
	 
		# unbind 0000:01:00.0 from vfio-pci kernel module
		echo '0000:01:00.0' > /sys/bus/pci/drivers/vfio-pci/unbind
	 
		# bind 0000:01:00.0 to xhci_hcd kernel module
		echo '0000:01:00.0' > /sys/bus/pci/drivers/nvme/bind
	}

Without using VMCTL

As mentioned before, vmctl tool makes life a bit easier to manage QEMU NVMe development. But it is an optional tool.

If you already have a workflow with QEMU, then it can be easily extended.

To create a QEMU instance with an NVMe driver, add the following lines1 while running your QEMU command:

-drive file=nvm.img,if=none,id=nvm
-device nvme,serial=deadbeef,drive=nvm

This requires nvm.img raw image, which can be easily created as follows:

>$ qemu-img create nvm.img 1G

Apart from the nvme changes to the qemu command, make sure to use the -kernel option to point to the custom kernel and -s option to enable gdb remote debugging.

Conclusion

This article showed how to set up a QEMU-based environment for an NVMe driver development. I have also added an Ansible role to automate the vmctl setup. It also builds and installs the upstream QEMU for VMCTL.

I hope you enjoyed the article. Happy Hacking!

1 Taken from the official QEMU documentation here

2 NVMe VFIO passthrough with QEMU link