Original: https://people.gnome.org/~markmc/qcow-image-format.html

The QCOW2 Image Format

by Mark McLoughlin

Note: this documentation is for qcow version 2. It is out of date and has been replaced by  official documentation in qemu.git  https://git.qemu.org/?p=qemu.git;a=blob;f=docs/interop/qcow2.txt and https://git.qemu.org/?p=qemu.git;a=blob;f=docs/qcow2-cache.txt, which describe 32 additional header bytes (at offsets 72-103) for qcow version 3, plus optional sections called “header extensions,” which are not described in this version 2 documentation.
 

Command Examples

# Create 90 GB virtual machine image file:
qemu-img create -f qcow2 vm1.qcow2 90G
# Install Linux (SL 7 on KVM):
virt-install --network bridge:br0 --name vm1 --ram=1024 --vcpus=2 --disk path=vm1.qcow2,format=qcow2 --location=SL-7-x86_64-DVD.iso --extra-args="console=tty0 console=ttyS0,115200"
# Note: use qemu-img, not virt-install, to create the image file, lest you get an obsolete version

The QCOW image format is one of the disk image formats sup­port­ed by the QEMU pro­ces­sor em­u­la­tor. It is a rep­re­sen­ta­tion of a fixed size block device in a file. Ben­e­fits it of­fers over us­ing raw dump rep­re­sen­ta­tion in­clude:

  1. Smaller file size, even on filesystems which don't support holes (i.e. sparse files)
  2. Copy-on-write support, where the image only represents changes made to an underlying disk image
  3. Snapshot support, where the image can contain multiple snapshots of the images history
  4. Optional zlib based compression
  5. Optional AES encryption

The qemu-img command is the most common way of manipulating these images e.g.

  $> qemu-img create -f qcow2 test.qcow2 4G
  Formating 'test.qcow2', fmt=qcow2, size=4194304 kB
  $> qemu-img convert test.qcow2 -O raw test.img

The Header

Each QCOW2 file begins with a header, in big endian format, as follows:

  typedef struct QCowHeader {
      uint32_t magic;
      uint32_t version;

      uint64_t backing_file_offset;
      uint32_t backing_file_size;

      uint32_t cluster_bits;
      uint64_t size; /* in bytes */
      uint32_t crypt_method;

      uint32_t l1_size;
      uint64_t l1_table_offset;

      uint64_t refcount_table_offset;
      uint32_t refcount_table_clusters;

      uint32_t nb_snapshots;
      uint64_t snapshots_offset;
  } QCowHeader;

Typically the image file will be laid out as follows:

2-Level Lookups

With QCOW, the contents of the device are stored in clusters. Each cluster contains a number of 512 byte sectors.

In order to find the cluster for a given address within the device, you must traverse two levels of tables. The L1 table is an array of file offsets to L2 tables, and each L2 table is an array of file offsets to clusters.

So, an address is split into three separate offsets according to the cluster_bits field. For example, if cluster_bits is 12, then the address is split up as follows:

Note, the minimum size of the L1 table is a function of the size of the represented disk image:

  l1_size = round_up(disk_size / (cluster_size * l2_size), cluster_size)

In other words, in order to map a given disk address to an offset within the image:

  1. Obtain the L1 table address using the l1_table_offset header field
  2. Use the top (64 - l2_bits - cluster_bits) bits of the address to index the L1 table as an array of 64 bit entries
  3. Obtain the L2 table address using the offset in the L1 table
  4. Use the next l2_bits of the address to index the L2 table as an array of 64 bit entries
  5. Obtain the cluster address using the offset in the L2 table.
  6. Use the remaining cluster_bits of the address as an offset within the cluster itself

If the offset found in either the L1 or L2 table is zero, that area of the disk is not allocated within the image.

Note also, that the top two bits of each of the offsets found in the L1 and L2 tables are reserved for "copied" and "compressed" flags. More on that below.

Reference Counting

Each cluster is reference counted, allowing clusters to be freed if, and only if, they are no longer used by any snapshots.

The 2 byte reference count for each cluster is kept in cluster sized blocks. A table, given by refcount_table_offset and occupying refcount_table_clusters clusters, gives the offset in the image of each of these refcount blocks.

In order to obtain the reference count of a given cluster, you split the cluster offset into a refcount table offset and refcount block offset. Since a refcount block is a single cluster of 2 byte entries, the lower cluster_size - 1 bits is used as the block offset and the rest of the bits are used as the table offset.

One optimization is that if any cluster pointed to by an L1 or L2 table entry has a refcount exactly equal to one, the most significant bit of the L1/L2 entry is set as a "copied" flag. This indicates that no snapshots are using this cluster and it can be immediately written to without having to make a copy for any snapshots referencing it.

Copy-on-Write Images

# create new c-o-w image from base ("backing") image:
qemu-img create -b base.qemu2 -f qcow2 new.qemu2

A QCOW im­age can be used to store the chan­ges to an­o­ther disk im­age, with­out act­u­ally af­fect­ing the con­tents of the or­i­gin­al im­age. The im­age, known as a copy-on-write im­age, looks like a stand­a­lone im­age to the user but most of its data is ob­tained from the or­i­gin­al im­age. Only the clus­ters which dif­fer from the or­i­gin­al im­age are stored in the copy-on-write im­age file it­self.

The representation is very simple. The copy-on-write image contains the path to the original disk image, and the image header gives the location of the path string within the file.

When you want to read an cluster from the copy-on-write image, you first check to see if that area is allocated within the copy-on-write image. If not, you read the area from the original disk image.

Snapshots

# create a snapshot:
qemu-img snapshot -c snapshotname filename.qcow2
# list snapshots:
qemu-img -l filename.qcow2
# apply (revert to) a snapshot:
qemu-img snapshot -a snapshotname filename.qcow2
# delete a snapshot:
qemu-img snapshot -d snapshotname filename.qcow2

Snap­shots are a sim­i­lar no­tion to the copy-on-write fea­ture, ex­cept it is the or­i­gin­al im­age that is writ­able, not the snap­shots.

To ex­plain fur­ther, a copy-on-write im­age could con­fus­ing­ly be call­ed a "snap­shot," since it does in­deed rep­re­sent a snap­shot of the or­i­gin­al im­age's state. You can make mul­ti­ple "snapshots" of the original image by creating multiple copy-on-write images, each referring to the same original image. What's noteworthy here, though, is that the original image must be considered read-only and it is the copy-on-write snapshots which are writable.

Snapshots – "real snapshots" – are represented in the original image itself. Each snapshot is a read-only record of the image a past instant. The original image remains writable and as modifications are made to it, a copy of the original data is made for any snapshots referring to it.

Each snapshot is described by a header:

  typedef struct QCowSnapshotHeader {
      /* header is 8 byte aligned */
      uint64_t l1_table_offset;

      uint32_t l1_size;
      uint16_t id_str_size;
      uint16_t name_size;

      uint32_t date_sec;
      uint32_t date_nsec;

      uint64_t vm_clock_nsec;

      uint32_t vm_state_size;
      uint32_t extra_data_size; /* for extension */
      /* extra data follows */
      /* id_str follows */
      /* name follows  */
  } QCowSnapshotHeader;

Details are as follows

A snapshot is created by adding one of these headers, making a copy of the L1 table and incrementing the reference counts of all L2 tables and data clusters referenced by the L1 table. Later, if any L2 table or data clusters of the underlying image are to be modified – i.e. if the reference count of the cluster is greater than 1 and/or the "copied" flag is set for that cluster – they will first be copied and then written to. That way, all snapshots remains unmodified.

Compression

The QCOW format supports compression by allowing each cluster to be independently compressed with zlib.

This is represented in the cluster offset obtained from the L2 table as follows:

Encryption

The QCOW format also supports the encryption of clusters.

If the crypt_method header field is 1, then a 16 character password is used as the 128 bit AES key.

Each sector within each cluster is independently encrypted using AES Cipher Block Chaining mode, using the sector's offset (relative to the start of the device) in little-endian format as the first 64 bits of the 128 bit initialisation vector.

The QCOW Format

Version 2 of the QCOW format differs from the original version in the following ways:

  1. It supports the concepts of snapshots; version 1 only had the concept of copy-on-write image
  2. Clusters are reference counted in version 2; reference counting was added to support snapshots
  3. L2 tables always occupy a single cluster in version 2; previously their size was given by a l2_bits header field
  4. The size of compressed clusters is now given in sectors instead of bytes

A previous version of this document which described version 1 only is available here.

By Mark McLoughlin. Sep 11, 2008.
HTML5 formatting updates by Dave Burton. Dec 27, 2014.