vc4: T-format Textures


Warning: Improperly programming a GPU, or a display, may damage the devices.

This post describes the layout of the vc4 T-format textures.


RPi vc4 GPU consumes textures that are either in T-format, or in LT-format. Quoting the VideoCoreIV-AG100-R reference guide:

The hardware automatically determines type of image (T-format or LT-format) by looking at LOD dimensions. The hardware assumes a level is in T-format unless either the width or height for the level is less than one T-format tile. In this case use the hardware assumes the level is stored in LT-format.

Before a linear image is converted to the T-format, one must consider another assumption the GPU makes: an assumption about the layout of the linear image. The layout of the linear image affects the layout of the converted, T-format image, and consequently the final render.

The hardware assumes that the very first pixel, located right at the base-address of the buffer storing the linear image, belongs to the bottom-left corner of the image. That is, the GPU assumes that, given a T-format texture, its source, the linear-image buffer, has the pixel rows stored bottom to top as the addresses advance (go from lower values towards higher values) within the buffer.

# Linear Image Layout, as assumed by the GPU.

Base Address + 7 * sizeof(row) ----> +-------------+
Base Address + 6 * sizeof(row) ----> |    #####    |
Base Address + 4 * sizeof(row) ----> |   #     #   |
Base Address + 4 * sizeof(row) ----> |  #       #  |  <- right-side-up
Base Address + 3 * sizeof(row) ----> |  #########  |     image
Base Address + 2 * sizeof(row) ----> |  #       #  |
Base Address + 1 * sizeof(row) ----> |  #       #  |
Base Address + 0 * sizeof(row) ----> +-------------+------> X-Axis/Col-Axis
                                  (0, 0)

This condition may not be true, if the linear-image buffer stores the image from top-row to the bottom-row as the addresses advance in the buffer.

For e.g., a PPM file has its pixel rows stored top to bottom as the addresses advance in the file:

# PPM Layout.

Base Address + 0 * sizeof(row) ----> +-------------+-----> X-Axis/Col-Axis
Base Address + 1 * sizeof(row) ----> |    #####    |
Base Address + 2 * sizeof(row) ----> |   #     #   |
Base Address + 3 * sizeof(row) ----> |  #       #  |  <- right-side-up
Base Address + 4 * sizeof(row) ----> |  #########  |     image
Base Address + 5 * sizeof(row) ----> |  #       #  |
Base Address + 6 * sizeof(row) ----> |  #       #  |
Base Address + 7 * sizeof(row) ----> +-------------+

The conversion routine that is described here processes images assuming the following setup:

Base Address + 0 * sizeof(row) ----> +-------------+-----> X-Axis/Col-Axis
Base Address + 1 * sizeof(row) ----> |             |
Base Address + 2 * sizeof(row) ----> |             |
Base Address + 3 * sizeof(row) ----> |    image    |
Base Address + 4 * sizeof(row) ----> |             |
Base Address + 5 * sizeof(row) ----> |             |
Base Address + 6 * sizeof(row) ----> |             |
Base Address + 7 * sizeof(row) ----> +-------------+

Since the coordinate axes assumed by the routine are different from those assumed by the GPU, in certain situations fixups are needed, as described below.

Consider a 2D block of 4x4 pixels in a 64x64 linear texture image, as assumed by the GPU:

    19|       a  z  m  t
    18|       k  o  r  q   <- right-side-up block
    17|       l  n  f  b
    16|       p  w  g  i
    --+------------------------> X-Axis/Col-Axis
      |00 .. 12 13 14 15 .. 63

The reference guide linearizes this block as:

 ------ low-to-high-consecutive-addresses ----->
 . . . | p w g i l n f b k o r q a z m t | . . .

There are two possible in-memory layouts for the routine to consider.

If the buffer layout is consistent with the view the GPU assumes, then the routine views the image as,

      |00 .. 12 13 14 15 .. 63
    --+------------------------> X-Axis/Col-Axis
    16|       p  w  g  i
    17|       l  n  f  b
    18|       k  o  r  q
    19|       a  z  m  t

and linearize it the same as described in the reference guide:

 ------ low-to-high-consecutive-addresses ----->
 . . . | p w g i l n f b k o r q a z m t | . . .

But, if the row-order in the buffer is the opposite of what GPU assumes,

      |00 .. 12 13 14 15 .. 63
    --+------------------------> X-Axis/Col-Axis
    44|       a  z  m  t
    45|       k  o  r  q
    46|       l  n  f  b
    47|       p  w  g  i

then the routine linearizes the block as:

 ------ low-to-high-consecutive-addresses ----->
 . . . | a z m t k o r q l n f b p w g i |

If, for a particular texture, this inconsistency arises, it can be solved in at least two ways:

  1. either flip the linear image along the X-axis (vertical flip) so that the start of the buffer now contains the bottom-row, and then convert to T-format,
  2. or, let the start of the buffer contain the top-row, and convert the image with the inconsistent linearization, but program the GPU’s texture-configuration-parameter-#0 to flip the texture’s Y-axis (see FLIPY in the reference guide) to inform it to fixup the linearization at run-time.

In the e.g. below, the 2. alternative is chosen.


The T-format stores a 2D block of pixels in consecutive memory locations, with some swizzling. The common raster format usually has a group of pixels in a single row stored in consecutive memory locations, but two pixels which are adjacent in the Y-direction are usually several bytes/KBs/MBs/cache-lines apart. Since the GPUs tend to process the pixels in 2D, it is beneficial to have a block of pixels that are close together to be within the same cache-line/page/dram-bank.

The T-format is a hierarchical format. At the top of the hierarchy is the tile. A tile consists of sub-tiles (s-tiles); a sub-tile consists of micro-tiles (u-tiles).

A u-tile is of fixed size, 64 bytes - a typical cache-line size. An s-tile is of 1KB size, containing a linearized block of 4x4 u-tiles. A tile is of 4KB size, containing a linearized block of 2x2 s-tiles.

For a typical case of a 32-bits-per-pixel image (rgba8888/bgra8888), assumed herewith, a u-tile linearizes a block of 4x4 pixels. Based on the dimensions described above, an s-tile therefore linearizes a block of 16x16 pixels, and a tile a block of 32x32 pixels.

Consider the 256x256 LunarG from Vulkan-Tools repository. Note that this row-order of the raw pixels of this image is opposite to that assumed by the GPU.

The routine first copies the image, adding the 4th component, alpha, with the default value of 0xff. It does not flip the image; so the 2. alternative will become necessary when the converted texture is supplied to the GPU.

It then divides the linear image into tiles.

     |0 1 2 3 4 5 6 7
    -+------------------------> X-Axis/Col-Axis
    0|t t t t t t t t
    1|t t t t t t t t
    2|t t t t t t t t
    3|t t t t t t t t
    4|t t t t t t t t
    5|t t t t t t t t
    6|t t t t t t t t
    7|t t t t t t t t

Each tile (t above) is self-contained; the s-tile linearizations within a tile, and the u-tile linearizations within those s-tiles, are all independent of similar linearizations within other tiles.

The linearization of tiles depends upon the row# (Y-axis component) of the tile. The rows and columns are counted up from 0, although the reference guide seems to count them up from 1.

If the row# is even, the tiles in that row are linearized in the col-order as shown below, where tyx is the tile at row# y and col# x:

 ------ low-to-high-consecutive-addresses ----->
 . . . | ty0 ty1 ty2 ty3 ty4 ty5 ty6 ty7 | . . .

If the row# is odd, the tiles in that row are linearized in the reverse col-order as:

 ------ low-to-high-consecutive-addresses ----->
 . . . | ty7 ty6 ty5 ty4 ty3 ty2 ty1 ty0 | . . .

Thus, the linearization of the tiles in the sample image is as shown below:

# This isn't a 2D block. Each tile tyx is 4KB.
# The addresses are consecutive, rising from left to right and then back to the
# left of the next 'row' and so on.

      | t00 t01 t02 t03 t04 t05 t06 t17 t16 t15 t14 t13 t12 t11 t10 |
. . . | t20 t21 t22 t23 t24 t25 t26 t37 t36 t35 t34 t33 t32 t31 t30 | . . .
      | t40 t41 t42 t43 t44 t45 t46 t57 t56 t55 t54 t53 t52 t51 t50 |
      | t60 t61 t62 t63 t64 t65 t66 t77 t76 t75 t74 t73 t72 t71 t70 |

The linearization of the s-tiles within a tile also depends upon the row# of the tile, as shown below, where syx is the s-tile at row y and col x with a tile:

     |0 1
    -+----> X-Axis/Col-Axis
    0|s s
    1|s s

If the tile row# is even, the s-tiles in that tile are linearized as:

 ------ low-to-high-consecutive-addresses ----->
         . . . | s00 s10 s11 s01 | . . .

If the tile row# is odd, the s-tiles in that tile are linearized as:

 ------ low-to-high-consecutive-addresses ----->
         . . . | s11 s01 s00 s10 | . . .

The linearization of the u-tiles within an s-tile is in the raster order, as shown below, where uyx is the u-tile at row y and col x within an s-tile.

     |0 1 2 3
    -+---------> X-Axis/Col-Axis
    0|u u u u
    1|u u u u
    2|u u u u
    3|u u u u
            ------ low-to-high-consecutive-addresses ----->
 . | u00 u01 u02 u03 u10 u11 u12 u13 u20 u21 u22 u23 u30 u31 u32 u33 | .

T-format samples:

The 256x256 LunarG image, after conversion to T-format, is here. As the linear image is in a row-order opposite to that expected by the GPU, using this texture requires enabling FLIPY bit in the texture-configuration-parameter#0.

The same texture, created after flipping the linear image vertically, is here. Using this texture requires keeping the FLIPY bit off; it is already in the format expected by the GPU.

(The texture PNG images do not contain the alpha channel).