vc4: T-format Textures

Warning: Improperly programming a GPU, or a display, may damage the devices.

This post describes the layout of the vc4 T-format textures.

Preparation:

RPi vc4 GPU consumes textures that are either in T-format, or in LT-format. Quoting the VideoCoreIV-AG100-R reference guide:

The hardware automatically determines type of image (T-format or LT-format) by looking at LOD dimensions. The hardware assumes a level is in T-format unless either the width or height for the level is less than one T-format tile. In this case use the hardware assumes the level is stored in LT-format.

Before a linear image is converted to the T-format, one must consider another assumption the GPU makes: an assumption about the layout of the linear image. The layout of the linear image affects the layout of the converted, T-format image, and consequently the final render.

The hardware assumes that the very first pixel, located right at the base-address of the buffer storing the linear image, belongs to the bottom-left corner of the image. That is, the GPU assumes that, given a T-format texture, its source, the linear-image buffer, has the pixel rows stored bottom to top as the addresses advance (go from lower values towards higher values) within the buffer.

# Linear Image Layout, as assumed by the GPU.


                               Y-Axis/Row-Axis
                                     ^
                                     |
                                     |
                                     |
Base Address + 7 * sizeof(row) ----> +-------------+
Base Address + 6 * sizeof(row) ----> |    #####    |
Base Address + 4 * sizeof(row) ----> |   #     #   |
Base Address + 4 * sizeof(row) ----> |  #       #  |  <- right-side-up
Base Address + 3 * sizeof(row) ----> |  #########  |     image
Base Address + 2 * sizeof(row) ----> |  #       #  |
Base Address + 1 * sizeof(row) ----> |  #       #  |
Base Address + 0 * sizeof(row) ----> +-------------+------> X-Axis/Col-Axis
                                  (0, 0)

This condition may not be true, if the linear-image buffer stores the image from top-row to the bottom-row as the addresses advance in the buffer.

For e.g., a PPM file has its pixel rows stored top to bottom as the addresses advance in the file:

# PPM Layout.

Base Address + 0 * sizeof(row) ----> +-------------+-----> X-Axis/Col-Axis
Base Address + 1 * sizeof(row) ----> |    #####    |
Base Address + 2 * sizeof(row) ----> |   #     #   |
Base Address + 3 * sizeof(row) ----> |  #       #  |  <- right-side-up
Base Address + 4 * sizeof(row) ----> |  #########  |     image
Base Address + 5 * sizeof(row) ----> |  #       #  |
Base Address + 6 * sizeof(row) ----> |  #       #  |
Base Address + 7 * sizeof(row) ----> +-------------+
                                     |
                                     |
                                     |
                                     v
                               Y-Axis/Row-Axis

The conversion routine that is described here processes images assuming the following setup:

Base Address + 0 * sizeof(row) ----> +-------------+-----> X-Axis/Col-Axis
Base Address + 1 * sizeof(row) ----> |             |
Base Address + 2 * sizeof(row) ----> |             |
Base Address + 3 * sizeof(row) ----> |    image    |
Base Address + 4 * sizeof(row) ----> |             |
Base Address + 5 * sizeof(row) ----> |             |
Base Address + 6 * sizeof(row) ----> |             |
Base Address + 7 * sizeof(row) ----> +-------------+
                                     |
                                     |
                                     |
                                     v
                               Y-Axis/Row-Axis

Since the coordinate axes assumed by the routine are different from those assumed by the GPU, in certain situations fixups are needed, as described below.

Consider a 2D block of 4x4 pixels in a 64x64 linear texture image, as assumed by the GPU:

Y-Axis/Row-Axis
      ^
      |
    63|
    ..|
    19|       a  z  m  t
    18|       k  o  r  q   <- right-side-up block
    17|       l  n  f  b
    16|       p  w  g  i
    ..|
    00|
    --+------------------------> X-Axis/Col-Axis
      |00 .. 12 13 14 15 .. 63

The reference guide linearizes this block as:

 ------ low-to-high-consecutive-addresses ----->
       +---------------------------------+
 . . . | p w g i l n f b k o r q a z m t | . . .
       +---------------------------------+

There are two possible in-memory layouts for the routine to consider.

If the buffer layout is consistent with the view the GPU assumes, then the routine views the image as,

      |00 .. 12 13 14 15 .. 63
    --+------------------------> X-Axis/Col-Axis
    00|
    ..|
    16|       p  w  g  i
    17|       l  n  f  b
    18|       k  o  r  q
    19|       a  z  m  t
    ..|
    63|
      |
      v
Y-Axis/Row-Axis

and linearize it the same as described in the reference guide:

 ------ low-to-high-consecutive-addresses ----->
       +---------------------------------+
 . . . | p w g i l n f b k o r q a z m t | . . .
       +---------------------------------+

But, if the row-order in the buffer is the opposite of what GPU assumes,

      |00 .. 12 13 14 15 .. 63
    --+------------------------> X-Axis/Col-Axis
    00|
    ..|
    44|       a  z  m  t
    45|       k  o  r  q
    46|       l  n  f  b
    47|       p  w  g  i
    ..|
    63|
      |
      v
Y-Axis/Row-Axis

then the routine linearizes the block as:

 ------ low-to-high-consecutive-addresses ----->
       +---------------------------------+
 . . . | a z m t k o r q l n f b p w g i |
       +---------------------------------+

If, for a particular texture, this inconsistency arises, it can be solved in at least two ways:

either flip the linear image along the X-axis (vertical flip) so that the start of the buffer now contains the bottom-row, and then convert to T-format,
or, let the start of the buffer contain the top-row, and convert the image with the inconsistent linearization, but program the GPU’s texture-configuration-parameter-#0 to flip the texture’s Y-axis (see FLIPY in the reference guide) to inform it to fixup the linearization at run-time.

In the e.g. below, the 2. alternative is chosen.

T-format:

The T-format stores a 2D block of pixels in consecutive memory locations, with some swizzling. The common raster format usually has a group of pixels in a single row stored in consecutive memory locations, but two pixels which are adjacent in the Y-direction are usually several bytes/KBs/MBs/cache-lines apart. Since the GPUs tend to process the pixels in 2D, it is beneficial to have a block of pixels that are close together to be within the same cache-line/page/dram-bank.

The T-format is a hierarchical format. At the top of the hierarchy is the tile. A tile consists of sub-tiles (s-tiles); a sub-tile consists of micro-tiles (u-tiles).

A u-tile is of fixed size, 64 bytes - a typical cache-line size. An s-tile is of 1KB size, containing a linearized block of 4x4 u-tiles. A tile is of 4KB size, containing a linearized block of 2x2 s-tiles.

For a typical case of a 32-bits-per-pixel image (rgba8888/bgra8888), assumed herewith, a u-tile linearizes a block of 4x4 pixels. Based on the dimensions described above, an s-tile therefore linearizes a block of 16x16 pixels, and a tile a block of 32x32 pixels.

Consider the 256x256 LunarG from Vulkan-Tools repository. Note that this row-order of the raw pixels of this image is opposite to that assumed by the GPU.

The routine first copies the image, adding the 4^th component, alpha, with the default value of 0xff. It does not flip the image; so the 2. alternative will become necessary when the converted texture is supplied to the GPU.

It then divides the linear image into tiles.

     |0 1 2 3 4 5 6 7
    -+------------------------> X-Axis/Col-Axis
    0|t t t t t t t t
    1|t t t t t t t t
    2|t t t t t t t t
    3|t t t t t t t t
    4|t t t t t t t t
    5|t t t t t t t t
    6|t t t t t t t t
    7|t t t t t t t t
     |
     v
Y-Axis/Row-Axis

Each tile (t above) is self-contained; the s-tile linearizations within a tile, and the u-tile linearizations within those s-tiles, are all independent of similar linearizations within other tiles.

The linearization of tiles depends upon the row# (Y-axis component) of the tile. The rows and columns are counted up from 0, although the reference guide seems to count them up from 1.

If the row# is even, the tiles in that row are linearized in the col-order as shown below, where tyx is the tile at row# y and col# x:

 ------ low-to-high-consecutive-addresses ----->
       +---------------------------------+
 . . . | ty0 ty1 ty2 ty3 ty4 ty5 ty6 ty7 | . . .
       +---------------------------------+

If the row# is odd, the tiles in that row are linearized in the reverse col-order as:

 ------ low-to-high-consecutive-addresses ----->
       +---------------------------------+
 . . . | ty7 ty6 ty5 ty4 ty3 ty2 ty1 ty0 | . . .
       +---------------------------------+

Thus, the linearization of the tiles in the sample image is as shown below:

# This isn't a 2D block. Each tile tyx is 4KB.
# The addresses are consecutive, rising from left to right and then back to the
# left of the next 'row' and so on.

      +-------------------------------------------------------------+
      | t00 t01 t02 t03 t04 t05 t06 t17 t16 t15 t14 t13 t12 t11 t10 |
. . . | t20 t21 t22 t23 t24 t25 t26 t37 t36 t35 t34 t33 t32 t31 t30 | . . .
      | t40 t41 t42 t43 t44 t45 t46 t57 t56 t55 t54 t53 t52 t51 t50 |
      | t60 t61 t62 t63 t64 t65 t66 t77 t76 t75 t74 t73 t72 t71 t70 |
      +-------------------------------------------------------------+

The linearization of the s-tiles within a tile also depends upon the row# of the tile, as shown below, where syx is the s-tile at row y and col x with a tile:

     |0 1
    -+----> X-Axis/Col-Axis
    0|s s
    1|s s
     |
     v
Y-Axis/Row-Axis

If the tile row# is even, the s-tiles in that tile are linearized as:

 ------ low-to-high-consecutive-addresses ----->
               +-----------------+
         . . . | s00 s10 s11 s01 | . . .
               +-----------------+

If the tile row# is odd, the s-tiles in that tile are linearized as:

 ------ low-to-high-consecutive-addresses ----->
               +-----------------+
         . . . | s11 s01 s00 s10 | . . .
               +-----------------+

The linearization of the u-tiles within an s-tile is in the raster order, as shown below, where uyx is the u-tile at row y and col x within an s-tile.

     |0 1 2 3
    -+---------> X-Axis/Col-Axis
    0|u u u u
    1|u u u u
    2|u u u u
    3|u u u u
     |
     v
Y-Axis/Row-Axis

            ------ low-to-high-consecutive-addresses ----->
   +-----------------------------------------------------------------+
 . | u00 u01 u02 u03 u10 u11 u12 u13 u20 u21 u22 u23 u30 u31 u32 u33 | .
   +-----------------------------------------------------------------+

T-format samples:

The 256x256 LunarG image, after conversion to T-format, is here. As the linear image is in a row-order opposite to that expected by the GPU, using this texture requires enabling FLIPY bit in the texture-configuration-parameter#0.

The same texture, created after flipping the linear image vertically, is here. Using this texture requires keeping the FLIPY bit off; it is already in the format expected by the GPU.

(The texture PNG images do not contain the alpha channel).

vc4: T-format Textures

2023/06/27

Preparation:

T-format:

T-format samples: