Warning: Improperly programming a GPU, or a display, may damage the devices.
This post describes the layout of the vc4
T-format textures.
Preparation:
RPi vc4
GPU consumes textures that are either in T-format
, or in
LT-format
.
Quoting the VideoCoreIV-AG100-R
reference guide:
The hardware automatically determines type of image (T-format or LT-format) by looking at LOD dimensions. The hardware assumes a level is in T-format unless either the width or height for the level is less than one T-format tile. In this case use the hardware assumes the level is stored in LT-format.
Before a linear image is converted to the T-format, one must consider another assumption the GPU makes: an assumption about the layout of the linear image. The layout of the linear image affects the layout of the converted, T-format image, and consequently the final render.
The hardware assumes that the very first pixel, located right at the base-address of the buffer storing the linear image, belongs to the bottom-left corner of the image. That is, the GPU assumes that, given a T-format texture, its source, the linear-image buffer, has the pixel rows stored bottom to top as the addresses advance (go from lower values towards higher values) within the buffer.
# Linear Image Layout, as assumed by the GPU.
Y-Axis/Row-Axis
^
|
|
|
Base Address + 7 * sizeof(row) ----> +-------------+
Base Address + 6 * sizeof(row) ----> | ##### |
Base Address + 4 * sizeof(row) ----> | # # |
Base Address + 4 * sizeof(row) ----> | # # | <- right-side-up
Base Address + 3 * sizeof(row) ----> | ######### | image
Base Address + 2 * sizeof(row) ----> | # # |
Base Address + 1 * sizeof(row) ----> | # # |
Base Address + 0 * sizeof(row) ----> +-------------+------> X-Axis/Col-Axis
(0, 0)
This condition may not be true, if the linear-image buffer stores the image from top-row to the bottom-row as the addresses advance in the buffer.
For e.g., a PPM file has its pixel rows stored top to bottom as the addresses advance in the file:
# PPM Layout.
Base Address + 0 * sizeof(row) ----> +-------------+-----> X-Axis/Col-Axis
Base Address + 1 * sizeof(row) ----> | ##### |
Base Address + 2 * sizeof(row) ----> | # # |
Base Address + 3 * sizeof(row) ----> | # # | <- right-side-up
Base Address + 4 * sizeof(row) ----> | ######### | image
Base Address + 5 * sizeof(row) ----> | # # |
Base Address + 6 * sizeof(row) ----> | # # |
Base Address + 7 * sizeof(row) ----> +-------------+
|
|
|
v
Y-Axis/Row-Axis
The conversion routine that is described here processes images assuming the following setup:
Base Address + 0 * sizeof(row) ----> +-------------+-----> X-Axis/Col-Axis
Base Address + 1 * sizeof(row) ----> | |
Base Address + 2 * sizeof(row) ----> | |
Base Address + 3 * sizeof(row) ----> | image |
Base Address + 4 * sizeof(row) ----> | |
Base Address + 5 * sizeof(row) ----> | |
Base Address + 6 * sizeof(row) ----> | |
Base Address + 7 * sizeof(row) ----> +-------------+
|
|
|
v
Y-Axis/Row-Axis
Since the coordinate axes assumed by the routine are different from those assumed by the GPU, in certain situations fixups are needed, as described below.
Consider a 2D block of 4x4
pixels in a 64x64
linear texture image, as
assumed by the GPU:
Y-Axis/Row-Axis
^
|
63|
..|
19| a z m t
18| k o r q <- right-side-up block
17| l n f b
16| p w g i
..|
00|
--+------------------------> X-Axis/Col-Axis
|00 .. 12 13 14 15 .. 63
The reference guide linearizes this block as:
------ low-to-high-consecutive-addresses ----->
+---------------------------------+
. . . | p w g i l n f b k o r q a z m t | . . .
+---------------------------------+
There are two possible in-memory layouts for the routine to consider.
If the buffer layout is consistent with the view the GPU assumes, then the routine views the image as,
|00 .. 12 13 14 15 .. 63
--+------------------------> X-Axis/Col-Axis
00|
..|
16| p w g i
17| l n f b
18| k o r q
19| a z m t
..|
63|
|
v
Y-Axis/Row-Axis
and linearize it the same as described in the reference guide:
------ low-to-high-consecutive-addresses ----->
+---------------------------------+
. . . | p w g i l n f b k o r q a z m t | . . .
+---------------------------------+
But, if the row-order in the buffer is the opposite of what GPU assumes,
|00 .. 12 13 14 15 .. 63
--+------------------------> X-Axis/Col-Axis
00|
..|
44| a z m t
45| k o r q
46| l n f b
47| p w g i
..|
63|
|
v
Y-Axis/Row-Axis
then the routine linearizes the block as:
------ low-to-high-consecutive-addresses ----->
+---------------------------------+
. . . | a z m t k o r q l n f b p w g i |
+---------------------------------+
If, for a particular texture, this inconsistency arises, it can be solved in at least two ways:
- either flip the linear image along the X-axis (vertical flip) so that the start of the buffer now contains the bottom-row, and then convert to T-format,
- or, let the start of the buffer contain the top-row, and convert the image
with the inconsistent linearization, but program the GPU’s
texture-configuration-parameter-#0
to flip the texture’s Y-axis (seeFLIPY
in the reference guide) to inform it to fixup the linearization at run-time.
In the e.g. below, the 2. alternative is chosen.
T-format:
The T-format stores a 2D block of pixels in consecutive memory locations, with some swizzling. The common raster format usually has a group of pixels in a single row stored in consecutive memory locations, but two pixels which are adjacent in the Y-direction are usually several bytes/KBs/MBs/cache-lines apart. Since the GPUs tend to process the pixels in 2D, it is beneficial to have a block of pixels that are close together to be within the same cache-line/page/dram-bank.
The T-format is a hierarchical format. At the top of the hierarchy is the
tile
. A tile
consists of sub-tiles (s-tiles)
; a sub-tile
consists of
micro-tiles (u-tiles)
.
A u-tile
is of fixed size, 64 bytes - a typical cache-line size. An s-tile
is of 1KB size, containing a linearized block of 4x4
u-tiles
. A tile
is of
4KB size, containing a linearized block of 2x2
s-tiles
.
For a typical case of a 32-bits-per-pixel image (rgba8888/bgra8888
), assumed
herewith, a u-tile linearizes a block of 4x4
pixels.
Based on the dimensions described above, an s-tile therefore linearizes a
block of 16x16
pixels, and a tile a block of 32x32
pixels.
Consider the 256x256
LunarG
from Vulkan-Tools repository. Note that this row-order of the raw pixels of
this image is opposite to that assumed by the GPU.
The routine first copies the image, adding the 4th component,
alpha
, with the
default value of 0xff
. It does not flip the image; so the 2. alternative will
become necessary when the converted texture is supplied to the GPU.
It then divides the linear image into tiles.
|0 1 2 3 4 5 6 7
-+------------------------> X-Axis/Col-Axis
0|t t t t t t t t
1|t t t t t t t t
2|t t t t t t t t
3|t t t t t t t t
4|t t t t t t t t
5|t t t t t t t t
6|t t t t t t t t
7|t t t t t t t t
|
v
Y-Axis/Row-Axis
Each tile (t
above) is self-contained; the s-tile linearizations within a
tile, and the u-tile linearizations within those s-tiles, are all independent
of similar linearizations within other tiles.
The linearization of tiles depends upon the row# (Y-axis component) of the tile. The rows and columns are counted up from 0, although the reference guide seems to count them up from 1.
If the row# is even, the tiles in that row are linearized in the col-order as
shown below, where tyx
is the tile at row# y
and col# x
:
------ low-to-high-consecutive-addresses ----->
+---------------------------------+
. . . | ty0 ty1 ty2 ty3 ty4 ty5 ty6 ty7 | . . .
+---------------------------------+
If the row# is odd, the tiles in that row are linearized in the reverse col-order as:
------ low-to-high-consecutive-addresses ----->
+---------------------------------+
. . . | ty7 ty6 ty5 ty4 ty3 ty2 ty1 ty0 | . . .
+---------------------------------+
Thus, the linearization of the tiles in the sample image is as shown below:
# This isn't a 2D block. Each tile tyx is 4KB.
# The addresses are consecutive, rising from left to right and then back to the
# left of the next 'row' and so on.
+-------------------------------------------------------------+
| t00 t01 t02 t03 t04 t05 t06 t17 t16 t15 t14 t13 t12 t11 t10 |
. . . | t20 t21 t22 t23 t24 t25 t26 t37 t36 t35 t34 t33 t32 t31 t30 | . . .
| t40 t41 t42 t43 t44 t45 t46 t57 t56 t55 t54 t53 t52 t51 t50 |
| t60 t61 t62 t63 t64 t65 t66 t77 t76 t75 t74 t73 t72 t71 t70 |
+-------------------------------------------------------------+
The linearization of the s-tiles within a tile also depends upon the row#
of the tile, as shown below, where syx
is the s-tile at row y
and col x
with a tile:
|0 1
-+----> X-Axis/Col-Axis
0|s s
1|s s
|
v
Y-Axis/Row-Axis
If the tile row# is even, the s-tiles in that tile are linearized as:
------ low-to-high-consecutive-addresses ----->
+-----------------+
. . . | s00 s10 s11 s01 | . . .
+-----------------+
If the tile row# is odd, the s-tiles in that tile are linearized as:
------ low-to-high-consecutive-addresses ----->
+-----------------+
. . . | s11 s01 s00 s10 | . . .
+-----------------+
The linearization of the u-tiles within an s-tile is in the raster order, as
shown below, where uyx
is the u-tile at row y
and col x
within an s-tile.
|0 1 2 3
-+---------> X-Axis/Col-Axis
0|u u u u
1|u u u u
2|u u u u
3|u u u u
|
v
Y-Axis/Row-Axis
------ low-to-high-consecutive-addresses ----->
+-----------------------------------------------------------------+
. | u00 u01 u02 u03 u10 u11 u12 u13 u20 u21 u22 u23 u30 u31 u32 u33 | .
+-----------------------------------------------------------------+
T-format samples:
The 256x256
LunarG
image, after conversion to T-format, is
here.
As the linear image is in a row-order opposite to that expected by the GPU,
using this texture requires enabling FLIPY
bit in the
texture-configuration-parameter#0
.
The same texture, created after flipping the
linear image
vertically, is here.
Using this texture requires keeping the FLIPY
bit off; it is already in the
format expected by the GPU.
(The texture PNG images do not contain the alpha channel).