Get rid of your sneaky structs

Posted: March 26, 2011 in Optimization Pills
Tags: ,

In this pill, I’m going to deal with a little detail of C (and C++) structs. This becomes memory-evident if you are handling a lot amount of data. If you’re ill of bad struct alignment then sugar this pill and you’ll be on the safe side!

Let’s start.

[from Wikipedia] A struct is a “structured” record type that aggregates a fixed set of labelled objects, possibly of different types, into a single object. A struct declaration consists of a list of fields, each of which can have any type. The total storage required for a struct object is the sum of the storage requirements of all the fields, plus any internal padding.

What is this padding?

Consider what happens when small data members are interspersed with larger members, like that:

struct BadStruct {
   U32   mU1; // 32 bits
   F32   mF2; // 32 bits
   U8    mB3; // 8 bits
   I32   mI4; // 32 bits
   bool  mB5; // 8 bits
   char* mP6; // 32 bits

You might imagine that the compiler simply packs the data members as tightly as it can. However, this is not usually the case. Instead, the compiler will typically leave “holes” in the layout (some compilers can be requested not to leave these holes by using a preprocessor directive like #pragma pack, or via command-line options).

Why does the compiler leave “holes” ?

The reason lies in the fact that every data type has a natural alignment which must be respected in order to permit the CPU to read and write memory effectively. The alignment of a data object refers to whether its address memory is a multiple of its size (which is generally a power of two):

  • An object with one-byte alignment resides at any memory address;
  • An object with two-byte alignment resides only at even addresses;
  • An object with four-byte alignment risides only at addresses that are multiple of four;
  • A 16-byte aligned object resides only at addresses that are multiple of 16.

Alignment is important because many modern processors can actually only read and write properly aligned block of data. For example, suppose a 32-bit architecture, if a program requests that a 32-bit (four-byte) integer be read from the address 0x6A341178, the memory controller will load the data happily because the address is four-byte aligned (its least significant nibble is 0x8). However, if a request is made to load the same data from 0x6A341177, the memory controller has now to read two four-byte blocks: the one at 0x6A341174 and the one at 0x6A341178. Furthermore, it has to mask and shift the two parts of the 32-bit integer and logically OR them together into the destination register on the CPU:

But with some microprocessors you can be unlucky: if you request a read or write of unaligned data, you might just get garbage (the Playstation 2 is a notable example of this kind of intolerance for unaligned data).

A good rule is that a data type should be aligned to a boundary equal to the width of the data type in bytes (for example, 32-bit values generally have a four-byte alignment requirement, 16-bit values should be two-byte aligned and 8-bit values can be stored at any addresses (one-byte aligned).

Let’s turn to our BadStruct: what’s a better alignment? For example, this:

struct BetterStruct {
   U32   mU1; // 32 bits
   F32   mF2; // 32 bits
   I32   mI4; // 32 bits
   char* mP6; // 32 bits
   U8    mB3; // 8 bits
   bool  mB5; // 8 bits

In this case, the last two bytes are in the same block, but the size of the structure as a whole is 20 bytes, not 18 as we might expect, because it has been padded by two bytes at the end. This padding is added by the compiler to ensure proper alignment of the structure in an array context. That is, if an array of these structs is defined and the first element of the array is aligned, then the padding at the end guarantees the all subsequent elements will also aligned properly.

A good thing is adding this pad manually to make the wasted space visible and explicit, like this:

struct BetterStructWithExplicitPadding {
   U32   mU1; // 32 bits
   F32   mF2; // 32 bits
   I32   mI4; // 32 bits
   char* mP6; // 32 bits
   U8    mB3; // 8 bits
   bool  mB5; // 8 bits
   U8    _pad[2]; // explicit padding

To sum up, we can have (at least) three possibilities:

  • the compiler leaves “holes” to guarantee a correct alignment, saving time (and cache) but wasting memory;
  • the compiler does not leave “holes” (data remain unaligned), saving memory. Two sub-possibilities:
    • the memory controller reads properly the data you need but, in case, it loads more blocks, performing bitwise operations, and wasting clock cycles and cache space;
    • the memory controller does not read happily the data you want, returning garbage.

The moral of the story? Keep your data as aligned as you can, adding explicit padding if needed. If you do this, you’ll (hopefully) avoid holes in your layout, saving time, space and program consistency.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s