lundi 19 janvier 2009

Data Structure Alignment

Spéciale pour Geo celle la :p.

Observons la source suivante:

  1. #include <stdio.h>

  2. #include <stdlib.h>

  3.  

  4. int main()

  5. {

  6.  

  7.     typedef struct {

  8.     short foo1; // 2 octets

  9.     int foo2; // 4 octets

  10.     } MaStructure;

  11.  

  12.     printf("VISUAL STUDIO 2005/2008\n");

  13.     printf("INT    SIZE :%i\n",sizeof(int));

  14.     printf("SHORT  SIZE :%i\n",sizeof(short));

  15.     printf("STRUCT SIZE :%i\n",sizeof(MaStructure));

  16.     system("pause");

  17.     return 0;

  18. }



Et voici la sortie obtenue ( Microsoft Visual C++ 2005 77915-009-0000007-41714 )




SOO? WTF? ben après quelques recherches :


Although the compiler (or interpreter) normally allocates individual data items on aligned boundaries, data structures often have members with different alignment requirements. To maintain proper alignment the translator normally inserts additional unnamed data members so that each member is properly aligned. In addition the data structure as a whole may be padded with a final unnamed member. This allows each member of an array of structures to be properly aligned.

Padding is only inserted when a structure member is followed by a member with a larger alignment requirement or at the end of the structure. By changing the ordering of members in a structure, it is possible to change the amount of padding required to maintain alignment. For example, if members are sorted by ascending or descending alignment requirements a minimal amount of padding is required. The minimal amount of padding required is always less than the largest alignment in the structure. Computing the maximum amount of padding required is more complicated, but is always less than the sum of the alignment requirements for all members minus twice the sum of the alignment requirements for the least aligned half of the structure members.

[...]

If the type "short" is stored in two bytes of memory then each member of the data structure depicted above would be 2-byte aligned. Data1 would be at offset 0, Data2 at offset 2 and Data3 at offset 4. The size of this structure would be 6 bytes.

The type of each member of the structure usually has a default alignment, meaning that it will, unless otherwise requested by the programmer, be aligned on a pre-determined boundary. The following typical alignments are valid for compilers from Microsoft, Borland, and GNU when compiling for x86:

* A char (one byte) will be 1-byte aligned.
* A short (two bytes) will be 2-byte aligned.
* An int (four bytes) will be 4-byte aligned.
* A float (four bytes) will be 4-byte aligned.
* A double (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux.

Source: Wikipedia

Le mystère n'est donc plus, mais pourquoi ce padding??

Le compilateur aligne les structures en mémoire par rapport a l'architecture de la machine ( elles seront alignées sur 8 bits pour une architecture 32 bits) car l'adressage des variables se fait sur 32 bits et donc 4 octets. Les structures doivent donc toujours avoir une taille totalle qui soit un multiple de l'architecture (beurk c'est crade comme phrase). Cela est fait par le compilateur par soucis de performance.

On peut modifier la façon dont sera géré le Padding:

#pragma PACK()

Cependant, il est important de rester très prudent avec cette directive, car sa mauvaise utilisation peut entrainer une exception du type "Alignment Fault" et donc générer des cycles processeurs additionnels.

Connaitre la façon dont un compilateur gère le padding peut parfois être pratique, par exemple lors de l'exploitation de vulnérabilités du type débordement de tampons, cela nous permettra d'être moins approximatif.

2 commentaires:

Geo a dit…

Merci, ma poule !

Je te rendrai honneur aux chartreux.

TTC !

_Geo'_

sm0k a dit…

Petit coquin