memory management - emulating a variable size struct in C; allignment, performance issues -


it possible put arrays custom length anywhere in struct in c, in case additional malloc calls required. compilers allow having vlas anywhere in struct, not standard compliant. decided emulate vlas within struct standard c.

i in situation have maximum performance. code in c automatically generated, readability or style not important in case.

there structs many custom size array members in between static size members. below simple form of such structs.

struct old_a {     int n_refs;     void **refs;     int count; };  struct old_a *old_a_new(int n_refs, int count) {     struct old_a *p_a = malloc(sizeof(struct old_a));     p_a->n_refs = n_refs;     p_a->refs = malloc(n_refs * sizeof(void *));     p_a->count = count;     return p_a; }  #define old_a_delete(p_a) {\     free(p_a->refs);\     free(p_a);\ } while (0) 

the additional malloc call refs can avoided follows.

#define a_get_n_refs(p_a) *(int *)p_a #define a_set_n_refs(p_a, rval) *(int *)p_a = rval #define a_get_count(p_a) *(int *)((char *)p_a + sizeof(int) + a_get_n_refs(p_a) * sizeof(void *)) #define a_set_count(p_a, rval) *(int *)((char *)p_a + sizeof(int) + a_get_n_refs(p_a) * sizeof(void *)) = rval #define a_get_refs(p_a, i) *(void **)((char *)p_a + sizeof(int) + * sizeof(void *)) #define a_set_refs(p_a, i, rval) *(void **)((char *)p_a + sizeof(int) + * sizeof(void *)) = rval  static void *a_new(int n_refs, int count) {     void *p_a = malloc(sizeof(int) + n_refs * sizeof(void *) + sizeof(int));     a_set_n_refs(p_a, n_refs);     a_set_count(p_a, count);     return p_a; }  #define a_delete(p_a) {\     free(p_a);\ } while (0) 

the emulated version seems run 12~14% faster in machine 1 pointer array. assume due halved number of calls malloc , free, , reduced number of dereferencing. test code below.

int main(int argc, char **argv) {     const int n_as = atoi(argv[1]) * 10000;     const int n_refs = n_as;     const int count = 1;     unsigned int old_sum = 0;     unsigned int sum = 0;     clock_t timer;      timer = clock();     struct old_a **old_as = malloc(n_as * sizeof(struct old_a));     (int = 0; < n_as; ++i) {         old_as[i] = old_a_new(n_refs, count);         (int j = 0; j < n_refs; ++j) {             old_as[i]->refs[j] = (void *)j;             old_sum += (int)old_as[i]->refs[j];         }         old_sum += old_as[i]->n_refs + old_as[i]->count;         old_a_delete(old_as[i]);     }     free(old_as);     timer = clock() - timer;     printf("old_sum = %u; elapsed time = %.3f\n", old_sum, (double)timer / clocks_per_sec);      timer = clock();     void **as = malloc(n_as * sizeof(void *));     (int = 0; < n_as; ++i) {         as[i] = a_new(n_refs, count);         (int j = 0; j < n_refs; ++j) {             a_set_refs(as[i], j, (void *)j);             sum += (int)a_get_refs(as[i], j);         }         sum += a_get_n_refs(as[i]) + a_get_count(as[i]);         a_delete(as[i]);     }     free(as);     timer = clock() - timer;     printf("sum = %u; elapsed time = %.2f\n", sum, (double)timer / clocks_per_sec);     return 0; } 

compiled gcc test.c -otest -std=c99:

>test 4 old_sum = 3293684800; elapsed time = 7.04 sum = 3293684800; elapsed time = 6.07  >test 5 old_sum = 885958608; elapsed time = 10.74 sum = 885958608; elapsed time = 9.44 

please let me know if code has undefined behaviors, implementation defined behaviors et cetera. meant 100% portable machines sane (standard compliant) c compiler.

i aware of memory alignment issues. member of these emulated structs int, double, , void *, think there not alignment problems, not sure. although emulated struct appreared run faster in machine (windows 7 64bit, mingw/gcc), not know how run other hardware or compilers. other checking standard guarenteed behavior, need hardware knowledge; 1 more machine friendly code (preferably in general)?

unless sizable proportion of work of program going allocating , freeing these data structures, difference observed in allocation / deallocation speed unlikely make significant difference in program's overall execution time.

furthermore, aware 2 approaches not equivalent. latter not produce representation of struct old_a, other code uses data structure produced must use provided access macros (or equivalent) so.

moreover, roll-your-own-struct approach has potential alignment issues. depending on implementation-dependent sizes , alignment requirements various types, may cause members of pointer array inside pseudo-struct misaligned. if does, either speed penalty or possibly program crash result.

more generally, there few safe assumptions sizes of type representations. unsafe assume size of int same size of void *, or either 1 same size double.


Comments

Popular posts from this blog

tcpdump - How to check if server received packet (acknowledged) -