c++ - How to conditionally set compiler optimization for template headers -


i found a question interesting, , went on attempt answer it. author wants compile -one- source file (which relies on template libraries) avx optimizations, , rest of project without those.

so, see happen, i've created test project this:

main.cpp

#include <iostream> #include <string> #include "fn_normal.h" #include "fn_avx.h"  int main(int argc, char* argv[]) {        int number = 10; // come input, let's keep simple     int result;      if (std::string(argv[argc - 1]) == "--noavx")         result = fnnormal(number);     else     {         std::cout << "avx selected\n";         result = fnavx(number);     }      std::cout << "double of " << number << " " << result << std::endl;      return 0; } 

files fn_normal.h , fn_avx.h contains declarations functions fnnormal() , fnavx() respectively, defined follows:

fn_normal.cpp

#include "fn_normal.h" #include "double.h"  int fnnormal(int num) {     return rtdouble(num); } 

fn_avx.cpp

#include "fn_avx.h" #include "double.h"  int fnavx(int num) {     return rtdouble(num); } 

and here's template function definition:

double.h

template<typename t> int rtdouble(t number) {     // side effect: generates avx instructions     const int n = 1000;     float a[n], b[n];     (int n = 0; n < n; ++n)     {         a[n] = b[n] * b[n] * b[n];     }         return number * 2; } 


ultimately, set enhanced instruction set avx file fn_avx.cpp under "properties-> c/c++ -> code generation", leaving not set other sources, should default sse2.

i thought doing so, compiler instantiate template once each source includes (and avoid violating "the one-definition rule" mangling template function name or other way), , calling program --noavx parameter make run fine in cpus without avx support.
resulting program actualy have 1 machine-code version of function, avx instructions, , fail on older cpus.

disabling other optimizations doesn't solve issue. tried no enhanced instructions - /arch:ia32 instead of not set well.

as i'm beginning understand templates , such, point me the exact details behavior , achieve goal?

my compiler msvc 2013.


additional info: .obj files both fn_normal.cpp , fn_avx.cpp same size in bytes. i've looked generated assembly listings , same, important difference avx-enabled source replaces default sse's movss/mulss vmovss , vmulss, respectively. stepping throught code in visual studio's disassembly view (ctrl+alt+d), confirms fnnormal() indeed makes use of avx specialized instructions.

the compiler generate 2 objects (fn_avx.obj , fn_normal.obj), compiled different instruction sets. said, outputting disassembly both verifies being done correctly:

objdump -d fn_normal.obj:

... movss  -0x1f5c(%ebp,%eax,4),%xmm0 mulss  -0x1f5c(%ebp,%ecx,4),%xmm0 mov    -0x1f68(%ebp),%edx mulss  -0x1f5c(%ebp,%edx,4),%xmm0 mov    -0x1f68(%ebp),%eax movss  %xmm0,-0xfb4(%ebp,%eax,4) ... 

objdump -d fn_avx.obj:

... vmovss -0x1f5c(%ebp,%eax,4),%xmm0 vmulss -0x1f5c(%ebp,%ecx,4),%xmm0,%xmm0 mov    -0x1f68(%ebp),%edx vmulss -0x1f5c(%ebp,%edx,4),%xmm0,%xmm0 mov    -0x1f68(%ebp),%eax vmovss %xmm0,-0xfb4(%ebp,%eax,4) ... 

the strikingly similar, because default msvc 2013 assume sse2 availability. if change instruction set ia32, you'll non-vector instructions. so, not issue compiler/compilation unit.

the issue here, rtdouble defined in header file non-specialized template (perfectly legal). compiler assumes definition across multiple translation units same, but, compiling different options, assumption being violated. it's no different introducing divergence preprocessor:

double.h:

template<typename t> int rtdouble(t number) { #ifdef super_bad // side effect: generates avx instructions const int n = 1000; float a[n], b[n]; (int n = 0; n < n; ++n) {     a[n] = b[n] * b[n] * b[n]; } return number * 2; #else return 0; #endif } 

fn_avx.cpp:

#include "fn_avx.h" #define super_bad #include "double.h"  int fnavx(int num) {     return rtdouble(num); } 

the fnnormal return 0 (and can verify the disassembly of new fn_normal.obj). linker happily chooses one, , not warn either situation. question comes down to: should it? extremely helpful in situations this. however, slow down linking, need comparison of of functions exist in multiple compilation units (eg. inline functions well).

when have faced similar issue in code, choose different function naming scheme optimized version vs. non-optimized version. using template parameter distinguish them work (as suggested in @celtschk's answer).


Comments

Popular posts from this blog

cakephp - simple blog with croogo -

How to group boxplot outliers in gnuplot -

bash - Performing variable substitution in a string -