What's the common strategy to optimize c++ arithmetic computation for arrays? -


for example, have 3 float arrays, a, b , c, , want add a , b element-wisely c. naive way like

for(int = 0; < n; i++){     c[i] = a[i] + b[i]; } 

as far know, openmp can parallelize piece of code. in opencv code, see flags cv_sse2 , cv_neon related optimization.

what's common way optimize these kinds of code, if want code highly efficient?

there no common strategy. should sure bottleneck (which might not be, if size n of arrays small enough).

some compilers able optimize (at least in simple cases) using vector machine instructions. gcc try compile gcc -o3 -mtune=native (or other -mtune=... or -mfpu=... arguments, in particular if cross-compiling) , possibly -ffast-math

you consider openmp, opencl (with gpgpu), openacc, mpi, explicit threading e.g. pthreads or c++11 std::thread-s, etc... (and clever mix of several approaches)

i leave optimization compiler, , consider improving if measure bottleneck. spend months or years (or specialize in whole work life) of developer time improve ....

you use numerical computation library (e.g. lapack, gsl, etc...) or specialized software scilab, octave, r, etc...

read http://floating-point-gui.de/


Comments

Popular posts from this blog

tcpdump - How to check if server received packet (acknowledged) -