What's the common strategy to optimize c++ arithmetic computation for arrays? -
for example, have 3 float
arrays, a
, b
, c
, , want add a
, b
element-wisely c
. naive way like
for(int = 0; < n; i++){ c[i] = a[i] + b[i]; }
as far know, openmp
can parallelize piece of code. in opencv
code, see flags cv_sse2
, cv_neon
related optimization.
what's common way optimize these kinds of code, if want code highly efficient?
there no common strategy. should sure bottleneck (which might not be, if size n
of arrays small enough).
some compilers able optimize (at least in simple cases) using vector machine instructions. gcc try compile gcc -o3 -mtune=native
(or other -mtune=
... or -mfpu=
... arguments, in particular if cross-compiling) , possibly -ffast-math
you consider openmp, opencl (with gpgpu), openacc, mpi, explicit threading e.g. pthreads or c++11 std::thread-s, etc... (and clever mix of several approaches)
i leave optimization compiler, , consider improving if measure bottleneck. spend months or years (or specialize in whole work life) of developer time improve ....
you use numerical computation library (e.g. lapack, gsl, etc...) or specialized software scilab, octave, r, etc...
Comments
Post a Comment