parallel processing - With OpenMP parallelized nested loops run slow -


i've got part of fortran program consisting of nested loops want parallelize openmp.

integer :: nstates , n, i, dima, dimb, dimc, a_row, b_row, b_col, c_row, row, col double complex, dimension(4,4):: mat double complex, dimension(:), allocatable :: vecin,vecout   nstates = 2 n = 24  allocate(vecin(nstates**n), vecout(nstates**n)) vecin = ...some data vecout = 0  mat = reshape([...some data...],[4,4])  dimb=nstates**2  !$omp parallel private(dima,dimc,row,col,a_row,b_row,c_row,b_col)  i=1,n-1     dima=nstates**(i-1)     dimc=nstates**(n-i-1)      a_row = 1, dima         b_row = 1,dimb             c_row = 1,dimc                 row = ((a_row-1)*dimb + b_row - 1)*dimc + c_row                 b_col = 1,dimb                     col = ((a_row-1)*dimb + b_col - 1)*dimc + c_row                     !$omp atomic                     vecout(row) = vecout(row) + vecin(col)*mat(b_row,b_col)                 end             end         end     end end !$omp end parallel  

the program runs , result correct, it's incredible slow. slower without openmp. don't know openmp. have done wrong use of private or omp atomic? grateful every advice how improve performance of code.

if arrays large , stack overflows automatic reduction, can implement reduction allocatable temporary arrays.

as francois jacq pointed out, have race condition caused dima , dimb should private.

double complex, dimension(:), allocatable :: tmp  !$omp parallel private(dima,dimb,row,col,a_row,b_row,c_row,b_col,tmp)  allocate(tmp(size(vecout))) tmp = 0  !$omp do i=1,n-1     dima=nstates**(i-1)     dimc=nstates**(n-i-1)      a_row = 1, dima         b_row = 1,dimb             c_row = 1,dimc                 row = ((a_row-1)*dimb + b_row - 1)*dimc + c_row                 b_col = 1,dimb                     col = ((a_row-1)*dimb + b_col - 1)*dimc + c_row                     tmp(row) = tmp(row) + vecin(col)*mat(b_row,b_col)                 end             end         end     end end !$omp end  !$omp critical vecout = vecout + tmp !$omp end critical !$omp end parallel 

Comments

Popular posts from this blog

javascript - AngularJS custom datepicker directive -

javascript - jQuery date picker - Disable dates after the selection from the first date picker -