parallel processing - With OpenMP parallelized nested loops run slow -
i've got part of fortran program consisting of nested loops want parallelize openmp.
integer :: nstates , n, i, dima, dimb, dimc, a_row, b_row, b_col, c_row, row, col double complex, dimension(4,4):: mat double complex, dimension(:), allocatable :: vecin,vecout nstates = 2 n = 24 allocate(vecin(nstates**n), vecout(nstates**n)) vecin = ...some data vecout = 0 mat = reshape([...some data...],[4,4]) dimb=nstates**2 !$omp parallel private(dima,dimc,row,col,a_row,b_row,c_row,b_col) i=1,n-1 dima=nstates**(i-1) dimc=nstates**(n-i-1) a_row = 1, dima b_row = 1,dimb c_row = 1,dimc row = ((a_row-1)*dimb + b_row - 1)*dimc + c_row b_col = 1,dimb col = ((a_row-1)*dimb + b_col - 1)*dimc + c_row !$omp atomic vecout(row) = vecout(row) + vecin(col)*mat(b_row,b_col) end end end end end !$omp end parallel
the program runs , result correct, it's incredible slow. slower without openmp. don't know openmp. have done wrong use of private or omp atomic? grateful every advice how improve performance of code.
if arrays large , stack overflows automatic reduction, can implement reduction allocatable temporary arrays.
as francois jacq pointed out, have race condition caused dima
, dimb
should private.
double complex, dimension(:), allocatable :: tmp !$omp parallel private(dima,dimb,row,col,a_row,b_row,c_row,b_col,tmp) allocate(tmp(size(vecout))) tmp = 0 !$omp do i=1,n-1 dima=nstates**(i-1) dimc=nstates**(n-i-1) a_row = 1, dima b_row = 1,dimb c_row = 1,dimc row = ((a_row-1)*dimb + b_row - 1)*dimc + c_row b_col = 1,dimb col = ((a_row-1)*dimb + b_col - 1)*dimc + c_row tmp(row) = tmp(row) + vecin(col)*mat(b_row,b_col) end end end end end !$omp end !$omp critical vecout = vecout + tmp !$omp end critical !$omp end parallel
Comments
Post a Comment