Improving the execution time of matrix calculations in Python -

April 15, 2010

i work large amount of data , execution time of piece of code very important. results in each iteration interdependent, it's hard make in parallel. awesome if there faster way implement parts of code, like:

finding max element in matrix , indices
changing values in row/column max row/column
removing specific row , column

filling weights matrix pretty fast.

the code following:

it contains list of lists of words word_list, count elements in it. @ beginning each word separate list.
it contains 2 dimensional list (count x count) of float values weights (lower triangular matrix, values i>=j zeros)
in each iteration following:
- it finds 2 words similar value (the max element in matrix , indices)
- it merges row , column, saving larger value 2 in each cell
- it merges corresponding word lists in word_list. saves both lists in 1 smaller index (max_j) , removes 1 larger index (max_i).
it stops if largest value less given threshold

i might think of different algorithm task, have no ideas , great if there @ least small performance improvement.

i tried using numpy performed worse.

weights = fill_matrix(count, n, word_list) while 1:     # find max element in matrix , indices      max_element = 0     in range(count):         max_e = max(weights[i])         if max_e > max_element:             max_element = max_e             max_i =             max_j = weights[i].index(max_e)      if max_element < threshold:         break      # reset value of max element     weights[max_i][max_j] = 0      # here important max_j less max (since it's lower triangular matrix)     j in range(count):         weights[max_j][j] = max(weights[max_i][j], weights[max_j][j])      in range(count):         weights[i][max_j] = max(weights[i][max_j], weights[i][max_i])      # compare symmetrical elements, set ones above 0     in range(count):         j in range(count):             if <= j:                 if weights[i][j] > weights[j][i]:                     weights[j][i] = weights[i][j]                 weights[i][j] = 0      # remove max_i-th column     in range(len(weights)):         weights[i].pop(max_i)      # remove max_j-th row     weights.pop(max_i)      new_list = word_list[max_j]     new_list += word_list[max_i]     word_list[max_j] = new_list      # remove element merged cluster     word_list.pop(max_i)     count -= 1

it depends on how work want put if you're concerned speed should cython. quick start tutorial gives few examples ranging 35% speedup amazing 150x speedup (with added effort on part).

Search This Blog

Plus Code

Improving the execution time of matrix calculations in Python -

Comments

Post a Comment

Popular posts from this blog

How to group boxplot outliers in gnuplot -

cakephp - simple blog with croogo -

bash - Performing variable substitution in a string -