python - Multiple Sort Lists of Strings -


i have list of person's "interests" looks like:

[u'technology , computing', u'software', u'shareware , freeware'] [u'art , entertainment', u'shows , events', u'festival'] [u'art , entertainment', u'shows , events', u'circus'] [u'technology , computing', u'computer certification'] [u'news'] [u'religion , spirituality', u'islam'] 

these taxonomies output nlp api, , trying run further analyses draw higher level conclusions sorts of things interested in, based on things how item[0]=='art , entertainment' appears, , if so, specific types of arts , entertainment interested in (e.g. if item[0]=='art , entertainment': return item[:-1]

anyway, use guidance on approach. first thought calculate max(len()) of item in list (in case 5), ,

for item in list:      append((max(len()) - len(item))*'null,') 

in order make sure have same number of "columns", convert named tuple , multi-sort on that. seems annoying process. there simpler (yet readable) way handle this?

i've thought using nltk or seems big pain setup, if make analysis easier once did.

you can use itertools.izip_longet zip lists have list contain columns of main list missing elements have been replaced none:

>>> itertools import izip_longest >>> a=[[u'technology , computing', u'software', u'shareware , freeware'], ... [u'art , entertainment', u'shows , events', u'festival'], ... [u'art , entertainment', u'shows , events', u'circus'], ... [u'technology , computing', u'computer certification'], ... [u'news'], ... [u'religion , spirituality', u'islam']]  >>> list(izip_longest(*a)) [(u'technology , computing', u'art , entertainment', u'art , entertainment', u'technology , computing', u'news', u'religion , spirituality'), (u'software', u'shows , events', u'shows , events', u'computer certification', none, u'islam'), (u'shareware , freeware', u'festival', u'circus', none, none, none)] 

then can operation on columns want!

but if want add none incomplete list can use itertools.repeat :

>>> max_len=max(map(len,a)) >>> itertools import repeat >>> [i+list(repeat(none,max_len-len(i))) in a] [[u'technology , computing', u'software', u'shareware , freeware'], [u'art , entertainment', u'shows , events', u'festival'], [u'art , entertainment', u'shows , events', u'circus'], [u'technology , computing', u'computer certification', none], [u'news', none, none], [u'religion , spirituality', u'islam', none]] 

Comments

Popular posts from this blog

cakephp - simple blog with croogo -

How to group boxplot outliers in gnuplot -

bash - Performing variable substitution in a string -