csv - Using Unstack in Python -


i trying unstack column in python isn't quite doing expecting. table (called df) looks similar this:

station_id   year     day1   day2   210018       1916      4        7               1917      3        9   256700       1916     nan       8               1917      6        9 

i want unstack year days year per stationn in 1 row. 2 days 1916 start first followed 2 days 1917 station 210018 , 256700.

an example this:

station_id            1916       1917 210018                4   7      3  9  256700                nan  8     6   9    

i trying use code:

df2=df.unstack(level='year') df2.columns=df2.columns.swaplevel(0,1) df2=df2.sort(axis=1) 

i error says attributeerror: 'series' object has no attribute 'columns'.

any appreciated.

you need make year index before call unstack:

try:     # python2     cstringio import stringio  except importerror:     # python3     io import stringio  import pandas pd   text = '''\ station_id   year     day1   day2   210018       1916      4        7  210018       1917      3        9   256700       1916     nan       8  256700       1917      6        9'''  df = pd.read_table(stringio(text), sep='\s+') df = df.set_index(['station_id', 'year']) df2 = df.unstack(level='year') df2.columns = df2.columns.swaplevel(0,1) df2 = df2.sort(axis=1) print(df2) 

yields

year       1916      1917                 day1 day2 day1 day2 station_id                     210018        4    7    3    9 256700      nan    8    6    9 

whereas, if year column, , not index, then

df = pd.read_table(stringio(text), sep='\s+') df = df.set_index(['station_id'])    df2 = df.unstack(level='year') df2.columns = df2.columns.swaplevel(0,1) df2 = df2.sort(axis=1) 

leads attributeerror: 'series' object has no attribute 'columns'.


the level='year' ignored in df.unstack(level='year') when df not have index level named year (or even, say, blah):

in [102]: df out[102]:              year  day1  day2 station_id                   210018      1916     4     7 210018      1917     3     9 256700      1916   nan     8 256700      1917     6     9  in [103]: df.unstack(level='blah') out[103]:        station_id year  210018        1916       210018        1917       256700        1916       256700        1917 day1  210018           4       210018           3       256700         nan       256700           6 day2  210018           7       210018           9       256700           8       256700           9 dtype: float64 

this source of surprising error.


Comments

Popular posts from this blog

cakephp - simple blog with croogo -

How to group boxplot outliers in gnuplot -

bash - Performing variable substitution in a string -