python - When using dataframes as params for .fillna(), is identical shape required? -


according docs, can use dataframe value parameter .fillna()

http://pandas.pydata.org/pandas-docs/dev/generated/pandas.dataframe.fillna.html

but dataframe need identical shape? if so, why first example give me desired output?

using df:

mukey   hzdept_r    hzdepb_r    sandtotal_r silttotal_r 425897      0         61         425897      61        152          5.3         44.7 425911      0         30           30.1        54.9 425911      30        74           17.7        49.8 425911      74        84         

i can run:

df = pd.read_clipboard() df1 = df.set_index('mukey') df1.fillna(df.groupby('mukey').mean(),inplace=true) 

and df1 results in desired df:

        hzdept_r  hzdepb_r  sandtotal_r  silttotal_r mukey                                                425897         0        61          5.3        44.70 425897        61       152          5.3        44.70 425911         0        30         30.1        54.90 425911        30        74         17.7        49.80 425911        74        84         23.9        52.35 

however, when try run same code on larger df (https://www.dropbox.com/s/a6j1dskdq2f76kb/www004.csv?dl=0), breaks invalidindexerror.

df = pd.read_csv('www004.csv') df1 = df.set_index('mukey') df1.fillna(df.groupby('mukey').mean(),inplace=true) 

error:

invalidindexerror                         traceback (most recent call last) <ipython-input-126-a1038ea351c9> in <module>() ----> 1 df1.fillna(df.groupby('mukey').mean(),inplace=true)  /users/liamfoley/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in fillna(self, value, method, axis, inplace, limit, downcast)    2410                                              downcast=downcast)    2411             elif isinstance(value, dataframe) , self.ndim == 2: -> 2412                 new_data = self.where(self.notnull(), value)    2413             else:    2414                 raise valueerror("invalid fill value %s" % type(value))  /users/liamfoley/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in where(self, cond, other, inplace, axis, level, try_cast, raise_on_error)    3306                         not all([other._get_axis(i).equals(ax)    3307                                  i, ax in enumerate(self.axes)])): -> 3308                     raise invalidindexerror    3309     3310             # slice me out of other  invalidindexerror:  

i can around creating means_df has identical shape.

import pandas pd df = pd.read_csv('www004.csv').set_index('mukey') means = df.groupby(level=0).mean() means_df = pd.merge(pd.dataframe(df.index),means,                     left_on='mukey',right_index=true,how='left').set_index('mukey') df1 = df.fillna(means_df) 

that gives me desired result:

 df.ix[426184]         hzdept_r  hzdepb_r  sandtotal_r  silttotal_r  claytotal_r  om_r mukey                                                                   426184         0        18         30.1         54.9           15   3.5 426184        18        46         58.2         17.8           24   nan 426184        46       152          nan          nan            5   nan  df1.ix[426184]         hzdept_r  hzdepb_r  sandtotal_r  silttotal_r  claytotal_r  om_r mukey                                                                   426184         0        18        30.10        54.90           15   3.5 426184        18        46        58.20        17.80           24   3.5 426184        46       152        44.15        36.35            5   3.5 

related: pandas fill missing values in dataframe dataframe

fill in missing row values in pandas dataframe

in pandas, how can patch dataframe missing values values dataframe given similar index?

a workaround use transform (rather aggregating) groupby method:

df1.fillna(df1.groupby(level=0).transform("mean")) 

it's unclear me whether bug in pandas, recommend posting issue on github (it may nice feature)!


Comments

Popular posts from this blog

cakephp - simple blog with croogo -

How to group boxplot outliers in gnuplot -

bash - Performing variable substitution in a string -