python - Strange results with groupby, transform, and NaNs -


edit (19-may-2015): verified has been fixed of version 0.16.1 should not problem in date versions.

these ought give same results, right?

df.groupby(level=0).transform('mean') df.groupby(level=0)['x'].transform(np.nanmean) df.groupby(level=0)['x'].transform('mean') 

first 2 ok, third not work. might bug?

df = pd.dataframe({ 'x':[1,np.nan,3,4] }, index=[1,1,2,2],)  df out[686]:      x 1   1 1 nan 2   3 2   4  df.groupby(level=0).transform('mean') out[687]:       x 1  1.0 1  1.0 2  3.5 2  3.5  df.groupby(level=0)['x'].transform(np.nanmean) out[688]:  1    1.0 1    1.0 2    3.5 2    3.5 name: x, dtype: float64 

that's good, not this:

df.groupby(level=0)['x'].transform('mean') --------------------------------------------------------------------------- valueerror                                traceback (most recent call last) <ipython-input-691-24761ee742fd> in <module>() ----> 1 df.groupby(level=0)['x'].transform('mean')  c:\users\eilerj\appdata\local\continuum\anaconda\lib\site-packages\pandas\core\groupby.pyc in transform(self, func, *args, **kwargs)    2411         # if string function    2412         if isinstance(func, compat.string_types): -> 2413             return self._transform_fast(lambda : getattr(self, func)(*args, **kwargs))    2414     2415         # have cython function  c:\users\eilerj\appdata\local\continuum\anaconda\lib\site-packages\pandas\core\groupby.pyc in _transform_fast(self, func)    2457         values = np.repeat(values, com._ensure_platform_int(counts))    2458  -> 2459         return self._set_result_index_ordered(series(values))    2460     2461     def filter(self, func, dropna=true, *args, **kwargs):  c:\users\eilerj\appdata\local\continuum\anaconda\lib\site-packages\pandas\core\groupby.pyc in _set_result_index_ordered(self, result)     495             result = result.sort_index()     496  --> 497         result.index = self.obj.index     498         return result     499   c:\users\eilerj\appdata\local\continuum\anaconda\lib\site-packages\pandas\core\generic.pyc in __setattr__(self, name, value)    1978         try:    1979             object.__getattribute__(self, name) -> 1980             return object.__setattr__(self, name, value)    1981         except attributeerror:    1982             pass  c:\users\eilerj\appdata\local\continuum\anaconda\lib\site-packages\pandas\lib.pyd in pandas.lib.axisproperty.__set__ (pandas\lib.c:38795)()  c:\users\eilerj\appdata\local\continuum\anaconda\lib\site-packages\pandas\core\series.pyc in _set_axis(self, axis, labels, fastpath)     266         object.__setattr__(self, '_index', labels)     267         if not fastpath: --> 268             self._data.set_axis(axis, labels)     269      270     def _set_subtyp(self, is_all_dates):  c:\users\eilerj\appdata\local\continuum\anaconda\lib\site-packages\pandas\core\internals.pyc in set_axis(self, axis, new_labels)    2209         if new_len != old_len:    2210             raise valueerror('length mismatch: expected axis has %d elements, ' -> 2211                              'new values have %d elements' % (old_len, new_len))    2212     2213         self.axes[axis] = new_labels  valueerror: length mismatch: expected axis has 3 elements, new values have 4 elements 

i have verified has indeed been fixed in version 0.16.1. see comments above @dsm , @andyhayden.


Comments

Popular posts from this blog

cakephp - simple blog with croogo -

How to group boxplot outliers in gnuplot -

bash - Performing variable substitution in a string -