python - Strange results with groupby, transform, and NaNs -
edit (19-may-2015): verified has been fixed of version 0.16.1 should not problem in date versions.
these ought give same results, right?
df.groupby(level=0).transform('mean') df.groupby(level=0)['x'].transform(np.nanmean) df.groupby(level=0)['x'].transform('mean') first 2 ok, third not work. might bug?
df = pd.dataframe({ 'x':[1,np.nan,3,4] }, index=[1,1,2,2],) df out[686]: x 1 1 1 nan 2 3 2 4 df.groupby(level=0).transform('mean') out[687]: x 1 1.0 1 1.0 2 3.5 2 3.5 df.groupby(level=0)['x'].transform(np.nanmean) out[688]: 1 1.0 1 1.0 2 3.5 2 3.5 name: x, dtype: float64 that's good, not this:
df.groupby(level=0)['x'].transform('mean') --------------------------------------------------------------------------- valueerror traceback (most recent call last) <ipython-input-691-24761ee742fd> in <module>() ----> 1 df.groupby(level=0)['x'].transform('mean') c:\users\eilerj\appdata\local\continuum\anaconda\lib\site-packages\pandas\core\groupby.pyc in transform(self, func, *args, **kwargs) 2411 # if string function 2412 if isinstance(func, compat.string_types): -> 2413 return self._transform_fast(lambda : getattr(self, func)(*args, **kwargs)) 2414 2415 # have cython function c:\users\eilerj\appdata\local\continuum\anaconda\lib\site-packages\pandas\core\groupby.pyc in _transform_fast(self, func) 2457 values = np.repeat(values, com._ensure_platform_int(counts)) 2458 -> 2459 return self._set_result_index_ordered(series(values)) 2460 2461 def filter(self, func, dropna=true, *args, **kwargs): c:\users\eilerj\appdata\local\continuum\anaconda\lib\site-packages\pandas\core\groupby.pyc in _set_result_index_ordered(self, result) 495 result = result.sort_index() 496 --> 497 result.index = self.obj.index 498 return result 499 c:\users\eilerj\appdata\local\continuum\anaconda\lib\site-packages\pandas\core\generic.pyc in __setattr__(self, name, value) 1978 try: 1979 object.__getattribute__(self, name) -> 1980 return object.__setattr__(self, name, value) 1981 except attributeerror: 1982 pass c:\users\eilerj\appdata\local\continuum\anaconda\lib\site-packages\pandas\lib.pyd in pandas.lib.axisproperty.__set__ (pandas\lib.c:38795)() c:\users\eilerj\appdata\local\continuum\anaconda\lib\site-packages\pandas\core\series.pyc in _set_axis(self, axis, labels, fastpath) 266 object.__setattr__(self, '_index', labels) 267 if not fastpath: --> 268 self._data.set_axis(axis, labels) 269 270 def _set_subtyp(self, is_all_dates): c:\users\eilerj\appdata\local\continuum\anaconda\lib\site-packages\pandas\core\internals.pyc in set_axis(self, axis, new_labels) 2209 if new_len != old_len: 2210 raise valueerror('length mismatch: expected axis has %d elements, ' -> 2211 'new values have %d elements' % (old_len, new_len)) 2212 2213 self.axes[axis] = new_labels valueerror: length mismatch: expected axis has 3 elements, new values have 4 elements
i have verified has indeed been fixed in version 0.16.1. see comments above @dsm , @andyhayden.
Comments
Post a Comment