python - Pandas Dataframes: How to groupby on a groupby? -
i'm trying generalize question asked here.
the mlb
dataframe looks like
player position salary year 0 mike witt pitcher 1400000 1988 1 george hendrick outfielder 989333 1988 2 chili davis outfielder 950000 1988 3 brian downing designated hitter 900000 1988 4 bob boone catcher 883000 1988 5 bob boone catcher 883000 1989 6 frank smith catcher 993000 1988 7 frank smith pitcher 1300000 1989
note same player may listed multiple times different years. i'm trying find the player maximum total salary each position. output should like:
position player salary 0 pitcher mike witt 1400000 1 outfielder george hendrick 989333 2 brian downing designated hitter 900000 3 catcher bob boone 1766000
i think need group position, group player, sum each player , find maximum. i'm having trouble doing this.
once positions = mlb.groupby("position")
i'm having trouble doing next step. think nested groupby player necessary, don't know how proceed.
this messy gets job done.
df = pd.dataframe({'player':['mike witt','george hendrick','chili davis','brian downing','bob boone','bob boone'], 'position':['pitcher','outfielder','outfielder','designated hitter','catcher','catcher'], 'salary':[1400000,989333, 950000,900000,883000,900000], 'year':[1988,1988,1988,1988,1988,1988]}) gp = df.groupby(['player','position']).sum()['salary'].to_frame().reset_index() gp.sort('salary',ascending=false).drop_duplicates('position')
or
gp.groupby('position').max()
like @dawg mentioned, treat player has multiple positions different players salaries per position shown here.
player position salary 0 bob boone catcher 1783000 4 mike witt pitcher 1400000 3 george hendrick outfielder 989333 1 brian downing designated hitter 900000
Comments
Post a Comment