python - Pandas Dataframes: How to groupby on a groupby? -

March 15, 2011

i'm trying generalize question asked here.

the mlb dataframe looks like

    player             position          salary     year 0   mike witt          pitcher           1400000    1988 1   george hendrick    outfielder        989333     1988 2   chili davis        outfielder        950000     1988 3   brian downing      designated hitter 900000     1988 4   bob boone          catcher           883000     1988 5   bob boone          catcher           883000     1989 6   frank smith        catcher           993000     1988 7   frank smith        pitcher           1300000    1989

note same player may listed multiple times different years. i'm trying find the player maximum total salary each position. output should like:

    position           player            salary      0  pitcher            mike witt         1400000  1  outfielder         george hendrick   989333  2  brian downing      designated hitter 900000  3  catcher            bob boone         1766000

i think need group position, group player, sum each player , find maximum. i'm having trouble doing this.

once positions = mlb.groupby("position") i'm having trouble doing next step. think nested groupby player necessary, don't know how proceed.

this messy gets job done.

df = pd.dataframe({'player':['mike witt','george hendrick','chili davis','brian downing','bob boone','bob boone'],                 'position':['pitcher','outfielder','outfielder','designated hitter','catcher','catcher'],                 'salary':[1400000,989333, 950000,900000,883000,900000],                 'year':[1988,1988,1988,1988,1988,1988]})  gp = df.groupby(['player','position']).sum()['salary'].to_frame().reset_index() gp.sort('salary',ascending=false).drop_duplicates('position')

gp.groupby('position').max()

like @dawg mentioned, treat player has multiple positions different players salaries per position shown here.

            player           position   salary 0        bob boone            catcher  1783000 4        mike witt            pitcher  1400000 3  george hendrick         outfielder   989333 1    brian downing  designated hitter   900000

Search This Blog

Plus Code

python - Pandas Dataframes: How to groupby on a groupby? -

Comments

Post a Comment

Popular posts from this blog

r - Trouble relying on third party package imports in my package -

java - Intellij IDEA shortcut How to add new element (ex. class or package)? -

Payment information shows nothing in one page checkout page magento -