如何在Pandas中对一个多索引进行分组

在 Pandas 中，我们可以使用 groupby() 方法对数据进行分组。而对于多索引的数据，我们可以使用 tuple 进行分组。下面就是对一个多索引进行分组的详细攻略。

导入 Pandas 模块，并创建一个多索引 DataFrame，如下所示：

import pandas as pd

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second'))

df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7, 8], 'B': [10, 20, 30, 40, 50, 60, 70, 80]}, index=index)

print(df)

输出结果为：

             A   B
first second      
bar   one    1  10
      two    2  20
baz   one    3  30
      two    4  40
foo   one    5  50
      two    6  60
qux   one    7  70
      two    8  80

使用 groupby() 方法，对多索引进行分组。下面是按照 ‘first’ 进行分组的代码：

grouped = df.groupby('first')

print(grouped.sum())

输出结果为：

       A    B
first        
bar    3   30
baz    7   70
foo   11  110
qux   15  150

我们可以看到，按照 ‘first’ 进行分组后，数据被分成了四组，每组的数据分别为 [‘bar’， ‘baz’， ‘foo’， ‘qux’]。然后，对每组数据的列进行了求和操作。

接下来是按照两个索引进行分组的代码：

grouped = df.groupby(['first', 'second'])

print(grouped.sum())

输出结果为：

              A   B
first second       
bar   one     1  10
      two     2  20
baz   one     3  30
      two     4  40
foo   one     5  50
      two     6  60
qux   one     7  70
      two     8  80

我们可以看到，按照两个索引进行分组后，数据没有进行任何操作，而是按照原来的方式输出了整个 DataFrame。

最后是按照第二个索引进行分组，并进行统计操作的代码：

grouped = df.groupby(level='second').sum()

print(grouped)

输出结果为：

        A    B
second       
one    16  160
two    18  180

我们可以看到，按照第二个索引进行分组后，数据被分成了两组，分别为 [‘one’， ‘two’]。然后，对每组数据的列进行了求和操作。这里需要注意的是，由于我们只对第二个索引进行了分组，因此在结果数据中只保留了第二个索引的值。

至此，就完成了对一个多索引进行分组的详细攻略。希望对大家有所帮助。

你可能也喜欢

计算Pandas数据框架中的NaN或缺失值

将压缩文件作为pandas DataFrame来读取

在Pandas中根据多列的值分割数据框架