Pandas中的数据结构

Pandas是Python中用于数据分析和处理的库，提供了两种主要的数据结构，分别是Series和DataFrame。

Series

Series是Pandas中的一种一维数组类型的数据结构，可以存储不同类型的数据（如整数、浮点数、字符串等）。Series的一般形式为：

import pandas as pd

ser = pd.Series(data,index)

其中，data可以是一个列表、元组或字典；index是一个可选参数，用于指定自定义索引。如果不指定索引，则会默认生成从0开始的整数索引。

若data为列表或元组，例如：

ser = pd.Series([1, 2, 3, 4, 5])

则生成的Series对象如下：

0    1
1    2
2    3
3    4
4    5
dtype: int64

其中，0-4是默认生成的整数索引，而1-5是我们传入的数据。dtype表示数据类型，int64表示整数类型。

若data为字典，例如：

ser = pd.Series({'a': 1, 'b': 2, 'c': 3})

则生成的Series对象如下：

a    1
b    2
c    3
dtype: int64

其中，a、b、c是我们传入的索引，1、2、3是对应的值。

DataFrame

DataFrame是Pandas中的一种二维表格数据结构，可以存储不同类型的数据（如整数、浮点数、字符串等），并且可以轻松地进行数据操作和分析。DataFrame的一般形式为：

import pandas as pd

df = pd.DataFrame(data,index,columns)

其中，data可以是二维数组、字典、Series、列表、元组等类型；index是行索引；columns是列索引，它们都是可选的。

若data为字典，则键（key）表示列索引，值（value）表示列的值。例如：

data = {'name': ['Alice', 'Bob', 'Cathy', 'Daniel'],
        'age': [20, 25, 30, 35],
        'gender': ['F', 'M', 'F', 'M']}
df = pd.DataFrame(data)

则生成的DataFrame对象如下：

      name  age gender
0    Alice   20      F
1      Bob   25      M
2    Cathy   30      F
3   Daniel   35      M

其中，行索引默认从0开始生成。

如果想要自定义行或列的索引，可以通过index和columns参数来完成。例如：

data = {'name': ['Alice', 'Bob', 'Cathy', 'Daniel'],
        'age': [20, 25, 30, 35],
        'gender': ['F', 'M', 'F', 'M']}
df = pd.DataFrame(data, index=['one', 'two', 'three', 'four'], columns=['name', 'age', 'gender'])

则生成的DataFrame对象如下：

        name  age gender
one    Alice   20      F
two      Bob   25      M
three  Cathy   30      F
four  Daniel   35      M

其中，行索引为自定义的索引，列索引为自定义的列表。

除此之外，还可以通过其他方式创建DataFrame对象，如读取CSV、Excel、SQL等文件。在使用Pandas进行数据处理和分析时，DataFrame成为了我们主要的数据处理工具之一。

Series

DataFrame

你可能也喜欢

Python 在Pandas DataFrame中改变列名和行索引

对pandas的dataframe绘图并保存的实现方法

Python Pandas数据分析工具用法实例