如何在Python中把分类的字符串数据转换成数字

在Python中，我们可以用多种方法把分类的字符串数据转换成数字。以下是一些常见的方法：

1. LabelEncoder

使用scikit-learn库的LabelEncoder类，可以将字符串属性转换为整数：

from sklearn.preprocessing import LabelEncoder

labelencoder = LabelEncoder()

# 创建一个字符串列表
string_list = ['cat', 'dog', 'dog', 'cat', 'bird', 'fish']

# 将字符串列表转换为整数列表
integer_list = labelencoder.fit_transform(string_list)

print(integer_list)  # 输出 [0 1 1 0 2 3]

在上面的代码中，我们首先从sklearn.preprocessing模块导入LabelEncoder。然后我们创建一个字符串列表，在这个例子中包含了6个不同的字符串。接下来我们使用LabelEncoder的fit_transform()方法将字符串转换成整数。最后输出整数列表。

2. pandas的Factorize

使用pandas的factorize()方法，可以将每个不同的字符串映射到一个不同的整数，然后将这些整数作为新的列添加到原始数据中：

import pandas as pd

# 创建一个字典，包含不同的字符串
data = {'animal': ['cat', 'dog', 'dog', 'cat', 'bird', 'fish']}

# 创建一个数据帧
df = pd.DataFrame(data)

# 使用factorize()方法将字符串转换为整数
df['animal_cat'] = pd.factorize(df['animal'])[0]

print(df)

在上面的代码中，我们先创建一个字典，包含一个字符串列表。然后我们使用pandas的DataFrame()方法创建一个数据帧。使用pandas的factorize()方法将字符串转换为整数，并将整数添加到原始数据帧的一个新列中。最后，我们输出数据帧，其中包含了原始字符串和转换后的整数数据。

3. 自定义映射字典

如果我们需要对字符串和数字之间的映射实现更多的控制，我们可以使用Python的字典来自定义映射。例如：

string_list = ['cat', 'dog', 'dog', 'cat', 'bird', 'fish']

# 创建一个自定义映射字典
custom_mapping = {'cat': 0, 'dog': 1, 'bird': 2, 'fish': 3}

# 将字符串列表转换为整数列表
integer_list = [custom_mapping[word] for word in string_list]

print(integer_list)  # 输出 [0, 1, 1, 0, 2, 3]

在上面的代码中，我们首先创建一个字符串列表。然后我们创建一个自定义映射字典，将每个字符串映射到一个整数。最后，我们使用列表推导式将字符串列表转换为整数列表，其中我们使用了自定义映射字典。

以上是三种常见的方法将分类的字符串数据转换为数字。根据具体的应用场景和数据特点，选择适合自己的方法进行转换即可。

1. LabelEncoder

2. pandas的Factorize

3. 自定义映射字典

你可能也喜欢

如何用cuDF加快Pandas的速度

在Python中把 CSV 文件读成一个列表

pandas 对每一列数据进行标准化的方法