通过5个例子让你学会Pandas中的字符串过滤

Post published:2023年5月14日
Post category:Python

以下是“通过5个例子让你学会Pandas中的字符串过滤”的完整攻略：

目录

导入库和数据
用str.contains()过滤字符串
用str.startswith()和str.endswith()过滤字符串
用.str.extract()提取字符串
使用正则表达式过滤字符串
用.str.split()分割字符串

1. 导入库和数据

在使用Pandas进行字符串过滤之前，首先要导入Pandas库并读取数据。以下是导入Pandas库和读取数据的代码：

import pandas as pd

data = pd.read_csv('data.csv')

在实际的代码中，需要将'data.csv'修改为导入的数据文件名。

2. 用str.contains()过滤字符串

如果你想要找出数据中包含某个特定字符串的行，可以使用str.contains()函数。以下是使用str.contains()函数查找贷款类型为“个人贷款”的行：

personal_loans = data[data['loan_type'].str.contains('个人贷款')]

其中，data['loan_type']表示选择data数据中的loan_type列，str.contains('个人贷款')表示筛选包含“个人贷款”字符串的行。最终结果保存在personal_loans变量中。

3. 用str.startswith()和str.endswith()过滤字符串

如果你想要找出数据中以某个特定字符串开头或结尾的行，可以使用str.startswith()和str.endswith()函数。以下是使用str.startswith()函数查找贷款类型以“企业”开头的行：

business_loans = data[data['loan_type'].str.startswith('企业')]

其中，data['loan_type']表示选择data数据中的loan_type列，str.startswith('企业')表示筛选以“企业”开头的行。最终结果保存在business_loans变量中。

以下是使用str.endswith()函数查找贷款用途以“房屋装修”结尾的行：

home_improvement_loans = data[data['purpose'].str.endswith('房屋装修')]

其中，data['purpose']表示选择data数据中的purpose列，str.endswith('房屋装修')表示筛选以“房屋装修”结尾的行。最终结果保存在home_improvement_loans变量中。

4. 用.str.extract()提取字符串

如果你想要从一个字符串中提取出特定的部分，可以使用.str.extract()函数。以下是使用.str.extract()函数从贷款用途中提取出金额的行：

loan_amounts = data['purpose'].str.extract('(\d+\.?\d*)', expand=False)

其中，data['purpose']表示选择data数据中的purpose列，(\d+\.?\d*)是一个正则表达式，表示提取字符串中的数字部分，通过expand=False设置为一维序列，最终结果保存在loan_amounts变量中。

5. 使用正则表达式过滤字符串

如果你想要使用更复杂的规则进行字符串过滤，可以使用正则表达式。以下是使用正则表达式过滤出贷款利率小于等于4%的行：

low_rate_loans = data[data['rate'].str.contains('^([0-3]\.\d|\d(\.\d)?%?)$')]

其中，data['rate']表示选择data数据中的rate列，'^([0-3]\\.\d|\\d(\\.\\d)?%?)$'是一个正则表达式，表示匹配小于等于4%的利率。

6. 用.str.split()分割字符串

如果你想要将一个字符串拆分成多个部分进行过滤，可以使用.str.split()函数。以下是使用.str.split()函数将贷款用途拆分成多个部分并进行过滤：

purposes = data['purpose'].str.split('，', expand=True)
home_purposes = purposes[purposes[1].str.startswith('房屋')]

其中，data['purpose']表示选择data数据中的purpose列，.str.split('，', expand=True)表示按照逗号将字符串拆分成多个部分并扩展为多列，purposes[1].str.startswith('房屋')表示筛选第二列以“房屋”开头的行。最终结果保存在home_purposes变量中。

希望本文可以帮助到你学习Pandas中的字符串过滤。

Tags: pandas