pandas的apply函数用法详解

简介

apply() 是 pandas 中非常强大的一个数据处理函数。它可以对一个数据集的行或列进行操作，返回处理后的结果。使用apply()可以大大提高数据处理效率，缩短代码编写时间。

Python pandas 数据分析库是基于 NumPy 库的一个开源数据分析工具，它可用于数据清洗和整理，如删除、替换、排序和合并数据。pandas 大多数的操作都以 DataFrame（数据帧）和 Sereis（一组带标签的数据）为主。pandas 的 apply() 函数用于对 DataFrame 行或列的数据进行操作。

语法

apply的语法如下：

DataFrame.apply(func, axis=0, args=(), **kwds)

其中：

func : 传入的函数可以是普通函数，也可以是lambda表达式。
axis : 设置取值为 0 或1，代表以行（0）或列（1）为单位来进行操作。
args : 传入func的位置参数，使用元组。
kwargs : 传入func的关键字参数。

示例

示例1

以下是一个操作 DataFrame 行进行求和运算的示例。

import pandas as pd

data = {'one': [1, 1], 'two': [2, 2], 'three': [3, 3]}
df = pd.DataFrame(data)

def sum_row(row):
    return sum(row)

df['row_sum'] = df.apply(sum_row, axis=1)
print(df)

输出结果：

   one  two  three  row_sum
0    1    2      3        6
1    1    2      3        6

这个示例中，我们定义了一个函数 sum_row，该函数用于对 DataFrame 每一行进行求和操作。我们使用 apply() 函数调用该函数，传递 axis=1 参数表示按照行进行操作，将结果赋值给新创建的列 row_sum。

示例2

以下是一个处理 DataFrame 列，进行对数变换的示例。

import pandas as pd
import math

data = {'one': [1, 2], 'two': [2, 4], 'three': [3, 6]}
df = pd.DataFrame(data)

def log10(val):
    return math.log10(val)

df['one_log'] = df['one'].apply(log10)
df['two_log'] = df['two'].apply(log10)
df['three_log'] = df['three'].apply(log10)
print(df)

输出结果：

   one  two  three   one_log   two_log  three_log
0    1    2      3  0.000000  0.301030   0.477121
1    2    4      6  0.301030  0.602060   0.778151

以上示例中，我们定义了一个函数 log10，该函数用于求每一个值的对数，使用 apply() 函数对 DataFrame 的每一列进行操作，将结果赋值给新创建的列 one_log、two_log 和 three_log。

pandas的apply函数用法详解

简介

语法

示例

示例1

示例2

你可能也喜欢

在Pandas DataFrame中基于现有的列创建一个新的列

Pandas数据类型之category的用法

Pandas中DataFrame的基本操作之重新索引讲解