详解 Scikit-learn 的 datasets.fetch_lfw_people函数：加载人脸数据集

sklearn.datasets.fetch_lfw_people函数是Scikit-learn（sklearn）中一个常用的数据集下载函数，用于加载具有相似特征的著名面孔的人脸数据集。以下是这个函数的详细作用与使用方法攻略。

作用

这个函数的作用是下载并加载人脸数据集并将其作为NumPy数组返回。这个数据集包含13233张包含5749个人脸的图片，其中4858个人脸来自250个人的人脸数据集中的特定人物。这些人脸具有大小约为62×47像素的灰度图像。

使用方法

首先需要导入 sklearn.datasets 模块，并使用函数 fetch_lfw_people 加载数据集。

from sklearn.datasets import fetch_lfw_people

faces = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

min_faces_per_person参数表示要加载的每个人至少拥有多少张图片，如果小于该数，则必须忽略此人。resize表示原始图像大小的百分比，较小的值将导致加载的图像更小且加快下载时间。

加载后，该数据集将存储为一个名为 faces 的 Bunch 对象。此对象包含许多属性，其中最重要的是 data 和 target 属性。

data: 一个二维数组，每行表示一张图片，每列表示该图片在各个像素位置的强度值。
target: 对应每个图像的人物标签，通常用于分类或回归任务。

下面是使用这个函数的两个实例说明：

实例1：使用人脸识别数据集进行分类

import matplotlib.pyplot as plt
from sklearn.datasets import fetch_lfw_people
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.svm import SVC
from sklearn.decomposition import PCA

# 加载数据集
faces = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

# 数据预处理
X_train, X_test, y_train, y_test = train_test_split(faces.data, faces.target, test_size=0.3, random_state=42)
pca = PCA(n_components=150, whiten=True, random_state=42)
pca.fit(X_train)
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)

# 使用SVM分类器进行分类
svc = SVC(kernel='rbf', class_weight='balanced')
svc.fit(X_train_pca, y_train)
y_pred = svc.predict(X_test_pca)

# 输出分类报告
print(classification_report(y_test, y_pred, target_names=faces.target_names))

上述代码使用人脸识别数据集进行分类，采用PCA算法降低数据维度，并使用SVM分类器进行分类。分类报告包含精度、召回率、F1得分和支持度四个指标，根据分类报告的结果，可以评估分类器性能的好坏。

实例2：使用图像可视化来检查数据集

import matplotlib.pyplot as plt
from sklearn.datasets import fetch_lfw_people

# 加载数据集
faces = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

# 打印数据集信息
print(faces.DESCR)

# 利用子图显示数据集中的多张图像
fig, axs = plt.subplots(3, 5)

for i, ax in enumerate(axs.flat):
    ax.imshow(faces.images[i], cmap='gray')
    ax.set(xticks=[], yticks=[], xlabel=faces.target_names[faces.target[i]])

上述代码使用Matplotlib库中的 subplots 函数利用子图显示数据集中的多张图像，并打印数据集的描述信息。结果显示出数据集中人物的多个照片，并可以了解数据集的基础信息和数据的大致内容。每个图像底部都标有该图像所代表的人物名字。

作用

使用方法

实例1：使用人脸识别数据集进行分类

实例2：使用图像可视化来检查数据集

你可能也喜欢

详解 Scikit-learn 的 metrics.recall_score函数：计算分类器召回率

scikit-learn报”ValueError: Found input variables with inconsistent numbers of samples: {n_samples1}, {n_samples2} “的原因以及解决办法

scikit-learn报”ValueError: Found array with {n_features} feature(s) (shape={shape}) while a minimum of {n_required_features} is required by {estimator_name}. “的原因以及解决办法