详解 Scikit-learn 的 datasets.load_wine函数:加载葡萄酒数据集

  • Post category:Python

下面我来详细讲解一下 Scikit-learn 的 sklearn.datasets.load_wine 函数。

函数作用

sklearn.datasets.load_wine 函数的作用是加载红酒数据集(load wine dataset),返回一个包含数据和标签的 Bunch 对象,其中:

  • data 属性是一个二维数组,包含红酒的化学成分数据;
  • target 属性是一个一维数组,包含红酒所属的类别(0、1 或 2);
  • DESCR 属性是一个字符串,包含关于数据集的描述信息。

Wine 数据集包含了 178 个样本,红酒的化学成分数据有 13 维特征,每个样本属于 3 类中的一类。

使用方法

Step 1,导入库和数据集

首先需要导入 Scikit-learn 库和 sklearn.datasets 模块,然后使用 load_wine() 函数加载 Wine 数据集:

from sklearn.datasets import load_wine

wine = load_wine()

Step 2,查看数据集信息

可以使用 print(wine.DESCR) 来查看数据集的详细信息:

print(wine.DESCR)

输出结果如下:

.. _wine_dataset:

Wine recognition dataset
--------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
        - Alcohol
        - Malic acid
        - Ash
        - Alcalinity of ash
        - Magnesium
        - Total phenols
        - Flavanoids
        - Nonflavanoid phenols
        - Proanthocyanins
        - Color intensity
        - Hue
        - OD280/OD315 of diluted wines
        - Proline

    - class:
            - class_0
            - class_1
            - class_2

    :Summary Statistics:

    ============================= ==== ======= ========
                                   Min   Max     Mean
    ============================= ==== ======= ========
    Alcohol                          11.0  14.8    13.0
    Malic Acid                        0.74   5.8     2.34
    Ash                               1.3   3.0     2.36
    Alcalinity of Ash                10.6  30.0    19.5
    Magnesium                        70.0 162.0    99.7
    Total Phenols                     0.98   3.9     2.29
    Flavanoids                        0.34   5.1     2.03
    Nonflavanoid Phenols              0.13   0.66    0.36
    Proanthocyanins                   0.41   3.6     1.59
    Color Intensity                   1.3  13.0     5.06
    Hue                               0.48   1.7     0.96
    OD280/OD315 of Diluted Wines      1.27   4.0     2.61
    Proline                         278.0 1680.0   746.9

    :Missing Attribute Values: None

    :Class Distribution: 33.3% for each of 3 classes.

    :Creator: R.A. Fisher

    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)

    :Date: July, 1988

This is a copy of UCI ML Wine recognition datasets.
http://archive.ics.uci.edu/ml/datasets/Wine


The data is the results of a chemical analysis of wines grown in the same region in Italy by three different cultivators.
There are thirteen different measurements taken for different constituents found in the three types of wine.

Step 3,获取数据和标签

可以把数据和标签分别赋值给 Xy

X = wine.data   # 红酒的化学成分数据
y = wine.target  # 红酒所属类别

实例1,使用 KMeans 聚类算法对 Wine 数据集进行聚类

下面是对 Wine 数据集进行聚类的代码:

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3, random_state=0)
pred = kmeans.fit_predict(X)

print('KMeans 聚类算法的红酒样本分类预测结果:')
print(pred)

输出结果为:

KMeans 聚类算法的红酒样本分类预测结果:
[0 0 0 0 1 1 1 0 0 0 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 0 0 0
 1 1 2 2 2 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 1 1 2 2 2 1 0 0 0 1 2 2 1 1 0 0 2 2
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 1 0 0 0 0 0 1 2 0 1
 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 2 2 2 2 2 2 2 2]

实例2,使用决策树算法对 Wine 数据集进行分类

下面是对 Wine 数据集进行分类的代码:

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# 将数据集分成训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

clf = DecisionTreeClassifier(criterion='entropy')  # 初始化决策树分类器
clf.fit(X_train, y_train)  # 在训练集上拟合决策树分类器

# 在测试集上预测
y_pred = clf.predict(X_test)

print('决策树分类器的预测结果:')
print(y_pred)

输出结果为:

决策树分类器的预测结果:
[0 0 2 2 2 2 2 1 0 1 0 1 1 0 0 0 0 2 0 0 0 1 0 1 1 2 2 1 0 0 0 2 2 0 1 2 2
 2 2 0 0 0 0 0 0]

以上就是 sklearn.datasets.load_wine 函数的使用方法和两个实例,希望对你有所帮助。