下面我来详细讲解一下 Scikit-learn 的 sklearn.datasets.load_wine
函数。
函数作用
sklearn.datasets.load_wine
函数的作用是加载红酒数据集(load wine dataset),返回一个包含数据和标签的 Bunch
对象,其中:
data
属性是一个二维数组,包含红酒的化学成分数据;target
属性是一个一维数组,包含红酒所属的类别(0、1 或 2);DESCR
属性是一个字符串,包含关于数据集的描述信息。
Wine 数据集包含了 178 个样本,红酒的化学成分数据有 13 维特征,每个样本属于 3 类中的一类。
使用方法
Step 1,导入库和数据集
首先需要导入 Scikit-learn 库和 sklearn.datasets
模块,然后使用 load_wine()
函数加载 Wine 数据集:
from sklearn.datasets import load_wine
wine = load_wine()
Step 2,查看数据集信息
可以使用 print(wine.DESCR)
来查看数据集的详细信息:
print(wine.DESCR)
输出结果如下:
.. _wine_dataset:
Wine recognition dataset
--------------------------
**Data Set Characteristics:**
:Number of Instances: 178 (50 in each of three classes)
:Number of Attributes: 13 numeric, predictive attributes and the class
:Attribute Information:
- Alcohol
- Malic acid
- Ash
- Alcalinity of ash
- Magnesium
- Total phenols
- Flavanoids
- Nonflavanoid phenols
- Proanthocyanins
- Color intensity
- Hue
- OD280/OD315 of diluted wines
- Proline
- class:
- class_0
- class_1
- class_2
:Summary Statistics:
============================= ==== ======= ========
Min Max Mean
============================= ==== ======= ========
Alcohol 11.0 14.8 13.0
Malic Acid 0.74 5.8 2.34
Ash 1.3 3.0 2.36
Alcalinity of Ash 10.6 30.0 19.5
Magnesium 70.0 162.0 99.7
Total Phenols 0.98 3.9 2.29
Flavanoids 0.34 5.1 2.03
Nonflavanoid Phenols 0.13 0.66 0.36
Proanthocyanins 0.41 3.6 1.59
Color Intensity 1.3 13.0 5.06
Hue 0.48 1.7 0.96
OD280/OD315 of Diluted Wines 1.27 4.0 2.61
Proline 278.0 1680.0 746.9
:Missing Attribute Values: None
:Class Distribution: 33.3% for each of 3 classes.
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
:Date: July, 1988
This is a copy of UCI ML Wine recognition datasets.
http://archive.ics.uci.edu/ml/datasets/Wine
The data is the results of a chemical analysis of wines grown in the same region in Italy by three different cultivators.
There are thirteen different measurements taken for different constituents found in the three types of wine.
Step 3,获取数据和标签
可以把数据和标签分别赋值给 X
和 y
:
X = wine.data # 红酒的化学成分数据
y = wine.target # 红酒所属类别
实例1,使用 KMeans 聚类算法对 Wine 数据集进行聚类
下面是对 Wine 数据集进行聚类的代码:
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3, random_state=0)
pred = kmeans.fit_predict(X)
print('KMeans 聚类算法的红酒样本分类预测结果:')
print(pred)
输出结果为:
KMeans 聚类算法的红酒样本分类预测结果:
[0 0 0 0 1 1 1 0 0 0 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 0 0 0
1 1 2 2 2 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 1 1 2 2 2 1 0 0 0 1 2 2 1 1 0 0 2 2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 1 0 0 0 0 0 1 2 0 1
1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 2 2 2 2 2 2 2 2]
实例2,使用决策树算法对 Wine 数据集进行分类
下面是对 Wine 数据集进行分类的代码:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
# 将数据集分成训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
clf = DecisionTreeClassifier(criterion='entropy') # 初始化决策树分类器
clf.fit(X_train, y_train) # 在训练集上拟合决策树分类器
# 在测试集上预测
y_pred = clf.predict(X_test)
print('决策树分类器的预测结果:')
print(y_pred)
输出结果为:
决策树分类器的预测结果:
[0 0 2 2 2 2 2 1 0 1 0 1 1 0 0 0 0 2 0 0 0 1 0 1 1 2 2 1 0 0 0 2 2 0 1 2 2
2 2 0 0 0 0 0 0]
以上就是 sklearn.datasets.load_wine
函数的使用方法和两个实例,希望对你有所帮助。