对本书的大部分内容来说,我们会依赖于数据集来拟合机器学习算法。这部分将会告诉你如何通过TensorFlow和Python来获取这些数据资源。

在TensorFlow中,有一些数据资源是Python库中内置的,有些是需要Python的脚本来下载,有些是需要在网上手动下载。 当然,所有这些数据集都是需要网络来获取数据。

首先,我们需要对TensorFlow的 graph session 进行初始化:

>>> import tensorflow.compat.v1 as tf
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> from tensorflow.python.framework import ops
>>> ops.reset_default_graph()
>>> tf.disable_eager_execution()
>>> sess = tf.Session()

Iris Dataset(鸢尾属植物数据集)

这个数据集( Iris Dataset )无可置疑地是最经典的用于机器学习的数据集,而且可能扩展到所有统计学。这个数据集采集了三种鸢尾花的 sepal length (花萼长度), sepal width (花萼宽度),petal length (花瓣长度),petal width (花瓣宽度)。这三种鸢尾花分别是Iris Setosa(山鸢尾),Iris Versicolour(杂色鸢尾),Iris Virginica(维吉尼亚鸢尾)。总共有150项测量,每种鸢尾花有50项。为了在Python中使用这些数据集,我们使用Scikit Learn中的数据函数。

>>> from sklearn.datasets import load_iris
>>> import pandas as pd
>>> iris = load_iris()
>>> print(len(iris.data))
150
>>> print(len(iris.target))
150
>>> print(iris.data[0])
[5.1 3.5 1.4 0.2]
>>> print(set(iris.target))
{0, 1, 2}
>>> pd.DataFrame(data=iris.data, columns=iris.feature_names)
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
5 5.4 3.9 1.7 0.4
6 4.6 3.4 1.4 0.3
7 5.0 3.4 1.5 0.2
8 4.4 2.9 1.4 0.2
9 4.9 3.1 1.5 0.1
10 5.4 3.7 1.5 0.2
11 4.8 3.4 1.6 0.2
12 4.8 3.0 1.4 0.1
13 4.3 3.0 1.1 0.1
14 5.8 4.0 1.2 0.2
15 5.7 4.4 1.5 0.4
16 5.4 3.9 1.3 0.4
17 5.1 3.5 1.4 0.3
18 5.7 3.8 1.7 0.3
19 5.1 3.8 1.5 0.3
20 5.4 3.4 1.7 0.2
21 5.1 3.7 1.5 0.4
22 4.6 3.6 1.0 0.2
23 5.1 3.3 1.7 0.5
24 4.8 3.4 1.9 0.2
25 5.0 3.0 1.6 0.2
26 5.0 3.4 1.6 0.4
27 5.2 3.5 1.5 0.2
28 5.2 3.4 1.4 0.2
29 4.7 3.2 1.6 0.2
... ... ... ... ...
120 6.9 3.2 5.7 2.3
121 5.6 2.8 4.9 2.0
122 7.7 2.8 6.7 2.0
123 6.3 2.7 4.9 1.8
124 6.7 3.3 5.7 2.1
125 7.2 3.2 6.0 1.8
126 6.2 2.8 4.8 1.8
127 6.1 3.0 4.9 1.8
128 6.4 2.8 5.6 2.1
129 7.2 3.0 5.8 1.6
130 7.4 2.8 6.1 1.9
131 7.9 3.8 6.4 2.0
132 6.4 2.8 5.6 2.2
133 6.3 2.8 5.1 1.5
134 6.1 2.6 5.6 1.4
135 7.7 3.0 6.1 2.3
136 6.3 3.4 5.6 2.4
137 6.4 3.1 5.5 1.8
138 6.0 3.0 4.8 1.8
139 6.9 3.1 5.4 2.1
140 6.7 3.1 5.6 2.4
141 6.9 3.1 5.1 2.3
142 5.8 2.7 5.1 1.9
143 6.8 3.2 5.9 2.3
144 6.7 3.3 5.7 2.5
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8

150 rows x 4 columns

Low Birthrate Dataset (Hosted on Github)

马萨诸塞大学艾摩斯特分校(The university of Massachusetts at Amherst)编撰了很多有趣的统计数据集。其中有一项是测量儿童出生重量和其他人口学数据( Low Birthrate Dataset , “Low Infant Birth Weight Risk Factor Study”, 1989, Hosmer and Lemeshow),以及母亲和家庭历史的医学测量。总共测量了11个变量的189观察数据。这里给出如何通过Python来获取其中的数据:

>>> import requests
>>> birthdata_url='https://github.com/nfmcclure/tensorflow_cookbook/raw/master/01_Introduction/07_Working_with_Data_Sources/birthweight_data/birthweight.dat'
>>> birth_file = requests.get(birthdata_url)
>>> birth_data = birth_file.text.split('\r\n')
>>> birth_header = birth_data[0].split('\t')
>>> birth_data = [[float(x) for x in y.split('\t') if len(x)>=1] for y in birth_data[1:] if len(y)>=1]
>>> print(len(birth_data))
189
>>> print(len(birth_data[0]))
9
>>> print(birth_header)
['LOW', 'AGE', 'LWT', 'RACE', 'SMOKE', 'PTL', 'HT', 'UI', 'BWT']

>>> import pandas as pd
>>> pd.DataFrame(data=birth_data, columns=birth_header)
LOW AGE LWT RACE SMOKE PTL HT UI BWT
0 1.0 28.0 113.0 1.0 1.0 1.0 0.0 1.0 709.0
1 1.0 29.0 130.0 0.0 0.0 0.0 0.0 1.0 1021.0
2 1.0 34.0 187.0 1.0 1.0 0.0 1.0 0.0 1135.0
3 1.0 25.0 105.0 1.0 0.0 1.0 1.0 0.0 1330.0
4 1.0 25.0 85.0 1.0 0.0 0.0 0.0 1.0 1474.0
5 1.0 27.0 150.0 1.0 0.0 0.0 0.0 0.0 1588.0
6 1.0 23.0 97.0 1.0 0.0 0.0 0.0 1.0 1588.0
7 1.0 24.0 128.0 1.0 0.0 1.0 0.0 0.0 1701.0
8 1.0 24.0 132.0 1.0 0.0 0.0 1.0 0.0 1729.0
9 1.0 21.0 165.0 0.0 1.0 0.0 1.0 0.0 1790.0
10 1.0 32.0 105.0 1.0 1.0 0.0 0.0 0.0 1818.0
11 1.0 19.0 91.0 0.0 1.0 1.0 0.0 1.0 1885.0
12 1.0 25.0 115.0 1.0 0.0 0.0 0.0 0.0 1893.0
13 1.0 16.0 130.0 1.0 0.0 0.0 0.0 0.0 1899.0
14 1.0 25.0 92.0 0.0 1.0 0.0 0.0 0.0 1928.0
15 1.0 20.0 150.0 0.0 1.0 0.0 0.0 0.0 1928.0
16 1.0 21.0 190.0 1.0 0.0 0.0 0.0 1.0 1928.0
17 1.0 24.0 155.0 0.0 1.0 1.0 0.0 0.0 1936.0
18 1.0 21.0 103.0 1.0 0.0 0.0 0.0 0.0 1970.0
19 1.0 20.0 125.0 1.0 0.0 0.0 0.0 1.0 2055.0
20 1.0 25.0 89.0 1.0 0.0 1.0 0.0 0.0 2055.0
21 1.0 19.0 102.0 0.0 0.0 0.0 0.0 0.0 2082.0
22 1.0 19.0 112.0 0.0 1.0 0.0 0.0 1.0 2084.0
23 1.0 26.0 117.0 0.0 1.0 1.0 0.0 1.0 2084.0
24 1.0 24.0 138.0 0.0 0.0 0.0 0.0 0.0 2100.0
25 1.0 17.0 130.0 1.0 1.0 1.0 0.0 1.0 2125.0
26 1.0 20.0 120.0 1.0 1.0 0.0 0.0 0.0 2126.0
27 1.0 22.0 130.0 0.0 1.0 1.0 0.0 1.0 2187.0
28 1.0 27.0 130.0 1.0 0.0 0.0 0.0 1.0 2187.0
29 1.0 20.0 80.0 1.0 1.0 0.0 0.0 1.0 2211.0
... ... ... ... ... ... ... ... ... ...
159 0.0 24.0 110.0 0.0 0.0 0.0 0.0 0.0 3728.0
160 0.0 19.0 184.0 0.0 1.0 0.0 1.0 0.0 3756.0
161 0.0 24.0 110.0 0.0 0.0 1.0 0.0 0.0 3770.0
162 0.0 23.0 110.0 0.0 0.0 0.0 0.0 0.0 3770.0
163 0.0 20.0 120.0 1.0 0.0 0.0 0.0 0.0 3770.0
164 0.0 25.0 141.0 0.0 0.0 0.0 1.0 0.0 3790.0
165 0.0 30.0 112.0 0.0 0.0 0.0 0.0 0.0 3799.0
166 0.0 22.0 169.0 0.0 0.0 0.0 0.0 0.0 3827.0
167 0.0 18.0 120.0 0.0 1.0 0.0 0.0 0.0 3856.0
168 0.0 16.0 170.0 1.0 0.0 0.0 0.0 0.0 3860.0
169 0.0 32.0 186.0 0.0 0.0 0.0 0.0 0.0 3860.0
170 0.0 18.0 120.0 1.0 0.0 0.0 0.0 0.0 3884.0
171 0.0 29.0 130.0 0.0 1.0 0.0 0.0 0.0 3884.0
172 0.0 33.0 117.0 0.0 0.0 0.0 0.0 1.0 3912.0
173 0.0 20.0 170.0 0.0 1.0 0.0 0.0 0.0 3940.0
174 0.0 28.0 134.0 1.0 0.0 0.0 0.0 0.0 3941.0
175 0.0 14.0 135.0 0.0 0.0 1.0 0.0 0.0 3941.0
176 0.0 28.0 130.0 1.0 0.0 0.0 0.0 0.0 3969.0
177 0.0 25.0 120.0 0.0 0.0 0.0 0.0 0.0 3983.0
178 0.0 16.0 135.0 1.0 0.0 0.0 0.0 0.0 3997.0
179 0.0 20.0 158.0 0.0 0.0 0.0 0.0 0.0 3997.0
180 0.0 26.0 160.0 0.0 0.0 0.0 0.0 0.0 4054.0
181 0.0 21.0 115.0 0.0 0.0 0.0 0.0 0.0 4054.0
182 0.0 22.0 129.0 0.0 0.0 0.0 0.0 0.0 4111.0
183 0.0 25.0 130.0 0.0 0.0 0.0 0.0 0.0 4153.0
184 0.0 31.0 120.0 0.0 0.0 0.0 0.0 0.0 4167.0
185 0.0 35.0 170.0 0.0 0.0 1.0 0.0 0.0 4174.0
186 0.0 19.0 120.0 0.0 1.0 0.0 1.0 0.0 4238.0
187 0.0 24.0 216.0 0.0 0.0 0.0 0.0 0.0 4593.0
188 0.0 45.0 123.0 0.0 0.0 1.0 0.0 0.0 4990.0

189 rows x 9 columns

波士顿房价数据库(University of California at Irvine)

卡耐基梅隆大学在它的统计学库中保存了很多数据。其中一项,波士顿房价数据( Boston Housing data )可以通过加利福尼亚艾文分校的机器学习仓库来获取。这里总共有房价的506项观察数据和不同人口学数据,以及住宅性质(14个变量)。这里展示如何在Python中获取这些数据:

>>> import requests
>>> housing_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data'
>>> housing_header = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
>>> housing_file = requests.get(housing_url)
>>> housing_data = [[float(x) for x in y.split(' ') if len(x)>=1] for y in housing_file.text.split('\n') if len(y)>=1]
>>> print(len(housing_data))
506
>>> print(len(housing_data[0]))
14
>>> pd.DataFrame(data=housing_data,columns=housing_header)
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33 36.2
5 0.02985 0.0 2.18 0.0 0.458 6.430 58.7 6.0622 3.0 222.0 18.7 394.12 5.21 28.7
6 0.08829 12.5 7.87 0.0 0.524 6.012 66.6 5.5605 5.0 311.0 15.2 395.60 12.43 22.9
7 0.14455 12.5 7.87 0.0 0.524 6.172 96.1 5.9505 5.0 311.0 15.2 396.90 19.15 27.1
8 0.21124 12.5 7.87 0.0 0.524 5.631 100.0 6.0821 5.0 311.0 15.2 386.63 29.93 16.5
9 0.17004 12.5 7.87 0.0 0.524 6.004 85.9 6.5921 5.0 311.0 15.2 386.71 17.10 18.9
10 0.22489 12.5 7.87 0.0 0.524 6.377 94.3 6.3467 5.0 311.0 15.2 392.52 20.45 15.0
11 0.11747 12.5 7.87 0.0 0.524 6.009 82.9 6.2267 5.0 311.0 15.2 396.90 13.27 18.9
12 0.09378 12.5 7.87 0.0 0.524 5.889 39.0 5.4509 5.0 311.0 15.2 390.50 15.71 21.7
13 0.62976 0.0 8.14 0.0 0.538 5.949 61.8 4.7075 4.0 307.0 21.0 396.90 8.26 20.4
14 0.63796 0.0 8.14 0.0 0.538 6.096 84.5 4.4619 4.0 307.0 21.0 380.02 10.26 18.2
15 0.62739 0.0 8.14 0.0 0.538 5.834 56.5 4.4986 4.0 307.0 21.0 395.62 8.47 19.9
16 1.05393 0.0 8.14 0.0 0.538 5.935 29.3 4.4986 4.0 307.0 21.0 386.85 6.58 23.1
17 0.78420 0.0 8.14 0.0 0.538 5.990 81.7 4.2579 4.0 307.0 21.0 386.75 14.67 17.5
18 0.80271 0.0 8.14 0.0 0.538 5.456 36.6 3.7965 4.0 307.0 21.0 288.99 11.69 20.2
19 0.72580 0.0 8.14 0.0 0.538 5.727 69.5 3.7965 4.0 307.0 21.0 390.95 11.28 18.2
20 1.25179 0.0 8.14 0.0 0.538 5.570 98.1 3.7979 4.0 307.0 21.0 376.57 21.02 13.6
21 0.85204 0.0 8.14 0.0 0.538 5.965 89.2 4.0123 4.0 307.0 21.0 392.53 13.83 19.6
22 1.23247 0.0 8.14 0.0 0.538 6.142 91.7 3.9769 4.0 307.0 21.0 396.90 18.72 15.2
23 0.98843 0.0 8.14 0.0 0.538 5.813 100.0 4.0952 4.0 307.0 21.0 394.54 19.88 14.5
24 0.75026 0.0 8.14 0.0 0.538 5.924 94.1 4.3996 4.0 307.0 21.0 394.33 16.30 15.6
25 0.84054 0.0 8.14 0.0 0.538 5.599 85.7 4.4546 4.0 307.0 21.0 303.42 16.51 13.9
26 0.67191 0.0 8.14 0.0 0.538 5.813 90.3 4.6820 4.0 307.0 21.0 376.88 14.81 16.6
27 0.95577 0.0 8.14 0.0 0.538 6.047 88.8 4.4534 4.0 307.0 21.0 306.38 17.28 14.8
28 0.77299 0.0 8.14 0.0 0.538 6.495 94.4 4.4547 4.0 307.0 21.0 387.94 12.80 18.4
29 1.00245 0.0 8.14 0.0 0.538 6.674 87.3 4.2390 4.0 307.0 21.0 380.23 11.98 21.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
476 4.87141 0.0 18.10 0.0 0.614 6.484 93.6 2.3053 24.0 666.0 20.2 396.21 18.68 16.7
477 15.02340 0.0 18.10 0.0 0.614 5.304 97.3 2.1007 24.0 666.0 20.2 349.48 24.91 12.0
478 10.23300 0.0 18.10 0.0 0.614 6.185 96.7 2.1705 24.0 666.0 20.2 379.70 18.03 14.6
479 14.33370 0.0 18.10 0.0 0.614 6.229 88.0 1.9512 24.0 666.0 20.2 383.32 13.11 21.4
480 5.82401 0.0 18.10 0.0 0.532 6.242 64.7 3.4242 24.0 666.0 20.2 396.90 10.74 23.0
481 5.70818 0.0 18.10 0.0 0.532 6.750 74.9 3.3317 24.0 666.0 20.2 393.07 7.74 23.7
482 5.73116 0.0 18.10 0.0 0.532 7.061 77.0 3.4106 24.0 666.0 20.2 395.28 7.01 25.0
483 2.81838 0.0 18.10 0.0 0.532 5.762 40.3 4.0983 24.0 666.0 20.2 392.92 10.42 21.8
484 2.37857 0.0 18.10 0.0 0.583 5.871 41.9 3.7240 24.0 666.0 20.2 370.73 13.34 20.6
485 3.67367 0.0 18.10 0.0 0.583 6.312 51.9 3.9917 24.0 666.0 20.2 388.62 10.58 21.2
486 5.69175 0.0 18.10 0.0 0.583 6.114 79.8 3.5459 24.0 666.0 20.2 392.68 14.98 19.1
487 4.83567 0.0 18.10 0.0 0.583 5.905 53.2 3.1523 24.0 666.0 20.2 388.22 11.45 20.6
488 0.15086 0.0 27.74 0.0 0.609 5.454 92.7 1.8209 4.0 711.0 20.1 395.09 18.06 15.2
489 0.18337 0.0 27.74 0.0 0.609 5.414 98.3 1.7554 4.0 711.0 20.1 344.05 23.97 7.0
490 0.20746 0.0 27.74 0.0 0.609 5.093 98.0 1.8226 4.0 711.0 20.1 318.43 29.68 8.1
491 0.10574 0.0 27.74 0.0 0.609 5.983 98.8 1.8681 4.0 711.0 20.1 390.11 18.07 13.6
492 0.11132 0.0 27.74 0.0 0.609 5.983 83.5 2.1099 4.0 711.0 20.1 396.90 13.35 20.1
493 0.17331 0.0 9.69 0.0 0.585 5.707 54.0 2.3817 6.0 391.0 19.2 396.90 12.01 21.8
494 0.27957 0.0 9.69 0.0 0.585 5.926 42.6 2.3817 6.0 391.0 19.2 396.90 13.59 24.5
495 0.17899 0.0 9.69 0.0 0.585 5.670 28.8 2.7986 6.0 391.0 19.2 393.29 17.60 23.1
496 0.28960 0.0 9.69 0.0 0.585 5.390 72.9 2.7986 6.0 391.0 19.2 396.90 21.14 19.7
497 0.26838 0.0 9.69 0.0 0.585 5.794 70.6 2.8927 6.0 391.0 19.2 396.90 14.10 18.3
498 0.23912 0.0 9.69 0.0 0.585 6.019 65.3 2.4091 6.0 391.0 19.2 396.90 12.92 21.2
499 0.17783 0.0 9.69 0.0 0.585 5.569 73.5 2.3999 6.0 391.0 19.2 395.77 15.10 17.5
500 0.22438 0.0 9.69 0.0 0.585 6.027 79.7 2.4982 6.0 391.0 19.2 396.90 14.33 16.8
501 0.06263 0.0 11.93 0.0 0.573 6.593 69.1 2.4786 1.0 273.0 21.0 391.99 9.67 22.4
502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1.0 273.0 21.0 396.90 9.08 20.6
503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1.0 273.0 21.0 396.90 5.64 23.9
504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1.0 273.0 21.0 393.45 6.48 22.0
505 0.04741 0.0 11.93 0.0 0.573 6.030 80.8 2.5050 1.0 273.0 21.0 396.90 7.88 11.9

506 rows x 14 columns

或者采用另外一种方式:

 >>> from sklearn.datasets import load_boston
 >>> import pandas as pd
 >>> boston = load_boston()
 >>> print(len(boston.data))
 506
 >>> print(len(boston.target))
 506
 >>> print(boston.data[0])
 [6.320e-03 1.800e+01 2.310e+00 0.000e+00 5.380e-01 6.575e+00 6.520e+01
4.090e+00 1.000e+00 2.960e+02 1.530e+01 3.969e+02 4.980e+00]
 >>> print(set(boston.target))
 {5.0, 6.3, 7.2, 8.8, 7.4, 10.2, 11.8, 12.7, 13.6, 14.5, 15.2, 15.0, 16.5, 17.5, 19.6, 18.9, 18.2, 20.4, 21.6, 22.9, 21.7, 26.6, 26.5, 27.5, 24.0, 23.1, 27.1, 28.7, 24.7, 30.8, 33.4, 34.7, 34.9, 36.2, 35.4, 31.6, 33.0, 38.7, 43.8, 41.3, 37.2, 39.8, 42.3, 48.5, 44.8, 50.0, 46.7, 48.3, 44.0, 48.8, 46.0, 10.5, 11.5, 11.0, 12.5, 12.0, 13.5, 13.0, 14.0, 16.6, 16.0, 16.1, 16.4, 17.4, 17.1, 17.0, 17.6, 17.9, 18.4, 18.6, 18.5, 18.0, 18.1, 19.9, 19.4, 19.5, 19.1, 19.0, 20.1, 20.0, 20.5, 20.9, 20.6, 21.0, 21.4, 21.5, 21.9, 21.1, 22.0, 22.5, 22.6, 22.4, 22.1, 23.4, 23.5, 23.9, 23.6, 23.0, 24.1, 24.6, 24.4, 24.5, 25.0, 25.1, 26.4, 27.0, 27.9, 28.0, 28.4, 28.1, 28.5, 28.6, 29.4, 29.9, 29.6, 29.1, 29.0, 30.5, 30.1, 31.1, 31.5, 31.0, 32.5, 32.0, 32.9, 32.4, 32.2, 33.2, 33.3, 33.8, 33.1, 32.7, 34.6, 8.4, 35.2, 35.1, 10.4, 10.9, 7.0, 36.4, 36.0, 36.5, 36.1, 11.9, 37.9, 37.0, 37.6, 37.3, 13.9, 13.4, 14.4, 14.9, 15.4, 8.5, 41.7, 42.8, 43.1, 43.5, 45.4, 9.5, 8.3, 8.7, 9.7, 10.8, 11.3, 11.7, 12.3, 12.8, 13.2, 13.3, 13.8, 14.8, 14.3, 14.2, 15.7, 15.3, 16.2, 16.8, 16.3, 16.7, 17.3, 17.8, 17.2, 17.7, 18.3, 18.7, 18.8, 19.2, 19.3, 19.7, 19.8, 20.2, 20.8, 20.3, 20.7, 21.2, 21.8, 22.2, 22.8, 22.7, 22.3, 23.3, 23.8, 23.2, 23.7, 24.8, 24.2, 24.3, 25.3, 25.2, 26.7, 26.2, 7.5, 28.2, 29.8, 30.3, 30.7, 5.6, 31.7, 31.2, 8.1, 9.6, 12.1, 12.6, 13.1, 14.6, 14.1, 15.6, 15.1}
 >>> pd.DataFrame(data=boston.data, columns=boston.feature_names)
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33
5 0.02985 0.0 2.18 0.0 0.458 6.430 58.7 6.0622 3.0 222.0 18.7 394.12 5.21
6 0.08829 12.5 7.87 0.0 0.524 6.012 66.6 5.5605 5.0 311.0 15.2 395.60 12.43
7 0.14455 12.5 7.87 0.0 0.524 6.172 96.1 5.9505 5.0 311.0 15.2 396.90 19.15
8 0.21124 12.5 7.87 0.0 0.524 5.631 100.0 6.0821 5.0 311.0 15.2 386.63 29.93
9 0.17004 12.5 7.87 0.0 0.524 6.004 85.9 6.5921 5.0 311.0 15.2 386.71 17.10
10 0.22489 12.5 7.87 0.0 0.524 6.377 94.3 6.3467 5.0 311.0 15.2 392.52 20.45
11 0.11747 12.5 7.87 0.0 0.524 6.009 82.9 6.2267 5.0 311.0 15.2 396.90 13.27
12 0.09378 12.5 7.87 0.0 0.524 5.889 39.0 5.4509 5.0 311.0 15.2 390.50 15.71
13 0.62976 0.0 8.14 0.0 0.538 5.949 61.8 4.7075 4.0 307.0 21.0 396.90 8.26
14 0.63796 0.0 8.14 0.0 0.538 6.096 84.5 4.4619 4.0 307.0 21.0 380.02 10.26
15 0.62739 0.0 8.14 0.0 0.538 5.834 56.5 4.4986 4.0 307.0 21.0 395.62 8.47
16 1.05393 0.0 8.14 0.0 0.538 5.935 29.3 4.4986 4.0 307.0 21.0 386.85 6.58
17 0.78420 0.0 8.14 0.0 0.538 5.990 81.7 4.2579 4.0 307.0 21.0 386.75 14.67
18 0.80271 0.0 8.14 0.0 0.538 5.456 36.6 3.7965 4.0 307.0 21.0 288.99 11.69
19 0.72580 0.0 8.14 0.0 0.538 5.727 69.5 3.7965 4.0 307.0 21.0 390.95 11.28
20 1.25179 0.0 8.14 0.0 0.538 5.570 98.1 3.7979 4.0 307.0 21.0 376.57 21.02
21 0.85204 0.0 8.14 0.0 0.538 5.965 89.2 4.0123 4.0 307.0 21.0 392.53 13.83
22 1.23247 0.0 8.14 0.0 0.538 6.142 91.7 3.9769 4.0 307.0 21.0 396.90 18.72
23 0.98843 0.0 8.14 0.0 0.538 5.813 100.0 4.0952 4.0 307.0 21.0 394.54 19.88
24 0.75026 0.0 8.14 0.0 0.538 5.924 94.1 4.3996 4.0 307.0 21.0 394.33 16.30
25 0.84054 0.0 8.14 0.0 0.538 5.599 85.7 4.4546 4.0 307.0 21.0 303.42 16.51
26 0.67191 0.0 8.14 0.0 0.538 5.813 90.3 4.6820 4.0 307.0 21.0 376.88 14.81
27 0.95577 0.0 8.14 0.0 0.538 6.047 88.8 4.4534 4.0 307.0 21.0 306.38 17.28
28 0.77299 0.0 8.14 0.0 0.538 6.495 94.4 4.4547 4.0 307.0 21.0 387.94 12.80
29 1.00245 0.0 8.14 0.0 0.538 6.674 87.3 4.2390 4.0 307.0 21.0 380.23 11.98
... ... ... ... ... ... ... ... ... ... ... ... ... ...
476 4.87141 0.0 18.10 0.0 0.614 6.484 93.6 2.3053 24.0 666.0 20.2 396.21 18.68
477 15.02340 0.0 18.10 0.0 0.614 5.304 97.3 2.1007 24.0 666.0 20.2 349.48 24.91
478 10.23300 0.0 18.10 0.0 0.614 6.185 96.7 2.1705 24.0 666.0 20.2 379.70 18.03
479 14.33370 0.0 18.10 0.0 0.614 6.229 88.0 1.9512 24.0 666.0 20.2 383.32 13.11
480 5.82401 0.0 18.10 0.0 0.532 6.242 64.7 3.4242 24.0 666.0 20.2 396.90 10.74
481 5.70818 0.0 18.10 0.0 0.532 6.750 74.9 3.3317 24.0 666.0 20.2 393.07 7.74
482 5.73116 0.0 18.10 0.0 0.532 7.061 77.0 3.4106 24.0 666.0 20.2 395.28 7.01
483 2.81838 0.0 18.10 0.0 0.532 5.762 40.3 4.0983 24.0 666.0 20.2 392.92 10.42
484 2.37857 0.0 18.10 0.0 0.583 5.871 41.9 3.7240 24.0 666.0 20.2 370.73 13.34
485 3.67367 0.0 18.10 0.0 0.583 6.312 51.9 3.9917 24.0 666.0 20.2 388.62 10.58
486 5.69175 0.0 18.10 0.0 0.583 6.114 79.8 3.5459 24.0 666.0 20.2 392.68 14.98
487 4.83567 0.0 18.10 0.0 0.583 5.905 53.2 3.1523 24.0 666.0 20.2 388.22 11.45
488 0.15086 0.0 27.74 0.0 0.609 5.454 92.7 1.8209 4.0 711.0 20.1 395.09 18.06
489 0.18337 0.0 27.74 0.0 0.609 5.414 98.3 1.7554 4.0 711.0 20.1 344.05 23.97
490 0.20746 0.0 27.74 0.0 0.609 5.093 98.0 1.8226 4.0 711.0 20.1 318.43 29.68
491 0.10574 0.0 27.74 0.0 0.609 5.983 98.8 1.8681 4.0 711.0 20.1 390.11 18.07
492 0.11132 0.0 27.74 0.0 0.609 5.983 83.5 2.1099 4.0 711.0 20.1 396.90 13.35
493 0.17331 0.0 9.69 0.0 0.585 5.707 54.0 2.3817 6.0 391.0 19.2 396.90 12.01
494 0.27957 0.0 9.69 0.0 0.585 5.926 42.6 2.3817 6.0 391.0 19.2 396.90 13.59
495 0.17899 0.0 9.69 0.0 0.585 5.670 28.8 2.7986 6.0 391.0 19.2 393.29 17.60
496 0.28960 0.0 9.69 0.0 0.585 5.390 72.9 2.7986 6.0 391.0 19.2 396.90 21.14
497 0.26838 0.0 9.69 0.0 0.585 5.794 70.6 2.8927 6.0 391.0 19.2 396.90 14.10
498 0.23912 0.0 9.69 0.0 0.585 6.019 65.3 2.4091 6.0 391.0 19.2 396.90 12.92
499 0.17783 0.0 9.69 0.0 0.585 5.569 73.5 2.3999 6.0 391.0 19.2 395.77 15.10
500 0.22438 0.0 9.69 0.0 0.585 6.027 79.7 2.4982 6.0 391.0 19.2 396.90 14.33
501 0.06263 0.0 11.93 0.0 0.573 6.593 69.1 2.4786 1.0 273.0 21.0 391.99 9.67
502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1.0 273.0 21.0 396.90 9.08
503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1.0 273.0 21.0 396.90 5.64
504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1.0 273.0 21.0 393.45 6.48
505 0.04741 0.0 11.93 0.0 0.573 6.030 80.8 2.5050 1.0 273.0 21.0 396.90 7.88

506 rows x 13 columns

#颜色
>>> cnames = {'aliceblue': '#F0F8FF', 'antiquewhite': '#FAEBD7', 'aqua': '#00FFFF', 'aquamarine': '#7FFFD4', 'azure': '#F0FFFF', 'beige': '#F5F5DC', 'bisque': '#FFE4C4', 'black': '#000000', 'blanchedalmond': '#FFEBCD', 'blue': '#0000FF', 'blueviolet': '#8A2BE2', 'brown': '#A52A2A', 'burlywood': '#DEB887', 'cadetblue': '#5F9EA0', 'chartreuse': '#7FFF00', 'chocolate': '#D2691E', 'coral': '#FF7F50', 'cornflowerblue': '#6495ED', 'cornsilk': '#FFF8DC', 'crimson': '#DC143C', 'cyan': '#00FFFF', 'darkblue': '#00008B', 'darkcyan': '#008B8B', 'darkgoldenrod': '#B8860B', 'darkgray': '#A9A9A9', 'darkgreen': '#006400', 'darkkhaki': '#BDB76B', 'darkmagenta': '#8B008B', 'darkolivegreen': '#556B2F', 'darkorange': '#FF8C00', 'darkorchid': '#9932CC', 'darkred': '#8B0000', 'darksalmon': '#E9967A', 'darkseagreen': '#8FBC8F', 'darkslateblue': '#483D8B', 'darkslategray': '#2F4F4F', 'darkturquoise': '#00CED1', 'darkviolet': '#9400D3', 'deeppink': '#FF1493', 'deepskyblue': '#00BFFF', 'dimgray': '#696969', 'dodgerblue': '#1E90FF', 'firebrick': '#B22222', 'floralwhite': '#FFFAF0', 'forestgreen': '#228B22', 'fuchsia': '#FF00FF', 'gainsboro': '#DCDCDC', 'ghostwhite': '#F8F8FF', 'gold': '#FFD700', 'goldenrod': '#DAA520', 'gray': '#808080', 'green': '#008000', 'greenyellow': '#ADFF2F', 'honeydew': '#F0FFF0', 'hotpink': '#FF69B4', 'indianred': '#CD5C5C', 'indigo': '#4B0082', 'ivory': '#FFFFF0', 'khaki': '#F0E68C', 'lavender': '#E6E6FA', 'lavenderblush': '#FFF0F5', 'lawngreen': '#7CFC00', 'lemonchiffon': '#FFFACD', 'lightblue': '#ADD8E6', 'lightcoral': '#F08080', 'lightcyan': '#E0FFFF', 'lightgoldenrodyellow': '#FAFAD2', 'lightgreen': '#90EE90', 'lightgray': '#D3D3D3', 'lightpink': '#FFB6C1', 'lightsalmon': '#FFA07A', 'lightseagreen': '#20B2AA', 'lightskyblue': '#87CEFA', 'lightslategray': '#778899', 'lightsteelblue': '#B0C4DE', 'lightyellow': '#FFFFE0', 'lime': '#00FF00', 'limegreen': '#32CD32', 'linen': '#FAF0E6', 'magenta': '#FF00FF', 'maroon': '#800000', 'mediumaquamarine': '#66CDAA', 'mediumblue': '#0000CD', 'mediumorchid': '#BA55D3', 'mediumpurple': '#9370DB', 'mediumseagreen': '#3CB371', 'mediumslateblue': '#7B68EE', 'mediumspringgreen': '#00FA9A', 'mediumturquoise': '#48D1CC', 'mediumvioletred': '#C71585', 'midnightblue': '#191970', 'mintcream': '#F5FFFA', 'mistyrose': '#FFE4E1', 'moccasin': '#FFE4B5', 'navajowhite': '#FFDEAD', 'navy': '#000080', 'oldlace': '#FDF5E6', 'olive': '#808000', 'olivedrab': '#6B8E23', 'orange': '#FFA500', 'orangered': '#FF4500', 'orchid': '#DA70D6', 'palegoldenrod': '#EEE8AA', 'palegreen': '#98FB98', 'paleturquoise': '#AFEEEE', 'palevioletred': '#DB7093', 'papayawhip': '#FFEFD5', 'peachpuff': '#FFDAB9', 'peru': '#CD853F', 'pink': '#FFC0CB', 'plum': '#DDA0DD', 'powderblue': '#B0E0E6', 'purple': '#800080', 'red': '#FF0000', 'rosybrown': '#BC8F8F', 'royalblue': '#4169E1', 'saddlebrown': '#8B4513', 'salmon': '#FA8072', 'sandybrown': '#FAA460', 'seagreen': '#2E8B57', 'seashell': '#FFF5EE', 'sienna': '#A0522D', 'silver': '#C0C0C0', 'skyblue': '#87CEEB', 'slateblue': '#6A5ACD', 'slategray': '#708090', 'snow': '#FFFAFA', 'springgreen': '#00FF7F', 'steelblue': '#4682B4', 'tan': '#D2B48C', 'teal': '#008080', 'thistle': '#D8BFD8', 'tomato': '#FF6347', 'turquoise': '#40E0D0', 'violet': '#EE82EE', 'wheat': '#F5DEB3', 'white': '#FFFFFF', 'whitesmoke': '#F5F5F5', 'yellow': '#FFFF00', 'yellowgreen': '#9ACD32'}
>>> colorname = list(cnames.keys())
>>> print(colorname)
['aliceblue', 'antiquewhite', 'aqua', 'aquamarine', 'azure', 'beige', 'bisque', 'black', 'blanchedalmond', 'blue', 'blueviolet', 'brown', 'burlywood', 'cadetblue', 'chartreuse', 'chocolate', 'coral', 'cornflowerblue', 'cornsilk', 'crimson', 'cyan', 'darkblue', 'darkcyan', 'darkgoldenrod', 'darkgray', 'darkgreen', 'darkkhaki', 'darkmagenta', 'darkolivegreen', 'darkorange', 'darkorchid', 'darkred', 'darksalmon', 'darkseagreen', 'darkslateblue', 'darkslategray', 'darkturquoise', 'darkviolet', 'deeppink', 'deepskyblue', 'dimgray', 'dodgerblue', 'firebrick', 'floralwhite', 'forestgreen', 'fuchsia', 'gainsboro', 'ghostwhite', 'gold', 'goldenrod', 'gray', 'green', 'greenyellow', 'honeydew', 'hotpink', 'indianred', 'indigo', 'ivory', 'khaki', 'lavender', 'lavenderblush', 'lawngreen', 'lemonchiffon', 'lightblue', 'lightcoral', 'lightcyan', 'lightgoldenrodyellow', 'lightgreen', 'lightgray', 'lightpink', 'lightsalmon', 'lightseagreen', 'lightskyblue', 'lightslategray', 'lightsteelblue', 'lightyellow', 'lime', 'limegreen', 'linen', 'magenta', 'maroon', 'mediumaquamarine', 'mediumblue', 'mediumorchid', 'mediumpurple', 'mediumseagreen', 'mediumslateblue', 'mediumspringgreen', 'mediumturquoise', 'mediumvioletred', 'midnightblue', 'mintcream', 'mistyrose', 'moccasin', 'navajowhite', 'navy', 'oldlace', 'olive', 'olivedrab', 'orange', 'orangered', 'orchid', 'palegoldenrod', 'palegreen', 'paleturquoise', 'palevioletred', 'papayawhip', 'peachpuff', 'peru', 'pink', 'plum', 'powderblue', 'purple', 'red', 'rosybrown', 'royalblue', 'saddlebrown', 'salmon', 'sandybrown', 'seagreen', 'seashell', 'sienna', 'silver', 'skyblue', 'slateblue', 'slategray', 'snow', 'springgreen', 'steelblue', 'tan', 'teal', 'thistle', 'tomato', 'turquoise', 'violet', 'wheat', 'white', 'whitesmoke', 'yellow', 'yellowgreen']
>>> X = boston.data
>>> y = boston.target
>>> features = boston.feature_names
>>> def Boston():
...     for i, colorn in enumerate(colorname[13:26]):
...         if i<12:
...           plt.figure(43)
...           plt.subplot(5,3,i+1)
...           plt.plot(X[:,i], y, color=str(colorn))
...         if i==12:
...           plt.subplot(5,1,5)
...           plt.plot(X[:,i],y,color=str(colorn))
...      plt.savefig('Boston_Housing_Data.png', dpi=700)
...      plt.show()
>>> Boston()

摘取其中LSTAT与Boston House Price的关系图:

>>> X = boston.data
>>> y = boston.target
>>> features = boston.feature_names[12]
>>> targets = 'Boston Housing Price versus %s' %(features)  #3类鸢尾花的名称,跟y中的3个数字对应
... plt.figure(figsize=(8, 5))
... plt.plot(X[:,12], y, 'bx', label=targets)
... plt.title('Boston Housing Data')
... plt.legend()
... plt.savefig('Boston Housing Data.png', dpi=500)
... plt.show()

MNIST Handwriting Dataset (手写数据库, Yann LeCun)

MNIST(Mixed National Institute of Standards and Technology)只是更大NIST手写数据库的子集,但它是图片识别领域的 Hello World . 著名的科学家,Yann LeCun, 将这个数据集存放在 mnist . 但是因为它经常用,所以很多数据库,包括TensorFlow, 也将它囊括进去了。 数据集MNIST来自美国国家标准与技术研究所(NIST),其分为训练集和测试集,训练集有60000张图片,测试集有10000张图片,每张图片都有标签。数据集开源地址: MNIST ,共有四部分.

>>> import os, sys
>>> if os.getenv("JUPYTER_ENABLE_OBS") == "false":
...    project_path = os.getcwd() + "/"
... else:

...    #OBS类型Notebook实例,默认example目录
>>> directory = "mxnet_mnist_digit_recognition_train"
>>> project_path = os.environ['HOME'] + '/work/' + directory + "/"
>>> sys.path.append(project_path)

>>> dataset_url = "https://modelarts-cnnorth1-market-dataset.obs.cn-north-1.myhuaweicloud.com/dataset-market/Mnist-Data-Set/archiver/Mnist-Data-Set.zip"
>>> dataset_file_names = ["train-images-idx3-ubyte.gz", "train-labels-idx1-ubyte.gz", "t10k-images-idx3-ubyte.gz", "t10k-labels-idx1-ubyte.gz"]
>>> dataset_local_path = project_path + 'dataset/'
>>> dataset_local_name = dataset_local_path + 'Mnist-Data-Set.zip'
$ wget {dataset_url} -P {dataset_local_path}
$ unzip -d {dataset_local_path} -o {dataset_local_name}
--2020-08-12 10:15:31--  https://modelarts-cnnorth1-market-dataset.obs.cn-north-1.myhuaweicloud.com/dataset-market/Mnist-Data-Set/archiver/Mnist-Data-Set.zip
Resolving proxy-notebook.modelarts-dev-proxy.com (proxy-notebook.modelarts-dev-proxy.com)... xxx.xxx.x.xxx
Connecting to proxy-notebook.modelarts-dev-proxy.com (proxy-notebook.modelarts-dev-proxy.com)|xxx.xxx.x.xxx|:8083... connected.
Proxy request sent, awaiting response... 200 OK
Length: 23192478 (22M) [application/octet-stream]
Saving to: ‘/home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/Mnist-Data-Set.zip’

Mnist-Data-Set.zip  100%[===================>]  22.12M   124MB/s    in 0.2s

2020-08-12 10:15:33 (124 MB/s) - ‘/home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/Mnist-Data-Set.zip’ saved [23192478/23192478]

Archive:  /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/Mnist-Data-Set.zip
  inflating: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/t10k-images-idx3-ubyte
  inflating: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/t10k-images-idx3-ubyte.gz
  inflating: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/t10k-labels-idx1-ubyte
 extracting: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/t10k-labels-idx1-ubyte.gz
  inflating: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/train-images-idx3-ubyte
  inflating: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/train-images-idx3-ubyte.gz
  inflating: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/train-labels-idx1-ubyte
 extracting: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/train-labels-idx1-ubyte.gz