引言¶

Google在2015年11月完成了对TensorFlow的开源。自从那之后，TensorFlow 已经是Github上机器学习starred最多的仓库。

为什么选择TensorFlow ? TensorFlow的受欢迎程度归因于很多方面，但是主要是因为它的计算图概念，自动微分和TensorFlow的 Python API 的架构。这些都使得程序员用TensorFlow来解决实际问题更加便捷。

Google的TensorFlow引擎有一个解决问题的独特方式。这种独特的方式使得解决机器学习问题非常有效。下面，我们会介绍TensorFlow 如何运行的基本步骤。

TensorFlow是如何运行的¶

在一开始的时候, TensorFlow中的计算可能看起来毫无必要的复杂. 但其实其中是有原因的: 也正因为TensorFlow处理计算的方式，发展更为复杂的计算也就相对来说更为简单。这一节呢，会带领你领略一个TensorFlow算法通常工作的方式.

现在呢，TensorFlow已经被所有的主流操作系统(Windows, Linux 和 Mac)所支持。通过这本书呢，我们只关心TensorFlow的Python库这本书呢，会用到 Python 3.x 和 Tensorflow 0.12 + (我们这里会用 Python 3.7 和 TensorFlow 1.8 版本)。虽然说TensorFlow可以在CPU上运行，但是它在GPU(Graphic Processing Unit)运行得更快。英伟达(Nvidia) Compute Capability 3.0+的显卡现在也支持TensorFlow。如果你想要在GPU上运行，你需要下载并安装 Nvidia Cuda Toolkit。有些章节可能还依赖安装Scipy, Numpy和Scikit-learn。你可以通过下载下面的requirements.txt, 然后运行下面的命令，来满足这些条件。

下载 requirements.txt

$ pip install -r requirements.txt

通用TensorFlow算法概览¶

这里呢，我们会简单介绍一下TensorFlow算法的工作流程。大多数机器学习算法都遵循此流程。

导入或产生数据¶

我们所有的机器学习算法都取决于数据。在这本书中我们要么自己产生数据，要么使用外部数据源。有时候呢，因为我们想要知道算法所塑造的模型是否能产生期望的结果，所以有时候依赖产生的数据更好一点(因为它有参考的对象)。其他的时候呢，我们需要获取公众数据，方法我们会在这章的第八部分提到。

转换和规范化数据¶

有时候，数据并不是TensorFlow所能处理的正确维度。在我们使用之前，我们必须将数据进行转换。大多数算法期待的是正则化数据，我们在这里也会用到。TensorFlow有一些内置函数可以帮助你实现数据正则化。比如：

# 低版本TensorFlow的用法
>>> data = tf.nn.batch_norm_with_global_normalization(...)
# TensorFlow 2.2的用法
>>> data = tf.nn.batch_normalization(...)

注意

tensorflow.nn.batch_normalization用法介绍

Batch normalization.

Normalizes a tensor by mean and variance, and applies (optionally) a scale \(gamma\) to it, as well as an offset \(beta\):

\(frac{gamma(x-mu)}{sigma}+beta\)

mean, variance, offset and scale are all expected to be of one of two shapes:

In all generality, they can have the same number of dimensions as the input x, with identical sizes as x for the dimensions that are not normalized over (the ‘depth’ dimension(s)), and dimension 1 for the others which are being normalized over. mean and variance in this case would typically be the outputs of tf.nn.moments(…, keepdims=True) during training, or running averages thereof during inference.

In the common case where the ‘depth’ dimension is the last dimension in the input tensor x, they may be one dimensional tensors of the same size as the ‘depth’ dimension. This is the case for example for the common [batch, depth] layout of fully-connected layers, and [batch, height, width, depth] for convolutions. mean and variance in this case would typically be the outputs of tf.nn.moments(…, keepdims=False) during training, or running averages thereof during inference.

See equation 11 in Algorithm 2 of source: [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift; S. Ioffe, C. Szegedy] (http://arxiv.org/abs/1502.03167).

param variance_epsilon:
param x:	Input Tensor of arbitrary dimensionality.
param mean:	A mean Tensor.
param variance:	A variance Tensor.
param offset:	An offset Tensor, often denoted \(beta\) in equations, or None. If present, will be added to the normalized tensor.
param scale:	A scale Tensor, often denoted \(gamma\) in equations, or None. If present, the scale is applied to the normalized tensor.
	A small float number to avoid dividing by 0.
param name:	A name for this operation (optional).
returns:	the normalized, scaled, offset tensor.

References

Batch Normalization - Accelerating Deep Network Training by Reducing Internal Covariate Shift:

[Ioffe et al., 2015](http://arxiv.org/abs/1502.03167) ([pdf](http://proceedings.mlr.press/v37/ioffe15.pdf))

设置算法参数¶

我们使用的算法通常会有一些参数是需要我们一直保持不变的。例如，迭代次数，学习速率，或者其他的设定的参数。为了方便读者或用户很便捷找到它们，通常将它们放在一起初始化是个很好的典范。比如：

>>> learning_rate = 0.01
>>> iterations = 1000

变量和占位符的初始化¶

TensorFlow是需要我们告诉它，哪些是可以改变的，哪些是不可以改变的。在损失函数最小化的优化过程中，TensorFlow会改变一些变量。为了实现这些，我们需要通过占位符(placeholders)来传入数据。变量和占位符的大小和类型都是需要我们进行初始化的，这样呢，TensorFlow 就会知道应该怎么优化。例如：

>>> a_var = tf.constant(42)
>>> x_input = tf.placeholder(tf.float32, [None, input_size])
>>> y_input = tf.placeholder(tf.float32, [None, num_classes])

注意

tensorflow.constant用法介绍

Creates a constant tensor from a tensor-like object.

Note: All eager tf.Tensor values are immutable (in contrast to tf.Variable). There is nothing especially _constant_ about the value returned from tf.constant. This function is not fundamentally different from tf.convert_to_tensor. The name tf.constant comes from the value being embedded in a Const node in the tf.Graph. tf.constant is useful for asserting that the value can be embedded that way.

If the argument dtype is not specified, then the type is inferred from the type of value.

>>> # Constant 1-D Tensor from a python list.
>>> tf.constant([1, 2, 3, 4, 5, 6])
<tf.Tensor: shape=(6,), dtype=int32,
    numpy=array([1, 2, 3, 4, 5, 6], dtype=int32)>
>>> # Or a numpy array
>>> a = np.array([[1, 2, 3], [4, 5, 6]])
>>> tf.constant(a)
<tf.Tensor: shape=(2, 3), dtype=int64, numpy=
  array([[1, 2, 3],
         [4, 5, 6]])>

If dtype is specified, the resulting tensor values are cast to the requested dtype.

>>> tf.constant([1, 2, 3, 4, 5, 6], dtype=tf.float64)
<tf.Tensor: shape=(6,), dtype=float64,
    numpy=array([1., 2., 3., 4., 5., 6.])>

If shape is set, the value is reshaped to match. Scalars are expanded to fill the shape:

>>> tf.constant(0, shape=(2, 3))
  <tf.Tensor: shape=(2, 3), dtype=int32, numpy=
  array([[0, 0, 0],
         [0, 0, 0]], dtype=int32)>
>>> tf.constant([1, 2, 3, 4, 5, 6], shape=[2, 3])
<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
  array([[1, 2, 3],
         [4, 5, 6]], dtype=int32)>

tf.constant has no effect if an eager Tensor is passed as the value, it even transmits gradients:

>>> v = tf.Variable([0.0])
>>> with tf.GradientTape() as g:
...     loss = tf.constant(v + v)
>>> g.gradient(loss, v).numpy()
array([2.], dtype=float32)

But, since tf.constant embeds the value in the tf.Graph this fails for symbolic tensors:

>>> with tf.compat.v1.Graph().as_default():
...   i = tf.compat.v1.placeholder(shape=[None, None], dtype=tf.float32)
...   t = tf.constant(i)
Traceback (most recent call last):
...
TypeError: ...

tf.constant will create tensors on the current device. Inputs which are already tensors maintain their placements unchanged.

Related Ops:

tf.convert_to_tensor is similar but: * It has no shape argument. * Symbolic tensors are allowed to pass through.

>>> with tf.compat.v1.Graph().as_default():
...   i = tf.compat.v1.placeholder(shape=[None, None], dtype=tf.float32)
...   t = tf.convert_to_tensor(i)

tf.fill: differs in a few ways: * tf.constant supports arbitrary constants, not just uniform scalar

Tensors like tf.fill.
- tf.fill creates an Op in the graph that is expanded at runtime, so it can efficiently represent large tensors.
- Since tf.fill does not embed the value, it can produce dynamically sized outputs.

param value:	A constant value (or list) of output type dtype.
param dtype:	The type of the elements of the resulting tensor.
param shape:	Optional dimensions of resulting tensor.
param name:	Optional name for the tensor.
returns:	A Constant Tensor.
raises:	`TypeError` – if shape is incorrectly specified or unsupported. `ValueError` – if called on a symbolic tensor.

注意

tensorflow.float32用法介绍

Represents the type of the elements in a Tensor.

DType’s are used to specify the output data type for operations which require it, or to inspect the data type of existing Tensor’s.

Examples:

>>> tf.constant(1, dtype=tf.int64)
<tf.Tensor: shape=(), dtype=int64, numpy=1>
>>> tf.constant(1.0).dtype
tf.float32

See tf.dtypes for a complete list of DType’s defined.

定义模型结构¶

在我们有了数据，并且将我们的变量和占位符进行初始化之后，我们就可以定义模型了。这个，我们可以通过建立一个计算图来完成。我们告诉TensorFlow哪些操作需要在变量和占位符上完成，以实现我们的模型预测。关于计算图，我们会在第二章详细描述。在这里我们，我们先看一下定义模型结构的例子：

# 低版本TensorFlow的用法
>>> y_pred = tf.add(tf.mul(x_input, weight_matrix), b_matrix)
# TensorFlow2.2的用法
>>> y_pred = tf.add(tf.multiply(x_input, weight_matrix), b_matrix)

注意

tensorflow.add用法介绍

Returns x + y element-wise.

Example usages below.

Add a scalar and a list:

>>> x = [1, 2, 3, 4, 5]
>>> y = 1
>>> tf.add(x, y)
<tf.Tensor: shape=(5,), dtype=int32, numpy=array([2, 3, 4, 5, 6],
dtype=int32)>

Note that binary + operator can be used instead:

>>> x = tf.convert_to_tensor([1, 2, 3, 4, 5])
>>> y = tf.convert_to_tensor(1)
>>> x + y
<tf.Tensor: shape=(5,), dtype=int32, numpy=array([2, 3, 4, 5, 6],
dtype=int32)>

Add a tensor and a list of same shape:

>>> x = [1, 2, 3, 4, 5]
>>> y = tf.constant([1, 2, 3, 4, 5])
>>> tf.add(x, y)
<tf.Tensor: shape=(5,), dtype=int32,
numpy=array([ 2,  4,  6,  8, 10], dtype=int32)>

Warning: If one of the inputs (x or y) is a tensor and the other is a non-tensor, the non-tensor input will adopt (or get casted to) the data type of the tensor input. This can potentially cause unwanted overflow or underflow conversion.

For example,

>>> x = tf.constant([1, 2], dtype=tf.int8)
>>> y = [2**7 + 1, 2**7 + 2]
>>> tf.add(x, y)
<tf.Tensor: shape=(2,), dtype=int8, numpy=array([-126, -124], dtype=int8)>

When adding two input values of different shapes, Add follows NumPy broadcasting rules. The two input array shapes are compared element-wise. Starting with the trailing dimensions, the two dimensions either have to be equal or one of them needs to be 1.

For example,

>>> x = np.ones(6).reshape(1, 2, 1, 3)
>>> y = np.ones(6).reshape(2, 1, 3, 1)
>>> tf.add(x, y).shape.as_list()
[2, 2, 3, 3]

Another example with two arrays of different dimension.

>>> x = np.ones([1, 2, 1, 4])
>>> y = np.ones([3, 4])
>>> tf.add(x, y).shape.as_list()
[1, 2, 3, 4]

The reduction version of this elementwise operation is tf.math.reduce_sum

param x:	A tf.Tensor. Must be one of the following types: bfloat16, half, float32, float64, uint8, int8, int16, int32, int64, complex64, complex128, string.
param y:	A tf.Tensor. Must have the same type as x.
param name:	A name for the operation (optional)

注意

tensorflow.multiply用法介绍

Returns an element-wise x * y.

For example:

>>> x = tf.constant(([1, 2, 3, 4]))
>>> tf.math.multiply(x, x)
<tf.Tensor: shape=(4,), dtype=..., numpy=array([ 1,  4,  9, 16], dtype=int32)>

Since tf.math.multiply will convert its arguments to Tensor`s, you can also pass in non-`Tensor arguments:

>>> tf.math.multiply(7,6)
<tf.Tensor: shape=(), dtype=int32, numpy=42>

If x.shape is not the same as y.shape, they will be broadcast to a compatible shape. (More about broadcasting [here](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).)

For example:

>>> x = tf.ones([1, 2]);
>>> y = tf.ones([2, 1]);
>>> x * y  # Taking advantage of operator overriding
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[1., 1.],
     [1., 1.]], dtype=float32)>

The reduction version of this elementwise operation is tf.math.reduce_prod

param x:	A Tensor. Must be one of the following types: bfloat16, half, float32, float64, uint8, int8, uint16, int16, int32, int64, complex64, complex128.
param y:	A Tensor. Must have the same type as x.
param name:	A name for the operation (optional).

Returns:

A Tensor. Has the same type as x.

raises:	* InvalidArgumentError – When x and y have incompatible shapes or types.

声明损失函数¶

在定义模型之后，我们就可以用TensorFlow算出结果了。这时候，我们需要定义一个损失函数。损失函数是非常重要的，因为它告诉我们我们的预测离真实值差多少。在第二章第五节中，我们会对损失函数的类型进行详细的讲解。

>>> loss = tf.reduce_mean(tf.square(y_actual – y_pred))

注意

tensorflow.reduce_mean用法介绍

Computes the mean of elements across dimensions of a tensor.

Reduces input_tensor along the dimensions given in axis by computing the mean of elements across the dimensions in axis. Unless keepdims is true, the rank of the tensor is reduced by 1 for each of the entries in axis, which must be unique. If keepdims is true, the reduced dimensions are retained with length 1.

If axis is None, all dimensions are reduced, and a tensor with a single element is returned.

For example:

>>> x = tf.constant([[1., 1.], [2., 2.]])
>>> tf.reduce_mean(x)
<tf.Tensor: shape=(), dtype=float32, numpy=1.5>
>>> tf.reduce_mean(x, 0)
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([1.5, 1.5], dtype=float32)>
>>> tf.reduce_mean(x, 1)
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([1., 2.], dtype=float32)>

param input_tensor:
	The tensor to reduce. Should have numeric type.
param axis:	The dimensions to reduce. If None (the default), reduces all dimensions. Must be in the range [-rank(input_tensor), rank(input_tensor)).
param keepdims:	If true, retains reduced dimensions with length 1.
param name:	A name for the operation (optional).
returns:	The reduced tensor.

@compatibility(numpy) Equivalent to np.mean

Please note that np.mean has a dtype parameter that could be used to specify the output type. By default this is dtype=float64. On the other hand, tf.reduce_mean has an aggressive type inference from input_tensor, for example:

>>> x = tf.constant([1, 0, 1, 0])
>>> tf.reduce_mean(x)
<tf.Tensor: shape=(), dtype=int32, numpy=0>
>>> y = tf.constant([1., 0., 1., 0.])
>>> tf.reduce_mean(y)
<tf.Tensor: shape=(), dtype=float32, numpy=0.5>

@end_compatibility

注意

tensorflow.square用法介绍

Computes square of x element-wise.

I.e., \(y = x * x = x^2\).

>>> tf.math.square([-2., 0., 3.])
<tf.Tensor: shape=(3,), dtype=float32, numpy=array([4., 0., 9.], dtype=float32)>

param x:

A Tensor. Must be one of the following types: bfloat16, half, float32, float64, int8, int16, int32, int64, uint8, uint16, uint32, uint64, complex64, complex128.

param name:

A name for the operation (optional).

returns:

A Tensor. Has the same type as x.

If x is a SparseTensor, returns SparseTensor(x.indices, tf.math.square(x.values, …), x.dense_shape)

模型的初始化和训练¶

既然我们现在设置好了一切，我们可以创建一个实例或者计算图，然后通过占位符将数据传入，并通过训练让TensorFlow改变变量来更好预测我们的训练数据。这里举出一个初始化计算图的一种方式：

>>> with tf.Session(graph=graph) as session:
         ...
>>> session.run(...)
         ...

需要注意的是，我们也可以这样初始化计算图：

>>> session = tf.Session(graph=graph)
>>> session.run(…)

模型的评估(可选)¶

一旦我们建立并训练模型，我们应当通过查看它的新数据的预测情况，来评估这个模型。

预测新结果(可选)¶

同样，知道如何预测性新的，不可知的数据也很重要。幸运的是，如果我们完成模型的训练之后，我们可以通过训练后的模型来做这些事情。

总结¶

在TensorFlow中，我们在程序进行训练并改变变量来预测变量之前，必须先建立数据，变量，占位符以及模型。 TensorFlow通过计算图来完成这些。我们告诉它去最小化损失函数，而TensorFlow要通过改变变量来实现这一目标。TensorFlow知道如何改变变量，这是因为它一直在关注模型的计算，然后自动计算每个变量的梯度。也正因为如此，我们也就知道改变它以及尝试不同数据的类型又多么简单。

总的来说，算法在TensorFlow中会被设计成为循环的算法。我们把这个循环建成计算图，然后通过占位符来输入数据，计算计算图的输出结果，用损失函数来比较输出结果，通过自动反向传播来改变模型中的变量，最后不断重复整个过程，直到达到设定的标准。

你知道吗？¶

在学习机器学习知识时，你将遇到很多不同的术语，例如人工智能、机器学习、神经网络和深度学习。这些术语到底是什么意思，它们之间有何关系？

下面我们来简单介绍下这些术语:

 - 人工智能：一种计算机科学分支，旨在让计算机达到人类的智慧。实现这一目标有很多方式，包括机器学习和深度学习。

-  机器学习：一系列相关技术，用于训练计算机执行特定的任务。

-  神经网络：一种机器学习结构，灵感来自人类大脑的神经元网络。神经网络是深度学习的基本概念。

-  深度学习：机器学习的一个分支，利用多层神经网络实现目标。通常“机器学习”和“深度学习”可以相互指代。

机器学习和深度学习也有很多分支和特殊技术。一个典型示例是监督式学习和非监督式学习。

简而言之，在监督式学习过程中，你知道你希望计算机学习什么，而在非监督式学习过程中，你会让计算机自己去判断要学习什么。监督式学习是最常见的机器学习类型，并且将是这门课程的侧重点。