## Automatic differentiation and gradient tape

​ 之前我们介绍了Tensor 以及在其上的操作，下面我们介绍一下自动微分技术，---用来优化模型参数的关键。

​ tensorflow 提供了用于自动微分的API，来计算一个函数的导数。一种更接近数学的求导方法是：先写一个python函数，封装好对参数的运算。然后使用tf.contrib.eager.gradients_function 来创建一个函数计算上面封装好的函数的导函数（可指定对哪个参数求导）。同时，只要嵌套调用该函数，即可求高阶导。

All computation inside the context of a tf.GradientTape is "recorded".

GradientTape 上下文管理器中的所有相关操作都会记录下来用于automatic differentiation.

If gradients are computed in that context, then the gradient computation is recorded as well.

As a result, the exact same API works for higher-order gradients as well.

## Code1

import tensorflow as tf
import matplotlib.pyplot as plt
from math import pi

tf.enable_eager_execution()
tfe = tf.contrib.eager

def f(x):
return tf.square(tf.sin(x))

print( f(pi/2).numpy() == 1.0 )

x = tf.lin_space(-2*pi,2*pi,100) # 100 points between -2pi ~ 2pi

plt.plot(x, f(x), label = 'f')
plt.plot(x, grad(f)(x), label = 'first derivative')
plt.legend()
plt.show()


## Code2

'''
'''
import tensorflow as tf

tf.enable_eager_execution()
tfe = tf.contrib.eager # shorthand for some symbols

# x^y
def f(x,y):
output = 1
# Must use range(int(y)) instead of range(y) in Python 3 when
# using TensorFlow 1.10 and earlier. Can use range(y) in 1.11+
for i in range(int(y)): # you can use for loop (#^.^#)
output = tf.multiply(output, x)
return output

# d x^y / d x
def g(x,y):
# Return the gradient of 'f' with respect to it's first parameter  default?

print( f(3,2).numpy() )
print( g(3.0,2).numpy() )
print( f(3,3).numpy() )
print( g(3.0,3).numpy() )


## Code3

'''
At times it may be inconvenient to encapsulate(封装) computation of
interest into a function. For example, if you want the gradient
of the output with respect to intermediate(中间的) values computed in
the function. In such cases, the slightly more verbose but
explicit(明确的)  **tf.GradientTape**  context is useful. All computation
inside the context(上下文) of a **tf.GradientTape** is "recorded".
'''
import tensorflow as tf

tf.enable_eager_execution()
tfe = tf.contrib.eager # shorthand for some symbols

x= tf.ones((2, 2))

with tf.GradientTape(persistent = True) as t: # persistent 持久的
t.watch(x)
y = tf.reduce_sum(x) # 就是求和的意思(降维)
z = tf.multiply(y,y)

#use the same tape to compute the derivative of z with
# respect to the intermediate value y
dz_dy = t.gradient(z, y) # 对y求导
print(dz_dy.numpy())

# Derivative of z with respect to the original input tensor x
dz_dx = t.gradient(z, x) # 对x求导
print(dz_dx.numpy())

'''

Operations inside of the GradientTape context manager
are recorded for automatic differentiation.

If gradients are computed in that context,
then the gradient computation is recorded as well.

As a result, the exact same API works for higher-order
'''

x = tf.Variable(1.0)  # Convert the Python 1.0 to a Tensor object
y = x*x*x
# Compute the gradient inside the 't' context manager
# which means the gradient computation is differentiable as well.