Programming Machine Learning
From Coding to Deep Learning
German edition
Chapter 11
Training the network
Page 191
def back(X, Y, y_hat, w2, h):
w2_gradient = np.matmul(prepend_bias(h).T, y_hat - Y) / X.shape[0]
a_gradient = np.matmul(y_hat - Y, w2[1:].T) * sigmoid_gradient(h)
w1_gradient = np.matmul(prepend_bias(X).T, a_gradient) / X.shape[0]
return (w1_gradient, w2_gradient)
Hi Paolo,
can you explain me how to come to the decision to multiply this expression:
np.matmul(y_hat - Y, w2[1:].T) * sigmoid_gradient(h)
element by element?
Sincerely,
Kai