Programming Machine Learning- gradient descent calculating the derivative (page 35)

if someone can help me to understand how to calculate the derivative:
2/m xi ((wxi + b) - yi)

I thought it was: 2/m ((wxi + b) - yi)

tks ~

Hei, @ggerico! Are you calculating the derivative with respect to x, w, or b here?

Screen Shot 2022-11-03 at 20.02.18

There is this Loss function which introduces the GD in the book.
For simplicity here, the bias (b) is equal to 0; m, xi and yi are all constants, only w varies.
I would say, I am calculating the derivative of L with respect to w only:

The book showed the derivative of the loss function with respect to w. And I don’t understand why each element of the sum has been multiplied by x.
I thought it was: 2/m ∑ ((wxi + b) - yi)

Tks for your reply

Aaah, OK! The xi comes out of the “chain rule” of derivation. Here is a video that explains the rule.

Because of the chain rule, the derivative of ((wxi + b) - y)2 is equal to 2((wxi + b) - y) (as you mentioned), multiplied by the derivative of ((wxi + b) - y). The only term inside this expression that depends on w is wxi, and its derivative is xi. That is where that xi is coming from.

Did that clarify it, or should I go into more detail?

without multiplying by xᵢ, I calculated the derivative of the loss function with respect to (wxᵢ + b) - y, I calculated the derivative of the outer function only. But multiplying the derivative of the outer function by the derivative of the inner function, we finally obtain the derivative of the loss function with respect to w

should be something like that :smile:
Thanks for your reply, that was helpful

You’re welcome!