Programming Machine Learning- gradient descent calculating the derivative (page 35)

Aaah, OK! The xi comes out of the “chain rule” of derivation. Here is a video that explains the rule.

Because of the chain rule, the derivative of ((wxi + b) - y)2 is equal to 2((wxi + b) - y) (as you mentioned), multiplied by the derivative of ((wxi + b) - y). The only term inside this expression that depends on w is wxi, and its derivative is xi. That is where that xi is coming from.

Did that clarify it, or should I go into more detail?