Here I am. Hello, @samuiweb_gm!
The trick with the bias can be confusing, so let me try to explain it here.
In Chapter 2, we use a line to approximate the data. here is its equation:
ŷ = x * w + b
So we calculate the output ŷ
based on the value of the inputx
. We do it with two variables, or “parameters”: w
and b
.
By contrast, in Chapter 4 we have multiple inputs: x1
, x2
, and so on. So we start by calculating the output based on those inputs, each given a weight… and a final bias, like we did before:
ŷ = x1 * w1 + x2 * w2 + x3 * w3 + b
The trick in “Bye, bye, bias” is all about turning that b
into just another weight (let’s call it w0
), by associating it with an artificial input:
ŷ = x1 * w1 + x2 * w2 + x3 * w3 + x0 * w0
The last two formulae are the same as long as we do two things:
- We rename
b
tow0
. - We add an artificial input
x0
that has a value of 1, so that when we multiplicate it byw0
, nothing changes.
So, to answer your question directly: b
is still a variable, and it’s become a weight like any other. What we added is another input, and that one has a constant value of 1. By doing that, we can remove all the code that deals with the special case of b
, and treat all the weights and the bias the same.
Does that make it clear?