Programming Machine Learning: Bias in linear regression

nusco · 18 June 2020 09:39

Here I am. Hello, @samuiweb_gm!

The trick with the bias can be confusing, so let me try to explain it here.

In Chapter 2, we use a line to approximate the data. here is its equation:

ŷ = x * w + b

So we calculate the output ŷ based on the value of the inputx. We do it with two variables, or “parameters”: w and b.

By contrast, in Chapter 4 we have multiple inputs: x1, x2, and so on. So we start by calculating the output based on those inputs, each given a weight… and a final bias, like we did before:

ŷ = x1 * w1 + x2 * w2 + x3 * w3 + b

The trick in “Bye, bye, bias” is all about turning that b into just another weight (let’s call it w0), by associating it with an artificial input:

ŷ = x1 * w1 + x2 * w2 + x3 * w3 + x0 * w0

The last two formulae are the same as long as we do two things:

We rename b to w0.
We add an artificial input x0 that has a value of 1, so that when we multiplicate it by w0, nothing changes.

So, to answer your question directly: b is still a variable, and it’s become a weight like any other. What we added is another input, and that one has a constant value of 1. By doing that, we can remove all the code that deals with the special case of b, and treat all the weights and the bias the same.

Does that make it clear?