@seanmor5
On p. 41 it says:
For now, you’ll use standardization to scale your data…
then the following code is given
cols = ~w(sepal_width sepal_length petal_length petal_width)
normalized_iris =
DF.mutate(
iris,
col <- across(^cols) do
{col.name, (col - mean(col)) / variance(col)}
end
)
I’m very much a statistics newbie and learning as I go, but according to this page:
So to convert a value to a Standard Score (“z-score”):
- first subtract the mean,
- then divide by the Standard Deviation [emphasis mine]
And doing that is called “Standardizing”
That site explains Standard Deviation here: Standard Deviation and Variance
So in this code sample, shouldn’t we divide by standard_deviation(col)
instead of by variance(col)
, like this?
cols = ~w(sepal_width sepal_length petal_length petal_width)
normalized_iris =
DF.mutate(
iris,
col <- across(^cols) do
{col.name, (col - mean(col)) / standard_deviation(col)}
end
)
Using the variance(col)
version, I get an evaluated accuracy like this:
%{
0 => %{
"accuracy" => #Nx.Tensor<
f32
0.8999999761581421
>
}
}
Using the standard_deviation(col)
version, that goes up to 0.9666666388511658
.