Machine Learning in Elixir: shouldn't z-score should be based on standard deviation (ch 1, B3.0)?

nathanl · 15 March 2024 13:31

@seanmor5
On p. 41 it says:

For now, you’ll use standardization to scale your data…

then the following code is given

cols = ~w(sepal_width sepal_length petal_length petal_width)
normalized_iris =
  DF.mutate(
    iris,
    col <- across(^cols) do
      {col.name, (col - mean(col)) / variance(col)}
      end
    )

I’m very much a statistics newbie and learning as I go, but according to this page:

So to convert a value to a Standard Score (“z-score”):

first subtract the mean,

then divide by the Standard Deviation [emphasis mine]

And doing that is called “Standardizing”

That site explains Standard Deviation here: Standard Deviation and Variance

So in this code sample, shouldn’t we divide by standard_deviation(col) instead of by variance(col), like this?

cols = ~w(sepal_width sepal_length petal_length petal_width)
normalized_iris =
  DF.mutate(
    iris,
    col <- across(^cols) do
      {col.name, (col - mean(col)) / standard_deviation(col)}
      end
    )

Using the variance(col) version, I get an evaluated accuracy like this:

%{
  0 => %{
    "accuracy" => #Nx.Tensor<
      f32
      0.8999999761581421
    >
  }
}

Using the standard_deviation(col) version, that goes up to 0.9666666388511658.