Test data should be scaled using the learned parameters from scaling train data. For example, if train data contains [1, 2, 3, 10], their scaled values in train data are [0.1, 0.2, 0.3, 1.0] but if test data only has [1, 2, 3], their scaled value are [0.33, 0.66, 1.0] instead. Such diff will upset the metrics when evaluating the model on test data.