Thanks for pointing this out, I’ve updated the code and language for the next beta. The approach is to cast the entire dataframe to a categorical variable before splitting/shuffling, this ensures that the one-hot encoding across train/test sets are consistent
2 Likes