Should I encode before or after train_test_split

Juil 16, 2025

—

So, I want to do a machine learning project on this dataset, but there is a class imbalance. So I wanted to combine this one dataset with another dataset to balance things out. However, the other dataset already has one-hot encoded values, and my initial dataset does not. Should I encode the first dataset and combine it with the second dataset, then split the data with train_test_split? I know generally you encode after train_test_split, so I’m wondering if this is a good idea. Any help is appreciated, thanks!

submitted by /u/sassybitch4 to r/learnmachinelearning
[link] [comments]

Should I encode before or after train_test_split

Commentaires

Laisser un commentaire Annuler la réponse