Should I encode before or after train_test_split

So, I want to do a machine learning project on this dataset, but there is a class imbalance. So I wanted to combine this one dataset with another dataset to balance things out. However, the other dataset already has one-hot encoded values, and my initial dataset does not. Should I encode the first dataset and combine it with the second dataset, then split the data with train_test_split? I know generally you encode after train_test_split, so I’m wondering if this is a good idea. Any help is appreciated, thanks!

submitted by /u/sassybitch4 to r/learnmachinelearning
[link] [comments]


Commentaires

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *