So, I want to do a machine learning project on this dataset, but there is a class imbalance. So I wanted to combine this one dataset with another dataset to balance things out. However, the other dataset already has one-hot encoded values, and my initial dataset does not. Should I encode the first dataset and combine it with the second dataset, then split the data with train_test_split? I know generally you encode after train_test_split, so I’m wondering if this is a good idea. Any help is appreciated, thanks!
submitted by /u/sassybitch4 to r/learnmachinelearning
[link] [comments]
Laisser un commentaire