Training with certain % masking, and changing % during inference (bert)

I was training a small bert-like model and i used masked tokens and the masked-autoencoder training like bert.

It was a model from scratch (idk if this matters).

During training i did a consistent X% masked tokens.

During testing, it had the best scores when having the same % of masked tokens (regardless if i increase the length).

I would have expected that lower masked % would lead to better scores?

Thanks in advanced

submitted by /u/Pristine-Staff-5250 to r/learnmachinelearning
[link] [comments]


Commentaires

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *