Training with certain % masking, and changing % during inference (bert)

—

I was training a small bert-like model and i used masked tokens and the masked-autoencoder training like bert.

It was a model from scratch (idk if this matters).

During training i did a consistent X% masked tokens.

During testing, it had the best scores when having the same % of masked tokens (regardless if i increase the length).

I would have expected that lower masked % would lead to better scores?

Thanks in advanced

Commentaires