With this approach, the smaller ViT-B/16 model achieves 79.9% accuracy on ImageNet, a significant The authors also performedĪn experiment with a self-supervised pre-training objective, namely masked patched prediction (inspired by masked The best results are obtained with supervised pre-training, which is not the case in NLP.In order to fine-tune at higher resolution, the authors performĢD interpolation of the pre-trained position embeddings, according to their location in the original image. Use a higher resolution than pre-training (Touvron et al., 2019), (KolesnikovĮt al., 2020). During fine-tuning, it is often beneficial to
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |