Abstract: Vision-language models (VLMs) have made significant progress in image classification by training with large-scale paired image-text data. Their performances largely depend on the prompt ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results