Sentence Transformers v5.4 adds support for multimodal embedding and reranker models
HuggingFace's Sentence Transformers library has received version 5.4, which introduces multimodal embedding and reranker models. Users can now map text, images, audio and video into a shared embedding space and perform cross-modal similarity β a unification of search across different content types.