Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval
posted on April 1, 2025


By Mohammad Omama

Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval

Proceedings of the International Conference on Learning Representations (ICLR)

TLDR: Image Retrieval with Foundation Models: Better, Faster, Distribution-Aware!

ArXiv

Project Website

Motivation

Image retrieval is pivotal in many real-world applications, from visual place recognition in robotics to personalized recommendations in e-commerce. However, current state-of-the-art (SOTA) image retrieval methods face two significant problems:

  1. Scalability Issue: State-of-the-art (SOTA) image retrieval methods train large models separately for each dataset. This is not scalable.

  2. Efficiency Issue: SOTA image retrieval methods use large embeddings, and since retrieval speed is directly proportional to embedding size, this is not efficient.

Our research specifically targets these challenges with two crucial questions:

Contributions

To tackle the scalability and efficiency challenges, our work introduces the follwoing novel ideas:

Methodology

Our proposed approach follows a two-step pipeline:

  1. AE-SVC first trains an autoencoder with the constraints mentioned to enhance the embeddings from foundation models.
  2. The improved embeddings from AE-SVC are then distilled using (SS)2D, producing embeddings that are both efficient and adaptive at various sizes.

The training process ensures that the resulting embeddings, even at smaller sizes, preserve similarity relationships, making them highly effective for retrieval tasks.

Methodology Pipeline

Impact on Cosine Similarity Distribution

Our AE-SVC method profoundly impacts cosine similarity distributions, significantly reducing their variance. Lower variance in similarity distributions correlates with improved discriminative power as we mathematically prove in our paper. Our method shows remarkable benefits, particularly for general-purpose foundation models like DINO, compared to already optimized dataset-specific models such as Cosplace.

Cosine Similarity Distribution

Results

Our experimental validation demonstrates:

This advancement represents a significant step towards more practical, scalable, and efficient image retrieval solutions, enhancing both speed and accuracy.

Retrieval Performance Results