In this paper, we introduce a retrieval framework designed for e-commerce applications, which employs a multi-modal approach to represent items of interest. This approach incorporates both textual descriptions and images of products, alongside a locality-sensitive hashing (LSH) indexing scheme for rapid retrieval of potentially relevant products. Our focus is on a data-independent methodology, where the indexing mechanism remains unaffected by the specific dataset, while the multi-modal representation is learned beforehand. Specifically, we utilize a multi-modal architecture, CLIP, to learn a latent representation of items by combining text and images in a contrastive manner. The resulting item embeddings encapsulate both the visual and textual information of the products, which are then subjected to various types of LSH for balancing between result quality and retrieval speed. We present the findings of our experiments conducted on two real-world datasets sourced from e-commerce platforms, comprising both product images and textual descriptions. Promising results have been achieved, demonstrating favorable retrieval time and average precision. These results were obtained through testing the approach with a specifically selected set of queries and with synthetic queries generated using a Large Language Model.
Fast retrieval of multi-modal embeddings for e-commerce applications
Portinale, Luigi
2024-01-01
Abstract
In this paper, we introduce a retrieval framework designed for e-commerce applications, which employs a multi-modal approach to represent items of interest. This approach incorporates both textual descriptions and images of products, alongside a locality-sensitive hashing (LSH) indexing scheme for rapid retrieval of potentially relevant products. Our focus is on a data-independent methodology, where the indexing mechanism remains unaffected by the specific dataset, while the multi-modal representation is learned beforehand. Specifically, we utilize a multi-modal architecture, CLIP, to learn a latent representation of items by combining text and images in a contrastive manner. The resulting item embeddings encapsulate both the visual and textual information of the products, which are then subjected to various types of LSH for balancing between result quality and retrieval speed. We present the findings of our experiments conducted on two real-world datasets sourced from e-commerce platforms, comprising both product images and textual descriptions. Promising results have been achieved, demonstrating favorable retrieval time and average precision. These results were obtained through testing the approach with a specifically selected set of queries and with synthetic queries generated using a Large Language Model.File | Dimensione | Formato | |
---|---|---|---|
abluton-et-al-2024-fast-retrieval-of-multi-modal-embeddings-for-e-commerce-applications.pdf
file disponibile agli utenti autorizzati
Descrizione: Paper
Tipologia:
Versione Editoriale (PDF)
Licenza:
Non specificato
Dimensione
2.73 MB
Formato
Adobe PDF
|
2.73 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.