Summary

On a the dev split of the MIRACL dataset, Pongo’s semantic filter outperforms both BGE’s large reranker and Cohere’s reranker on NDCG10, going from 63**%** and ****78% accuracy respectively all the way up to 90%.

Methodology

This benchmark uses the dev split of the MIRACL dataset. It consists of 800 queries and 8,350 passages. Each query has one or more “positive” passages that are marked as relevant, as well as “negative” passages that are irrelevant noise.

We used pinecone serverless for the vector database and OpenAI’s new text-embedding-3-large model with the full 3072 dimensions for embeddings. For BGE we used their most powerful reranker- bge-reranker-large, and for Cohere we used english-v2.0 via the production API.

For each query, we passed the top 100 results from raw vector search into each system and then scored them on MRR, recall accuracy, and NDCG based each system’s top 10 results.

The source code for this benchmark is available in this GitHub repo.

Results

Sheets for each chart containing questions, retrieval results, and the full MRR/DCG assessments for each system are available here.

Summary

Methodology

Results

NDCG10

Full Results