Summary

On a dataset of 192 episodes and 65 questions from full transcriptions of the Acquired podcast, Pongo’s semantic filter significantly outperforms vector search alone and vector search with the Cohere Reranker at MRR@3, going from 0.69 and 0.76 accuracy respectively all the way up to 0.93 with just one line of code**.**

Methodology

This benchmark uses a dataset derived from the Acquired Podcast. You can see the transcripts in the benchmark’s GitHub and the 65 questions along with their answers in this google sheet.

The data set is around 150 hours of video transcripts or around 10 million tokens, with 192 individual episodes.

For our vector search , we used OpenAI’s new text-embedding-3-large model with the full 3072 dimensions and a standard Pinecone pod. For the code, Llamaindex’s tutorials for splitting text, vectorizing, storing vectors, and retrieval with Pinecone and OpenAI.

We then proceeded to feed the top 100 vector results into the Cohere rerank API, and the Pongo semantic filter API.

To remove an LLM’s impact from this benchmark, we did answer evaluation manually. Scored were calculated for [Mean Reciprocal Rank](https://www.evidentlyai.com/ranking-metrics/mean-reciprocal-rank-mrr#:~:text=Mean Reciprocal Rank (MRR) is,item across all user lists.), measured for the top 3 and top 5 results. In short, this means having a correct answer in 1st place is with 100% (1/1), 2nd place is 50% (1/2), and 3rd place is 33% (1/3), etc.

Results

Sheets for each chart containing questions, retrieval results, and the MRR assessments are available here.

Untitled

MRR@Rank 3

VS + Pongo - 0.93

VS + Cohere - 0.76

Vector Search - 0.69

MRR@Rank 5