newsmode MarketNews
arrow_back К списку
rss_feedHamel Husain ·27.04.2026 open_in_newОригинал

P2: Modern IR Evals For RAG – Hamel’s Blog - Hamel Husain



Speaker Introduction
Overview of the Talk
History of Traditional IR
IR’s 60-Year History
TREC Conference Tracks
The Cranfield Paradigm
Examples of Test Collections
Introducing the BEIR Benchmark
What is Zero-Shot Evaluation?
The Problem BEIR Solved: Overfitting
The Problem with BEIR Today
Leaderboard Saturation
Summary of Challenges
The Evolution of Search
Information Retrieval: Before and After RAG
Traditional vs. Modern Day RAG Users
The Mismatch in Evaluation Objectives
Why RAG Metrics Need to Change
Introducing FreshStack
Motivation for FreshStack
FreshStack Queries: Stack Overflow
FreshStack Corpus: GitHub Repositories
The FreshStack Pipeline
Step I: Nuggetization Example
Step II & III: Oracle Retrieval & Nugget Support
FreshStack Evaluation Metrics
FreshStack Results & Takeaways
The FreshStack Leaderboard & Colab
What Did We Learn Today?
Thank You