newsmode MarketNews

arrow_back К списку

rss_feedHamel Husain ·27.04.2026 open_in_newОригинал

P2: Modern IR Evals For RAG – Hamel’s Blog - Hamel Husain

translate EN + RU language Только EN language Только RU

Speaker Introduction

Overview of the Talk

History of Traditional IR

IR’s 60-Year History

TREC Conference Tracks

The Cranfield Paradigm

Examples of Test Collections

Introducing the BEIR Benchmark

What is Zero-Shot Evaluation?

The Problem BEIR Solved: Overfitting

The Problem with BEIR Today

Leaderboard Saturation

Summary of Challenges

The Evolution of Search

Information Retrieval: Before and After RAG

Traditional vs. Modern Day RAG Users

The Mismatch in Evaluation Objectives

Why RAG Metrics Need to Change

Introducing FreshStack

Motivation for FreshStack

FreshStack Queries: Stack Overflow

FreshStack Corpus: GitHub Repositories

The FreshStack Pipeline

Step I: Nuggetization Example

Step II & III: Oracle Retrieval & Nugget Support

FreshStack Evaluation Metrics

FreshStack Results & Takeaways

The FreshStack Leaderboard & Colab

What Did We Learn Today?

Thank You