Abstract Binary and graded relevance in IR evaluations— Comparison of the effects on ranki.pdfVIP
- 3
- 0
- 约7.33万字
- 约 15页
- 2017-06-06 发布于河南
- 举报
Abstract Binary and graded relevance in IR evaluations— Comparison of the effects on ranki
Information Processing and Management 41 (2005) 1019–1033
/locate/infoproman
Binary and graded relevance in IR evaluations—
Comparison of the effects on ranking of IR systems
Jaana Kekalainen
¨ ¨
Department of Information Studies, FIN-33014 University of Tampere, Finland
Received 15 October 2003; accepted 5 January 2005
Available online 5 March 2005
Abstract
In this study the rankings of IR systems based on binary and graded relevance in TREC 7 and 8 data are compared.
Relevance of a sample TREC results is reassessed using a relevance scale with four levels: non-relevant, marginally rele-
vant, fairly relevant, highly relevant. Twenty-one topics and 90 systems from TREC 7 and 20 topics and 121 systems
from TREC 8 form the data. Binary precision, and cumulated gain, discounted cumulated gain and normalised dis-
counted cumulated gain are the measures compared. Different weighting schemes for relevance levels are tested with
cumulated gain measures. Kendalls rank correlations are computed to determine to what extent the rankings produced
by different measures are similar. Weighting schemes from binary to emphasising highly relevant documents form a
continuum, where the measures correlate strongly in the binary end, and less in the heavily weighted end. The results
show the different character of the measures.
2005 Elsevier Ltd. All rights reserved.
1. Introduction
Relevance has always been an equivocal concept in information retrieval (IR) evaluation, and thus it has
given rise to many studies and discussions. Relevance is defined multidimensional (Barry, 19
原创力文档

文档评论(0)