KDD CUP 2004_数据挖掘_科研数据集.pdfVIP

下载本文档

12
0
约5.95千字
约 6页
2017-10-18 发布于河北
举报

KDD CUP 2004_数据挖掘_科研数据集.pdf

KDD CUP 2004 英文关键词： KDD CUP 2004,performance criteria,bioinformatics,quantum physics, 中文关键词： KDD 杯 2004 年业绩标准、生物信息学、量子物理，数据格式： TEXT 数据介绍： This years competition focuses on data-mining for a variety of performance criteria such as Accuracy, Squared Error, Cross Entropy, and ROC Area. As described on this WWW-site, there are two main tasks based on two datasets from the areas of bioinformatics and quantum physics. The file you downloaded is a TAR archive that is compressed with GZIP. Most decompression programs (e.g. winzip) can decompress these formats. If you run into problems, send us email. The archive should contain four files: phy_train.dat: Training data for the quantum physics task (50,000 train cases) phy_test.dat: Test data for the quantum physics task (100,000 test cases) bio_train.dat: Training data for the protein homology task (145,751 lines) bio_test.dat: Test data for the protein homology task (139,658 lines) The file formats for the two tasks are as follows. Format of the Quantum Physics Dataset Each line in the training and the test file describes one example. The structure of each line is as follows: The first element of each line is an EXAMPLE ID that uniquely describes the example. You will need this EXAMPLE ID when you submit results. The second element is the class of the example. Positive examples are denoted by 1, negative examples by 0. Test examples have a ? in this position. This is a balanced problem so the target values are roughly half 0s and 1s. All following elements are feature values. There are 78 feature values in each line. Missing values: columns 22,23,24 and 46,47,48 use a value of 999 to denote not available, and columns 31 and 57 use 9999 to denote not available. These are the column numbers in the data tables starting with 1 for the first column (the case ID numbers). If you remove the first two columns (the case ID numbers and the ta

您可能关注的文档

文档评论（0）

1亿VIP精品文档

更多 >

KDD CUP 2004_数据挖掘_科研数据集.pdfVIP