A. Keller, S. Purvine, A.I. Nesvizhskii, S. Stolyar, D.R. Goodlett, and E. Kolker, "Experimental Protein Mixture for Validating Tandem Mass Spectra Analysis", OMICS 6(2), 207-212 (2002).
Download the protein mixture dataset here: omics_6_2002_dataset.zip
Contents of the readme.txt:
This disk contains the dataset described in A. Keller, S. Purvine, A.I. Nesvizhskii, S. Stolyar, D.R. Goodlett, and E. Kolker, "Experimental Protein Mixture for Validating Tandem Mass Spectra Analysis", OMICS 6(2), 207-212 (2002). The content: 1) 22 .tar files - 22 LC/MS/MS runs, 14 runs on control mixture A and 8 on control mixture B. For example, the file 'sergey_digest_A_full_01.tar' contains all MS/MS (.dta) files generated in the first LC/MS/MS run on mixture A. 2) annotation file, 'list_of_positives.txt'. This file lists all MS/MS spectra that were correctly identified by SEQUEST. The format: spectrum charge_state protein peptide The ending '...2/3' in the spectrum name indicates that the same MS/MS spectrum was searched using SEQUEST twice, once assuming it was a 2+ precursor ion (spectrum name ending with .2) , and once assuming it was a 3+ ion (spectrum name ending with .3). The number following the spectrum name indicates what the correct charge state. For example, ./sergei_digest_A_full_01.0469.0471.2/3 2 sp|P02666|CASB_BOVIN K.VKEAMAPK.H means that the spectrum 'sergei_digest_A_full_01.0469.0471.2' was assigned a peptide K.VKEAMAPK.H corresponding with CASB_BOVIN protein (one of the 18 control proteins), and that that identification was a correct one. 3) database file, 'control_mixture.db' with the sequences of the 18 control proteins and known contaminants. 4) Folder called SEQUEST with a) 22 SEQUEST output files (html files) b) sequest.xls, that contains all assignments from all 22 SEQUEST output files in Excel format the most important columns in that file are : B (spectrum name), C (precursor ion mass), D (difference between measured and theoretical peptide mass), E (Xcorr), F (delta Cn), H (Sp rank), K (protein name), and M (peptide sequence) 5) omics_6_2002.pdf - the manuscript describing the data set.