control mixture datasetM

This page is for the distribution of the data associated with the publication

A. Keller, S. Purvine, A.I. Nesvizhskii, S. Stolyar, D.R. Goodlett, and E. Kolker, "Experimental Protein Mixture for Validating Tandem Mass Spectra Analysis", OMICS 6(2), 207-212 (2002).

Download the protein mixture dataset here: omics_6_2002_dataset.zip

Contents of the readme.txt:

This disk contains the dataset described in 
A. Keller, S. Purvine, A.I. Nesvizhskii, S. Stolyar, D.R. Goodlett, and E. 
Kolker, "Experimental Protein Mixture for Validating Tandem Mass Spectra 
Analysis", OMICS 6(2), 207-212 (2002).

The content:

1) 22 .tar files - 22 LC/MS/MS runs, 14 runs on control mixture A and 8 on 
control mixture B. For example, the file
'sergey_digest_A_full_01.tar' contains all MS/MS (.dta) files generated in the 
first LC/MS/MS run on mixture A. 

2) annotation file, 'list_of_positives.txt'.
This file lists all MS/MS spectra that were correctly identified by SEQUEST.
The format:

spectrum  charge_state  protein  peptide

The ending '...2/3' in the spectrum name indicates that the same MS/MS spectrum 
was searched using SEQUEST twice, once assuming it was a 2+ precursor ion 
(spectrum name ending with .2) , and  once assuming it was a 3+ ion (spectrum
name ending with .3). The number following the spectrum name indicates what
the correct charge state. 

For example, 

./sergei_digest_A_full_01.0469.0471.2/3 2   sp|P02666|CASB_BOVIN   K.VKEAMAPK.H  

means that the spectrum 'sergei_digest_A_full_01.0469.0471.2' was assigned a 
peptide K.VKEAMAPK.H corresponding with CASB_BOVIN protein (one of the 18 
control proteins), and that that identification was a correct one. 
 
3) database file, 'control_mixture.db' with the sequences of the 18 control 
proteins and known contaminants.

4) Folder called SEQUEST with 
   a) 22 SEQUEST output files (html files) 

   b) sequest.xls, that contains all assignments from all 22 SEQUEST output files
   in Excel format  

   the most important columns in that file are : B (spectrum name), C (precursor ion mass), 
   D (difference between measured and theoretical peptide mass), E (Xcorr),
   F (delta Cn), H (Sp rank), K (protein name), and M (peptide sequence) 

5) omics_6_2002.pdf - the manuscript describing the data set.