The aggregation of the data is complex. This article describes the process step by step for a better understanding of the data transformation.
Let’s see how the data is transformed. We will use the example file “KD_180110_CD160_HVEM.csv” from the HaDeX package and focus on just one peptide - “LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL” in the state “CD160”. The measurement was made for time point 1 min.
Below is shown the original and not aggregated data for chosen peptide.
##      Protein Start   End                             Sequence Modification
##       <char> <int> <int>                               <char>       <lgcl>
##  1: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
##  2: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
##  3: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
##  4: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
##  5: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
##  6: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
##  7: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
##  8: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
##  9: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
## 10: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
## 11: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
## 12: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
## 13: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
## 14: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
## 15: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
## 16: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
## 17: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
## 18: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
## 19: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
## 20: db_CD160    34    69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL           NA
##      Protein Start   End                             Sequence Modification
##     Fragment MaxUptake   MHP  State Exposure                    File     z
##       <lgcl>     <num> <num> <char>    <num>                  <char> <int>
##  1:       NA        33  3901  CD160        1 KD_160530_CD160_1min_01     3
##  2:       NA        33  3901  CD160        1 KD_160530_CD160_1min_01     4
##  3:       NA        33  3901  CD160        1 KD_160530_CD160_1min_01     5
##  4:       NA        33  3901  CD160        1 KD_160530_CD160_1min_01     6
##  5:       NA        33  3901  CD160        1 KD_160530_CD160_1min_01     7
##  6:       NA        33  3901  CD160        1 KD_160530_CD160_1min_02     3
##  7:       NA        33  3901  CD160        1 KD_160530_CD160_1min_02     4
##  8:       NA        33  3901  CD160        1 KD_160530_CD160_1min_02     5
##  9:       NA        33  3901  CD160        1 KD_160530_CD160_1min_02     6
## 10:       NA        33  3901  CD160        1 KD_160530_CD160_1min_02     7
## 11:       NA        33  3901  CD160        1 KD_160530_CD160_1min_03     3
## 12:       NA        33  3901  CD160        1 KD_160530_CD160_1min_03     4
## 13:       NA        33  3901  CD160        1 KD_160530_CD160_1min_03     5
## 14:       NA        33  3901  CD160        1 KD_160530_CD160_1min_03     6
## 15:       NA        33  3901  CD160        1 KD_160530_CD160_1min_03     7
## 16:       NA        33  3901  CD160        1 KD_160530_CD160_1min_04     3
## 17:       NA        33  3901  CD160        1 KD_160530_CD160_1min_04     4
## 18:       NA        33  3901  CD160        1 KD_160530_CD160_1min_04     5
## 19:       NA        33  3901  CD160        1 KD_160530_CD160_1min_04     6
## 20:       NA        33  3901  CD160        1 KD_160530_CD160_1min_04     7
##     Fragment MaxUptake   MHP  State Exposure                    File     z
##        RT   Inten Center
##     <num>   <num>  <num>
##  1:  4.52  325032   1308
##  2:  4.52  753259    981
##  3:  4.52 1340447    785
##  4:  4.52 2076956    654
##  5:  4.53  759271    561
##  6:  4.52  239810   1308
##  7:  4.52  583325    981
##  8:  4.52 1011160    785
##  9:  4.52 1584254    654
## 10:  4.52  600218    561
## 11:  4.52  176788   1308
## 12:  4.52  402630    981
## 13:  4.52  746309    785
## 14:  4.52 1117344    654
## 15:  4.52  397718    561
## 16:  4.53  189258   1308
## 17:  4.53  441817    981
## 18:  4.53  796722    785
## 19:  4.53 1186263    654
## 20:  4.53  451071    561
##        RT   Inten CenterAs we can see from the File column, there are four
replicates of the experiment. Each measurement of a replicate provide
values for different possible charge values for each peptide. The result
of a measurement is in column Center - this is a
geometrical centroid of an isotopic envelope - the product of the
measurement from a mass spectrometer.
Let’s take a look for values for each replicate.
The centroid values for different charge values are not useful. We have to transform it to the mass values, according to an equation:
\[ aggMass = z*(Center - protonMass)\] The results are shown below.
This results are just for one repetition. We have four of them:
Values from each replicate are aggregated into one value, using weighted mean (with intensity value as weight):
The results from replicates are aggregated into the final result (mean), and the uncertainty (standard deviation of the mean) is calculated.
Now we have the mass value for chosen peptide in the chosen state,
measured in the chosen time point. This calculation is done for every
other peptide, and these values of mass and uncertainty are used in the
calculation of deuterium uptake, as described in the
Data processing article.