1 Package description

Infrared, near-infrared and Raman spectroscopic data measured during chemical reactions, provide structural fingerprints by which molecules can be identified and quantified. The application of these spectroscopic techniques as inline process analytical tools (PAT), provides the (pharma-)chemical industry with novel tools, allowing to monitor their chemical processes, resulting in a better process understanding through insight in reaction rates, mechanistics, stability, etc. Data can be read into R via the generic spc-format, which is generally supported by spectrometer vendor software. Versatile pre-processing functions are available to perform baseline correction by linking to the ‘baseline’ package; noise reduction via the ‘signal’ package; as well as time alignment, normalization, differentiation, integration and interpolation. Implementation based on the S4 object system allows storing a pre-processing pipeline as part of a spectral data object, and easily transferring it to other datasets. Interactive plotting tools are provided based on the ‘plotly’ package. Non-negative matrix factorization (NMF) has been implemented to perform multivariate analyses on individual spectral datasets or on multiple datasets at once. NMF provides a parts-based representation of the spectral data in terms of spectral signatures of the chemical compounds and their relative proportions. The functionality to read in spc-files was adapted from the ‘hyperSpec’ package.

In this vignette we review the package functionality from loading data to NMF-analysis.

2 Basic functionality

2.1 SpectraInTime-class

Spectral data are represented as en S3-class which containt a data matrix together with meta data such as wavelengths, time points and preprocessing steps executed on the data. An artificial example is added in the package by which we will illustrate the basic functionality.

library( spectralAnalysis )
spectralEx                 <-  getSpectraInTimeExample( )
str( spectralEx)
#> Formal class 'SpectraInTime' [package "spectralAnalysis"] with 9 slots
#>   ..@ spectra       : num [1:101, 1:801] 1.92e-24 1.88e-24 1.83e-24 1.78e-24 1.74e-24 ...
#>   ..@ experimentName: chr "ABLOMMAERT-01-00376"
#>   ..@ spectralAxis  : num [1:801] 100 100 101 102 102 ...
#>   ..@ timePoints    : num [1:101] 0 360 720 1080 1440 1800 2160 2520 2880 3240 ...
#>   ..@ timePointsAlt : num [1:101] -10800 -10440 -10080 -9720 -9360 ...
#>   ..@ extraInfo     : list()
#>   ..@ startTime     : POSIXct[1:1], format: "2013-06-03 20:00:03"
#>   ..@ units         :List of 4
#>   .. ..$ wavelength: chr "expression(tilde(nu)/cm^-1)"
#>   .. ..$ spc       : chr "expression('A')"
#>   .. ..$ z         : chr "expression(t/s)"
#>   .. ..$ z.end     : chr "expression(t/s)"
#>   ..@ preprocessing : list()

You can extract slots via specific methods:

dim( spectralEx)
#>         time spectralAxis 
#>          101          801
getExperimentName(spectralEx)
#> [1] "ABLOMMAERT-01-00376"
getExtraInfo( spectralEx )
#> list()
getStartTime( spectralEx )
#> [1] "2013-06-03 20:00:03 CEST"
getTimePoints( spectralEx )
#>   [1]     0   360   720  1080  1440  1800  2160  2520  2880  3240  3600  3960
#>  [13]  4320  4680  5040  5400  5760  6120  6480  6840  7200  7560  7920  8280
#>  [25]  8640  9000  9360  9720 10080 10440 10800 11160 11520 11880 12240 12600
#>  [37] 12960 13320 13680 14040 14400 14760 15120 15480 15840 16200 16560 16920
#>  [49] 17280 17640 18000 18360 18720 19080 19440 19800 20160 20520 20880 21240
#>  [61] 21600 21960 22320 22680 23040 23400 23760 24120 24480 24840 25200 25560
#>  [73] 25920 26280 26640 27000 27360 27720 28080 28440 28800 29160 29520 29880
#>  [85] 30240 30600 30960 31320 31680 32040 32400 32760 33120 33480 33840 34200
#>  [97] 34560 34920 35280 35640 36000
getTimePoints( spectralEx , timePointsAlt = TRUE , timeUnit = "seconds" ) 
#>   [1] -10800 -10440 -10080  -9720  -9360  -9000  -8640  -8280  -7920  -7560
#>  [11]  -7200  -6840  -6480  -6120  -5760  -5400  -5040  -4680  -4320  -3960
#>  [21]  -3600  -3240  -2880  -2520  -2160  -1800  -1440  -1080   -720   -360
#>  [31]      0    360    720   1080   1440   1800   2160   2520   2880   3240
#>  [41]   3600   3960   4320   4680   5040   5400   5760   6120   6480   6840
#>  [51]   7200   7560   7920   8280   8640   9000   9360   9720  10080  10440
#>  [61]  10800  11160  11520  11880  12240  12600  12960  13320  13680  14040
#>  [71]  14400  14760  15120  15480  15840  16200  16560  16920  17280  17640
#>  [81]  18000  18360  18720  19080  19440  19800  20160  20520  20880  21240
#>  [91]  21600  21960  22320  22680  23040  23400  23760  24120  24480  24840
#> [101]  25200
getTimePoints( spectralEx , timeUnit = "minutes" )
#>   [1]   0   6  12  18  24  30  36  42  48  54  60  66  72  78  84  90  96 102
#>  [19] 108 114 120 126 132 138 144 150 156 162 168 174 180 186 192 198 204 210
#>  [37] 216 222 228 234 240 246 252 258 264 270 276 282 288 294 300 306 312 318
#>  [55] 324 330 336 342 348 354 360 366 372 378 384 390 396 402 408 414 420 426
#>  [73] 432 438 444 450 456 462 468 474 480 486 492 498 504 510 516 522 528 534
#>  [91] 540 546 552 558 564 570 576 582 588 594 600
getTimePoints( spectralEx , timeUnit = "hours" )
#>   [1]  0.0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1.0  1.1  1.2  1.3  1.4
#>  [16]  1.5  1.6  1.7  1.8  1.9  2.0  2.1  2.2  2.3  2.4  2.5  2.6  2.7  2.8  2.9
#>  [31]  3.0  3.1  3.2  3.3  3.4  3.5  3.6  3.7  3.8  3.9  4.0  4.1  4.2  4.3  4.4
#>  [46]  4.5  4.6  4.7  4.8  4.9  5.0  5.1  5.2  5.3  5.4  5.5  5.6  5.7  5.8  5.9
#>  [61]  6.0  6.1  6.2  6.3  6.4  6.5  6.6  6.7  6.8  6.9  7.0  7.1  7.2  7.3  7.4
#>  [76]  7.5  7.6  7.7  7.8  7.9  8.0  8.1  8.2  8.3  8.4  8.5  8.6  8.7  8.8  8.9
#>  [91]  9.0  9.1  9.2  9.3  9.4  9.5  9.6  9.7  9.8  9.9 10.0
getUnits( spectralEx )
#> $wavelength
#> [1] "expression(tilde(nu)/cm^-1)"
#> 
#> $spc
#> [1] "expression('A')"
#> 
#> $z
#> [1] "expression(t/s)"
#> 
#> $z.end
#> [1] "expression(t/s)"
getTimePoints( spectralEx , timePointsAlt = TRUE ) 
#>   [1] -10800 -10440 -10080  -9720  -9360  -9000  -8640  -8280  -7920  -7560
#>  [11]  -7200  -6840  -6480  -6120  -5760  -5400  -5040  -4680  -4320  -3960
#>  [21]  -3600  -3240  -2880  -2520  -2160  -1800  -1440  -1080   -720   -360
#>  [31]      0    360    720   1080   1440   1800   2160   2520   2880   3240
#>  [41]   3600   3960   4320   4680   5040   5400   5760   6120   6480   6840
#>  [51]   7200   7560   7920   8280   8640   9000   9360   9720  10080  10440
#>  [61]  10800  11160  11520  11880  12240  12600  12960  13320  13680  14040
#>  [71]  14400  14760  15120  15480  15840  16200  16560  16920  17280  17640
#>  [81]  18000  18360  18720  19080  19440  19800  20160  20520  20880  21240
#>  [91]  21600  21960  22320  22680  23040  23400  23760  24120  24480  24840
#> [101]  25200
spectra       <-  getSpectra( spectralEx )
dim( spectra )
#> [1] 101 801

One can modify slot using specific methods:

setTimePointsAlt( spectralEx  )  <-  getTimePoints( spectralEx )  -  200 

Using these methods, will avoid you to make some errors since validity testing is included.

setTimePointsAlt( spectralEx  )  <-  getTimePoints( spectralEx ) * 5 
#> Error in !checkEqualTimeIntervals: invalid argument type

to get more information on the SpectraIntime class and its methods:

?SpectraInTime

2.2 Reading in data

The function loadAllSPCFiles() allows to read in all spectra data into a list:

allSPCFiles     <- loadAllSPCFiles(directoryFiles)

2.3 Subsetting

Different subsetting methods have been defined:

  • logical, integer (by order)
  • range r() function
  • closest element matching e() function
  • for time matching one can change the time unit between ‘seconds’ , ‘minutes’ and ‘hours’.
print( r( 2.5 , 10.8) )
#> An object of class "RangeToSubset"
#> Slot "range":
#>  min  max 
#>  2.5 10.8
print( r( c(1 , 3 , 2 , 6 , 9 , 3 ) ) )
#> An object of class "RangeToSubset"
#> Slot "range":
#> min max 
#>   1   9

print( e( 1, 3 ,5 ,6 ,7 ,8 ) )
#> An object of class "ElementsToSelect"
#> Slot "elements":
#> [1] 1 3 5 6 7 8

Note that r() and e() work similar to the usual c() function:

# range subsetting 
spectralEx                      <-   getSpectraInTimeExample()
spectraSubset                   <-  spectralEx[ r( 1000 , 30000 ) , r(130 , 135 ) ]
getTimePoints( spectraSubset )
#>  [1]  1080  1440  1800  2160  2520  2880  3240  3600  3960  4320  4680  5040
#> [13]  5400  5760  6120  6480  6840  7200  7560  7920  8280  8640  9000  9360
#> [25]  9720 10080 10440 10800 11160 11520 11880 12240 12600 12960 13320 13680
#> [37] 14040 14400 14760 15120 15480 15840 16200 16560 16920 17280 17640 18000
#> [49] 18360 18720 19080 19440 19800 20160 20520 20880 21240 21600 21960 22320
#> [61] 22680 23040 23400 23760 24120 24480 24840 25200 25560 25920 26280 26640
#> [73] 27000 27360 27720 28080 28440 28800 29160 29520 29880
getSpectralAxis(  spectraSubset )
#>  [1] 130.0 130.5 131.0 131.5 132.0 132.5 133.0 133.5 134.0 134.5 135.0
spectraTimeSubset               <-  spectralEx[ r( 1000 , 30000 ) ,  ]
spectraWavelengthSubset         <-  spectralEx[  ,  r(130 , 135 ) ]

# other types of subsetting 
# logical
spectraSubsetLogical   <-  spectralEx[ getTimePoints( spectralEx ) > 300  ,
    getSpectralAxis( spectralEx ) <= 500 ]
subsetLogTimeAlt       <-  spectralEx[ 
    getTimePoints( spectralEx , timePointsAlt = TRUE ) > 0 ,
    getSpectralAxis( spectralEx ) <= 500 ]

# integer

subsetInteger                   <-  spectralEx[ c( 1, 5, 10)  , c( 4 , 4 , 4 , 8 , 16) ] 

# closest element matching 

spectraSubsetElem               <-  spectralEx[ e( 1.234 , 3.579 ) ,
    e( 200.001 , 466.96  ) , timeUnit = "hours" ]
getTimePoints( spectraSubsetElem , timeUnit = "hours" )
#> [1] 1.2 3.6
getSpectralAxis( spectraSubsetElem  )
#> [1] 200 467

2.4 Summary

summarySpec       <-  summary( spectralEx )
str( summarySpec )
#> Formal class 'SummaryByWavelengths' [package "spectralAnalysis"] with 6 slots
#>   ..@ experimentName: chr "ABLOMMAERT-01-00376"
#>   ..@ timeRange     : num [1:2] 0 36000
#>   ..@ timeRangeAlt  : num [1:2] -10800 25200
#>   ..@ spectralRange : num [1:2] 100 500
#>   ..@ spectra       :'data.frame':   801 obs. of  6 variables:
#>   .. ..$ wavelengths  : num [1:801] 100 100 101 102 102 ...
#>   .. ..$ firstSpectrum: num [1:801] 1.92e-24 2.47e-24 3.17e-24 4.06e-24 5.20e-24 ...
#>   .. ..$ lastSpectrum : num [1:801] 1.58e-25 2.03e-25 2.60e-25 3.33e-25 4.27e-25 ...
#>   .. ..$ mean         : num [1:801] 7.10e-25 9.11e-25 1.17e-24 1.50e-24 1.92e-24 ...
#>   .. ..$ median       : num [1:801] 5.51e-25 7.07e-25 9.08e-25 1.16e-24 1.49e-24 ...
#>   .. ..$ sd           : num [1:801] 4.95e-25 6.36e-25 8.15e-25 1.05e-24 1.34e-24 ...
#>   ..@ units         :List of 4
#>   .. ..$ wavelength: chr "expression(tilde(nu)/cm^-1)"
#>   .. ..$ spc       : chr "expression('A')"
#>   .. ..$ z         : chr "expression(t/s)"
#>   .. ..$ z.end     : chr "expression(t/s)"

2.5 Graphics

2.5.1 One spectraInTime

4 different visualization options are implemented:

data = getSpectraInTimeExample()
library( plotly )
plot( x =  data[ e( 1 , 2 , 3) , , timeUnit = "hours" ] , type = "time" , timeUnit = "hours" , timePointsAlt = FALSE ) 

Figure 2.1: time view plot

plot( x =  data[ , e( seq( 200 , 400 , 50 ) ) ] , type = "spectralAxis" , timeUnit = "minutes" , timePointsAlt = TRUE ) 

Figure 2.2: Wavelength view plot

plot( x = data , type = "contour" , nColors = 200 , colors = "C" , timeUnit = "seconds", timePointsAlt = TRUE ) 
Contour plot

Figure 2.3: Contour plot

Note: 3D-plot not run to save space:

plot( x =  data , type = "3D" , timeUnit = "hours" , timePointsAlt = FALSE ) 

note that subsetting with element matching is useful for the time and wavelength views.

2.5.2 List of spectra in time

We have also 2 options to plot a list of SpectraInTime-objects directly.

Line plot:

listOfSpectra     <-  getListOfSpectraExample()
plot( listOfSpectra , times = 1:3 , timeUnit = "hours" , colors = "A" ) 
Line plot for list of SpectraInTime

Figure 2.4: Line plot for list of SpectraInTime

and contour plot:

plot( listOfSpectra , timeUnit = "hours" , colors = "C" , type = "contour" )
Contour plot for list of SpectraInTime

Figure 2.5: Contour plot for list of SpectraInTime

2.6 Time alignment

Time alignment of SpectraInTime refers to translating and possibly subsetting the secondary time axis. Such that different experiments have a comparable time origin.

Information to align SpectraInTime is contained in processTimes object for one experiment:

processTimes        <-  getProcessTimesExample() 
processTimes
#> An object of class "ProcessTimes"
#> Slot "experimentName":
#> [1] "ABLOMMAERT-01-00376"
#> 
#> Slot "timeHeatingAboveMin":
#> [1] "2013-06-03 20:30:03 CEST"
#> 
#> Slot "timeStartReaction":
#> [1] "2013-06-03 22:30:03 CEST"
#> 
#> Slot "timeEndProcess":
#> [1] "2013-06-04 03:30:03 CEST"
#> 
#> Slot "Tset":
#> [1] 100
#> 
#> Slot "comments":
#> [1] ""

and processTimesFrame for multiple experiments:

processTimesFrame   <-  getProcessTimesFrameExample()
processTimesFrame
#> An object of class "ProcessTimesFrame"
#> Slot "processTimes":
#>        experimentName timeHeatingAboveMin   timeStartReaction
#> 1 ABLOMMAERT-01-00376 2013-06-03 20:30:03 2013-06-03 22:30:03
#> 2 ABLOMMAERT-02-00347 2013-06-03 20:30:03 2013-06-03 22:50:03
#>        timeEndProcess Tset comments
#> 1 2013-06-04 03:30:03  100         
#> 2 2013-06-04 03:30:03  100

There are 3 methods to align a SpectraInTime:

spectra             <-  getSpectraInTimeExample()
listOfSpectra       <-  getListOfSpectraExample()
pathProcessTimes    <-  getPathProcessTimesExample()
   
ex1    <-  timeAlign( x = spectra , y = processTimes ,
    cutCooling = TRUE , cutBeforeMinTemp = TRUE )
ex2    <-  timeAlign( x = listOfSpectra , y = processTimesFrame ,
    cutCooling = TRUE , cutBeforeMinTemp = TRUE )
ex3    <-  timeAlign( x = listOfSpectra , y = pathProcessTimes ,
    cutCooling = TRUE , cutBeforeMinTemp = TRUE  , timeFormat =  "%Y-%m-%d %H:%M:%OS" )

3 Real IR example

We used 2 example spectra from a chemical synthesis monitored by an IR-probe to illustrate preprocessing and NMF-analyis.

exampleData1       <-  readSpectra( system.file( "exampleData/exampleExperiment1.txt" ,
        package = "spectralAnalysis") ) 
plot( exampleData1, type = "contour" )
Contour plot 2 real example experiments

Figure 3.1: Contour plot 2 real example experiments

3.1 Pre-processing

Pre-processing is applied to correct for any physical influences such as light scattering. It allows for better interpretation and more precise modeling. Usually several pre-processing steps are applied after each other and the ideal pre-processing procedure depends on the technology and the measured system.

3.1.1 Selection of time and wavelength range

Scientist can often indicate a relevant wavelength range. In this manner we can reduce the number of covariates.

dim( exampleData1 )
#>         time spectralAxis 
#>          199          211
wavelengthRange         <-  r ( 800 ,  1625 )
spectralDataSelect      <-  exampleData1[  r( 0 , 5 ) , wavelengthRange , timeUnit = "hours" ]
Selected spectra of an example experiment

Figure 3.2: Selected spectra of an example experiment

3.1.2 Baseline correction

Baseline correction implies fitting a global function for each measured spectra and subtracting it from the data.

timesToShow           <-  e( 0.5 , 5 )
plot( spectralDataSelect[ timesToShow , , timeUnit = "hours"] , type = "time" ) 

Figure 3.3: Raw spectra example experiment

spectralDataBaseline        <-  baselineCorrect( spectralDataSelect ,
    method = 'modpolyfit', degree = 4 )
plot( spectralDataBaseline[ timesToShow , , timeUnit = "hours"] , type = "time" ) 

Figure 3.4: Baseline corrected spectra example experiment

3.1.3 smoothing

Smoothing or filtering in the wavelength domain can reduce measurement noise, the default smoothing is the Savitky-golay filter, which uses a local polynomial approximation, and which can be also be used to calculate derivatives in the wavelength domain. This can be useful to better detect peaks, especially in NIR-spectra where peaks are not clearly distinguishable otherwise.

spectralDataSmooth       <-  smooth( spectralDataBaseline , window = 5 )
plot( spectralDataSmooth[ timesToShow , , timeUnit = "hours"] , type = "time" ) 

Figure 3.5: Smoothed spectra example experiment

smoothing has little influence here, because the measurements contain little noise.

spectralDataDerivative   <-  smooth(  spectralDataBaseline , derivative = 1  )
plot( spectralDataDerivative[ timesToShow , , timeUnit = "hours"] , type = "time" ) 

Figure 3.6: Derivative spectra example experiment

3.1.4 Normalization

Normalization standardizes the intensities by:

  • the area under the curve
  • a reference peak, usually the peak of the solvent
  • standardization by some mean and scale function
spectralDataNormalized   <-  normalize( spectralDataSmooth , method = "integration"   )
plot( spectralDataNormalized[ timesToShow , , timeUnit = "hours"] , type = "time" ) 

Figure 3.7: Integral normalized spectra example experiment

3.1.5 Scatter correction

Multiplicative scatter correction (MSC) removes scatter effects by regressing a spectra in the wavelength domain against a reference spectra and substracting the found intercept and dividing by the slope for each spectra

?scatterCorrrect

3.1.6 Transfer preprocessing steps

preprocessing steps can be replicated on new data:

allSpectraDataProcessed    <-  lapply( listOfSpectra , preprocess , with = spectralDataSmooth  )

3.2 NMF-analysis

We will perform NMF-ananalysis on the smoothed and baseline corrected spectra for the time range: 0 - 5 hours assuning 3 chemical components.

spectralExSelect  <-  spectralDataSmooth[ r( 0 , 5 ) , , timeUnit = "hours" ] 
nmfResult    <-  spectralNMF( spectralExSelect , rank = 3 , subsamplingFactor = 5 )
#> Warning in nonNegativePreprocessing(spectra, method): Input data contain
#> negative values. These were reset to zero for NMF analysis
nmfObject    <-  getDimensionReduction( nmfResult , type = "NMF")$NMF

nmfTrends    <-  t( NMF::coef( nmfObject ) )
matplot( nmfTrends , type = "l" , x = getTimePoints( spectralExSelect , timeUnit = "hours"  ) , xlab = "time in hours"  )
NMF-trends on example experiments

Figure 3.8: NMF-trends on example experiments

NMF-analysis is a useful exploratory tool to get insight in the chemical process.

Including pure-component spectra can increase the quality of NMF-analysis.

see

?spectralNMF