Title: | massSpectrometryR |
---|---|
Description: | Provides calculations, plotting etc for chemistry & mass spectrometry. |
Authors: | Ben Bruyneel <[email protected]> |
Maintainer: | Ben Bruyneel <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.4.1 |
Built: | 2025-01-27 05:56:51 UTC |
Source: | https://github.com/BenBruyneel/massSpectrometryR |
custom operator for subtracting formulas from one another, to make calculating with formulas a little more clear
formula1 %f-% formula2
formula1 %f-% formula2
formula1 |
named numeric vector, example c(O = 2, C = 1); formula to be subtracted from |
formula2 |
named numeric vector, example c(H = 2, S = 1); formula to subtract |
c(H = 2, O = 1) %f-% c(H = 1) c(H = 2, O = 1) %f-% c(S = 1, O = 2)
c(H = 2, O = 1) %f-% c(H = 1) c(H = 2, O = 1) %f-% c(S = 1, O = 2)
custom operator for adding up formulas, to make calculating with formulas a little more clear
formula1 %f+% formula2
formula1 %f+% formula2
formula1 |
named numeric vector, example c(O = 2, C = 1) |
formula2 |
named numeric vector, example c(H = 2, S = 1) |
waterFormula() %f+% protonFormula() waterFormula() %f+% c(C=1, O = 2) c(H = 2, O = 1) %f+% c(S = 1, O = 2)
waterFormula() %f+% protonFormula() waterFormula() %f+% c(C=1, O = 2) c(H = 2, O = 1) %f+% c(S = 1, O = 2)
Adding up two formulas, taking into account possible differing elements
addFormulas(formula1, formula2)
addFormulas(formula1, formula2)
formula1 |
named numeric vector, example c(O = 2, C = 1) |
formula2 |
named numeric vector, example c(H = 2, S = 1) |
a named numeric vector (formula)
addFormulas(waterFormula(), protonFormula()) addFormulas(waterFormula(), c(C=1, O=2))
addFormulas(waterFormula(), protonFormula()) addFormulas(waterFormula(), c(C=1, O=2))
Take a list of formulas and adds them all up
addListFormulas(formulas)
addListFormulas(formulas)
formulas |
list of formulas |
a named numeric vector (formula)
addListFormulas(list(c(H = 2, O = 1), c(H = 1), c(H = 2, O = 1), c(S = 1, O = 2)))
addListFormulas(list(c(H = 2, O = 1), c(H = 1), c(H = 2, O = 1), c(S = 1, O = 2)))
R6 Class representing a set of amino acids. It adds three functions to quickly switch between different writing 'styles' of peptides
Note: this class is meant to be used only for amino acids and such
massSpectrometryR::chemicals
-> aminoacids
getName()
Function to retrieve the full name of an amino acid via the letter or shorts
aminoAcidClass$getName(searchString, checkCase = TRUE)
searchString
either a 1- or 3- letter character vector
checkCase
default = TRUE. If false, the function will ignore the case the searchString argument
character vector, name of the aminoacid
getShort()
Function to retrieve either the 1- or 3- letter code of an amino acid
aminoAcidClass$getShort(searchString, checkCase = TRUE)
searchString
either a 1- or 3- letter character vector: if 1-letter than the corresponding 3-letter character vector will be returned and vice versa
checkCase
default = TRUE. If false, the function will ignore the case the searchString argument
character vector, 1- or 3- letter code of the aminoacid
translatePeptide()
Translates a amino acid sequence from 1-letter codes to 3-letter codes and vice versa
aminoAcidClass$translatePeptide( sequence, from1to3 = FALSE, splitCharacter = NA, joinCharacter = NA, checkCase = TRUE )
sequence
character vector: amino acid sequence in 1-letter or 3-letter codes
from1to3
logical vector: if TRUE, then translation will be from 1-letter code to 3=letter code. If FALSE, then vice versa. Default = FALSE
splitCharacter
character vector specifying the character(s) between the 1- or 3-letter codes in the sequence. Default NA (same as "")
joinCharacter
character vector specifying the character(s) between the translated codes. Default NA (same as "")
checkCase
default = TRUE. If false, the function will ignore the case the sequence
character vector, sequence in either 1- or 3-letter codes
clone()
The objects of this class are cloneable with this method.
aminoAcidClass$clone(deep = FALSE)
deep
Whether to make a deep clone.
aminoAcidResidues()$getShort("L") aminoAcidResidues()$getShort("Leu") aminoAcidResidues()$getName("L") aminoAcidResidues()$getName("Leu") aminoAcidResidues()$translatePeptide("Asp-Arg-Val-Tyr-Ile-His-Pro-Phe-His-Leu", from1to3 = TRUE, splitCharacter ="-") aminoAcidResidues()$translatePeptide("DRVYIHPFHL", joinCharacter = "-")
aminoAcidResidues()$getShort("L") aminoAcidResidues()$getShort("Leu") aminoAcidResidues()$getName("L") aminoAcidResidues()$getName("Leu") aminoAcidResidues()$translatePeptide("Asp-Arg-Val-Tyr-Ile-His-Pro-Phe-His-Leu", from1to3 = TRUE, splitCharacter ="-") aminoAcidResidues()$translatePeptide("DRVYIHPFHL", joinCharacter = "-")
Returns a pre-defined object which contains info on some common amino acid modifications
aminoAcidModifications()
aminoAcidModifications()
An object of class modifications containing info on amino acid modifications
the resulting modification table cannot be used immediately: there is two times a fixed modification for Cysteine amino acids. Remove one of them to prevent errors when using peptide calculations
print(aminoAcidModifications)
print(aminoAcidModifications)
Generates a pre-defined object which contains info on 'normal' amino acid residues
aminoAcidResidues()
aminoAcidResidues()
a R6 object of class 'chemicals'
The formulas in the object are amino acid residues as they are present in proteins. To get the actual formula of the amino acid in its 'free' form, add c(H=2, O=1) (water)
this object is used in all protein calculations in this package
print(aminoAcidResidues())
print(aminoAcidResidues())
Every chemical inside the object has a name, letter, short and a formula. The first 3 can be any length of string (though the letter and short field should be maximum length (nchar) 1 and 2-4 respectively). Formula should be in the form of a named numeric with the names representing elements and the values themselves being the number of atoms of that element, eg c(C = 3, H = 5, N = 1, O = 1, S = 0)
Note: this class is meant to be used for classes of compounds, eg amino acids
Also: his class is meant as a base class to be expanded via inheritance
Warning: all chemicals inside this object should be unique (names, letters & shorts)
number
retrieve the number of compounds present in the object, read only
letters
to access the letters of the compounds in the object
names
to access the names of the compounds in the object
shorts
to access the shorts of the compounds in the object
formulas
to access the formulas of the compounds in the object
table
retrieves all info on the compounds in data.frame format, read only
new()
Create a new chemicals object
chemicals$new(letters, shorts, names, formulas)
letters
character vector specifying the letters (or numbers or whatever) for the chemicals. In case of amino acids it should be eg "A" for Alanine, "G" for Glycine, etc etc
shorts
character vector specifying the short names for the chemicals, eg Ala for Alanine
names
character vector specifying the names of the chemicals
formulas
list of named numeric vectors specifying the formulas of the chemicals, eg c(C = 6, H = 12, N = 4, O = 1, S = 0) for Arginine
a new 'chemical' object
print()
For printing purposes: prints a table of the chemicals with columns letter, name & short
chemicals$print(...)
...
no arguments, the function takes care of printing
getFormula()
Retrieves the formula of one of the compounds in the object
chemicals$getFormula(which1)
which1
specifies which chemical should be retrieved. Which the number (row number in the chemicals table), or the name, letter or short as a character vector. The way this is set up, it doesn't matter whether capital or non-capital letters are used, since all is converted to upper case before comparing with what's in the chemical table
a formula in the shape of a named numeric vector, eg c(C = 6, H = 12, N = 4, O = 1, S = 0)
clone()
The objects of this class are cloneable with this method.
chemicals$clone(deep = FALSE)
deep
Whether to make a deep clone.
estrogens <- chemicals$new(letters = c("1","2","3","4"), shorts = c("E1","E2","E3","E4"), names = c("Estrone","Estradiol", "Estriol","Estetrol"), formulas = list(c(C=18, H=22, O=2), c(C=18, H=24, O=2), c(C=18, H=24, O=3), c(C=18, H=24, O=4)))
estrogens <- chemicals$new(letters = c("1","2","3","4"), shorts = c("E1","E2","E3","E4"), names = c("Estrone","Estradiol", "Estriol","Estetrol"), formulas = list(c(C=18, H=22, O=2), c(C=18, H=24, O=2), c(C=18, H=24, O=3), c(C=18, H=24, O=4)))
Digests a sequence and returns
digest(sequence, enzyme = "trypsin", missed = 0)
digest(sequence, enzyme = "trypsin", missed = 0)
sequence |
character vector representing the amino acid sequence to be digested. Note: the letters in sequence will be changed to upper case. |
enzyme |
character string specifying the enzyme to be used for the digestion. Default is 'trypsin'. Other options are 'trypsin.strict', 'pepsin', 'chymotrypsin','chymotrypsin.strict' and 'Glu-C' |
missed |
integer vector: the maximum number of allowed missed cleavages |
data.frame with the columns 'peptide', 'start', 'stop' and 'mc' (missed cleavages)
This function is an modified version of the Digest function found in the package 'OrgMassSpecR'
generates a pre-defined formula for electron
electronFormula()
electronFormula()
a named numeric vector (formula)
this is used for calculations
print(electronFormula())
print(electronFormula())
Every element inside the object has a name, letter and a mass. The first 2 can be any length of string, mass should be a numeric
Note: this class is meant to be used for elements.
Warning: all elements inside this object should be unique (names & shorts, not mass)
number
retrieve the number of elements present in the object, read only
names
to access the names of the elements in the object
shorts
to access the shorts of the elements in the object
mass
to access the masses of the elements in the object
table
retrieves all info on the elements in data.frame format, read only
new()
creates a new elements object
elements$new(shorts, names, mass)
shorts
character vector specifying the short names for the elements, eg Hg for Mercury
names
character vector specifying the names of the elements
mass
numeric vector specifying the masses of the elements
a new 'elements' object
addElement()
adds one or more elements to the object. Elements to be added must have unique names and shorts.
elements$addElement(shorts, names, mass)
shorts
character vector specifying the short names for the elements, eg Hg for Mercury
names
character vector specifying the names of the elements
mass
numeric vector specifying the masses of the elements
nothing
print()
For printing purposes: prints a table of the chemicals with columns letter, name & short
elements$print(...)
...
no arguments, the function takes care of printing
getMass()
Retrieves the mass of one of the elements in the object
elements$getMass(which1)
which1
specifies which element should be retrieved. Which number (row number in the chemicals table), name or short as a character vector. The way this is set up, it doesn't matter whether capital or non-capital letters are used, since all are converted to upper case before comparing with what's in the elements table
a numeric value
clone()
The objects of this class are cloneable with this method.
elements$clone(deep = FALSE)
deep
Whether to make a deep clone.
randomElements <- elements$new(shorts = c("X1","X2","X3"), names = c("Secret Element 1", "Secret Element 2", "Secret Element 1"), mass = c(301, 312, 323))
randomElements <- elements$new(shorts = c("X1","X2","X3"), names = c("Secret Element 1", "Secret Element 2", "Secret Element 1"), mass = c(301, 312, 323))
generates a pre-defined object which contains info on elements, mass values are average masses (weighted mean mass of elements based on their natural occurence)
elementsAverage()
elementsAverage()
an elements object
electron is not really meant to be used in formulas, but is needed to calculate mass & m/z of ions
print(elementsAverage())
print(elementsAverage())
retrieves the names of the elements in a formula
elementsInFormula(formula, removeZero = FALSE)
elementsInFormula(formula, removeZero = FALSE)
formula |
named numeric vector, example c(O = 2, C = 1) |
removeZero |
logical flag on how to deal with elements which are zero |
character vector
glucose = c(C=6, H=12, O=6, S=0) elementsInFormula(glucose) elementsInFormula(glucose, removeZero = TRUE)
glucose = c(C=6, H=12, O=6, S=0) elementsInFormula(glucose) elementsInFormula(glucose, removeZero = TRUE)
especially important when formula1 contains elements that formula2 does not contain and vice versa, Note: also sorts the result
elementsInFormulas(formula1, formula2, decrease = FALSE)
elementsInFormulas(formula1, formula2, decrease = FALSE)
formula1 |
named numeric vector, example c(O = 2, C = 1) |
formula2 |
named numeric vector, example c(H = 2, S = 1) |
decrease |
logical flag on how to sort, default = FALSE: increasing |
a character vector
elementsInFormulas(c(O = 2, C = 1), c(H = 2, S = 1))
elementsInFormulas(c(O = 2, C = 1), c(H = 2, S = 1))
generates a pre-defined R6 elements object which contains info on elements, mass values are mono isotopic
elementsMonoisotopic()
elementsMonoisotopic()
an elements object
this object is (by default) used in all chemical calculations in this package
electron is not really meant to be used in formulas, but is needed to calculate mass & m/z of ions
print(elementsMonoisotopic())
print(elementsMonoisotopic())
generates an empty pre-defined formula
emptyFormula()
emptyFormula()
a named numeric vector (formula)
this can be used for calculations and setup of unknowns
print(emptyFormula())
print(emptyFormula())
Translates regular formula format into a character vector, eg C6H12O6
formulaString(formula, removeSingle = FALSE, useMarkdown = FALSE)
formulaString(formula, removeSingle = FALSE, useMarkdown = FALSE)
formula |
named numeric vector, example c(O = 2, C = 1) |
removeSingle |
if TRUE then for elements that are present in the formula only a single time, the number (1) will not be included. Default is FALSE. See also examples |
useMarkdown |
default = FALSE. If TRUE, the it will use HTML/Markdown codes <sub> in the formulas, which can be used with the library 'gt' to generate 'proper' notation for chemical formulas (numbers in subscript) |
character vector
formulaString(c(C=6, H=12, O=6)) formulaString(c(H=3,O=4,P=1)) formulaString(c(H=3,O=4,P=1), removeSingle = TRUE) formulaString(c(H=2, O=1)) formulaString(c(H=2, O=1), removeSingle = TRUE)
formulaString(c(C=6, H=12, O=6)) formulaString(c(H=3,O=4,P=1)) formulaString(c(H=3,O=4,P=1), removeSingle = TRUE) formulaString(c(H=2, O=1)) formulaString(c(H=2, O=1), removeSingle = TRUE)
calculates the neutral mono-isotopic mass of a formula
formulaToMass( formula = NULL, removeNA = FALSE, elementsInfo = elementsMonoisotopic(), enviPat = FALSE, exact = TRUE )
formulaToMass( formula = NULL, removeNA = FALSE, elementsInfo = elementsMonoisotopic(), enviPat = FALSE, exact = TRUE )
formula |
named numeric vector, example c(O = 2, C = 1) |
removeNA |
logical vector: what to do if any of the elements is NA. If TRUE, then remove before calculation, if FALSE, then do not remove |
elementsInfo |
elements masses to be used, needs to be of class elements, default is elementsMonoisotopic(). The elementsAverage() function does not produce 100 complex isotope patterns that emerge. In case average masses are needed it's better to use the enviPat option |
enviPat |
logical argument that determines if the enviPat based calculations should be used. Default is FALSE. For monoisotopoc masses there is no difference, but for average masses of larger molecules (with complicated isotope patterns) it's highly recommended to use enviPat = TRUE with exact = FALSE |
exact |
determines if the exact (TRUE, default) or the average (FALSE) mass is calculated (ignored if enviPat is FALSE)#' |
numeric vector
formulaToMass(c(H=2, O=1)) formulaToMass(c(H=2, O=1), elementsInfo = elementsAverage()) formulaToMass(c(C = 50, H=102)) formulaToMass(c(C = 50, H=102), elementsInfo = elementsAverage()) formulaToMass(c(C = 50, H=102), enviPat = TRUE) formulaToMass(c(C = 50, H=102), enviPat = TRUE, exact = FALSE)
formulaToMass(c(H=2, O=1)) formulaToMass(c(H=2, O=1), elementsInfo = elementsAverage()) formulaToMass(c(C = 50, H=102)) formulaToMass(c(C = 50, H=102), elementsInfo = elementsAverage()) formulaToMass(c(C = 50, H=102), enviPat = TRUE) formulaToMass(c(C = 50, H=102), enviPat = TRUE, exact = FALSE)
Calculates the m/z value of a charged/adducted ion
massToMz( mass, adducts = 0, adductFormula = electronFormula(), adductCharge = -1, elementsInfo = elementsMonoisotopic() )
massToMz( mass, adducts = 0, adductFormula = electronFormula(), adductCharge = -1, elementsInfo = elementsMonoisotopic() )
mass |
numeric vector, (neutral) mass of the molecule |
adducts |
numeric vector, number of adducts 'attached to' or 'removed from' the (originally neutral) molecule |
adductFormula |
formula (named numeric vector) of the adduct |
adductCharge |
numeric vector indicating the actual charge per adduct |
elementsInfo |
elements masses to be used, needs to be of class elements, default is elementsMonoisotopic() |
numeric vector
# amino acid residue lysine + water lysineMass <- formulaToMass(aminoAcidResidues()$getFormula("K") %f+% waterFormula()) lysineMass # M+H+ : adducts = 1, adductFormula = protonFormula(), adductCharge = 1 # singly charged/protonated ion (ESI) massToMz(lysineMass, adducts = 1, adductFormula = protonFormula(), adductCharge = 1) # Doubly charged/protonated ion (ESI) massToMz(lysineMass, adducts = 2, adductFormula = protonFormula(), adductCharge = 1) # M-H- : single, negatively charged massToMz(lysineMass, adducts = -1, adductFormula = protonFormula(), adductCharge = 1) # M+ : singly positively charged (molecular) ion (EI) massToMz(lysineMass, adducts = -1, adductFormula = electronFormula(), adductCharge = 1) # M- : singly negatively charged (molecular) ion (EI) massToMz(lysineMass, adducts = 1, adductFormula = electronFormula(), adductCharge = 1)
# amino acid residue lysine + water lysineMass <- formulaToMass(aminoAcidResidues()$getFormula("K") %f+% waterFormula()) lysineMass # M+H+ : adducts = 1, adductFormula = protonFormula(), adductCharge = 1 # singly charged/protonated ion (ESI) massToMz(lysineMass, adducts = 1, adductFormula = protonFormula(), adductCharge = 1) # Doubly charged/protonated ion (ESI) massToMz(lysineMass, adducts = 2, adductFormula = protonFormula(), adductCharge = 1) # M-H- : single, negatively charged massToMz(lysineMass, adducts = -1, adductFormula = protonFormula(), adductCharge = 1) # M+ : singly positively charged (molecular) ion (EI) massToMz(lysineMass, adducts = -1, adductFormula = electronFormula(), adductCharge = 1) # M- : singly negatively charged (molecular) ion (EI) massToMz(lysineMass, adducts = 1, adductFormula = electronFormula(), adductCharge = 1)
a wrapper around massToMz
for positively charged,
protonated ions in ESI
massToMzH(mass, charge = 1, elementsInfo = elementsMonoisotopic())
massToMzH(mass, charge = 1, elementsInfo = elementsMonoisotopic())
mass |
numeric vector, (neutral) mass of the molecule |
charge |
charge state |
elementsInfo |
elements masses to be used, needs to be of class elements, default is elementsMonoisotopic() |
numeric vector
# amino acid residue lysine + water lysineMass <- formulaToMass(aminoAcidResidues()$getFormula("K") %f+% waterFormula()) lysineMass # M+H+ : adducts = 1, adductFormula = protonFormula(), adductCharge = 1 # singly charged/protonated ion (ESI) massToMz(lysineMass, adducts = 1, adductFormula = protonFormula(), adductCharge = 1) massToMzH(lysineMass)
# amino acid residue lysine + water lysineMass <- formulaToMass(aminoAcidResidues()$getFormula("K") %f+% waterFormula()) lysineMass # M+H+ : adducts = 1, adductFormula = protonFormula(), adductCharge = 1 # singly charged/protonated ion (ESI) massToMz(lysineMass, adducts = 1, adductFormula = protonFormula(), adductCharge = 1) massToMzH(lysineMass)
Every modification inside the object has a name, position, fixed (flag), gain (formula), loss (formula) and category.
'name' is a character vector.
Position is a character vector specifying the amino acids which always have the modification (fixed = TRUE) or can have the modification (fixed = FALSE). More than one amino acid can be specified, eg NQ (for Asparagine & glutamine). Please note that currently the package does NOT support 'exotic' amino acids (eg Selenocysteine) or 'combination' letters, such as 'J' (Leucine or Isoleucine). For the C- and N-terminus, use 'C_Term' or 'N_Term' for position.
Gain and loss specify what is lost and/or gained when a amino acid is modified. For example Carbamidomethylation of Cysteine has both a loss formula c(H=1) and a gain formula c(C=2, H=4, N=1, O=1); obviously this could also be defined as: loss formula = emptyFormula(), gain formula = c(C=2, H=3, N=1, O=1).
For the category field (character vector) there is no real 'rule' on how to classify modifications. I usually stick to the categorisation of Mascot or Sequest.
number
retrieve the number of modifications present in the object, read only
fixed
retrieve a table of the fixed modifications in the modification table, read only
variable
retrieve a table of the variable modifications in the modification table, read only
table
to access the table of modifications directly
new()
Create a new modifications object
modifications$new(data = NA)
data
default = NA. If not NA, then should be a tibble with 6 columns: name, position, fixed, gain, loss and category. This is checked, but the contents of each column are not checked.
print()
For printing purposes: prints a table of the modifications
modifications$print(...)
...
no arguments, the function takes care of printing
add()
Adds a single modification
modifications$add(name, position, fixed, gain, loss, category)
name
character vector
position
character vector, should be a valid amino acid residue
fixed
logical vector, specifies whether the modification is fixed (TRUE) or dynamic (FALSE)
gain
named numeric vector (formula) that specifies what (atoms) are gained when a modification is applied to an amino acid
loss
named numeric vector (formula) that specifies what (atoms) are lost when a modification is applied to an amino acid
category
character vector. Not rigidly defined: for user to be able to select/filter etc which type of modifications to use
nothing
addTable()
Add (a set of) modifications via a tibble
modifications$addTable(data = NA)
data
default = NA. If not NA, then should be a tibble with 6 columns: name, position, fixed, gain, loss and category. This is checked, but the contents of each column are not checked.
nothing
clone()
The objects of this class are cloneable with this method.
modifications$clone(deep = FALSE)
deep
Whether to make a deep clone.
the logical vector fixed is very important. If TRUE, then a modification is considered to be always present, if FALSE then its presence is optional.
aaModifications <- modifications$new() aaModifications$addTable( tibble::tibble( name = c("Carbamidomethyl (C)", "Carboxymethyl (C)", "Oxidation (M)"), position = c("C","C","M"), fixed = c(TRUE,TRUE,FALSE), gain = list(c(C = 2, H = 4, N = 1, O = 1), c(C = 2, H = 3, N = 0, O = 2), c(C = 0, H = 0, N = 0, O = 1)), loss = list(c(protonFormula()), c(protonFormula()), c(emptyFormula())), category = c("Cys-state","Cys-state","Preparation Artefact") ) ) aaModifications aaModifications$add(name = "Deamidation", position = "NQ", fixed = FALSE, gain = c(O = 1, H = 1), loss = c(N = 1, H =2), category = "Preparation Artefact") aaModifications
aaModifications <- modifications$new() aaModifications$addTable( tibble::tibble( name = c("Carbamidomethyl (C)", "Carboxymethyl (C)", "Oxidation (M)"), position = c("C","C","M"), fixed = c(TRUE,TRUE,FALSE), gain = list(c(C = 2, H = 4, N = 1, O = 1), c(C = 2, H = 3, N = 0, O = 2), c(C = 0, H = 0, N = 0, O = 1)), loss = list(c(protonFormula()), c(protonFormula()), c(emptyFormula())), category = c("Cys-state","Cys-state","Preparation Artefact") ) ) aaModifications aaModifications$add(name = "Deamidation", position = "NQ", fixed = FALSE, gain = c(O = 1, H = 1), loss = c(N = 1, H =2), category = "Preparation Artefact") aaModifications
a wrapper around mzToMass
for positively charged,
protonated ions in ESI
mzHToMass(mz, charge = 1, elementsInfo = elementsMonoisotopic())
mzHToMass(mz, charge = 1, elementsInfo = elementsMonoisotopic())
mz |
numeric vector, mass to charge ratio of he ion |
charge |
charge state |
elementsInfo |
elements masses to be used, needs to be of class elements, default is elementsMonoisotopic() |
nuemric vector
massToMzH(mass = 174.1117, charge = 2) |> mzHToMass(charge = 2) massToMzH(mass = 174.1117, charge = 1) |> mzHToMass(charge = 1)
massToMzH(mass = 174.1117, charge = 2) |> mzHToMass(charge = 2) massToMzH(mass = 174.1117, charge = 1) |> mzHToMass(charge = 1)
essentially the reverse of the massToMz
mzToMass( mz, adducts = 0, adductFormula = electronFormula(), adductCharge = -1, elementsInfo = elementsMonoisotopic() )
mzToMass( mz, adducts = 0, adductFormula = electronFormula(), adductCharge = -1, elementsInfo = elementsMonoisotopic() )
mz |
mass to charge ratio of the ion |
adducts |
numeric vector, number of adducts 'attached to' or 'removed from' the (originally neutral) molecule |
adductFormula |
formula (named numeric vector) of the adduct |
adductCharge |
numeric vector indicating the actual charge per adduct |
elementsInfo |
elements masses to be used, needs to be of class elements, default is elementsMonoisotopic() |
numeric vector
massToMz(mass = 174.1117, adductFormula = c(e=1), adducts = 2, adductCharge = -1) |> mzToMass(adductFormula = c(e=1), adducts = 2, adductCharge = -1) massToMz(mass = 174.1117, adductFormula = c(H=1), adducts = 1, adductCharge = 1) |> mzToMass(adductFormula = c(H=1), adducts = 1, adductCharge = 1)
massToMz(mass = 174.1117, adductFormula = c(e=1), adducts = 2, adductCharge = -1) |> mzToMass(adductFormula = c(e=1), adducts = 2, adductCharge = -1) massToMz(mass = 174.1117, adductFormula = c(H=1), adducts = 1, adductCharge = 1) |> mzToMass(adductFormula = c(H=1), adducts = 1, adductCharge = 1)
translates a proteome Discoverer (Thermo Scientific) elements formula string to a formula as used by this package
pdToFormula(pdFormula)
pdToFormula(pdFormula)
pdFormula |
a character vector. Formula in a format as used by proteome discoverer software |
formula of format c(H=2, O=1)
glucose <- pdToFormula("C(6) H(12) O(6)") glucose water <- pdToFormula("H(2) O") water
glucose <- pdToFormula("C(6) H(12) O(6)") glucose water <- pdToFormula("H(2) O") water
Contains two character vectors: one representing the amino acid sequence, and a second conatining info on the positions of 'variable' modifications. The object also contains a modification table specifying the 'fixed' amd 'variable' modifications.
sequence
returns the amino acid sequence as a character vector, can be set but is not checked against the length of the modifications string
length
returns the length of the peptide (read only)
modifications
returns the moficiations string, can be set but is not checked agains the length of the sequence string
modificationsTable
returns the mofication table, can be modified. Note: 'variable' modifications should match the modifications string
new()
Create a new peptide object
peptide$new(sequence = "", modificationTable = NA, variableModifications = NA)
sequence
character vector, the amino acid sequence of the peptide
modificationTable
the table from a R6 'modifications' object containing the variable and fixed modifications present in the amino acid sequence
variableModifications
character vector specifying the position of variable modifications. The length of this vector must be the same length as the sequence. Each character specifies the modification at that position, eg "00010", means that position 1,2,3 & 5 are unmodified, while position 4 has the third variable modification in the the modification table. Note that the numbering follows the original row order of the modification table (fixed modifications filtered out). Additions to a modification table should not be a problem, deletions or editing can cause problems however as the object currently cannot deal with this itself. If this character vector is NA, then a character vector of "0"'s will be created (with the same length as the sequence)
a new 'peptide' object
print()
For printing purposes: prints the sequence string, the variable modifications string and the modification table
peptide$print(...)
...
no arguments, the function takes care of printing
sequence.part()
Retrieve part of the amino acid squence. Note: intended for internal use
peptide$sequence.part(startSeq = 1L, endSeq = 1L)
startSeq
integer vector, specifies the start of the part of the amino acid sequence to retrieve
endSeq
integer vector, specifies the end of the part of the amino acid sequence to retrieve
character vector
modifications.part()
Retrieve part of the variable modification string. Note: intended for internal use
peptide$modifications.part(startSeq = 1L, endSeq = 1L)
startSeq
integer vector, specifies the start of the part of the variable modification string to retrieve
endSeq
integer vector, specifies the end of the part of the variable modification string to retrieve
character vector
modifications.formula.part()
Determines the gain & loss formulas for a part of the peptide (waviable modification string and modification table are used for this): adds up all the losses and gains. If the position of a variable modification in the variable modification string does not match the amino acid in the modification table, then a warning is produced
peptide$modifications.formula.part( startSeq = 1L, endSeq = 1L, Nterminal = TRUE, Cterminal = TRUE )
startSeq
integer vector, specifies the start of the part of the variable modification string to retrieve
endSeq
integer vector, specifies the end of the part of the variable modification string to retrieve
Nterminal
logical vector if TRUE then Nterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
Cterminal
logical vector if TRUE then Cterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
a list of 2 formulas: the summed up gain formulas & the summed up loss formulas which are present in the part selected by startSeq and endSeq)
modifications.formula()
Deterines the gain & loss formulas for the full length of the peptide sequence. Essentially a wrapper for modifications.formula.part
peptide$modifications.formula(Nterminal = TRUE, Cterminal = TRUE)
Nterminal
logical vector if TRUE then Nterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
Cterminal
logical vector if TRUE then Cterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
a list of 2 formulas: the summed up gain formulas & the summed up loss formulas which are present in the part selected by startSeq and endSeq)
formula.part()
Determines the chemical formula of part of the peptide with or without the modifications.
peptide$formula.part( startSeq = 1, endSeq = 1, ignoreModifications = FALSE, Nterminal = TRUE, Cterminal = TRUE )
startSeq
integer vector, specifies the start of the part of the peptide sequence
endSeq
integer vector, specifies the end of the part of the peptide sequence
ignoreModifications
if FALSE then modifications (both fixed & variable) are taken into account when calculating the chemical formula of the peptide. Note: if TRUE then the 'Nterminal' and 'Cterminal' parameters are ignored
Nterminal
logical vector if TRUE then Nterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
Cterminal
logical vector if TRUE then Cterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
a named numeric vector, eg: c(C=6, H=12, O=6)
formula()
Determines the chemical formula of the full length peptide with or without modifications. Essentially a wrapper around 'formula.part'
peptide$formula( ignoreModifications = FALSE, Nterminal = TRUE, Cterminal = TRUE )
ignoreModifications
if FALSE then modifications (both fixed & variable) are taken into account when calculating the chemical formula of the peptide. Note: if TRUE then the 'Nterminal' and 'Cterminal' parameters are ignored
Nterminal
logical vector if TRUE then Nterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
Cterminal
logical vector if TRUE then Cterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
a named numeric vector, eg: c(C=6, H=12, O=6)
mass.part()
Calculate the mass of part of the peptide with or without modifications
peptide$mass.part( startSeq = 1, endSeq = 1, ignoreModifications = FALSE, Nterminal = TRUE, Cterminal = TRUE, elementsInfo = elementsMonoisotopic() )
startSeq
integer vector, specifies the start of the part of the peptide sequence
endSeq
integer vector, specifies the end of the part of the peptide sequence
ignoreModifications
if FALSE then modifications (both fixed & variable) are taken into account when calculating the chemical formula of the peptide. Note: if TRUE then the 'Nterminal' and 'Cterminal' parameters are ignored
Nterminal
logical vector if TRUE then Nterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
Cterminal
logical vector if TRUE then Cterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
elementsInfo
elements masses to be used, needs to be of class elements, default is elementsMonoisotopic()
numeric vector
mass()
Calculate the mass of the full length peptide with or without modifications
peptide$mass( ignoreModifications = FALSE, Nterminal = TRUE, Cterminal = TRUE, elementsInfo = elementsMonoisotopic() )
ignoreModifications
if FALSE then modifications (both fixed & variable) are taken into account when calculating the chemical formula of the peptide. Note: if TRUE then the 'Nterminal' and 'Cterminal' parameters are ignored
Nterminal
logical vector if TRUE then Nterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
Cterminal
logical vector if TRUE then Cterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
elementsInfo
elements masses to be used, needs to be of class elements, default is elementsMonoisotopic()
numeric vector
mz.part()
Calculate the m/z of part of the peptide (as an ion) with or without modifications
peptide$mz.part( startSeq = 1, endSeq = 1, ignoreModifications = FALSE, Nterminal = TRUE, Cterminal = TRUE, elementsInfo = elementsMonoisotopic(), adducts = 1, adductFormula = protonFormula(), adductCharge = 1 )
startSeq
integer vector, specifies the start of the part of the peptide sequence
endSeq
integer vector, specifies the end of the part of the peptide sequence
ignoreModifications
if FALSE then modifications (both fixed & variable) are taken into account when calculating the chemical formula of the peptide. Note: if TRUE then the 'Nterminal' and 'Cterminal' parameters are ignored
Nterminal
logical vector if TRUE then Nterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
Cterminal
logical vector if TRUE then Cterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
elementsInfo
elements masses to be used, needs to be of class elements, default is elementsMonoisotopic()
adducts
numeric vector, number of adducts attached to' or 'removed from' the (originally neutral) peptide
adductFormula
formula (named numeric vector) of the adduct
adductCharge
numeric vector indicating the actual charge per adduct
numeric vector
mz()
Calculate the m/z of the full length peptide (as an ion) with or without modifications
peptide$mz( ignoreModifications = FALSE, Nterminal = TRUE, Cterminal = TRUE, elementsInfo = elementsMonoisotopic(), adducts = 1, adductFormula = protonFormula(), adductCharge = 1 )
ignoreModifications
if FALSE then modifications (both fixed & variable) are taken into account when calculating the chemical formula of the peptide. Note: if TRUE then the 'Nterminal' and 'Cterminal' parameters are ignored
Nterminal
logical vector if TRUE then Nterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
Cterminal
logical vector if TRUE then Cterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
elementsInfo
elements masses to be used, needs to be of class elements, default is elementsMonoisotopic()
adducts
numeric vector, number of adducts attached to' or 'removed from' the (originally neutral) peptide
adductFormula
formula (named numeric vector) of the adduct
adductCharge
numeric vector indicating the actual charge per adduct
numeric vector
mzH.part()
Calculate the m/z of part of the peptide (as a protonated ion) with or without modifications
peptide$mzH.part( startSeq = 1, endSeq = 1, ignoreModifications = FALSE, Nterminal = TRUE, Cterminal = TRUE, charge = 1, elementsInfo = elementsMonoisotopic() )
startSeq
integer vector, specifies the start of the part of the peptide sequence
endSeq
integer vector, specifies the end of the part of the peptide sequence
ignoreModifications
if FALSE then modifications (both fixed & variable) are taken into account when calculating the chemical formula of the peptide. Note: if TRUE then the 'Nterminal' and 'Cterminal' parameters are ignored
Nterminal
logical vector if TRUE then Nterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
Cterminal
logical vector if TRUE then Cterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
charge
charge state
elementsInfo
elements masses to be used, needs to be of class elements, default is elementsMonoisotopic()
numeric vector
mzH()
Calculate the m/z of part of the peptide (as a protonated ion) with or without modifications
peptide$mzH( charge = 1, ignoreModifications = FALSE, Nterminal = TRUE, Cterminal = TRUE, elementsInfo = elementsMonoisotopic() )
charge
charge state
ignoreModifications
if FALSE then modifications (both fixed & variable) are taken into account when calculating the chemical formula of the peptide. Note: if TRUE then the 'Nterminal' and 'Cterminal' parameters are ignored
Nterminal
logical vector if TRUE then Nterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
Cterminal
logical vector if TRUE then Cterminal modifications are included (if N-terminus is present in the part selected by startSeq and endSeq)
elementsInfo
elements masses to be used, needs to be of class elements, default is elementsMonoisotopic()
numeric vector
fragments.part()
generates a table of fragments which could arise from fragmenting part of the peptide. The ionseries generated are: a, a-H2O, a-NH3, b, b-H2O, b-NH3, b+H2O, c, x, y, y-H2O, y-NH3, z. Please note that the calculation is relatively 'dumb': it does NOT check whether a fragment is possible at all. Prime example is the B+H2O ion series: these fragment ions can only if certain conditions are met. Currently there is no check in this function that checks these conditions/assumptions
peptide$fragments.part( startSeq = 1, endSeq = 1, ignoreModifications = FALSE, onlyIons = TRUE, chargeState = 1, returnFormulas = FALSE, formulaIncludeChargeProtons = FALSE )
startSeq
integer vector, specifies the start of the part of the peptide sequence
endSeq
integer vector, specifies the end of the part of the peptide sequence
ignoreModifications
if FALSE then modifications (both fixed & variable) are taken into account when calculating the chemical formula of the peptide
onlyIons
default = TRUE, only information on the 13 (earlier mentioned) ion series is generated. If FALSE then an additional 10 columns are generated with info on the ionseries
chargeState
charge state of the ions in the generated table
returnFormulas
default = FALSE, if TRUE then in stead of numerical values the table will be populated by the chemical formulas of the neutral fragments or charged fragment ions
formulaIncludeChargeProtons
default = FALSE, if TRUE then protons will be included in the formulas (ignored when ' returnFormulas = FALSE)
a data.frame with fragment information
fragments()
generates a table of fragments which could arise from fragmenting the full sequence of the peptide. The ion series generated are: a, a-H2O, a-NH3, b, b-H2O, b-NH3, b+H2O, c, x, y, y-H2O, y-NH3, z. Please note that the calculation is relatively 'dumb': it does NOT check whether a fragment is possible at all. Prime example is the B+H2O ion series: these fragment ions can only if certain conditions are met. Currently there is no check in this function that checks these conditions/assumptions
peptide$fragments( ignoreModifications = FALSE, onlyIons = TRUE, chargeState = 1, returnFormulas = FALSE, formulaIncludeChargeProtons = FALSE )
ignoreModifications
if FALSE then modifications (both fixed & variable) are taken into account when calculating the chemical formula of the peptide
onlyIons
default = TRUE, only information on the 13 (earlier mentioned) ion series is generated. If FALSE then an additional 10 columns are generated with info on the ionseries
chargeState
charge state of the ions in the generated table
returnFormulas
default = FALSE, if TRUE then in stead of numerical values the table will be populated by the chemical formulas of the neutral fragments or charged fragment ions
formulaIncludeChargeProtons
default = FALSE, if TRUE then protons will be included in the formulas (ignored when ' returnFormulas = FALSE)
a data.frame with fragment information
fragments.part.immoniumIons()
generates a numeric vector containing 'expected' immonium ions based on the amino acid content of part of the peptide. Please note that this function does NOT take into account possible (fixed or variable) modifications
peptide$fragments.part.immoniumIons(startSeq = 1, endSeq = 1)
startSeq
integer vector, specifies the start of the part of the peptide sequence
endSeq
integer vector, specifies the end of the part of the peptide sequence
numeric vector
fragments.immoniumIons()
generates a numeric vector containing 'expected' immonium ions based on the amino acid content of the full sequence of the peptide. Please note that this function does NOT take into account possible (fixed or variable) modifications
peptide$fragments.immoniumIons()
numeric vector
clone()
The objects of this class are cloneable with this method.
peptide$clone(deep = FALSE)
deep
Whether to make a deep clone.
testPeptide <- peptide$new(sequence = "SAMPLER", modificationTable = aminoAcidModifications()$table, variableModifications = "0010000") testPeptide testPeptide$formula()
testPeptide <- peptide$new(sequence = "SAMPLER", modificationTable = aminoAcidModifications()$table, variableModifications = "0010000") testPeptide testPeptide$formula()
counts the occurence of a amino acid (sequence) in another amino acid sequence
peptideCount( thePeptide = NA, searchPeptide = NA, doNotSplice = TRUE, upper = TRUE )
peptideCount( thePeptide = NA, searchPeptide = NA, doNotSplice = TRUE, upper = TRUE )
thePeptide |
character vector, the peptide to be searched |
searchPeptide |
character vector, the amino acid sequence to search for |
doNotSplice |
if FALSE the all characters in the searchPeptide are searched individually.If TRUE then the searchPeptide is searched as a whole. Default = TRUE |
upper |
convert both thePeptide & searchPeptides to uppercase before searching |
numeric vector
peptideCount("SAMPLER", "P") peptideCount("SAMPLER", "PLER", doNotSplice = TRUE) peptideCount("SAMPLER", "PLER", doNotSplice = FALSE)
peptideCount("SAMPLER", "P") peptideCount("SAMPLER", "PLER", doNotSplice = TRUE) peptideCount("SAMPLER", "PLER", doNotSplice = FALSE)
gives formula of a peptide string
peptideFormula(peptide, aminoAcids = aminoAcidResidues())
peptideFormula(peptide, aminoAcids = aminoAcidResidues())
peptide |
character vector specifying the sequence of amino acids in a peptide |
aminoAcids |
R6 object of type 'chemicals' with the amino acid info, default = aminoAcidResidues() |
a numeric vector
does not check for non-amino acid letters, modifications cannot be specified
peptideFormula("SAMPLER")
peptideFormula("SAMPLER")
Generates a pre-defined (incomplete) table of names of fragments for the ions resulting when fragmenting a peptide in MS
peptideFragments()
peptideFragments()
a data.frame of two columns: 'name' and 'series name' of fragments
will possibly be removed in the future
gives ion m/z of the protonated peptide
peptideMzH( peptide, charge = 1, aminoAcids = aminoAcidResidues(), elementsInfo = elementsMonoisotopic() )
peptideMzH( peptide, charge = 1, aminoAcids = aminoAcidResidues(), elementsInfo = elementsMonoisotopic() )
peptide |
character vector specifying the sequence of amino acids in a peptide |
charge |
numeric vector specifying the charge of the peptide ion |
aminoAcids |
R6 object of type 'chemicals' with the amino acid info, default = aminoAcidResidues() |
elementsInfo |
R6 object of type 'elements' with the elements masses info, default = elementsMonoisotopic() |
a numeric vector
does not check for non-amino acid letters, modifications cannot be specified
peptideMzH("SAMPLER") peptideMzH("SAMPLER", charge = 2) peptideMzH("SAMPLER", elementsInfo = elementsAverage())
peptideMzH("SAMPLER") peptideMzH("SAMPLER", charge = 2) peptideMzH("SAMPLER", elementsInfo = elementsAverage())
generates a pre-defined formula for proton
protonFormula()
protonFormula()
a named numeric vector (formula)
print(protonFormula())
print(protonFormula())
translates an cdkFormula object to a 'regular' formula format
rcdkFormula(cdkformula)
rcdkFormula(cdkformula)
cdkformula |
an object of type rcdkFormula |
formula of format c(H=2, O=1)
This function does not deal with the charge state which is possibly defined in the rcdkFormula object
glucose <- rcdk::get.formula("C6H12O6") rcdkFormula(glucose) glucoseAdductIon <- rcdk::get.formula("C6H12O6Na1", charge = 1) glucoseAdductIon # to get to the same m/z value glucoseAdductIon |> massSpectrometryR::rcdkFormula() |> formulaToMass() |> massToMz(adducts = -1)
glucose <- rcdk::get.formula("C6H12O6") rcdkFormula(glucose) glucoseAdductIon <- rcdk::get.formula("C6H12O6Na1", charge = 1) glucoseAdductIon # to get to the same m/z value glucoseAdductIon |> massSpectrometryR::rcdkFormula() |> formulaToMass() |> massToMz(adducts = -1)
removes elements that have number zero
removeZeros(formula)
removeZeros(formula)
formula |
named numeric vector, example c(O = 2, C = 1) |
named numeric vector (formula)
glucose = c(O=6, H=12, C=6) glucose %f+% emptyFormula() removeZeros(glucose %f+% emptyFormula())
glucose = c(O=6, H=12, C=6) glucose %f+% emptyFormula() removeZeros(glucose %f+% emptyFormula())
sorts the elements of a formula in alphabetical order (increasing/decreasing)
sortFormula(formula, decrease = FALSE)
sortFormula(formula, decrease = FALSE)
formula |
named numeric vector, example c(O = 2, C = 1) |
decrease |
logical flag on how to sort, default = FALSE: increasing |
named numeric vector (formula)
glucose = c(O=6, H=12, C=6) glucose sortFormula(glucose)
glucose = c(O=6, H=12, C=6) glucose sortFormula(glucose)
Translates a character vector formula, eg 'C6H12O6' to a regular formula c(C=6, H=12, O=6)
stringFormula(string)
stringFormula(string)
string |
character vector, format eg: 'C6H12O6' |
formula of format c(H=2, O=1)
it's imperative that every element has a number (count), otherwise this function is highly likely to malfunction and return NA
stringFormula("H3O4P1") stringFormula("C6H12O6")
stringFormula("H3O4P1") stringFormula("C6H12O6")
Translates a character vector formula, eg 'C6H12O6' to a regular formula c(C=6, H=12, O=6)
stringToFormula(string)
stringToFormula(string)
string |
character vector, format eg: 'C6H12O6' |
formula of format c(H=2, O=1)
this function is an improved version of stringFormula(). Now every elements with count 1 can have the number omitted. However, the function depends on 'correct' elements (first letter is uppercase, second letter is lowercase). This function also allows for the presence of isotopes, eg '[13]C' or '[2]H2O'
stringToFormula("H3O4P1") stringToFormula("C6H12O6") stringToFormula("C6H5Br") stringToFormula("[13]C6H12O5[18]O")
stringToFormula("H3O4P1") stringToFormula("C6H12O6") stringToFormula("C6H5Br") stringToFormula("[13]C6H12O5[18]O")
subtracting one formula from another, taking into account possible differing elements
subtractFormulas(formula1, formula2)
subtractFormulas(formula1, formula2)
formula1 |
named numeric vector, example c(O = 2, C = 1); formula to be subtracted from |
formula2 |
named numeric vector, example c(H = 2, S = 1); formula to subtract |
a named numeric vector (formula)
There are no checks for negative values!
subtractFormulas(c(H = 2, O = 1), c(H = 1)) subtractFormulas(c(H = 2, O = 1), c(S = 1, O = 2))
subtractFormulas(c(H = 2, O = 1), c(H = 1)) subtractFormulas(c(H = 2, O = 1), c(S = 1, O = 2))
checks if formula is valid
validFormula(formula, string = FALSE)
validFormula(formula, string = FALSE)
formula |
character vector or named numeric vector, representing the formula to be checked |
string |
logical vector specifying if the formula is a character vector or not |
logical vector
This is done via the check_chemform function from the package enviPat
glucose <- c(C=6, H=12, O=6) validFormula(glucose) formulaString(glucose) validFormula(formulaString(glucose), string = TRUE)
glucose <- c(C=6, H=12, O=6) validFormula(glucose) formulaString(glucose) validFormula(formulaString(glucose), string = TRUE)
generates a pre-defined formula for water
waterFormula()
waterFormula()
a named numeric vector (formula)
print(waterFormula())
print(waterFormula())