Data-dependent versus data-independent acquisition
What is the difference between data-dependent and
data-independent acquisition?
Because of the all ion fragmentation, the tandem mass
spectra generated in DIA experiments are chimeric (a lot of MS/MS peaks from
many precursor ions) and therefore a lot more complex than typical DDA spectra.
Increasing instrument resolution as well as decreasing the number of peptides
eluting at any one time within the same isolation window can help. But
ultimately it is a bioinformatics problem to solve.
Modern mass spectrometers are versatile instruments, and
when combined with clever design can acquire data in many different modes.
Something that is always useful in proteomics experiments. The two most popular
strategies are generally divided into data-dependent (DDA) and data-independent
acquisition (DIA). Many variations of these exist, and can have different names
- often describing very similar things. Proteomics scientists also employ a
third broad category, called targeted acquisition, but this will not be
discussed here.
Data-dependent acquisition
In DDA, a high resolution precursor ion scan (also known as
survey scan or just ‘MS1’) is first performed. This generates a list of MS1
masses, most of which correspond to intact peptide ions. Unless it is an experiment,
where there was no protein digestion step, in which case the MS1 scan
represents a much more complex protein isotopic envelope. In order to sequence
and identify these peptides (or proteins), the top n most abundant precursor
ions are sequentially isolated and fragmented. The “sequential” aspect is
important here; it means the peptides are fragmented one-by-one. If a complex
mixture is eluting from a chromatographic column there is only so much time to “catch”
the peptides before they disappear (elute off the column). In addition, in
cases where few highly abundant peptides are present in the mixture the
instrument tends to stochastically sample these and completely miss the rest.
To offset this a ‘dynamic exclusion’ time window is also set. During this window,
any mass already fragmented will not be considered again.
The fragmentation process generates tandem mass spectra
(product ion spectra, MS/MS or MS2), which can then be searched against a
database containing all theoretical spectra generated from a set of proteins
thought to be present in the sample. An alternative searching approach is to
perform de novo sequencing, where peptide sequence is inferred directly from
MS2 spectra.
Although DDA is extensively used in proteomics, it has some
disadvantages. An obvious one is that peptides tend to co-elute from the
chromatographic column and arrive at the mass spectrometer simultaneously and
in very different concentrations. The data dependent nature of the experiment (top
5 to top 20 most intense ions are usually directed for fragmentation) limits
the dynamic range of this method and means that only peptides present at high
relative concentrations will be preferentially sampled whereas information
about the proteins represented by low concentration peptides in the mixture can
be incomplete.
Additionally, the selection of precursor ions is somehow
stochastic. As a result, the overlap between peptide identifications can be
poor, even for technical replicates of the same sample (below 75% has been
reported – I know I need to add a reference here, here is one from David Tabb).
Another problem, pertinent to all database searching
approaches is the availability and reliability of protein sequence information,
which cannot always be guaranteed.
While it might be easy to criticise DDA, the fact remains it
is still the workhorse of proteomics.
Data-independent acquisition
DIA is a technique that was popularised much later than DDA,
and has gained community attention because it attempts to solve some of the
problems inherent to DDA.
In this mode of acquisition, scientists abandon the idea of
sequencing one peptide at a time, and instead all peptides eluting into the
instrument at any given time are fragmented together – this concept is known as
multiplex fragmentation. The process happens in a serial manner; all precursor
ions are measured in the survey scan, followed by all ion fragmentation.
An implementation of this method on the Waters TOF
instruments was termed MSE (I think E stands for elevated energy, or maybe everything
fragmentation – if anyone reading knows the answer, do let me know!). In an
example of great instrument design an additional mode of separation by means of
ion mobility was also added by Waters; it has been marketed as HDMSE (and this
one I know what the acronym stands for, HD – high definition).
Another variation of DIA performed on orthogonal
time-of-flight mass specs, and recently very popular in the field, is SWATH (this
one stands for Sequential Window Acquisition of all THeoretical mass spectra).
In SWATH, instead of fragmenting the entire mass range, smaller independent
mass windows (e.g. 25 Da) are isolated and fragmented sequentially, until the
complete desired mass range is covered. This happens on a chromatographic time
scale, and there is a small mass overlap between each window. The overlap is
necessary as it turns out it greatly helps with peptide identification later
on.
DIA type of acquisition is also possible on the Orbitrap instruments
and has been shown to increase throughput of analysis. I am going to have a
much more extensive post about this, so that’s all I’m going to say here.
I have not yet mentioned any drawback of DIA. Does it mean
there are none? Certainly not, but as the technique (both experimental and
bioinformatics part) is still in active development I think the jury is still
out… One obvious point should be mentioned tho.
Thanks a lot for this Explanation. It helped a lot understanding the topic :)
ReplyDeleteThis is one awesome article.Thanks Again. Really Great.
ReplyDeleteMachine Learning Online Training In India
Machine Learning Online Training