DNA sequencing
DNA sequencing is the process of
determining the sequence of nucleotides (As, Ts, Cs, and Gs) in a piece of DNA. DNA base sequence carries the information a
cell needs to carry out, for assembling protein and RNA molecules. Information
regarding the sequence of a DNA is very important for identifying and studying
the functions of genes. There are different approaches for DNA sequencing.
- Maxam and
Gilbert method
- Chain
termination method
- Semiautomated
method
- Automated
method
- Pyrosequencing
- The
whole-genome shotgun sequencing method
- Clone by the
clone sequencing method
- Next-generation sequencing method
Next-generation sequencing techniques are
new, large-scale approaches that increase the speed and reduce the cost of DNA
sequencing.
MAXAM-GILBERT
method of DNA sequencing
Maxam–Gilbert sequencing is a method of
DNA sequencing developed by Allan Maxam and Walter Gilbert in 1976–1977. It is also known as chemical cleavage method.
Maxam–Gilbert sequencing was the first widely adopted method for DNA
sequencing, and, along with the Sanger dideoxy method, is considered as the
first generation of DNA sequencing methods.
This method is more accurate and
advantageous than Sanger sequencing because purified DNA is directly used for
sequencing and is highly suitable for DNA finger printing, genetic engineering
studies and structural studies. It is more advantageous over the Sanger method.
But since the scalability is poor (only
400bp can be sequenced) and use of harmful radiolabeled chemicals are involved,
currently Maxam–Gilbert sequencing is no longer of widespread use.
Brief
principle of the method
First step is DNA extraction
The 5’ end of DNA is tagged with P32
to make the DNA molecule to be detected using radioactive techniques.
DNA strands are separated to obtain single
stranded DNA by denaturation and each strand is sequenced separately.
DNA strands are divided into two portions
I and II.
I is treated with dimethyl sulfoxide which
cause methylation of guanine and adenine, the methylation being caused at
Guanine more than adenine.
The treatment time is adjusted so that
only a very few bases are methylated per strand.
Then the sample I is divided into two, Ia
and Ib.
Ia is heated – cause sugar phosphate
breakage at methylated Guanine positions and fragments of varying length each
having G at the end is obtained – G only
fragments.
Ib is treated with dilute alkali – cause
breakage at both methylated A and G and fragments of varying length having
either A or G at the end is obtained – A+G
fragments.
After electrophoresis on polyacrylamide
gel, the fragments showing band at both (G and A+G) lanes contains G at end and
if in only (A+G) lane then it contains A at end.
Sample II is divided into two- IIa and
IIb.
IIa is treated with hydrazine in presence
of buffer and treated with piperidine - cause breakage of T and C -T+C fragments.
IIb is treated with hydrazine in presence
of 2M NaCl and treated with piperidine - cause breakage of C - C fragments.
After electrophoresis on polyacrylamide
gel, the fragments showing band at both (C and C+T) lanes contains C at end and
if in only (C+T) lane then it contains T at end.
Sanger’s
dideoxy method or chain termination method
Sanger sequencing was developed by the British biochemist Fred Sanger and his colleagues in 1977. This method is also known as first-generation DNA sequencing method. The chain termination method is also termed as dideoxynucleotide sequencing because of the use of the special types of ddNTPs. The ddNTPs are different from normal dNTPs, ddNTP possesses hydrogen group instead of hydroxyl group in the normal dNTPs.
Dideoxy nucleotides lack a hydroxyl group
on the 3’ carbon of the sugar ring. In a regular nucleotide, the 3’ hydroxyl
group allow a new nucleotide to be added to an existing chain thereby extending
the DNA molecule. Once a dideoxy nucleotide is added to the chain, there is no
hydroxyl end available and no further nucleotides can be added. The chain ends
with the dideoxy nucleotide.
Sanger sequencing involves making many
copies of a target DNA region. Briefly,
the process of Sanger sequencing is divided into 3 steps:
- DNA
extraction. The DNA to be sequenced
is denatured to obtain single stranded template.
- PCR
amplification. DNA is divided into
four tubes. To each tube, DNA
polymerase, primer and all 4 deoxyribo nucleotides (dCTP, dATP, dTTP,
dGTP) are added and either one of the 4 dideoxy ribonucleotides (ddATP or
ddTTP or ddCTP or ddGTP). The
dideoxy molecules lack 3’ OH so that they can not form a phosphodiester
bond and replication stops at this position.
The
following ingredients are involved
·
A DNA polymerase enzyme
·
A primer, which is a short piece of
single-stranded DNA that binds to the template DNA and acts as a
"starter" for the polymerase
·
The four DNA nucleotides (dATP, dTTP,
dCTP, dGTP)
·
The template DNA to be sequenced
·
Dideoxy, or chain-terminating, versions of
all four nucleotides (ddATP, ddTTP, ddCTP, ddGTP), each labeled with a
different color of dye.
Sanger sequencing is the gold standard
method for research and diagnosis. This
is easy to perform and automate and have high reproducibility.
Automated
sequencing
The identification of sequence from the
electrophoretic pattern through manual Sanger method was tedious. Recent
advances have enabled the semi-automated Sanger sequencing method which is
Sanger’s method with some minor variations.
Here, instead of 4 different reaction
tubes, a single tube is used and thus during electrophoresis the DNA runs in a
single lane in gel. Fluorescent-labeled ddNTPs are used. Capillary electrophoresis is used to separate
DNA molecules on the basis of size. It
is powerful enough to separate single base pair fragment. The chromatogram
generated after Capillary electrophoresis will give output as fluorescent peaks,
each colour representing a particular ddNTP.
In dye-terminator sequencing, each of the
four dideoxynucleotide chain terminators is labelled with fluorescent dyes,
each of which emit light at different wavelengths.
Pyrosequencing:
This was
described in 1993 by Bertil
Pettersson, Mathias Uhlen and Pål Nyren. Principle of
the method is the detection of the pyrophosphate released during the chain
reaction of nucleotide addition. The order of the nucleotide is determined by
the PPi released during the joining of two adjacent nucleotides.
Three
enzymes are required in the pyrosequencing method which work in a sequential
manner for the detection of the PPi. The three enzymes are:
- DNA
polymerase (without exonuclease activity)
- Luciferase
- Sulfurylase
Enzyme
polymerase add dNTPs to single-stranded DNA. If the correct complementary base
is added, pyrophosphate is released.
Enzyme
sulfurylase converts PPi into ATP (energy) with the help of the APS (adenosine
5´ phosphosulfate).
ATP will
be acted upon by luciferase and luciferase converts luciferin into oxyluciferin
in the presence of oxygen and a photon of light is released.
So, once
the correct nucleotide is added, light will be released by the enzymatic
reaction which is detected by a photodiode or a photomultiplier tube.
Based on
the substrate used, two types of pyrosequencing methods are there, solid-phase
pyroseq and liquid phase pyroseq.
The
pyrosequencing method is more accurate and faster than Sanger sequencing.
But this
method involves more chemical steps and thus is more complex.
Whole-genome shotgun sequencing:
This
technique is also a modification of Sanger’s chain termination method and the shotgun sequencing concept was originally discovered by Sanger F
and his colleagues for sequencing the whole genome. This technique can be used to sequence the entire
genome of an organism.
The
principle is the same as Sanger’s method.
There is an additional step of DNA fragmentation which help to read
multiple fragments.
The
entire genome of an organism is fragmented with the help of endonuclease
enzymes or by mechanically, and the smaller fragments are sequenced
individually.
The
computer-based software analyses each and every overlapping fragment and
reassemble it to generate the complete sequence of entire genome.
Steps
involved:
- Fragmentation of DNA to about 2 -20kb.
- Formation of libraries of subfragments, fragments are ligated
in vectors and an entire library is generated
- Sequencing the subfragments
- Generation and reading of overlapping fragments (contigs) by using
computer.
The technique is faster and cheaper, and can be used to sequence whole genome of an organism. This technique depends on computational analysis and a huge, powerful, supercomputer is required.
In 1981, for sequencing cauliflower
mosaic virus genome shotgun sequencing method was used.
Clone by clone sequencing:
For sequencing the whole genome, Clone by Clone Method can be used. In 1980 and 1990 the genomes of C. elegans and S. cerevisiae were sequenced using the clone by clone sequencing, respectively and this technique was used during the human genome project.
This
method is similar to shot gun sequencing method, but have additional steps.
1.
In the first step, instead of
smaller fragments, large clumps of DNA fragments are constructed and the
location of each fragment is noted through gene mapping. Using bacterial artificial chromosome,
multiple copies of each fragment are generated.
2.
In the next step, all these
copied fragments are further fragmented into smaller pieces and inserted into vectors.
3.
Now sequencing of these short
fragments are performed as per shotgun technique and overlapping fragments are
assembled by using computer.
4.
In the last step, the data obtained
during gene mapping is used to assemble the complete sequence. So the sequences
can be arranged on each chromosome based on their location.
Sequencing
of whole chromosomes can be done without any gaps.
More
tedious, time-consuming and costly since more procedures like mapping, cloning,
and restriction digestion are involved.
Cloning
od telomeres and centromeres are difficult.
Next-generation sequencing (NGS) or High-throughput
sequencing
The most recent set of DNA sequencing
technologies are collectively referred to as next-generation sequencing. Next-generation sequencing involves
amplification of millions of copies of a particular fragment and sequences are
analyzed by computational program. There
are a variety of next-generation sequencing techniques that use different
technologies. Examples are Polony
sequencing, Massively parallel signature sequencing
(MPSS), 454 pyrosequencing, Illumina (Solexa)
sequencing, Combinatorial probe anchor synthesis (cPAS), SOLiD sequencing, Ion Torrent
semiconductor sequencing, DNA nanoball
sequencing, Nanopore DNA sequencing, etc.
These varieties of next-generation sequencing techniques use
different technologies, however, most share a common set of features,
- Highly
parallel: many sequencing reactions take place at the
same time
- Micro
scale: reactions are tiny and many can be done at
once on a chip
- Fast: because
reactions are done in parallel, results are ready much faster
- Low-cost: sequencing
a genome is cheaper than with Sanger sequencing
Conceptually, next-generation sequencing is kind of like running
a very large number of tiny Sanger sequencing reactions in parallel. Thanks to
this parallelization and small scale, large quantities of DNA can be sequenced
much more quickly and cheaply with next-generation methods.
The
NGS process can be divided into 4 different steps:
- Library preparation
- Cluster generation
- DNA sequencing
- Data analysis
1. Library preparation:
During
library, fragmentation of cDNA or DNA fragments is done by restriction
digestion and the smaller DNA fragments are ligated with known DNA sequence (adaptors).
This process is called adapter ligation and after this, the library of smaller
DNA fragments is generated. Any unbound DNA fragments are washed by washing
buffer. This process of library
preparation is called as tagmentation.
2. Cluster generation:
The
short oligonucleotide sequences prepared are immobilized on solid surface. Immobilization is done with the help of complementary
binding by adapter sequences.
This library of fragmented DNA bound with immobilized
oligos on the solid surface is used to generate cluster of the DNA sequence through
bridge amplification.
In bridge amplification, DNA fragments bend over and bind to next oligo and create a bridge. A primer will bind to this DNA sequence and will be amplified vertically. So two new single-stranded DNA is generated by bridge amplification.
3. Sequencing:
The
polymerase enzyme adds the nucleotide into the bridge amplification and these
amplification signals are recorded each time. This will generate multiple sequencing
databases for the DNA sequence.
4. Data analysis:
The read or sequence generated by the sequencing is aligned to reference genome sequence. This helps to identify any addition, deletion or variation in the sequence.
(https://geneticeducation.co.in)
The
NGS is the most advanced, fast, accurate and 100% effective technique for DNA
sequencing.
There
are several other sequencing methods also, they are
- Single-molecule real-time (RNAP) sequencing
- Illumina (Solexa) sequencing
- Polony sequencing
- DNA nano ball sequencing
- SOLiD sequencing
- Single-molecule SMRT(TM) sequencing
- Massively parallel signature sequencing (MPSS)
- High throughput sequencing
- Helioscope (TM) single-molecule sequencing
Applications of DNA sequencing
Used
for the identification of genes or mutations responsible for hereditary
disorders.
Used
for parental verification, criminal investigation and identification of
individuals using available samples such as hair, nail, blood or tissue.
Identification
of GMO species and any minor variations in the plant genome.
Used
to construct whole chromosomal maps, restriction digestion maps, and genome
maps.
Open
reading frames, non-open reading frames and protein-coding DNA sequences can be
identified.
Used
in exon/ intron, repeat sequence and tandem repeat identification and
detection.
Used
in gene manipulation and gene editing
New
variations in nature can be determined through sequencing.
Used
in metagenomic studies
For
Microbial identification and study of the new bacterial species.
For
evolutionary studies and for generating evolutionary map
For
studying asymptomatic high-risk population, prior to the occurrence of disease
Limitations of DNA sequencing
DNA
sequencing is performed using computer algorithm-based assistive techniques and
so for such computational data processing high-speed supercomputer is required.
It
is difficult to sequence sequences like tandem repeats, repetitive DNA,
fragmented genes, other duplicated regions, etc.
There are chances of errors in the pre-sample processing which will result in economic losses.
No comments:
Post a Comment