DNA sequencing

DNA sequencing is the process of determining the sequence of nucleotides (As, Ts, Cs, and Gs) in a piece of DNA. DNA base sequence carries the information a cell needs to carry out, for assembling protein and RNA molecules. Information regarding the sequence of a DNA is very important for identifying and studying the functions of genes. There are different approaches for DNA sequencing.

Maxam and Gilbert method
Chain termination method
Semiautomated method
Automated method
Pyrosequencing
The whole-genome shotgun sequencing method
Clone by the clone sequencing method
Next-generation sequencing method

Next-generation sequencing techniques are new, large-scale approaches that increase the speed and reduce the cost of DNA sequencing.

MAXAM-GILBERT method of DNA sequencing

Maxam–Gilbert sequencing is a method of DNA sequencing developed by Allan Maxam and Walter Gilbert in 1976–1977. It is also known as chemical cleavage method. Maxam–Gilbert sequencing was the first widely adopted method for DNA sequencing, and, along with the Sanger dideoxy method, is considered as the first generation of DNA sequencing methods.

This method is more accurate and advantageous than Sanger sequencing because purified DNA is directly used for sequencing and is highly suitable for DNA finger printing, genetic engineering studies and structural studies. It is more advantageous over the Sanger method.

But since the scalability is poor (only 400bp can be sequenced) and use of harmful radiolabeled chemicals are involved, currently Maxam–Gilbert sequencing is no longer of widespread use.

Brief principle of the method

First step is DNA extraction

The 5’ end of DNA is tagged with P³² to make the DNA molecule to be detected using radioactive techniques.

DNA strands are separated to obtain single stranded DNA by denaturation and each strand is sequenced separately.

DNA strands are divided into two portions I and II.

I is treated with dimethyl sulfoxide which cause methylation of guanine and adenine, the methylation being caused at Guanine more than adenine.

The treatment time is adjusted so that only a very few bases are methylated per strand.

Then the sample I is divided into two, Ia and Ib.

Ia is heated – cause sugar phosphate breakage at methylated Guanine positions and fragments of varying length each having G at the end is obtained – G only fragments.

Ib is treated with dilute alkali – cause breakage at both methylated A and G and fragments of varying length having either A or G at the end is obtained – A+G fragments.

After electrophoresis on polyacrylamide gel, the fragments showing band at both (G and A+G) lanes contains G at end and if in only (A+G) lane then it contains A at end.

Sample II is divided into two- IIa and IIb.

IIa is treated with hydrazine in presence of buffer and treated with piperidine - cause breakage of T and C -T+C fragments.

IIb is treated with hydrazine in presence of 2M NaCl and treated with piperidine - cause breakage of C - C fragments.

After electrophoresis on polyacrylamide gel, the fragments showing band at both (C and C+T) lanes contains C at end and if in only (C+T) lane then it contains T at end.

The sequencing of both strands is necessary to obtain an error free sequence of the DNA molecule.

(https://geneticeducation.co.in)

Sanger’s dideoxy method or chain termination method

Sanger sequencing was developed by the British biochemist Fred Sanger and his colleagues in 1977. This method is also known as first-generation DNA sequencing method. The chain termination method is also termed as dideoxynucleotide sequencing because of the use of the special types of ddNTPs. The ddNTPs are different from normal dNTPs, ddNTP possesses hydrogen group instead of hydroxyl group in the normal dNTPs.

Dideoxy nucleotides lack a hydroxyl group on the 3’ carbon of the sugar ring. In a regular nucleotide, the 3’ hydroxyl group allow a new nucleotide to be added to an existing chain thereby extending the DNA molecule. Once a dideoxy nucleotide is added to the chain, there is no hydroxyl end available and no further nucleotides can be added. The chain ends with the dideoxy nucleotide.

Sanger sequencing involves making many copies of a target DNA region. Briefly, the process of Sanger sequencing is divided into 3 steps:

DNA extraction. The DNA to be sequenced is denatured to obtain single stranded template.
PCR amplification. DNA is divided into four tubes. To each tube, DNA polymerase, primer and all 4 deoxyribo nucleotides (dCTP, dATP, dTTP, dGTP) are added and either one of the 4 dideoxy ribonucleotides (ddATP or ddTTP or ddCTP or ddGTP). The dideoxy molecules lack 3’ OH so that they can not form a phosphodiester bond and replication stops at this position.

The following ingredients are involved

· A DNA polymerase enzyme

· A primer, which is a short piece of single-stranded DNA that binds to the template DNA and acts as a "starter" for the polymerase

· The four DNA nucleotides (dATP, dTTP, dCTP, dGTP)

· The template DNA to be sequenced

· Dideoxy, or chain-terminating, versions of all four nucleotides (ddATP, ddTTP, ddCTP, ddGTP), each labeled with a different color of dye.

Identification of the amplified fragments using autoradiography, PAGE, or capillary gel electrophoresis. The DNA from four tubes is denatured and electrophoresed and based on the electrophoretic pattern the location of each base is determined. The relative positions of the different bands among the four lanes are then used to read the DNA sequence.

Sanger sequencing is the gold standard method for research and diagnosis. This is easy to perform and automate and have high reproducibility.

Automated sequencing

The identification of sequence from the electrophoretic pattern through manual Sanger method was tedious. Recent advances have enabled the semi-automated Sanger sequencing method which is Sanger’s method with some minor variations.

Here, instead of 4 different reaction tubes, a single tube is used and thus during electrophoresis the DNA runs in a single lane in gel. Fluorescent-labeled ddNTPs are used. Capillary electrophoresis is used to separate DNA molecules on the basis of size. It is powerful enough to separate single base pair fragment. The chromatogram generated after Capillary electrophoresis will give output as fluorescent peaks, each colour representing a particular ddNTP.

In dye-terminator sequencing, each of the four dideoxynucleotide chain terminators is labelled with fluorescent dyes, each of which emit light at different wavelengths.

Pyrosequencing:

This was described in 1993 by Bertil Pettersson, Mathias Uhlen and Pål Nyren. Principle of the method is the detection of the pyrophosphate released during the chain reaction of nucleotide addition. The order of the nucleotide is determined by the PPi released during the joining of two adjacent nucleotides.

Three enzymes are required in the pyrosequencing method which work in a sequential manner for the detection of the PPi. The three enzymes are:

DNA polymerase (without exonuclease activity)
Luciferase
Sulfurylase

The real-time polymerase activity monitoring is done for the detection of the released pyrophosphate

Enzyme polymerase add dNTPs to single-stranded DNA. If the correct complementary base is added, pyrophosphate is released.

Enzyme sulfurylase converts PPi into ATP (energy) with the help of the APS (adenosine 5´ phosphosulfate).

ATP will be acted upon by luciferase and luciferase converts luciferin into oxyluciferin in the presence of oxygen and a photon of light is released.

So, once the correct nucleotide is added, light will be released by the enzymatic reaction which is detected by a photodiode or a photomultiplier tube.

Based on the substrate used, two types of pyrosequencing methods are there, solid-phase pyroseq and liquid phase pyroseq.

The pyrosequencing method is more accurate and faster than Sanger sequencing.

But this method involves more chemical steps and thus is more complex.

Whole-genome shotgun sequencing:

This technique is also a modification of Sanger’s chain termination method and the shotgun sequencing concept was originally discovered by Sanger F and his colleagues for sequencing the whole genome. This technique can be used to sequence the entire genome of an organism.

The principle is the same as Sanger’s method. There is an additional step of DNA fragmentation which help to read multiple fragments.

The entire genome of an organism is fragmented with the help of endonuclease enzymes or by mechanically, and the smaller fragments are sequenced individually.

The computer-based software analyses each and every overlapping fragment and reassemble it to generate the complete sequence of entire genome.

Steps involved:

Fragmentation of DNA to about 2 -20kb.
Formation of libraries of subfragments, fragments are ligated in vectors and an entire library is generated
Sequencing the subfragments
Generation and reading of overlapping fragments (contigs) by using computer.

The technique is faster and cheaper, and can be used to sequence whole genome of an organism. This technique depends on computational analysis and a huge, powerful, supercomputer is required.

In 1981, for sequencing cauliflower mosaic virus genome shotgun sequencing method was used.

Clone by clone sequencing:

For sequencing the whole genome, Clone by Clone Method can be used. In 1980 and 1990 the genomes of C. elegans and S. cerevisiae were sequenced using the clone by clone sequencing, respectively and this technique was used during the human genome project.

This method is similar to shot gun sequencing method, but have additional steps.

1. In the first step, instead of smaller fragments, large clumps of DNA fragments are constructed and the location of each fragment is noted through gene mapping. Using bacterial artificial chromosome, multiple copies of each fragment are generated.

2. In the next step, all these copied fragments are further fragmented into smaller pieces and inserted into vectors.

3. Now sequencing of these short fragments are performed as per shotgun technique and overlapping fragments are assembled by using computer.

4. In the last step, the data obtained during gene mapping is used to assemble the complete sequence. So the sequences can be arranged on each chromosome based on their location.

Sequencing of whole chromosomes can be done without any gaps.

More tedious, time-consuming and costly since more procedures like mapping, cloning, and restriction digestion are involved.

Cloning od telomeres and centromeres are difficult.

Next-generation sequencing (NGS) or High-throughput sequencing

The most recent set of DNA sequencing technologies are collectively referred to as next-generation sequencing. Next-generation sequencing involves amplification of millions of copies of a particular fragment and sequences are analyzed by computational program. There are a variety of next-generation sequencing techniques that use different technologies. Examples are Polony sequencing, Massively parallel signature sequencing (MPSS), 454 pyrosequencing, Illumina (Solexa) sequencing, Combinatorial probe anchor synthesis (cPAS), SOLiD sequencing, Ion Torrent semiconductor sequencing, DNA nanoball sequencing, Nanopore DNA sequencing, etc.

These varieties of next-generation sequencing techniques use different technologies, however, most share a common set of features,

Highly parallel: many sequencing reactions take place at the same time
Micro scale: reactions are tiny and many can be done at once on a chip
Fast: because reactions are done in parallel, results are ready much faster
Low-cost: sequencing a genome is cheaper than with Sanger sequencing

Conceptually, next-generation sequencing is kind of like running a very large number of tiny Sanger sequencing reactions in parallel. Thanks to this parallelization and small scale, large quantities of DNA can be sequenced much more quickly and cheaply with next-generation methods.

The NGS process can be divided into 4 different steps:

Library preparation
Cluster generation
DNA sequencing
Data analysis

1. Library preparation:

During library, fragmentation of cDNA or DNA fragments is done by restriction digestion and the smaller DNA fragments are ligated with known DNA sequence (adaptors). This process is called adapter ligation and after this, the library of smaller DNA fragments is generated. Any unbound DNA fragments are washed by washing buffer. This process of library preparation is called as tagmentation.

2. Cluster generation:

The short oligonucleotide sequences prepared are immobilized on solid surface. Immobilization is done with the help of complementary binding by adapter sequences.

This library of fragmented DNA bound with immobilized oligos on the solid surface is used to generate cluster of the DNA sequence through bridge amplification.

In bridge amplification, DNA fragments bend over and bind to next oligo and create a bridge. A primer will bind to this DNA sequence and will be amplified vertically. So two new single-stranded DNA is generated by bridge amplification.

(https://geneticeducation.co.in)

3. Sequencing:

The polymerase enzyme adds the nucleotide into the bridge amplification and these amplification signals are recorded each time. This will generate multiple sequencing databases for the DNA sequence.

4. Data analysis:

The read or sequence generated by the sequencing is aligned to reference genome sequence. This helps to identify any addition, deletion or variation in the sequence.

(https://geneticeducation.co.in)

The NGS is the most advanced, fast, accurate and 100% effective technique for DNA sequencing.

There are several other sequencing methods also, they are

Single-molecule real-time (RNAP) sequencing
Illumina (Solexa) sequencing
Polony sequencing
DNA nano ball sequencing
SOLiD sequencing
Single-molecule SMRT(TM) sequencing
Massively parallel signature sequencing (MPSS)
High throughput sequencing
Helioscope (TM) single-molecule sequencing

Applications of DNA sequencing

Used for the identification of genes or mutations responsible for hereditary disorders.

Used for parental verification, criminal investigation and identification of individuals using available samples such as hair, nail, blood or tissue.

Identification of GMO species and any minor variations in the plant genome.

Used to construct whole chromosomal maps, restriction digestion maps, and genome maps.

Open reading frames, non-open reading frames and protein-coding DNA sequences can be identified.

Used in exon/ intron, repeat sequence and tandem repeat identification and detection.

Used in gene manipulation and gene editing

New variations in nature can be determined through sequencing.

Used in metagenomic studies

For Microbial identification and study of the new bacterial species.

For evolutionary studies and for generating evolutionary map

For studying asymptomatic high-risk population, prior to the occurrence of disease

Limitations of DNA sequencing

DNA sequencing is performed using computer algorithm-based assistive techniques and so for such computational data processing high-speed supercomputer is required.

It is difficult to sequence sequences like tandem repeats, repetitive DNA, fragmented genes, other duplicated regions, etc.

There are chances of errors in the pre-sample processing which will result in economic losses.

Dhanus Micro Notes

Monday, December 6, 2021

DNA sequencing

Pyrosequencing:

Whole-genome shotgun sequencing:

Clone by clone sequencing:

Next-generation sequencing (NGS) or High-throughput sequencing

1. Library preparation:

2. Cluster generation:

3. Sequencing:

4. Data analysis:

Applications of DNA sequencing

Limitations of DNA sequencing

No comments:

Post a Comment