DNA sequencing: janvier 2008

DNA sequencing encompasses biochemical methods for determining the order of the nucleotide bases, adenine, guanine, cytosine, and thymine, in a DNA oligonucleotide. The sequence of DNA constitutes the heritable genetic information in nuclei, plasmids, mitochondria, and chloroplasts that forms the basis for the developmental programs of all living organisms. Determining the DNA sequence is therefore useful in basic research studying fundamental biological processes, as well as in applied fields such as diagnostic or forensic research. The advent of DNA sequencing has significantly accelerated biological research and discovery. The rapid speed of sequencing attainable with modern DNA sequencing technology has been instrumental in the large-scale sequencing of the human genome, in the Human Genome Project. Related projects, often by scientific collaboration across continents, have generated the complete DNA sequences of many animal, plant, and microbial genomes.

For thirty years, a large proportion of DNA sequencing has been carried out with the chain-termination method, developed by Frederick Sanger and coworkers in 1975. Prior to the development of rapid DNA sequencing methods in the early 1970s by Sanger in England and Gilbert et al. at Harvard, a number of laborious methods were used. For instance, in 1973 Gilbert and Maxam reported the sequence of 24 basepairs using a method known as wandering-spot analysis. It is noteworthy that RNA sequencing, which for technical reasons is easier to perform than DNA sequencing, could be considered one of the earliest forms of nucleotide sequencing. The major landmark of RNA sequencing, dating from the pre-recombinant DNA era, is the sequence of the phage MS2 genome, identified and published by Walter Fiers and coworkers.

Chain-Termination Methods While the chemical sequencing method of Maxam and Gilbert, and the plus-minus method of Sanger and Coulson were orders of magnitude faster than previous methods, the chain-terminator method developed by Sanger was even more efficient, and rapidly became the method of choice. The Maxam-Gilbert technique requires the use of highly toxic chemicals, and large amounts of radiolabeled DNA, whereas the chain-terminator method uses fewer toxic chemicals and lower amounts of radioactivity. The key principle of the Sanger method was the use of dideoxynucleotides triphosphates (ddNTPs) as DNA chain terminators. The classical chain-termination or Sanger method requires a single-stranded DNA template, a DNA primer, a DNA polymerase, radioactively or fluorescently labeled nucleotides, and modified nucleotides that terminate DNA strand elongation. The DNA sample is divided into four separate sequencing reactions, containing the four standard deoxynucleotides (dATP, dGTP, dCTP and dTTP) and the DNA polymerase. To each reaction is added only one of the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP). These dideoxynucleotides are the chain-terminating nucleotides, lacking a 3'-OH group required for the formation of a phosphodiester bond between two nucleotides during DNA strand elongation. Incorporation of a dideoxynucleotide into the nascent (elongating) DNA strand therefore terminates DNA strand extension, resulting in various DNA fragments of varying length. The dideoxynucleotides are added at lower concentration than the standard deoxynucleotides to allow strand elongation sufficient for sequence analysis. The newly synthesized and labeled DNA fragments are heat denatured, and separated by size (with a resolution of just one nucleotide) by gel electrophoresis on a denaturing polyacrylamide-urea gel. Each of the four DNA synthesis reactions is run in one of four individual lanes (lanes A, T, G, C); the DNA bands are then visualized by autoradiography or UV light, and the DNA sequence can be directly read off the X-ray film or gel image. In the image on the right, X-ray film was exposed to the gel, and the dark bands correspond to DNA fragments of different lengths. A dark band in a lane indicates a DNA fragment that is the result of chain termination after incorporation of a dideoxynucleotide (ddATP, ddGTP, ddCTP, or ddTTP). The terminal nucleotide base can be identified according to which dideoxynucleotide was added in the reaction giving that band. The relative positions of the different bands among the four lanes are then used to read (from bottom to top) the DNA sequence as indicated. There are some technical variations of chain-termination sequencing. In one method, the DNA fragments are tagged with nucleotides containing radioactive phosphorus for radiolabelling. Alternatively, a primer labeled at the 5’ end with a fluorescent dye is used for the tagging. Four separate reactions are still required, but DNA fragments with dye labels can be read using an optical system, facilitating faster and more economical analysis and automation. This approach is known as 'dye-primer sequencing'. The later development by L Hood and coworkers of fluorescently labeled ddNTPs and primers set the stage for automated, high-throughput DNA sequencing. The different chain-termination methods have greatly simplified the amount of work and planning needed for DNA sequencing. For example, the chain-termination-based "Sequenase" kit from USB Biochemicals contains most of the reagents needed for sequencing, prealiquoted and ready to use. Some sequencing problems can occur with the Sanger Method, such as non-specific binding of the primer to the DNA, affecting accurate read out of the DNA sequence. In addition, secondary structures within the DNA template, or contaminating RNA randomly priming at the DNA template can also affect the fidelity of the obtained sequence. Other contaminants affecting the reaction may consist of extraneous DNA or inhibitors of the DNA polymerase. Dye-terminator sequencing An alternative to primer labelling is labelling of the chain terminators, a method commonly called 'dye-terminator sequencing'. The major advantage of this method is that the sequencing can be performed in a single reaction, rather than four reactions as in the labelled-primer method. In dye-terminator sequencing, each of the four dideoxynucleotide chain terminators is labelled with a different fluorescent dye, each fluorescing at a different wavelength. This method is attractive because of its greater expediency and speed and is now the mainstay in automated sequencing with computer-controlled sequence analyzers (see below). Its potential limitations include dye effects due to differences in the incorporation of the dye-labelled chain terminators into the DNA fragment, resulting in unequal peak heights and shapes in the electronic DNA sequence trace chromatogram after capillary electrophoresis (see figure to the right). This problem has largely been overcome with the introduction of new DNA polymerase enzyme systems and dyes that minimize incorporation variability, as well as methods for eliminating "dye blobs", caused by certain chemical characteristics of the dyes that can result in artifacts in DNA sequence traces. The dye-terminator sequencing method, along with automated high-throughput DNA sequence analyzers, is now being used for the vast majority of sequencing projects, as it is both easier to perform and lower in cost than most previous sequencing methods. Automation and sample preparation Modern automated DNA sequencing instruments (DNA sequencers) can sequence up to 384 fluorescently labelled samples in a single batch (run) and perform as many as 24 runs a day. However, automated DNA sequencers carry out only DNA size separation by capillary electrophoresis, detection and recording of dye fluorescence, and data output as fluorescent peak trace chromatograms. Sequencing reactions by thermocycling, cleanup and re-suspension in a buffer solution before loading onto the sequencer are performed separately.

Current methods can directly sequence only relatively short (300-1000 nucleotides long) DNA fragments in a single reaction.. The main obstacle to sequencing DNA fragments above this size limit is insufficient power of separation for resolving large DNA fragments that differ in length by only one nucleotide. Limitations on ddNTP incorporation were largely solved by Tabor at Harvard Medical, Carl Fuller at USB biochemicals, and their coworkers. Large-scale sequencing aims at sequencing very long DNA fragments. Even relatively small bacterial genomes contain millions of nucleotides, and the human chromosome 1 alone contains about 246 million bases. Therefore, some approaches consist of cutting (with restriction enzymes) or shearing (with mechanical forces) large DNA fragments into shorter DNA fragments. The fragmented DNA is cloned into a DNA vector, usually a bacterial plasmid, and amplified in Escherichia coli. The amplified DNA can then be purified from the bacterial cells (a disadvantage of bacterial clones for sequencing is that some DNA sequences may be inherently un-clonable in some or all available bacterial strains, due to deleterious effect of the cloned sequence on the host bacterium or other effects). These short DNA fragments purified from individual bacterial colonies are then individually and completely sequenced and assembled electronically into one long, contiguous sequence by identifying 100%-identical overlapping sequences between them (shotgun sequencing). This method does not require any pre-existing information about the sequence of the DNA and is often referred to as de novo sequencing. Gaps in the assembled sequence may be filled by Primer walking, often with sub-cloning steps (or transposon-based sequencing depending on the size of the remaining region to be sequenced). These strategies all involve taking many small reads of the DNA by one of the above methods and subsequently assembling them into a contiguous sequence. The different strategies have different tradeoffs in speed and accuracy; the shotgun method is the most practical for sequencing large genomes, but its assembly process is complex and potentially error-prone - particularly in the presence of sequence repeats. The human genome is about 3 billion (3,000,000,000) bp long; if the average fragment length is 500 bases, it would take a minimum of six million (3 billion/500) to sequence the human genome (not allowing for overlap = 1-fold coverage). Keeping track of such a high number of sequences presents significant challenges, only held down by developing and coordinating several procedural and computational algorithms, such as efficient database development and management. Resequencing or targeted sequencing is utilized for determining a change in DNA sequence from a "reference" sequence. It is often performed using PCR to amplify the region of interest (pre-existing DNA sequence is required to design the PCR primers). Resequencing uses three steps, extraction of DNA or RNA from biological tissue; amplification of the RNA or DNA (often by PCR); followed by sequencing. The resultant sequence is compared to a reference or a normal sample to detect mutations.

DNA sequencing

mardi 8 janvier 2008

DNA sequencing

Archives du blog

Qui êtes-vous ?