analysis of genes and genomes phần 7 pot

278 PROTEIN PRODUCTION AND PURIFICATION 8 NH C CH O NH C CH HN O CH 2 CH 2 N N N N Ni 2+ − O − O C C O O O − CH 2 CH 2 N CH C O CH 2 CH 2 CH 2 CH 2 NH CH 2 CH CH 2 O OH Protein Ni 2+ -nitriloacetic acid Resin matrixSpacer Figure 8.9. The binding of proteins tagged with multiple histidine residues to Ni 2+ - NTA resin The purification of a his-tagged protein from E. coli cells is shown in Figure 8.10. E. coli cells containing an inducible expression vector were grown and induced to produce the tagged target protein. The cells were broken open and insoluble cell debris was removed by centrifugation. The supernatant from this process was applied to a Ni 2+ -NTA column. The column was washed with a low concentration (20 mM) of imidazole, which will compete with low-affinity histidine–column interactions to remove from the column any, perhaps histidine-rich, proteins that are non-specifically bound. Finally, the tagged protein itself is removed from the column by increasing the concentration of imidazole to a high level (250 mM). This process results in the single-step purification of the tagged protein to yield a very pure, almost homogenous, sample. His-tagged proteins from any expression system including bacteria, yeast, baculovirus, and mammalian cells, can be purified to a high degree of homogeneity using this technique. Alternative elution conditions may also be used. For example, lowering the pH from 8 to 4.5 will alter the protonated state of the histidine residues and results in the dissociation of the protein from the metal complex. The tagged protein can also be removed by adding chelating agents, such as EDTA, to strip the nickel ions from the column and consequently remove the tagged protein. The small size of the histidine tag means that the tagged recombinant protein often behaves identically to its untagged parent. In some cases, the tagged protein is actually found to be more biologically active than the untagged version of the same protein (Janknecht et al., 1991), although this effect is likely to be due to the speed of the purification process rather than any biological activity of the tag itself. Some proteins have been crystallized in the presence of the his-tag (Kim et al., 1996a). Additionally, the his-tag has extremely low 8.5 PROTEIN PURIFICATION 279 NH N Imidazole NH N CH CHH 3 + N COO − Histidine Uninduced cells Induced cells Cell supernatant Column flow Wash M [Imidazole] 97 66 45 31 21 15 kDa Figure 8.10. The purification of a his-tagged protein. The chemical structures of histidine and imidazole are shown, together with an SDS–polyacrylamide gel of the purification of a his-tagged protein. An E. coli cell extract producing a 14 kDa his-tagged protein was applied to a Ni 2+ -NTA column. The column was washed with a buffer containing a low concentration (20 mM) of the histidine analogue imidazole prior to elution of the tagged protein with an imidazole gradient (20–250 mM). Proteins were visualized after staining the gel with Coomassie blue immunogenicity and consequently the recombinant protein containing the tag can be used to produce antibodies. There are some reports of the his-tag altering protein function, (see, e.g. Knapp et al., 2000), but, as we will see later, it is more important to remove some other purification tags. An additional advantage of the his-tag is that purification can be performed under denaturing conditions (Reece, Rickles and Ptashne, 1993). The interaction between the histidine residues and the metal ion does not require any special protein structure and will occur even in the presence of strong protein denaturants (e.g. 8M urea). This is particularly important for the purification of proteins that would otherwise be insoluble. 8.5.2 The GST-tag The glutathione S-transferases (GSTs) are a family of enzymes that are involved in the cellular defense against electrophilic xenobiotic chemical compounds. 280 PROTEIN PRODUCTION AND PURIFICATION 8 M Uninduced cells Induced cells Cell extract Cell pellet Column flow Elution 97 66 45 31 kDa (a) (b) GST HO N H O OH NH 2 O O S R O N H HO N H H N OH O O SH OO NH 2 (c) Figure 8.11. The purification of proteins tagged with GST. (a) The chemical structure of the tripeptide glutathione and the action of GST for its addition to electrophilic compounds (R). Glutathione is composed of three amino acids – glutamic acid, cysteine and glycine. Note that the glutamic acid is joined to the Cys–Gly dipeptide through its γ -carboxyl group. (b) The three-dimensional structure of the GST–glutathione complex. The protein is depicted in a ribbon form and the glutathione as a green stick model (Garcia-S ´ aez et al ., 1994). (c) The purification if a GST-tagged protein from E. coli cells. The tagged protein was bound to a glutathione-affinity column and eluted using free glutathione itself. The tagged protein is indicated by the arrow They catalyse the addition of glutathione to these electrophilic substrates, which results in their increased solubility in water and promotes their subsequent enzymatic degradation (Strange, Jones and Fryer, 2000). Glutathione is a tripeptide composed of the amino acids glutamic acid, cysteine and glycine (Figure 8.11(a)). GST binds to glutathione with high affinity (Figure 8.11(b)). 8.5 PROTEIN PURIFICATION 281 The enzyme from the parasitic flatworm Schistosoma japonicum is a 26 kDa dimeric protein (Walker et al., 1993). The gene encoding this protein is fused, in the correct reading frame, to the target gene and a fusion protein is produced from an expression vector. Host cells producing the fusion protein are broken open and soluble proteins are applied to a column to which glutathione is attached (e.g. glutathione-agarose). The specific interaction between GST and glutathione will result in the binding of the fusion protein to the column, while the majority of host proteins are unable to adhere. The bound protein can then be eluted from the column by washing with a high concentration of glutathione (10 mM) to compete for the interaction with the column (Figure 8.11(c)). Both the large size of GST and its dimeric nature mean that the tag is more likely to influence the biological activity of the target protein than the his-tag. It is therefore desirable to remove the GST portion of the fusion protein to study the activity of the target protein in isolation. This can be achieved by the inclusion, in the expression vector, of DNA coding for the amino acid sequence of a specific protease cleavage site between GST and the target gene. Treatment of the purified fusion protein with the protease will then result in the generation of two polypeptides – the free target protein and GST itself. GST can then be removed from the target protein by applying the mixture back onto a glutathione column. The GST will, again, bind to the column, but the target protein will not. The column flow-through can be collected and will contain the purified target protein. A variety of specific proteases have been used to cleave purification tags from target fusion proteins (Table 8.2). Unlike restriction enzymes when they cleave DNA (see Chapter 2), many proteases do not have an abso- lute sequence requirement for their cleavage sites. For example, the protease Factor Xa cleaves after the arginine residue in its preferred cleavage site Ile–Glu–Gly–Arg. However, it will sometimes cleave at other basic residues, depending on the conformation of the protein substrate, and a number of the secondary sites have been sequenced that show cleavage following Gly–Arg dipeptides (Quinlan, Moir and Stewart, 1989). Consequently, the protease may not only cleave the site between the tag and the target protein, but many also cleave the target protein itself. Obviously, this must be avoided to maintain the integrity of the target protein. Other proteases, e.g. the TEV and PreScission proteases, have larger and more specific recognition sequences and are less likely to cleave at alternative sites. The TEV protease has the added advantage that the protease can be produced in a recombinant form from E. coli and is therefore not contaminated with other plasma proteases and factors. 282 PROTEIN PRODUCTION AND PURIFICATION 8 Table 8.2. Site-specific proteases. The recognition sequence of each protease is shown, together with the actual site of cleavage, depicted by the arrow Protease Recognition and cleavage site Notes Reference Factor Xa IleGluGlyArg↓ 42 kDa protein, composed of two disulphide linked chains, purified from bovine plasma (Nagai, Perutz and Poyart, 1985) Enterokinase AspAspAspAspLys↓ 26 kDa light chain of bovine enterokinase produced in and purified from E. coli (LaVallie et al., 1993b) Thrombin LeuValProArg↓ GlySer Purified from bovine plasma (Chang, 1985) TEV GluAsnLeuTyr- PheGln↓Gly Tobacco etch virus protease (Dougherty et al., 1989) PreScission LeuGluValLeuPhe Gln↓ GlyPro Protease from the 3C human rhinovirus (Walker et al., 1994) 8.5.3 The MBP-tag The target gene is inserted downstream from the malE gene of E. coli, which encodes maltose binding protein (MBP), in an expression vector that results in the production of an MBP fusion protein (Kellermann and Fer- enci, 1982). Maltose is a disaccharide composed of two molecules of glucose (Figure 8.12(a)). MBP is a 40 kDa monomeric protein that forms part of the maltose/maltodextrin system of E. coli, which is responsible for the uptake and efficient catabolism of glucose polymers (Boos and Shuman, 1998). The protein undergoes a large conformational change upon binding of maltose, and results in the formation of a stable complex (Figure 8.12(b)). One-step purification of fusion proteins is achieved using the affinity of MBP for cross-linked amylose (starch) (di Guan et al., 1988). Bound proteins can be eluted from amylose by including maltose (10 mM) in the column buffer (Figure 8.12(c)). 8.5.4 IMPACT Intein mediated purification with an affinity chitin binding tag (IMPACT) is an approach to protein purification that uses the protein self-splicing of 8.5 PROTEIN PURIFICATION 283 TargetMBP MBP Target Uninduced cells Induced cells Amylose elution Protease treatment Amylose flow X O CH 2 OH HO OH OH O O CH 2 OH OH OH (b) (c) (a) OH Figure 8.12. The purification of proteins tagged with MBP. (a) The chemical structure of maltose, a glucose disaccharide. (b) The three-dimensional structure of the MBP–maltose complex (Quiocho, Spurlino and Rodseth, 1997). The protein is depicted in a ribbon form with α-helices coloured in purple and β-sheets in blue. Maltose is shown as a green stick model. (c) The purification of an MBP-tagged protein. The tagged protein is bound to an amylose column and eluted with maltose. The MBP–target fusion is then cleaved with a protease at a site indicated by the X, and reapplied to the amylose column. The target protein will not adhere to the column when it is separated from MBP. The gel image is reprinted with permission of New England Biolabs,  2002/2003 284 PROTEIN PRODUCTION AND PURIFICATION 8 inteins to remove the purification tag and give pure isolated protein in one chromatographic step. Inteins are a class of proteins, found in a wide variety of organisms, that excise themselves from a precursor protein and in the process ligate the flanking protein sequences (exteins) (Cooper and Stevens, 1995). The excised intein is a site-specific DNA endonuclease that catalyses genetic mobility of its own DNA coding sequence. The process of polypeptide cleavage and ligation is dependent on specific chemistry involving thiols and a conserved asparagine residue. Most inteins have a cysteine residue at their amino-terminal end and an asparagine at their carboxy-terminal end (Figure 8.13(a)). All the information required for the splicing reaction is contained within the intein itself, and if these sequences are placed in the context of a target protein they still splice themselves out. The mechanism of splicing is complex, but the reaction is very efficient. The IMPACT expression system exploits this unusual chemistry by mutation of the C-terminal asparagine to alanine in a yeast intein, VMA1 (Chong and Xu, 1997). This mutation prevents the cleavage reaction occurring at the carboxy-terminal side of the intein and traps the protein in a thioester that can be cleaved by β-mercaptoethanol or dithiothreitol (DTT). The target gene is cloned into an expression vector such that a three-component fusion protein is produced, in which a target protein–intein–chitin binding domain fusion is produced. Chitin is a fibrous insoluble polysaccharide made of β-1,4- N-acetyl-D-glucosamine that is found in the cell walls of fungi and algae and in the exoskeletons of arthropods. Chitinase catalyses the hydrolytic degradation of chitin, and the Bacillus circulans enzyme (M r 74 kDa) is composed of three domains – an amino-terminal catalytic domain (CatD) (417 amino acid residues), a tandem repeat of fibronectin type III-like (FnIII) domains (duplicate 95 residues) and a carboxy-terminal chitin-binding domain (CBD, 45 amino acid residues) (Watanabe et al., 1990). The isolated CBD shows high-affinity binding to chitin. In the IMPACT system, the fusion protein is made in E. coli andpasseddown the chitin column, where it binds. The protein can be cleaved off the column by using thiol containing compounds, such as DTT, at 4 ◦ C. This is a slow process and requires an overnight incubation to complete, which may prove problematical if the target protein is not stable under these conditions. The final target protein produced by this method is native except for the DTT thioester moiety attached at the carboxy-terminal end. The thioester is, however, unstable and will spontaneously hydrolyse to yield a native protein. Other thiols can also be used to initiate the cleavage process, e.g. β-mercaptoethanol and cysteine. Cysteine induced cleavage results in the insertion of a cysteine amino acid residue at the carboxy-terminal end of the cleaved polypeptide. The cysteine 8.5 PROTEIN PURIFICATION 285 M Uninduced cells Induced cells Column flow Elution SDS Intein CBD Target protein Target protein CBD Intein Intein Cys Asn N-extein C-extein Intein Cys (a) (c) (d) Asn N-extein C-extein + Intein Cys Ala N-extein C-extein Intein Cys Ala N-extein C-extein + HS OH OH - S S O O H OH HS Target protein HS H 2 N Intein O N H CBD O Intein S O CBD Target protein O HS O Target protein Intein CBD + DTT N-S acyl shift + O OH Target protein +DTT Spontaneous C H 3 C H 3 N H N H N H H 2 N C H 3 (b) Figure 8.13. The IMPACT system for the purification of tagged proteins and the subsequent removal of the tag. (a) The normal splicing reaction involves the complete removal of the intein and the joining of the polypeptide sequences to its amino- and carboxy-terminal side. (b) A mutant form of the intein, in which an essential asparagine is replaced with an alanine, results in partial cleavage and the release of the amino-terminal side polypeptide only. (c) The chemistry of the splicing reaction used to cleave the target protein from the intein–chitin binding domain (CBD) tag. (d) The purification of Hha I methylase using the IMPACT system (Chong et al ., 1997). Purified target protein was eluted from the column, while the detergent SDS was used to remove the intein–CBD fusion. The gel image was kindly provided by Ming-Qun Xu (New England Biolabs) 286 PROTEIN PRODUCTION AND PURIFICATION 8 can be radio-labelled, or it can be a site for chemical modification, especially if it is the only cysteine in the protein, since it is a good site to add protein cross-linkers, fluorescent probes, spin labels or other tags. 8.5.5 TAP-tagging An extension of tagging over-produced proteins for purification is to tag proteins produced at wild-type levels in their native host cells. Protein purification in these circumstances, if performed under suitably mild conditions, can lead to the isolation of naturally occurring protein complexes. Most proteins do not exist as single entities within cells. They are associated, through non-covalent interactions, with a variety of other proteins that may be involved in the regulation of their function. The over-production of a single protein will not result in the over-production of other proteins in the complex. Therefore, to isolate complexes from cells, protein production should be as close to the natural state as possible. The DNA encoding what is termed a tandem affinity purification tag (TAP-tag) is cloned at the 3  -end of a target gene so that little disruption is made to its ability to be transcribed, and the fusion protein should be produced at the same level as the wild-type target protein. The TAP-tag encodes two purification elements – a calmodulin binding peptide and Protein AfromStaphylococcus aureus. These elements are separated by a TEV protease cleavage site (Puig et al., 2001). Cells containing the tagged protein are gently lysed and then applied to a column containing IgG, which binds with high affinity to Protein A. The fusion protein, and its associated proteins, are removed from the column using TEV protease and then applied directly to a calmodulin bead column, in the presence of Ca 2+ , and eluted using the chelating agent EDTA. The two-step purification procedure is highly specific and can result in the isolation of contaminant-free protein complexes. The TAP-tag allows the rapid purification of complexes from a relatively small number of cells with- out prior knowledge of the complex composition, activity or function (Rigaut et al., 1999; Gavin et al., 2002), and, combined with mass spectrometry, the TAP strategy allows for the identification of proteins interacting with a given target protein. 9 Genome sequencing projects Key concepts  Genetic and physical maps are used to determine the order of genes on a chromosome and their approximate distance apart  DNA sequence determination is performed using dideoxynucleotides that halt replication at a specific base. DNA fragments that differ by a single base can be separated using polyacrylamide gels  Sequencing reactions generate a few hundred bases of sequence  Whole genomes can be sequenced by cloning random small DNA genomic fragments, sequencing them, and then reassem- bling the genome sequence based on the overlap between the sequenced fragments  Massive computing power is required to assemble the sequenced fragments and determine the locations of genes within the genome The ultimate goal of all genome sequencing projects is to determine the precise sequence of bases that make up each DNA molecule within the genome. The knowledge of the sequence of individual genes, and the entire genome, is vital if we are to understand not only how genes and proteins work but also how different gene products influence the activity of each other within the context of the whole organism. The sheer amount of DNA contained within the genome of an organism, however, represents a sub- stantial barrier to attaining this level of analysis. Even in the absence of complete sequence knowledge, however, a variety of methods have been used to map the location of genes and other DNA sequences within the genome. On a small scale, mapping DNA fragments is a relatively straightfor- ward process (Figure 9.1). We have already seen (Chapter 2) that restriction enzymes will cleave DNA at specific sequences, termed recognition sites. The Analysis of Genes and Genomes Richard J. Reece  2004 John Wiley & Sons, Ltd ISBNs: 0-470-84379-9 (HB); 0-470-84380-2 (PB) [...]... bacteriophage øX 174 genome (Sanger et al., 1 977 ) 9.4.1 Manual DNA Sequencing Two alternative, and improved, sequencing methods were described in 1 977 Allan Maxam and Walter Gilbert devised a chemical method for cleaving the sugar–phosphate backbone of a radio-labelled DNA fragment at specific bases (Maxam and Gilbert, 1 977 ) They used specific chemicals to modify individual DNA bases (e.g the modification of T residues... use of computers to align DNA sequences to form contigs and in the search for similar genes, but their role does not stop there Raw sequence information, e.g the entire sequence of a chromosome, deposited into a database is important for the analysis of gene and gene function Perhaps more important, and certainly more useful to the majority of researchers, is to have an integrated collection of genes, ... new algorithms and statistics with which to assess the relationships among members of large data sets, • analyse and interpret various types of data including DNA and amino acid sequences, protein domains and protein structures and • develop and implement tools that enable efficient access and management of different types of information Table 9.1 Curated genome sequencing projects Organism (type) Escherichia... limited in the length of the DNA that can be sequenced during a single reaction (approximately 100 bases) and by the use of harsh chemicals required to modify and cleave the DNA Fred Sanger and his colleagues devised an alternative sequencing approach based upon the faithful replication of DNA using a DNA polymerase (Sanger, Nicklen and Coulson, 1 977 b) They relied on the incorporation of 2 , 3 dideoxynucleotides... organism Changes in the number and magnitude of genes expressed by cells in different conditions can give vital clues to the cellular response The analysis of genomes, transcriptomes and proteomes have been made possible through increases in automation and computational power With the arrival of a fully sequenced genome, readers may be forgiven for thinking that further gene analysis is not required What... transcripts expressed within a cell are an indication of the proteins it is producing, but do not reflect protein function and may not necessarily reflect protein abundance For example, a transcript that contains many seldom used codons may be translated at Analysis of Genes and Genomes Richard J Reece  2004 John Wiley & Sons, Ltd ISBNs: 0- 470 -84 379 -9 (HB); 0- 470 -84380-2 (PB) ... the presence of repetitive DNA sequences in the genome that could lead to the incorrect assignment of contigs (Waterston, Lander and Sulston, 2002) • 9.6 Hierarchical shotgun The shotgun approach to contig assembly has proved immensely successful in sequencing comparatively small genomes The majority of bacterial genomes can be sequenced by this method and contigs assembled within a matter of weeks using... fully characterized genomes has led to problems in the way in which the data is stored and accessed Bioinformatics is the study of this biological information It brings together the avalanche of biological data (genome sequence and other experiments) with the analytical theory and practical tools of mathematics and computer science Bioinformatics aims to • develop new algorithms and statistics with... for the gene that is being mapped and the number of crosses required to generate accurate mapping data Additionally, a tacit assumption of mapping based on crosses is that the recombination frequency is equal for all part of the chromosome This is simply not the case, and many recombinational ‘hot-spots’ and ‘cold-spots’ have been identified In humans, the segregation of naturally occurring mutant alleles... complete genes, or even complete genomes The first DNA molecule to be sequenced was that of the bacteriophage λ cohesive (cos) ends (Wu and Taylor, 1 971 ) These sequences, which are only 12 bases long, were obtained after the synthesis of a complementary RNA molecule and the subsequent use of RNA sequencing procedures The methods used were, however, impractical for DNA sequencing on a large scale In 1 975 , . knowledge of the sequence of individual genes, and the entire genome, is vital if we are to understand not only how genes and proteins work but also how different gene products influence the activity of. sequences, termed recognition sites. The Analysis of Genes and Genomes Richard J. Reece  2004 John Wiley & Sons, Ltd ISBNs: 0- 470 -84 379 -9 (HB); 0- 470 -84380-2 (PB) 288 GENOME SEQUENCING PROJECTS. made of β-1,4- N-acetyl-D-glucosamine that is found in the cell walls of fungi and algae and in the exoskeletons of arthropods. Chitinase catalyses the hydrolytic degradation of chitin, and the