Association study of ABCA1 polymorphisms in singapore populations 3

Chapter Literature Review II: Genetic Analysis of Complex Traits Literature Review II: Genetic Analysis of Complex Traits 3.1 Key Definitions 3.1.1 Complex Trait A complex trait refers to any phenotype that does not segregate in a classic, one-locus Mendelian fashion (Lander and Schork, 1994). In other words, there is poor one-to-one correspondence between genotype and phenotype. The same genotype can lead to different phenotypes due to effects of chance, environment or interactions with other genes. Conversely, mutations in different genes can give rise to identical phenotypes, such as when the genes are involved in a common biochemical pathway. Some traits require the simultaneously presence of mutations in multiple genes (genetic or locus heterogeneity), each with a relatively small effect. Some individuals who inherit a predisposing allele may not manifest the trait (incomplete penetrance), whereas others who inherit no predisposing allele may nonetheless acquire the condition due to environment or random causes (phenocopy). 3.1.2 Single Nucleotide Polymorphisms (SNPs) SNPs are single base substitutions representing 90% of the common genetic variation in the human genome, with insertion/deletion (indel) and length polymorphisms providing the rest. Operationally defined, a SNP has a minimum allele frequency of 1%. SNPs occur about in every 300-1000 bp (Cargill et al., 1999; Halushka et al., 1999; Stephens et al., 2001; Carlson et al., 2003). SNPs possess distinct advantages over repeat polymorphisms like short tandem repeats (also known as microsatellites) in the genetic analysis of complex traits. They are abundant with more than five million SNPs with minor allele frequencies greater than 10% expected to exist, and are distributed throughout the genome in coding and noncoding 22 Chapter Literature Review II: Genetic Analysis of Complex Traits regions. Thus one can use SNPs for fine-mapping studies of positional cloning efforts or as candidates to test directly as the casual mutations for a trait. Various experimental as well as in silico strategies are available for the discovery of novel SNPs although resequencing remains the gold standard. SNPs are also easy to genotype and are amenable to high throughput genotyping technologies (Schork et al., 2000; Kirk et al., 2002). SNPs are generally more stable than microsatellites which can undergo slippage during replication. The increased stability allows a more reliable way to assess linkage disequilibrium (LD) relationship, locus association and co-segregation. 3.2 Strategies for Dissecting Genetic Basis of Complex Traits Genetic dissection of complex traits may be carried out using linkage analysis, allele sharing methods, association studies and experimental models. 3.2.1 Linkage Analysis Linkage studies examine whether genetic markers tend to co-segregate with disease or other phenotype of interest using pedigrees. During meiosis, alleles of two loci that are located on different chromosomes or are widely separated on the same chromosome segregate independently of each other. Therefore markers closest to the causative locus would be most correlated with the phenotype while distant markers not segregate with the trait as a result of breakdown by recombination during meiosis. Linkage analysis is also sometimes referred to as positional cloning. A complete dissection of the genetic basis of a disease entails several steps: linkage studies by genotyping kindreds with multiple affected family members using ~400 microsatellite markers spaced 10 cM apart throughout the genome, followed by narrowing of the susceptibility locus using SNPs which are more abundant than microsatellites, and finally sequencing and identification of the causative mutation in the candidate gene (Bogardus et al., 2002). 23 Chapter Literature Review II: Genetic Analysis of Complex Traits Linkage studies have been extremely effective for locating genes involved in rare, simple monogenic Mendelian traits which are typically caused by genes with large effects (also known as high displacement), strong genotype-phenotype correlation, high heritability, and robust to allelic and locus heterogeneity (Risch, 2000). For complex diseases, linkage analysis of susceptibility genes is less powerful since it is performed without a known mode of inheritance and estimated allele frequency (derived from segregation analysis), and many unaffected individuals also carry the susceptible alleles (Bogardus et al., 2002). Linkage studies are more generally effective for loci with large genotypic risk ratios (GRR) of at least four, but not for loci with GRR of two or less; even then, positional cloning may prove daunting because the candidate region is large (Risch and Merikangas, 1996). The few successes are largely confined to those with low allele frequency and Mendelian-like inheritance, e.g. BRCA-1 and BRCA-2 genes in breast cancer, and β-amyloid precursor protein and presenilin-1 and -2 in Alzheimer’s disease (Risch, 2000). 3.2.2 Allele Sharing Methods Allele sharing methods also concern pedigree information like in classical linkage analysis but differ in that they are non-parametric with no assumptions about the mode of the inheritance of the disease, the population disease gene frequency, and so on (Lander and Schork, 1994; Ewens and Spielman, 2001). The affected sib pair method is the simplest allele sharing approach. Consider a locus A and an individual heterozygous at this locus, A1A2. A parent of two affected sibs could have either passed the same allele (either A1 or A2) to both sibs or A1 to one sib and A2 to the other. If locus A is linked to the disease, then both affected sibs will share an excess, i.e. greater than the expected 50%, of the allele. Allele sharing methods are more robust than linkage analysis because affected relatives always show excess sharing of alleles even in the presence of 24 Chapter Literature Review II: Genetic Analysis of Complex Traits incomplete penetrance, genetic heterogeneity and high frequency disease alleles (in which the expected Mendelian inheritance is confounded by multiple copies of the disease-causing allele segregating in the pedigree; Lander and Schork, 1994). The tradeoff is that they can be less powerful when the correct linkage model is specified. 3.2.3 Genetic Association Studies Genetic association methods detect differences in frequencies of genetic markers between affected (case) and normal (control) individuals. Unlike linkage studies which involve pedigrees, association is performed at the population level. Statistical analysis is generally straightforward. The presence of an association implies that the marker itself is directly functional or it is in close LD with the causative allele, hence association studies are sometimes referred to as LD mapping. LD occurs when an allele at a genetic locus is situated on the same haplotype with a specific allele at another locus. However, there are other reasons for statistical association such as chance effects due to multiple testing as well as confounding, for example, due to presence of cryptic population structure or another risk factor. Association methods are expected to be more powerful than linkage studies for the detection of common disease alleles that confer modest (Risch and Merikangas, 1996). This refects the fact that for modest risk alleles, the patterns of allele sharing among affected family members are less striking than those between affected unrelated individuals. Another practical advantage of association studies is that it is easier to enroll large numbers of affected unrelated individuals than to enroll large numbers of pedigrees, each with multiple affected family members, especially for lateonset diseases. However, the region of sharing among unrelated affected individuals will be narrower which implies higher marker densities on the order of hundreds of thousands of markers are required for whole-genome association studies than in linkage analysis (Kruglyak and Nickerson, 2001). 25 Chapter Literature Review II: Genetic Analysis of Complex Traits 3.2.4 Insights from Model Systems With animal models, there are two ways of assigning genes to a physiological process. The phenotype-driven approach involves linkage mapping of naturally occurring or induced mutations followed by screening of positional candidates. Conversely, in the genotype-driven approach, the effect of a known gene on physiology is investigated by engineering transgenic or knockout models. Use of inbred strains limits the number of positional candidate genes to those that are different between the two strains. Rare alleles with large displacement can become rapidly fixed by many generations of positive selection. After initial mapping, the physiological effect of individual polygenic factors may be further studied by constructing transgenic or knockout animals. After mapping a locus successfully in a model organism, syntenic conservation between animals and human can exploited to identify equivalent candidate regions in the human genome. For an example, the recent identification of the LPR1 gene in a murine model of tuberculosis suggests that the human homolog SP110 could be a strong candidate gene in determining host susceptibility to tuberculosis (Pan et al., 2005). 3.2.5 Other Approaches for Studying Disease Phenotypes Gene expression profiling using microarrays (Weiss and Terwilliger, 2000) and RNA inference (RNAi) technology can enable rapid screening and testing, respectively, of candidate genes involved in a disease or biological pathway. In the former, the assumption is that the susceptibility allele causes a differential gene expression between tissues from affected and unaffected. In RNAi, mRNAs of candidate genes are targeted for degradation specifically by engineering homologous double stranded RNA molecules. 26 Chapter Literature Review II: Genetic Analysis of Complex Traits 3.3 Allelic Spectrum of Complex Diseases Much of the enthusiasm advocating the use of association studies to map complex traits subscribe explicitly or implicitly to the common disease-common variant (CDCV) hypothesis. The CDCV hypothesis posits that the genetic variation underlying susceptibility to complex diseases arose spontaneously within the founding population of modern humans and gradually disseminates globally. The disease susceptibility alleles have persisted to reach fairly moderate frequencies presumably because they had been originally neutral and were not under selective pressure. Until recently, an opportunity, e.g. due to a change in environment or interaction with other genes, arose that triggers the manifestation of the disease. In the alternative common disease rare variant (CDRV) model, multiple rare alleles predominate, each contributing a small fraction to the population disease risk (Pritchard and Cox, 2002). Existing data support either model. An example often quoted by proponents of the CDCV model is the ApoE gene in which a single common allele, the ε4 allele with a frequency of 5-41%, increases risk to heart disease and Alzheimer’s disease (Pritchard and Cox, 2002). Another commonly cited example is the highly prevalent Pro12Ala polymorphism (frequency >75%) in the PPAR-γ gene which is associated with a 25% reduction in Type diabetes risk (Stumvoll and Haring, 2002). On the other hand, multiple rare variants in the CARD15/NOD2 gene contribute to susceptibility in about 20% of patients with the common and chronic inflammatory bowel disorder, Crohn’s disease (Hugot et al., 2001; Ogura et al., 2001). Yet another example exists in which both common and rare variants are believed to contribute to phenotypic variation in HDL levels in the general population (Cohen et al., 2004; Frikke-Schmidt et al., 2004). Which hypothesis is of interest to the investigator requires different methodological 27 Chapter Literature Review II: Genetic Analysis of Complex Traits considerations. For the CDRV hypothesis, a comprehensive analysis by resequencing is preferable compared to genotyping of known variants. 3.4 Case-Control Association Studies The most common, simple and oldest form of association study design involves sampling two random samples, such as cases and controls, from the population and studying the distributions of risk factors among them. Another association study design, the cohort study, assembles a group of individuals who are followed up with time to determine the frequency that the disease develops. The cohort study is prospective while the casecontrol study design is retrospective in nature. The reasons favoring a case-control study over a full cohort design are almost always practical. The clearest advantage is when the disease is rare and exposure of interest common. Its retrospective nature leads to savings in time and subjects are also easy to enroll. The cohort study is more expensive and labour-intensive but on the other hand, it has more credibility and offers an opportunity to collect more reliable exposure information. Case-control studies can suffer from (i) selection bias due to inappropriate sampling of cases and controls, creating noncomparability between them; (ii) information bias caused by measurement errors because the disease status is known to the researcher and subject when measurements are taken; and (iii) incidence or prevalence bias which is the failure to define the nature of the disease variable (Clayton, 2001). Proper matching and the use of genetic markers which are invariant throughout life will reduce selection and information biases. 3.4.1 Case-Control Genetic Data Analysis The analysis of genetic case-control data can be exquisitely straightforward or complex. Traditionally, allele and genotype frequencies at each genetic marker between cases and controls are compared using conventional 2x2 or 2x3 contingency table analyses respectively. Deviation from Hardy-Weinberg equilibrium (HWE) in the case sample, after 28 Chapter Literature Review II: Genetic Analysis of Complex Traits ruling out genotyping error, may also be taken as initial evidence of association (Clayton, 2001; Botstein and Risch, 2003). Logistic and loglinear regression methods have the added advantage of being able to adjust for confounding variables which could not be controlled in the experimental design (Clayton, 2001). Use of multi-locus information such as haplotypes can offer greater power than individual SNPs in detecting associations by increasing information and accommodating potential locus (multiple genes contributing to overall risk) and allelic heterogeneity (multiple alleles contributing to the overall risk) as well as weak disequilibria among markers and functional variant sites (Judson et al., 2000; Akey et al., 2001; Fallin et al., 2001; Morris and Kaplan, 2002; Hoh and Ott, 2003). For instance, in an association study of SNPs spanning a 1.5 Mb region around the Alzheimer disease locus, APOE, Martin et al. (2000) demonstrated that statistical analysis based on haplotypes was able to detect statistically stronger association than single SNP analysis alone, and furthermore, improved fine localization of the susceptibility allele ApoE-ε4 even though the latter was not examined directly in the analysis. 3.5 Prioritizing Polymorphisms for an Association Study Risch and Merikangas (1996) originally proposed studying coding or promoter variants that are most likely to affect the function of a protein or its regulation. However, in the alternative approach suggested by Collins et al. (1997), sequence variants, including noncoding ones, across the entire genome can serve as genetic markers to detect association by virtue of the phenomenon of LD. The effect of coding SNPs can be predicted using simplistic criteria such as determining the severity of the amino acid substitution using BLOSUM62 or Grantham values, or alternatively, through comparative sequence analysis using alignment with evolutionarily related paralogues or orthologues (Stephens et al., 2001;Leabman et al, 2003; Shu et al., 2003). If a promoter SNP resides 29 Chapter Literature Review II: Genetic Analysis of Complex Traits in a highly conserved region, it is likely to have an impact. In addition, effects of promoter SNPs can be verified experimentally using reporter gene assays but this can be timeconsuming and requires technical expertise. Although more difficult to predict their effects and thus less well characterized, nonsynonymous SNPs that not alter protein sequence or are found in the non-coding regions like the 3’ untranslated regions or introns may potentially affect stability, splicing or localization of the mRNA (Pagani and Baralle, 2004). Guidelines for selecting SNPs for a candidate gene association study have been recently published by Tabor et al. (2002). 3.5.1 Candidate Gene Approach In the candidate gene approach, polymorphisms in genes that are known a priori to be part of the physiological process underlying the trait are examined. For most complex traits, numerous candidates are available. Family and twin studies can be useful in helping to gauge the heritability (the proportion of phenotype variation attributed to genes), mode of inheritance and penetrance as well as the number of genes involved. Linkage studies, including those derived from animal models, as well as an understanding of the biological mechanisms underlying the complex trait, provide valuable clues on candidate regions of the genome to investigate. One practical advantage of the hypothesis-driven, candidate gene approach over the whole genome approach is the lower genotyping requirements. The candidate gene approach tends to focus on polymorphisms that are likely to be functional such as those in promoter and coding regions (Tabor et al., 2002). These SNPs can be selected from databases or by performing a resequencing-based SNP discovery in a small number of random individuals of identical ethnicity to the study population. The latter approach is preferred as efficient selection of a minimal set of SNPs for an association study requires a prior assessment of allele frequencies and LD relationships (Carlson et al., 2003). 30 Chapter Literature Review II: Genetic Analysis of Complex Traits 3.5.2 Whole Genome Association Studies The genome scan approach involves genotyping of a dense map of SNPs arrayed across both coding and noncoding regions in cases and controls (Collins et al., 1997). The strategy hypothesizes that the susceptibility allele descended from a single founder in the distant past so the all his/her descendents carry a signature array of alleles (i.e. haplotype) surrounding the causative allele. Thus the causative allele need not be observed directly because the adjacent SNPs serve as surrogates due to the effect of LD. The search for the causal variants could then be limited to regions showing association. It has been suggested that the required number of SNPs is on the order of 105-106 for a whole genome study (Kruglyak and Nickerson, 2001). The existence of haplotype blocks means a potential reduction in map density since only representative SNPs need to be examined to capture the entire haplotype diversity with little loss of statistical power (Daly et al., 2001; Johnson et al., 2001; Reich et al., 2001; Patil et al., 2001; Gabriel et al., 2002). In contrast to the candidate gene approach, the whole genome approach is unbiased and can potentially discover novel players that are involved in the trait/disease process. 3.6 Issues Surrounding Association Studies 3.6.1 Poor Replication Association studies are, in theory, a powerful approach to dissect the genetic basis of complex traits (Risch and Merikangas, 1996). But the typical scenario is that the vast majority of initial, strongly optimistic reports of associations, often published in prestigious journals, frequently fail to be reproduced unequivocally (Ioannidis et al., 2001; Lohmueller et al., 2003). Only 20-30% of claims of statistically significant genetic associations are believed to be true (Lohmueller et al., 2003). Several reasons for the inconsistent replication are listed below. 31 Chapter Literature Review II: Genetic Analysis of Complex Traits 3.6.2 Allelic and Genetic Heterogeneity: Small Genetic Effects on Disease Risk Phenotypes are rarely the manifestation of just one genetic variant but rather, a combination of them, some of which may increase the risk of disease while others are protective (Romero et al., 2002). Ioannidis (2003) notes that most genetic associations with complex diseases represent modest effects that increase the relative risk of getting a disease by a mere 10-50%, with each polymorphism accounting for at most 1-8% of the overall disease risk. Using regression analysis, Comings (2003) estimated only 0.5-2.5% of the total variance is attributed to any one gene, with an average of 1.5% or less. The nature of complex traits means that not one but several genes modulate the disease risk in both additive and epistatic fashion (genetic heterogeneity). Several alleles within the same gene can also modulate susceptibility (allelic heterogeneity). An adequately powered study requires sample sizes of 1000 subjects or more, depending on the frequency of the polymorphism and estimated genetic effect. If the gene-gene and even gene-environment interactions are investigated, even larger sample sizes are probably necessary (Ioannidis, 2003). 3.6.3 Chance Findings In a typical association study, each SNP is tested for association in turn on the same dataset. Assuming 1000 tests are performed and a nominal pre-set statistical significance level of 0.05, 50 false positives are expected by chance alone. The classic approach to reduce chance finding is to use a simple Bonferroni correction for multiple testing (resetting the threshold P value to 0.05/n, where n is the number of tests). However, the Bonferroni method risks over-correction, i.e. actual signals are missed since many tests will not be independent. Thus there is a tradeoff in high numbers of false positives and resulting low power (Cardon and Bell, 2001). In light of genome-wide association studies becoming the norm facilitated by the availability of millions of SNPs in the databases and 32 Chapter Literature Review II: Genetic Analysis of Complex Traits high throughput genotyping technology, multiple testing is clearly a serious issue. Statistical procedures that minimize overall false positive rate while not incur the penalty of low power are gradually becoming available (Hoh et al., 2001). An alternative approach is to select SNPs within coding and regulatory regions of candidate genes which have an increased prior probability of association (the candidate gene approach; Tabor et al., 2002). Permutation methods have been also been suggested to be the most prudent approach to cope with the high variability of LD across the genome (Cardon and Bell, 2001) 3.6.4 Publication Bias First reports of positive associations often exaggerate the genetic effect compared to later studies, a consequence of the winner’s curse phenomenom (Lohmueller et al., 2003). In fact, for the very reason that initial reports over-estimate the relative risk, subsequent studies, which aim to replicate the initial findings, tend to be underpowered (Ioannidis et al., 2001; Hirschhorn and Altschuler, 2002). 3.6.5 Varied Patterns of LD amongst Populations Because the patterns of LD in genes can vary across populations (Shifman et al., 2003 Crawford et al., 2004; Sawyer et al., 2005), associations can be specific to certain populations. This implies that a comprehensive assessment of the sequence variation in the study population is required before selecting suitable SNPs for an association study. 3.6.6 Confounding One main drawback of the association study is its potential for confounding arising from correlation of other factors with the true risk factor and disease, leading to artifactual as opposed to causal associations. Confounding can occur if the prevalence of the disease varies between among for reasons quite unrelated to the locus of interest, such as 33 Chapter Literature Review II: Genetic Analysis of Complex Traits differences in exposure to environmental causes perhaps due to differing customs and which may be beyond the control of the researcher (Clayton, 2001). More frequently, confounding due to the presence of population structure or stratification is often cited as a reason for the lack of replication of a genetic association. A spurious association can arise when the gene under study displays marked variation in allele frequencies across subgroups of the population (Thomas and Witte, 2002; Cardon and Palmer, 2003). It should be noted, however, that stratification can also mask or even reverse the effect of an association (Deng et al., 2001). There has been considerable debate about the true extent of stratification in reality. Empirical data suggest that hidden structure is unlikely to pose a serious threat in case-control studies and self-described ancestry appears sufficient to guard against stratification (Ardlie et al, 2002b; Pankow et al., 2002; Risch et al., 2002; Rosenberg et al., 2002). But others advocate that population structure cannot be safely ignored (Thomas and Witte, 2003; Ziv and Burchard, 2003; Freedman et al., 2004; Marchini et al., 2004). One method of controlling for population stratification is through the use of family controls such as in transmission/disequilibrium test devised by Spielman and co-workers (1993). A second method involves additional genotyping of null or unlinked genetic markers which are unrelated to the trait and candidate gene of interest (Pritchard and Donnelly, 2001). The identified genetic clusters can be used a covariate in the statistical analysis (Pritchard et al., 2000; Hoggart et al., 2003). Alternatively, information from the null genetic markers is used to provide a quantitative estimate of the extent of stratification which is then used to correct the otherwise inflated association test statistic (Devlin and Roeder, 1999; Reich and Goldstein, 2001; Freedman et al., 2004). 34 Chapter Literature Review II: Genetic Analysis of Complex Traits 3.7 Genetic Variation Detection Methods Experimental methods for the detection of genetic variation can be broadly classified into two categories: detection and screening methods. 3.7.1 Detection Methods Detection or diagnostic methods locate pre-established sequence changes, and therefore require prior knowledge of the mutation. These are also typically known as genotyping methods. The biochemical principles underlying various genotyping methods include primer extension, restriction enzyme analysis, hybridization with allelic-specific probes and ligation (Syvanen, 2001). 3.7.2 Screening Methods Screening methods generally not define the precise nature and location of the sequence change. A majority of screening methods make use of the different properties of heteroduplex DNA containing mismatches compared to homoduplex DNA. These methods may detect the different physical properties of heteroduplex DNA such as melting properties (denaturing high performance liquid chromatography, Xiao and Oefner, 2001; denaturing gradient gel electrophoresis, Fodde et al., 1994; temperature gradient gel electrophoresis, Wartell et al., 1990), or detect the mismatches directly by using chemicals (chemical cleavage of mismatch, Cotton et al., 1988) or enzymes (enzymatic cleavage of mismatch) that bind and/or cleave the DNA molecule at the site of mismatch (Taylor et al., 1999). Other methods reply on the alternative structural conformations of single-stranded DNA fragments unique to the nucleotide sequence which can be either resolved directly by their differential band patterns in non-denaturing slab or capillary gel electrophoresis (single-strand conformation polymorphism (SSCP) analysis, Nataraj et al, 1999), or probed by enzymes, producing distinct fragment profiles on denaturing gels (cleavase fragment length polymorphism analysis, Heisler and Lee, 2002). 35 Chapter Literature Review II: Genetic Analysis of Complex Traits It should be noted that definitions of screening and detection methods may overlap. For instance, DNA sequencing encompasses both detection and definition of genetic variants in one operation. The reproducible and characteristic gel migration patterns of variants identified by SSCP may be used to genotype DNA samples. 3.7.3 Choice of Experimental Methods for Genetic Variation Detection Essentially, screening methods avoid sequencing whole stretches of DNA which may only differ by a single base from the wild type. Thus where rare mutations are expected, screening is faster and cheaper than sequencing vast stretches of DNA where no sequence variation is expected. However, screening methods may not possess 100% detection rate, and in some cases, several conditions have to be utilized to ensure optimal coverage of all variants and this could increase costs and lower the throughput. In addition, when a screening method is first developed, it is often benchmarked against a second method (typically DNA sequencing) or by subjecting fragments harbouring known variants to a blinded analysis. Where cost is concerned, DNA sequencing may be seen as disadvantageous and screening methods may be more desirable. But in terms of technique, DNA sequencing is simple and straightforward in approach. The choice of genetic variation detection method will invariably differ from lab to lab. A lab with expertise in a screening method may prefer it to sequencing. With the introduction of less costly smaller DNA capillary sequencers, modifications to sequencing protocol such as improved chemistry, omission of PCR cleanup step, dilutions of the BigDye Terminator mastermix (Livingston et al., 2004), as well as the use of pooled DNA samples (Amos et al., 2000), resequencing has become popular (Carlson et al., 2003; Crawford et al., 2004; Livingston et al., 2004; Bhangale et al., 2005). 36 Chapter Literature Review II: Genetic Analysis of Complex Traits 3.7.4 In silico SNP Discovery Another category of genetic variation discovery method relies on in silico mining detection of SNPs using publicly available sequences. Here, sequences are aligned and base changes identified from regions of overlap. Expressed sequence tags (ESTs) possess several attributes that make them popular resources for in silico SNP mining (Buetow et al., 1999; Garg et al., 1999; Marth et al., 1999; Picoult-Newberg et al., 1999; Irizarry et al., 2000; Cox et al., 2001). There is high redundancy in ESTs due to cDNA libraries being constructed from multiple tissues from one or several donors. In addition, many EST data are publicly accessible including the DNA traces. Being derived from cDNA, candidate SNPs are potentially enriched for coding variants. Pitfalls, however, include the errorprone nature of the reverse transcriptase used in the cDNA construction, the single-pass, low quality nature of EST sequencing, and the 3’ bias. Candidate variants flagged by the in silico approach require validation by a diagnostic method. With the output from the Human Genome Project, however, identifying SNPs from ESTs has become less popular and there is a switch towards analysis using genomic clones (Altshuler et al., 2000; Sachidanandam et al., 2001). 37 [...]... are missed since many tests will not be independent Thus there is a tradeoff in high numbers of false positives and resulting low power (Cardon and Bell, 2001) In light of genome-wide association studies becoming the norm facilitated by the availability of millions of SNPs in the databases and 32 Chapter 3 Literature Review II: Genetic Analysis of Complex Traits high throughput genotyping technology,... 2002) 3. 6.5 Varied Patterns of LD amongst Populations Because the patterns of LD in genes can vary across populations (Shifman et al., 20 03 Crawford et al., 2004; Sawyer et al., 2005), associations can be specific to certain populations This implies that a comprehensive assessment of the sequence variation in the study population is required before selecting suitable SNPs for an association study 3. 6.6... study 3. 6.6 Confounding One main drawback of the association study is its potential for confounding arising from correlation of other factors with the true risk factor and disease, leading to artifactual as opposed to causal associations Confounding can occur if the prevalence of the disease varies between among for reasons quite unrelated to the locus of interest, such as 33 Chapter 3 Literature Review... 35 Chapter 3 Literature Review II: Genetic Analysis of Complex Traits It should be noted that definitions of screening and detection methods may overlap For instance, DNA sequencing encompasses both detection and definition of genetic variants in one operation The reproducible and characteristic gel migration patterns of variants identified by SSCP may be used to genotype DNA samples 3. 7 .3 Choice of. .. (Syvanen, 2001) 3. 7.2 Screening Methods Screening methods generally do not define the precise nature and location of the sequence change A majority of screening methods make use of the different properties of heteroduplex DNA containing mismatches compared to homoduplex DNA These methods may detect the different physical properties of heteroduplex DNA such as melting properties (denaturing high performance... alleles within the same gene can also modulate susceptibility (allelic heterogeneity) An adequately powered study requires sample sizes of 1000 subjects or more, depending on the frequency of the polymorphism and estimated genetic effect If the gene-gene and even gene-environment interactions are investigated, even larger sample sizes are probably necessary (Ioannidis, 20 03) 3. 6 .3 Chance Findings In a typical... variability of LD across the genome (Cardon and Bell, 2001) 3. 6.4 Publication Bias First reports of positive associations often exaggerate the genetic effect compared to later studies, a consequence of the winner’s curse phenomenom (Lohmueller et al., 20 03) In fact, for the very reason that initial reports over-estimate the relative risk, subsequent studies, which aim to replicate the initial findings, tend... coverage of all variants and this could increase costs and lower the throughput In addition, when a screening method is first developed, it is often benchmarked against a second method (typically DNA sequencing) or by subjecting fragments harbouring known variants to a blinded analysis Where cost is concerned, DNA sequencing may be seen as disadvantageous and screening methods may be more desirable But in. .. cleanup step, dilutions of the BigDye Terminator mastermix (Livingston et al., 2004), as well as the use of pooled DNA samples (Amos et al., 2000), resequencing has become popular (Carlson et al., 20 03; Crawford et al., 2004; Livingston et al., 2004; Bhangale et al., 2005) 36 Chapter 3 Literature Review II: Genetic Analysis of Complex Traits 3. 7.4 In silico SNP Discovery Another category of genetic variation... effects that increase the relative risk of getting a disease by a mere 10-50%, with each polymorphism accounting for at most 1-8% of the overall disease risk Using regression analysis, Comings (20 03) estimated only 0.5-2.5% of the total variance is attributed to any one gene, with an average of 1.5% or less The nature of complex traits means that not one but several genes modulate the disease risk in both . interactions are investigated, even larger sample sizes are probably necessary (Ioannidis, 20 03) . 3. 6 .3 Chance Findings In a typical association study, each SNP is tested for association in. variation in the study population is required before selecting suitable SNPs for an association study. 3. 6.6 Confounding One main drawback of the association study is its potential for confounding. detect association by virtue of the phenomenon of LD. The effect of coding SNPs can be predicted using simplistic criteria such as determining the severity of the amino acid substitution using