Ebook Veterinary epidemiology (4/E): Part 1

404 54 0
Ebook Veterinary epidemiology (4/E): Part 1

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Part 1 book “Veterinary epidemiology” has contents: The development of veterinary medicine, the scope of epidemiology, causality, describing disease occurrence, determinants of disease, the transmission and maintenance of infection, the ecology of disease, patterns of disease, comparative epidemiology,… and other contents.

Veterinary Epidemiology Veterinary Epidemiology Fourth Edition Michael Thrusfield Veterinary Clinical Sciences Royal (Dick) School of Veterinary Studies University of Edinburgh With Robert Christley Epidemiology & Population Health Institute of Infection & Global Health, and Institute of Veterinary Science University of Liverpool And Helen Brown, Peter J Diggle, Nigel French, Keith Howe, Louise Kelly, Annette O’Connor, Jan Sargeant and Hannah Wood This edition first published 2018 © 2018 by John Wiley & Sons Ltd Edition History First edition © 1986 by Butterworth & Co (Publishers) Ltd Second edition © 1995 by Blackwell Science Ltd Third edition © 2005, 2007 by Blackwell Science Ltd, a Blackwell Publishing company All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions The right of Michael Thrusfield and Robert Christley to be identified as the authors of the editorial material in this work has been asserted in accordance with law Registered Offices John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK Editorial Office 9600 Garsington Road, Oxford, OX4 2DQ, UK For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com Wiley also publishes its books in a variety of electronic formats and by print-on-demand Some content that appears in standard print versions of this book may not be available in other formats Limit of Liability/Disclaimer of Warranty The contents of this work are intended to further general scientific research, understanding, and discussion only and are not intended and should not be relied upon as recommending or promoting scientific method, diagnosis, or treatment by physicians for any particular patient In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of medicines, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each medicine, equipment, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make This work is sold with the understanding that the publisher is not engaged in rendering professional services The advice and strategies contained herein may not be suitable for your situation You should consult with a specialist where appropriate Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages Library of Congress Cataloging-in-Publication Data Names: Thrusfield, M V., author | Christley, Robert, 1968- author Title: Veterinary epidemiology / by Michael Thrusfield, Veterinary Clinical Sciences, Royal (Dick) School of Veterinary Studies, University of Edinburgh ; with Robert Christley and [8 others] Description: Fourth edition | Hoboken, NJ : Wiley, 2018 | Includes bibliographical references and index | Identifiers: LCCN 2017051658 (print) | LCCN 2017053201 (ebook) | ISBN 9781118280263 (pdf) | ISBN 9781118280270 (epub) | ISBN 9781118280287 (paperback) Subjects: LCSH: Veterinary epidemiology | MESH: Epidemiologic Methods–veterinary Classification: LCC SF780.9 (ebook) | LCC SF780.9 T48 2018 (print) | NLM SF 780.9 | DDC 636.089/44–dc23 LC record available at https://lccn.loc.gov/2017051658 Cover images: (Top: from left to right) © holbox/Shutterstock; © Paul Looyen/Shutterstock; © Lorado/Gettyimages; © Lisa Van Dyke/Gettyimages; (Map) © yukipon/Gettyimages; (Bottom: from left to right) © Palenque/Gettyimages; © l i g h t p o e t/Shutterstock; © Seiji/Shutterstock; © claire norman/Shutterstock Cover design by Wiley Set in 10/12 pt Warnock by SPi Global, Pondicherry, India 10 In memory of George vii Contents Contributors xviii From the preface to the first edition xix From the preface to the second edition xx From the preface to the third edition xxi Preface to the fourth edition xxii About the companion website xxiv The development of veterinary medicine Michael Thrusfield Historical perspective Domestication of animals and early methods of healing Changing concepts of the cause of disease Impetus for change Quantification in medicine 10 Contemporary veterinary medicine 12 Current perspectives 12 The fifth period 19 Recent trends 20 Further reading 25 The scope of epidemiology Michael Thrusfield 28 Definition of epidemiology 28 The uses of epidemiology 29 Types of epidemiological investigation 32 Epidemiological subdisciplines 33 Components of epidemiology 35 Qualitative investigations 35 Quantitative investigations 36 Epidemiology’s locale 39 The interplay between epidemiology and other sciences 39 The relationship between epidemiology and other diagnostic disciplines Epidemiology within the veterinary profession 40 Further reading 41 Causality 42 Michael Thrusfield Philosophical background 42 Causal inference 43 Methods of acceptance of hypotheses 44 40 viii Contents Koch’s postulates 45 Evans’ rules 45 Variables 46 Types of association 46 Non-statistical association 46 Statistical association 46 Confounding 49 Causal models 50 Formulating a causal hypothesis 53 Methods of deriving a hypothesis 53 Principles for establishing cause: Hill’s criteria Further reading 56 Describing disease occurrence Michael Thrusfield 55 58 Some basic terms 58 Basic concepts of disease quantification 61 The structure of animal populations 62 Contiguous populations 62 Separated populations 65 Measures of disease occurrence 67 Prevalence 67 Incidence 67 The relationship between prevalence and incidence rate 70 Application of prevalence and incidence values 72 Mortality 72 Survival 73 Example of calculation of prevalence, incidence, mortality, case fatality and survival Ratios, proportions and rates 76 Mapping 80 Geographic base maps 80 Further reading 84 Determinants of disease Michael Thrusfield 86 Classification of determinants 86 Host determinants 89 Genotype 89 Age 90 Sex 91 Species and breed 92 Behaviour 93 Other host determinants 93 Agent determinants 94 Virulence and pathogenicity 94 Gradient of infection 97 Outcome of infection 98 Microbial colonization of hosts 100 Environmental determinants 101 Location 101 Climate 101 Husbandry 104 75 Contents Stress 105 Interaction 106 Biological interaction 108 Statistical interaction 109 The cause of cancer 110 Further reading 112 The transmission and maintenance of infection Michael Thrusfield 115 Horizontal transmission 115 Types of host and vector 115 Factors associated with the spread of infection 118 Routes of infection 121 Methods of transmission 123 Long-distance transmission of infection 125 Vertical transmission 129 Types and methods of vertical transmission 129 Immunological status and vertical transmission 129 Transovarial and trans-stadial transmission in arthropods Maintenance of infection 131 Hazards to infectious agents 131 Maintenance strategies 132 Transboundary diseases 135 Further reading 136 The ecology of disease Michael Thrusfield 130 138 Basic ecological concepts 139 The distribution of populations 139 Regulation of population size 142 The niche 148 Some examples of niches relating to disease 150 The relationships between different types of animals and plants 152 Ecosystems 155 Types of ecosystem 156 Landscape epidemiology 158 Nidality 159 Objectives of landscape epidemiology 161 Landscape characteristics determining disease distribution 164 Further reading 165 Patterns of disease Michael Thrusfield 168 Epidemic curves 168 Kendall’s Threshold Theorem 168 Basic reproductive number (R0) 169 Dissemination rate 172 Common-source and propagating epidemics 172 The Reed–Frost model 173 Kendall’s waves 175 Trends in the temporal distribution of disease 177 Short-term trends 177 ix x Contents Cyclical trends 178 Long-term (secular) trends 179 True and false changes in morbidity and mortality 180 Detecting temporal trends: time series analysis 180 Trends in the spatial and temporal distribution of disease Spatial trends in disease occurrence 186 Space–time clustering 186 Further reading 187 Comparative epidemiology Michael Thrusfield 189 Types of biological model 189 Cancer 191 Monitoring environmental carcinogens 191 Identifying causes 192 Comparing ages 193 Some other diseases 196 Diseases with a major genetic component 196 Some non-infectious diseases 197 Diseases associated with environmental pollution Reasoning in comparative studies 199 Further reading 199 10 The nature of data Michael Thrusfield 198 201 Classification of data 201 Scales (levels) of measurement 201 Composite measurement scales 204 Data elements 205 Nomenclature and classification of disease 205 Diagnostic criteria 207 Sensitivity and specificity 208 Accuracy, refinement, precision, reliability and validity Bias 210 Representation of data: coding 210 Code structure 211 Numeric codes 212 Alpha codes 213 Alphanumeric codes 214 Symbols 215 Choosing a code 215 Error detection 216 Further reading 217 11 186 Data collection and management Michael Thrusfield 219 Data collection 219 Questionnaires 219 Quality control of data 228 Data storage 229 Database models 229 Non-computerized recording techniques 231 209 Contents Computerized recording techniques 232 Veterinary recording schemes 232 Scales of recording 232 Veterinary information systems 234 Some examples of veterinary databases and information systems Geographical information systems 244 Further reading 248 12 Presenting numerical data 251 Michael Thrusfield and Robert Christley Some basic definitions 251 Some descriptive statistics 252 Measures of position 253 Measures of spread 254 Statistical distributions 254 The Normal distribution 254 The binomial distribution 255 The Poisson distribution 255 Other distributions 256 Transformations 256 Normal approximations to the binomial and Poisson distributions Estimation of confidence intervals 257 The mean 257 The median 258 A proportion 258 The Poisson distribution 259 Some epidemiological parameters 260 Other parameters 261 Bootstrap estimates 261 Displaying numerical data 262 Displaying qualitative data 262 Displaying quantitative data 263 Monitoring performance: control charts 266 Further reading 269 13 Surveys 270 Michael Thrusfield and Helen Brown Sampling: some basic concepts 270 Types of sampling 272 Non-probability sampling methods 272 Probability sampling methods 272 What sample size should be selected? 275 Estimation of disease prevalence 275 Detecting the presence of disease 284 The cost of surveys 290 Calculation of confidence intervals 290 Further reading 294 14 237 Demonstrating association Michael Thrusfield 296 Some basic principles 296 The principle of a significance test The null hypothesis 297 296 257 xi xii Contents Errors of inference 297 Multiple significance testing 298 One- and two-tailed tests 298 Independent and related samples 299 Parametric and non-parametric techniques 299 Hypothesis testing versus estimation 300 Sample-size determination 300 Statistical versus clinical (biological) significance 300 Interval and ratio data: comparing means 302 Hypothesis testing 302 Calculation of confidence intervals 303 What sample size should be selected? 304 Ordinal data: comparing medians 304 Hypothesis testing 304 Calculation of confidence intervals 308 What sample size should be selected? 309 Nominal data: comparing proportions 309 Hypothesis testing 310 Calculation of confidence intervals 313 What sample size should be selected? 314 χ test for trend 314 Correlation 316 Multivariate analysis 317 Statistical packages 318 Further reading 318 15 Observational studies Michael Thrusfield 319 Types of observational study 319 Cohort, case-control and cross-sectional studies Measures of association 321 Relative risk 321 Odds ratio 323 Attributable risk 325 Attributable proportion 327 Interaction 328 The additive model 328 Bias 330 Controlling bias 332 What sample size should be selected? 335 Calculating the power of a study 336 Calculating upper confidence limits 337 Further reading 338 16 Design considerations for observational studies Robert Christley and Nigel French 319 339 Descriptive observational studies 339 Analytical observational studies 340 Design of cohort studies 340 Design of case-control studies 346 Design of cross-sectional analytical studies 352 Overview of other study designs 354 Further reading 359 368 17 Clinical trials assessed; the elementary unit is then the quarter, not the animal The experimental unit may be a group because events at the individual level cannot be measured, even though they are of interest For instance, in trials of infeed compounds likely to affect weight gain in poultry and pigs, either the amount eaten by, or the weight increase of, individuals within a house or pen is not recorded This often arises because it is not practical to identify individual animals at weighing Consequently, liveweight gain per house or pen is the response variable Thus, the efficacy of in-feed antibiotic medication in reducing the incidence of streptococcal meningitis in pigs could be assessed by dividing a herd into pens containing a specified number of animals (Johnston et al., 1992) The treatment is then randomly allocated to the pens, and medicated and placebo diets supplied to pigs in the respective treatment and control pens Additionally, when animals are penned together, external factors (e.g., farm hygiene) may affect the groups, and such ‘group effects’ cannot be separated from individual treatment effects (Donner, 1993; Speare et al., 1995) In these circumstances, samplesize determination for the trial and subsequent analyses must take into account the clustering of individuals within pens, because the clustering results in non-independent observations A particular problem arises with trials involving some infectious diseases If the treatment could reduce excretion of infectious agents (e.g., vaccination in poultry houses or anthelmintic trials on farms), then treated and control animals should not be kept together because any reduction in infection ‘pressure’ will benefit treated and control animals; similarly, control animals constitute a source of infection to treated animals This can lead to similar results in both categories (Thurber et al., 1977), therefore reducing the likelihood of detecting beneficial therapeutic effects The practice of mixing animals in each group is therefore unacceptable when herd immunity or group immunity is being assessed In these circumstances, an appropriate independent unit must be identified Thus, separate houses could be used on an intensive poultry enterprise, or separate tanks on a fish farm Dairy farms, in contrast, usually have a continuous production policy with mixing of animals, and so the herd may become the experimental unit The experimental population The population in which a trial is conducted is the experimental population This should be representative of the target population (see also Chapter 13) Differences between experimental and target populations may result in the trial not being generalizable (externally valid); that is, unbiased inferences regarding the target population cannot be made For example, findings from a trial of an anaesthetic drug conducted only on thoroughbred horses may not be relevant to the general horse population because of differences in level of fitness between thoroughbreds and other types of horse (Short, 1987) External validity (which is facilitated by conducting trials ‘in the field’) contrasts with internal validity, which indicates that observed differences between treatment and control groups in the experimental population can be legitimately attributed to the treatment Internal validity is obtained by good trial design (e.g., randomization) The evaluation of external validity usually requires much more information than assessment of internal validity Prophylactic trials may require selection of an experimental population that is at high risk of developing disease so that natural challenge can be anticipated during the period of the trial Previous knowledge of disease on potential trial sites may be sufficient to identify candidate populations (Johnston et al., 1992) However, the period of natural challenge may vary, reflecting complex patterns of infection Many infections are seasonal (see Figure 8.15); others may be poorly predictable (Clemens et al., 1993) Admission and exclusion criteria Criteria for inclusion of animals in a trial (admission criteria, eligibility criteria) must be defined These should be listed in the protocol, and include: • • a precise definition of the condition on which the treatment is being assessed; the criteria for diagnosis of the condition For example, in the trial of the efficacy of evening primrose oil in the treatment of canine atopy, chronically pruritic dogs were included only if they conformed to a documented set of diagnostic criteria (Willemse, 1986) and reacted positively to the relevant intradermal skin tests Similarly, specific types of mastitis may need to be defined in bovine mastitis trials; other admission criteria could include parity and stage of lactation Exclusion criteria are the corollaries of admission criteria Thus, dogs with positive reactions to flea allergens were excluded from the trial of evening primrose oil Cows might be excluded from a mastitis trial if they had been previously treated for mastitis during the relevant lactation, if they had multiple mammary infections, or if they also had other diseases that Design, conduct and analysis could affect treatment Trials of non-steroidal antiinflammatory drugs would require exclusion from the treatment group of animals to which corticosteroids were being administered However, too many exclusion criteria should be avoided; otherwise, external validity may be compromised It may be prudent to accommodate factors either in the trial design by stratification, or during the analysis Informed consent The objectives and general outline of a trial should be explained to owners of animals that are included in the trial, and then their willingness to participate recorded before the trial begins This is informed consent This has been poorly documented in veterinary studies in the past (Lund et al., 1998) Although research institutions and veterinary schools often have ethical review boards that are able to assess informed consent, such reviews may not be available to veterinary practitioners and others who may be involved in clinical trials However, ethical guidelines for the conduct of research in general veterinary practice, including clinical trials, have been suggested by a number of official veterinary organizations (e.g., Trees et al., 2013) Blinding Randomization Blinding (masking) is a means of reducing bias In this technique, those responsible for measurements or clinical assessment are kept unaware of the treatment assigned to each group The traditional classification of blinding into single or double (full) is based on whether the owner or attendant (patient in human medicine) or investigator is ‘blinded’ (Table 17.4) ‘The investigator’ can be more than one category of person; for example, participating veterinary practitioners and the principal investigators that analyse the results (the term treble blinding or triple blinding has been advocated in this situation) A problem with this classification is that there is considerable variability in clinicians’ interpretations and epidemiology-textbook definitions of these terms Table 17.4 Summary of traditional types of blinding to assignment of treatment Knowledge of assignment of treatment Type of blinding (Devereaux et al., 2001) The blinding status of all people for whom blinding may influence the validity of a trial thus should be explicitly reported, to avoid any ambiguity, and some therefore advocate discontinuation of the traditional blinding terms (Moher et al., 2010) Blinding should be employed wherever possible, and is facilitated by the use of a placebo in the control group However, there may be circumstances in which blinding is not feasible; for example, if two radically different treatments are being compared (e.g., comparing infiltration of local anaesthetic with bloodless castrators to reduce pain associated with castration and tail-docking of lambs: Kent et al., 2004), or if formulation of visually identical ‘trial’ and ‘standard’ drugs is impracticable Such unblinded studies are sometimes termed open-label trials (Everitt, 2006) Open-label trials can be avoided by partial blinding through denying personnel involved in clinical assessment access to details of treatment If full blinding is infeasible, those that are blinded (sponsor, investigator or owner) should be clearly documented, as should any intentional or unintentional breaking of blinding Owner Investigator None Yes Yes Single No Yes Double (full) No No Simple randomization Simple randomization is the most basic type of randomization When there are only two treatments, tossing a coin is an elementary method However, it is usually more rigorous to randomize in advance using random numbers (using Appendix X or appropriate software), allocating units identified by odd numbers to one group, and evenly numbered units to the other Randomization should be applied after eligible units have been identified When comparing a new treatment with an established one, and there is evidence that the new treatment is superior, it can be allocated to twice the number of units as the established one (Peto, 1978) This can increase the benefit to participating animals For example, if a new treatment was expected to reduce mortality by 50%, 2:1 randomization would be expected to produce an equal number of deaths in the two groups This randomization ratio can be obtained by using twice as many random numbers for allocation of the new treatment as those used to allocate the established one There is no advantage in increasing the ratio further, because of the resultant loss of statistical power, which can only be counteracted by increasing the total sample size 369 370 17 Clinical trials Block randomization Simple randomization can produce grossly uneven totals in each group if a small trial is undertaken This problem can be overcome using block (restricted) randomization This limits randomization to blocks of units, and ensures that within a block equal numbers are allocated to each treatment For example, if randomization is restricted to units of four animals, receiving either treatment A or treatment B, the numbers 1–6 are attached to the six possible treatment allocations in a block: AABB, ABBA, ABAB, BBAA, BAAB and BABA One of these numbers is then selected randomly for the next block of four individuals entering the trial, and given its treatment allocation Stratification Some factors (e.g., age, parity or severity of disease) may be known to affect the outcome of a trial and may bias results if they are unevenly distributed between the treatment and control groups This can be taken into account during initial randomization by stratifying both groups according to these confounding factors The experimental units are then allocated to treatment and control groups within the strata, using simple or block randomization18 Stratification leads to related samples and therefore decreases the number of units that are required to detect a specified difference between treatment and control groups (see Chapter 14) These and other methods of randomization are described in detail by Rosenberger and Lachlin (2002), Berger and Antsygina (2015) and Baghbaninaghadehi and others (2016) Alternatives to randomization Some alternatives to randomization include allocation according to date of entry (e.g., treatment on odd days, placebo on even days), clinic record number, wishes of the owner, and preceding results An example of the last method is the ‘play-the-winner’ approach (Zelen, 1969): if a treatment is followed by success, the next unit receives the same treatment; if it is followed by failure, the next unit receives the alternative treatment This limits the number of animals receiving an inferior treatment All of these techniques have disadvantages 18 The best way to make individuals in treatment and control groups as alike as possible with respect to characteristics that may affect outcome is by individual matching (see Chapter 15) In a trial with two treatment groups, this would necessitate finding pairs of individuals matched on all factors selected for matching and then randomizing the pairs; however, this is usually impracticable and should never be considered as acceptable alternatives to randomization (Bulpitt, 1996) Minimization is a largely non-random method, which aims to ensure that treatment groups are balanced with predefined patient factors, as well as with the number of individuals in each group (Pocock and Simon, 1975; Taves, 2010) This can provide better balanced groups than block randomization, and can incorporate more factors than stratification (Scott et al., 2002) Trial designs There are four main trial designs: parallel-group (standard); cross-over; sequential; factorial Parallel-group-design (standard) trials The parallel-group (standard) design is commonly used in confirmatory trials Experimental units are randomized to a single treatment group using either simple or block randomization, and each group receives a single treatment A specified number of units enter the trial and are followed for a predetermined period of time, after which the treatment is stopped The basic design can be refined by stratification The analytical techniques employed in a parallel trial involving two unstratified groups are listed in Table 17.2 Estimation of parameters with associated confidence intervals is preferred to hypothesis testing, for the reasons given in Chapter 14 Confidence intervals also should be quoted for negative, as well as positive, results Stratification may require complex analysis (Meinert, 2012), but this approach is seldom used in veterinary product development Cross-over-design trials In a cross-over trial, subjects are exposed to more than one treatment consecutively, each treatment regimen being selected randomly (Hills and Armitage, 1979) Experimental units therefore serve as their own controls, and treatment and control groups are therefore individually matched This design is useful when treatments are intended to alleviate a condition, rather than effect a cure, so that after the first treatment is withdrawn the subject is in a position to receive a second Examples are comparisons of antiinflammatory drugs in arthritis, and hypoglycaemics in diabetes Moreover, a comparison on the same Design, conduct and analysis individuals is likely to be more precise than a comparison between subjects because the responses are paired (see Chapter 14) The cross-over trial is therefore valuable if the number of experimental units is limited However, analysis of results is complex if a treatment effect carries over into the next treatment period If treatment effects not carry over into subsequent treatment periods, the techniques described in Chapter 14 for the analysis of related samples can be used However, the absence of a carry-over effect may be difficult to prove If there is any doubt, conclusions should be based only on the first period, using analyses of independent samples Alternatively, more complex methods that identify interactions between treatment effect and period of treatment can be applied (Jones and Kenward, 2015) Sequential-design trials A sequential trial is one whose conduct at any stage depends on the results so far obtained (Whitehead, 1997) Two treatments are usually compared, and experimental units (usually individuals) enter the trial in pairs; one individual being given one treatment, and one the other Results are then analysed sequentially according to the outcome in the pairs, and boundaries are drawn to define levels at which specified differences are obtained at the desired level of statistical significance The trial may be terminated when these levels are reached If the desired level is not reached, the investigator may decide to increase the sample size indefinitely until the former is reached; this is an open19 trial Alternatively, the trial may be terminated if a specified difference is not reached by a certain stage; this is a closed trial Sequential trials facilitate early detection of beneficial treatment effects and can require fewer experimental units However, they may be difficult to plan because their duration is initially unknown They also are unsuited to trials in which treatment response times are long because responses need to be analysed quickly so that a decision can be taken to enlist more subjects, if necessary A key feature of sequential trials therefore is that significance tests are conducted repeatedly on accumulating data This tends to increase the overall significance level (see Chapter 14: ‘Multiple significance testing’, and Armitage et al., 1969) For example, if five interim analyses, rather than one, are conducted, the chance of at least one analysis showing a treatment 19 This should not be confused with open, uncontrolled trials (mentioned earlier in this chapter) in which the same animals are compared before, and after, treatment difference at the 5% level (α = 0.05) increases to 0.23 (i.e., − [1 − α]5); if 20 interim analyses are undertaken, it increases to 0.64 (i.e., − [1 − α]20) The overall Type error therefore increases if, for any single interim analysis, α = 0.05 is used as the trial’s stopping criterion If data are analysed frequently enough, a value of P < 0.05 is likely, regardless of whether there is a treatment difference This problem can be overcome by choosing a more stringent nominal significance level for each repeated test, so that the overall significance level is kept at a reasonable value such as 0.05 or 0.01 (Pocock, 1983) Table 17.5 can be used for this purpose under two-tailed conditions For example, if the overall significance level is set at α = 0.05, and if a maximum of three analyses is anticipated, P < 0.022 is used as the stopping rule for a treatment difference at each analysis; similarly, if a maximum of five analyses is anticipated, P < 0.016 is used Suitable values for one-sided tests are given by Demets and Ware (1980) Sample-size calculations should therefore be modified if more than one significance test is planned (Wittes, 2002) Trials also can be adapted during their remaining time to allow for modification of particular design elements (e.g., sample size, randomization ratio) at an interim analysis, with full control of Type I error, in which circumstance they are termed adaptive-design trials (Pong and Chow, 2010) Sequential trials are considered in detail by Armitage (1975), Whitehead (1997) and Ellenberg and others (2002) Table 17.5 Nominal significance level required for repeated significance testing with an overall significance level, α = 0.05 or 0.01, and various values of N, the maximum number of tests (From Pocock, 1977.) α = 0.05 α = 0.01 0.0294 0.0056 0.0221 0.0041 0.0182 0.0033 0.0158 0.0028 0.0142 0.0025 0.0130 0.0023 0.0120 0.0021 0.0112 0.0019 10 0.0106 0.0018 15 0.0086 0.0015 20 0.0075 0.0013 N 371 372 17 Clinical trials Factorial-design trials If two factors, A and B, are to be investigated at a levels and b levels, respectively, this gives rise to ab experimental conditions, corresponding to all possible combinations of the levels of the two factors; this is a complete a × b factorial-design study (Zar, 1996) Thus, in a × factorial design where one factor is the absence or presence of treatment A (a = 2) and the other factor is the absence or presence of treatment B (b = 2), animals are randomly allocated to one of the four combinations of two treatments thus: A alone, B alone, A and B together, and neither A nor B This is a powerful method of testing the effect of two factors in the same study, using the same experimental units It can be used to explore any interactions that might occur between the two treatments and, in the absence of interaction, enables groups to be combined to increase the power to detect the effects of treatment A and treatment B The approach can be extended to any number of factors, and with each factor having a different number of levels What sample size should be selected? Superiority trials The number of experimental units in treatment and control groups in a superiority trial should be determined using the techniques outlined in others chapters (Table 17.2) In summary, the following four parameters should be considered: the acceptable level of Type I error, α (the probability of erroneously inferring a difference between treatment and control groups); test power, − β (the probability of correctly inferring a difference between treatment and control groups) where β = the probability of Type II error (the probability of erroneously missing a true difference between treatment and control groups); the magnitude of the treatment effect (i.e., the difference between proportions, medians or means); the choice of alternative hypothesis: ‘one-tailed’ or ‘two-tailed’ There is no rule for defining parameters 1–3 Type I error is traditionally set at 0.05, but a value as low as 0.01 can be justified if a trial is unique and its findings are unlikely to be repeated in the future Power can vary considerably (values between 0.50 and 0.95 have been quoted in human clinical trials; 0.80 is common when α = 0.05, and 0.96 when α = 0.01: see Chapter 14) The magnitude of the treatment effect depends on its clinical and economic relevance (In clinical trials in which treatment and control groups are matched, the formulae for sample-size determination listed in previous chapters will tend to overestimate the number of units required.) If a placebo or no treatment has been administered to the control group, and there is therefore intuitive evidence that the treatment can cause only an improvement in comparison with the control group, then a one-tailed test (see Chapter 14) is justifiable, and the sample size can be determined accordingly However, the use of placebos or ‘negative’ control groups is now ethically debatable; consequently, many contemporary clinical trials use a ‘positive’ control group and it is therefore prudent to assume two-tailed conditions (i.e., the treatment under test may be either better, or worse, than the standard treatment) Additionally, the magnitude of the difference between treatment and ‘positive’ control groups may be small; thus, large sample sizes may be specified These may be unattainable in practice However, knowledge of sample-size determination is necessary to appreciate the inferential limitations that may be imposed by the number of experimental units included in a trial Wittes (2002) presents a general discussion of sample-size calculation in clinical trials, and Machin and others (1997) tabulate sample sizes Sample-size determination for cross-over trials is discussed by Senn (2002) Sample-size determination for sequential trials is discussed by Armitage (1975) and Whitehead (1997); the estimated sample size for a given Type I and Type II error is smaller than for a non-sequential trial General guidelines are provided by Shuster (1992) Hayes and Bennett (1999) review methods of sample-size calculation for cluster randomized trials Hallstrom and Trobaugh (1985) provide formulae that incorporate diagnostic sensitivity and specificity (see Chapters 10 and 20) Norman and his colleagues (2012) discuss determination of sample size in the absence of good data for parameter values Equivalence and non-inferiority trials Determination of sample size to demonstrate equivalence focusses on the maximum difference that is tolerated – termed the margin of clinical equivalence (M) This is the largest difference that is clinically acceptable, larger differences having unwelcome consequences; for example, a difference in mean blood glucose levels induced by a new hypoglycaemic, relative to an established drug, such that signs of diabetes recur The common context is therefore of demonstrating non-inferiority Thus, for a dichotomous response variable, the sample-size formula to demonstrate a difference between two proportions (see Chapter 14) is applied, but p1 − p2 is now the margin of clinical tolerance, rather than the difference to be detected Moreover, the values of α Design, conduct and analysis and β are reversed because attention now focusses on the power of the comparison to detect any difference that may be present For example, a new footrot vaccine may be compared with the one listed in Table 17.3, which prevents disease in 296 of 317 sheep (93%) In determining equivalence of the two vaccines, it may be considered acceptable not to detect a difference as trivial as 5% or less in favour of the established vaccine (in which circumstance the vaccines are deemed to be equivalent), but desirable to detect a difference greater than 5% (in which circumstance they are identified as not being equivalent); thus M = 5% Note that this is a one-tailed situation, because if non-equivalence is demonstrated it is only in the direction of the new vaccine being inferior to the established one – not either inferior or superior Assume that the two vaccines are equivalent, with the proportion of disease in vaccinated sheep = 0.07 (i.e., 100% − 93%: the estimate for the established vaccine) Thus, p1 = 0.07 If M = 5%, p2 = 0.07 + 0.05 = 0.12, and (p1 + p2)/2 = (0.07 + 0.12)/2 = 0.095 = p Set β at 0.05 (that is, power = 0.95), and set α at 0.20; thus, from Appendix XV, Mβ = 1.64, and Mα = 0.84 (because the hypothesis is one-tailed) The number of animals required in each vaccinated group, n, is then derived thus: n= = Ma 2p 1− p + Mβ p1 1− p1 p2 − p2 p2 − p1 2 84 19 × 905 + 64 07 × 93 + 12 × 88 = 3483 + 6776 12 − 07 2 0025 = 421 Therefore, a trial comprising 421 animals vaccinated with the established vaccine, and 421 animals vaccinated with the new vaccine, will detect any difference between the performance of the two vaccines as small as an absolute difference in disease occurrence of 5%, but no smaller, with probability 0.95 Alternatively, the requirement may be to show that a new vaccine is neither inferior nor superior to an established one (i.e., assessment of full equivalence) In this circumstance, Mβ is obtained from Appendix XV using a β value obtained by setting − 2β equal to the overall power required for the two one-sided tests that need to be conducted For example, if the overall power is to be 0.95, then Mβ should be based on β being 0.025 (i.e., Mβ = 1.96) The same approach can be adopted for continuous and ordinal variables, using the relevant formulae for sample-size determination for differences between two means or two groups of ordinally-ranked data (see Chapter 14) Losses to follow-up The outcome of a trial may not be recorded in some experimental units because they are lost to followup For example, owners may move house or refuse to continue with the trial The extent of this loss to follow-up needs to be assessed, and is frequently based on the experience of the investigator The sample size then needs to be increased by multiplying the sample size by 1/(1 − d), where d is the anticipated proportion of experimental units lost For example, if d = 10/100, the sample size would need to be multiplied by 1.11 (1/0.9) to compensate Compliance The success of a trial depends on participants acting in accordance with the instructions of the trial’s designers; that is, complying with treatment For example, they may decide to switch from the treatment under trial to an alternative treatment Poor compliance will decrease the statistical power of the trial because the observed difference in outcome between treatment and control groups will be reduced, but it will not produce spurious differences between groups Reasons for poor compliance include: •• •• •• • unclear instructions; forgetfulness; inconvenience of participation; cost of participation; preference for alternative procedures; disappointment with results; side-effects •• •• • enrolling motivated participants; assessing the willingness of participants to comply; providing incentives (e.g., free treatment); supplying simple, unambiguous instructions; limiting duration of the trial Participants cannot be forced to comply, and so regular contact should be maintained with them so that they can be encouraged to comply, and the degree of compliance should be regularly assessed For example, if a treatment is formulated as a tablet, the number of tablets remaining can be counted regularly by the veterinarian Assessment may be difficult (e.g., with in-feed medication) but should, nevertheless, be attempted Other methods of improving compliance include: 373 374 17 Clinical trials If non-compliance is substantial, the required sample size should again be modified in the same way as adjustment for loss to follow-up If both losses to follow-up and non-compliance are anticipated, a composite value for d is required The potential for loss to follow-up and incomplete compliance raises issues about how a trial is analysed One option involves analysis by intention-to-treat, which compares individuals in the groups to which they were originally randomly assigned This is generally interpreted as including all individuals, regardless of the treatment they actually received, and subsequent withdrawal or deviation from the trial protocol This approach preserves the initial randomization; moreover, any deviations also are likely to occur when treatments are applied in routine clinical practice Intention-to-treat analysis therefore is most suitable for trials of effectiveness (i.e., the effect of the treatment in all to whom it is offered), particularly in superiority trials, rather than investigations of efficacy, which it tends to underestimate (Armitage et al., 2002) A second option is analysis per protocol, in which departures from the trial protocol are excluded from the groups to which they were initially assigned, and which allows regrouping and subsequent analysis of participants according to the treatment that they actually received This therefore explores the consequences of very specific treatment regimens, which are unlikely to be typical of the target population As such, per protocol analysis may provide initial insights into the value of a treatment, but is generally no substitute for intention-to-treat analysis20 Terminating a trial The number of experimental units entering a trial and the duration of treatment are specified during the design of a trial; therefore a trial will usually last as long as it takes to enlist the units and for the last unit to complete the trial However, it may be necessary to terminate a trial (particularly a long-term one) prematurely if there are serious adverse side-effects in the treatment group or the number of patients recruited will not be adequate, and such decision rules should be written into the trial’s protocol In sequential trials, another decision rule may be that a trial will be terminated when the specified difference is detected to the 20 Some authors recommend per protocol analysis for equivalence and non-inferiority trials (Dasgupta et al., 2010), although the relative merits of it and the intention-to-treat approach are less clear for non-inferiority trials (Sanchez and Chen, 2006) predetermined level of significance (see earlier in this chapter) Decision rules, and the advantages of early and late termination of trials, are discussed in detail by Bulpitt (1996) Interpretation of results In Chapter 14, the use of statistical hypothesis testing as an approach to interpreting data was discussed Increasingly, however, this is being replaced by estimation – in particular, by calculating confidence intervals (introduced in Chapter 12) Significance testing and confidence interval estimation are two ways of interpreting the same data However, an advantage of confidence intervals is that they encourage the investigator to express results in terms of the size of any treatment effect or difference The following discussion will therefore place particular emphasis on the interpretation of confidence intervals Superiority trials The goal of a superiority trial is to detect a difference between treated animals and controls Evidence of a difference is provided, at the 5% level of significance, if the probability of a Type I error is less than 5% (exact values of P should always be quoted) The 95% confidence interval for the difference between the treatment effects will then exclude the null value (zero for differences; but one for ratio measures such as the relative risk and odds ratio, and ratios of geometric means) This is illustrated in Figure 17.1a A 95% confidence interval with a lower bound clearly above the null value, and with a related value of P substantially lower than 0.05, provides strong evidence for superiority of the treated group over the control group A 95% confidence interval with the lower limit touching the null value (P = 0.05) provides adequate evidence of superiority at the 5% level of significance In contrast, if the 95% confidence interval includes the null value (P > 0.05), there is insufficient evidence to demonstrate superiority Equivalence and non-inferiority trials A full equivalence trial is intended to confirm the absence of a clinically-relevant difference between treatments This is best explored using confidence intervals First, the margin of clinical equivalence, M, is selected This margin should be chosen before a trial is undertaken to prevent bias, and therefore has been specified in the sample-size calculation undertaken before the study was conducted (see earlier) Two treatments (say, control and new treatment) are considered equivalent if the 95% confidence Meta-analysis (a) Superiority shown more strongly P = 0.002 Superiority shown P = 0.05 P = 0.2 Control better Superiority not shown Treatment difference New agent better (b) Equivalence shown Equivalence shown Equivalence shown Equivalence not shown Equivalence not shown –M +M Control better New agent better difference only in one direction The new preparation therefore is considered not to be inferior to the established (control) preparation only if the confidence interval lies entirely to the right of − M (Figure 17.1c) For example, a new footrot vaccine may be compared with an established one, with M set at a difference of 5% in the proportion of disease in sheep between the two groups in favour of the established vaccine (i.e., 5% less disease in sheep vaccinated with the established preparation than in sheep given the new vaccine) If the 95% confidence interval for the difference was −7%, −3%, then it would cross the − M boundary (i.e., does not fall entirely to the right of −5%), and so it could not be concluded that the new vaccine was non-inferior to the established one Sometimes, the goal of a comparison may switch from a non-inferiority trial to a superiority trial, or vice versa Results then have to be interpreted cautiously (EAEMP, 2000) Treatment difference (c) Non-inferiority shown Non-inferiority not shown –M Control better New agent better Treatment difference Fig 17.1 Significance levels and confidence intervals in clinical trials (a) Superiority trials; (b) equivalence trials; (c) noninferiority trials Null value for treatment difference between treated animals and controls = : Point estimate of treatment 95% interval estimate of treatment difference; difference interval21 lies entirely within the interval − M to + M (Figure 17.1b) A non-inferiority trial aims to demonstrate that a new therapy is no less efficacious than (i.e., is noninferior to) an established preparation, although it also could be equivalent or better Thus, concern lies with a Meta-analysis Meta-analysis is the statistical analysis of data pooled from several studies to integrate findings22 The technique has its recent origins in psychotherapeutic and educational research (Glass, 1976; Hunt, 1997)23 and has been widely applied in the social sciences, where key texts have been published (Glass et al., 1981; Card, 2011; Schmidt and Hunter, 2015) More recently, it has been used in economics (Van den Bergh et al., 1997), marketing (Farley and Lehmann, 1986), biology (e.g., to investigate parasite-induced behavioural changes: Poulin, 1994) and ecology (Osenberg et al., 1999) In human and veterinary medicine, meta-analysis has been applied in several areas (Stangle and Berry, 2000), including the evaluation of diagnostic tests24 (e.g., Greiner et al., 1997), genetics (Guerra and Goldstein, 2009), observational studies25 (Willeberg, 1993; Fourichon et al., 2000; Stroup et al., 2000), health The term is derived from the Greek preposition, μετα- (meta-) = ’alongside’, ’among’, or ’in connection with’ A subsidiary meaning is ’after’ Meta-analysis is therefore either one that is done alongside/ in conjunction with the normal analysis, or one that is done after the normal analysis, that is, at a later stage in the process 23 The first identifiable meta-analysis was undertaken on typhoid vaccination at the beginning of the 20th century by the English mathematician, Karl Pearson (1904) 24 Whiting and others (2003) give guidelines for the quality assessment of studies of diagnostic tests included in systematic reviews 25 There are formal recommendations for reporting such studies (MOOSE: Meta-analysis Of Observational Studies in Epidemiology: Stroup et al., 2000) 22 21 In bioequivalence studies involving drug kinetics, 90% intervals are the accepted standard (Ocaña et al., 2008) This exemplifies the two one-sided tests (TOST) procedure (Schuirmann, 1987) Using TOST, equivalence is established at the α significance level if a (1 – 2α) × 100% confidence interval for the difference in efficacies is contained within the interval –M to + M The reason the confidence interval is (1 – 2α) × 100%, and not the usual (1 – α) × 100%, is because the method is tantamount to performing two one-sided tests Thus, using a 90% interval yields a 0.05 level of significance for testing equivalence The TOST procedure can be directly extended to testing equivalence in other parameters such as means and odds ratios 375 376 17 Clinical trials policy (Tangl and Berry (2000), cost–benefit analysis of diagnostic techniques and treatments, assessment of the magnitude of health problems (e.g., Chesney, 2001; Dohoo et al., 2003; Trotz-Williams and Trees, 2003) and drug resistance (Falzon et al., 2014) However, it has been used most extensively in the area of clinical trials (e.g., Srinand et al., 1995; Peters et al., 2000; Whitehead, 2002; Steffan et al., 2006) and, for that reason, is introduced in this chapter Meta-analyses frequently are statistical components of systematic reviews, which developed in medicine and veterinary medicine towards the end of the last century, and which assemble and critically appraise all relevant studies on a specific topic, using systematic methods to limit bias (Porta, 2014) They therefore mark a shift away from traditional ‘authoritative reviews’ by experts (Oxman and Guyatt, 1993) Moreover, the quality of their content can be assessed rigorously (Balshem et al., 2011; Guyatt et al., 2011a–h) The key characteristics of a systematic review (Higgins and Green, 2011) are: • •• • • a clearly-stated set of objectives with pre-defined eligibility criteria for studies; an explicit, reproducible methodology; a systematic search that attempts to identify all studies that would meet the eligibility criteria; an assessment of the validity of the findings of the studies that are included (e.g., through assessment of the risk of bias); a systematic presentation, and synthesis, of the characteristics and findings of the studies that are included Many, but not all, systematic reviews include metaanalyses Systematic reviews are discussed in detail in Chapter 19, which also further considers meta-analysis Goals of meta-analysis The aims of meta-analysis (Sacks et al., 1987; Dickersin and Berlin, 1992; Marubini and Valsecchi, 1995) are to: •• • • • 26 increase statistical power for primary end-points; resolve uncertainty if there are conflicting results; improve estimates of therapeutic effect, and their precision26; answer questions not posed at the beginning of individual trials; give a ‘state-of-the-art’ literature review; This includes not only revealing beneficial effects that are not identified in isolated studies, but also identification of false-positive effects in individual studies: meta-analysis is designed to produce accurate results – not necessarily positive ones • •• • facilitate analysis of sub-groups when the power of individual analyses is low; guide researchers in planning new trials; offer rigorous support for generalization of a treatment (i.e., external validity); balance ‘overflow of enthusiasm’ which might accompany introduction of a new procedure following a single beneficial report Correctly conducted meta-analyses therefore offer strong evidence for efficacy of treatment (Table 17.6) The statistical procedures used are also applicable to the analysis of multicentre trials However, there are disadvantages, as well as advantages, to the technique (Table 17.7) Perhaps the major disadvantage is the seductive notion that combination Table 17.6 Hierarchy of strength of evidence concerning efficacy of treatment (From Marubini and Valsechi, 1995.) Anecdotal case reports Case series without controls Series with literature controls Analysis using computer databases Case-control observational studies Series based on historical control groups Single randomized controlled clinical trials Meta-analyses of randomized controlled clinical trials The table lists the types of study used in medicine, suggested by Green and Byar (1984) The table can be considered as an eighttiered pyramid In the context of clinical trials, the base on which conclusions about efficacy can be built becomes broader as one moves downwards Some authors place initial assessment of efficacy in laboratory animals at the apex of the pyramid because of weaknesses associated with inter-species extrapolation (see Chapter 9); and the order of single randomized controlled clinical trials relative to meta-analyses is arguable (Berlin and Golub, 2014) Similar hierarchies (levels of evidence) also are available for more general application in evidence-based medicine (Burns et al., 2011) Table 17.7 Advantages and disadvantages of meta-analysis (Based on Meinert, 1989.) Advantages Focusses attention on trials as an evaluation tool Increases the impact of trials in clinical practice Encourages good trial design and reporting Disadvantages Current fashion for meta-analysis may discourage large definitive trials Tendency to unwittingly mix different trials and ignore differences Potential for tension between meta-analyst and conductors of clinical trials Meta-analysis of several small trials is a substitute for a well-designed large one27 In this section, the main issues associated with metaanalysis are outlined For details of specific statistical procedures, the reader is directed to the standard texts mentioned above, and to Abramson (1991), Dickersin and Berlin (1992) and Deeks and others (2001, 2011) Components of meta-analysis There are both qualitative and quantitative components to meta-analysis, listed in a scheme for metaanalysis of clinical trials (Naylor, 1989): • •• • • • selection of trials according to inclusion and exclusion criteria; evaluation of the quality of the trials; abstraction of key trial characteristics and data; analysis of similarity in design, execution and analysis, and exploration of differences between trials; aggregation of data, testing various combinations and interpretations; drawing of careful conclusions Note that the conventional, qualitative, review article has traditionally been accepted as the means of summarizing research data – usually by listing the individual results of several studies – and lacks objective rigorous analysis A properly-designed meta-analysis, in contrast, goes further, and uses quantitative analytical procedures to combine results from several sources, where possible, to produce an overall conclusion Sources of data Data for meta-analyses are usually obtained from published material, most of which is presented in refereed journals This has the advantage of guaranteeing (at least theoretically) minimum standards with respect to the design, conduct and analysis of the component studies However, there is a tendency for positive findings (beneficial treatment effects) to be more readily accepted for publication than results that either not show significant effects or reveal only minor effects; this constitutes publication bias (Rothstein et al., 2005) This is a complex matter, though, and, although published trials can show larger intervention effects than ‘grey trials’ (i.e., reports that are produced 27 This may appear particularly attractive in the current academic climate where financial support is in short supply, and there is pressure to generate publications by all levels of government, academics, business and industry, in print and electronic formats, but that are not controlled by commercial publishers: Hopewell et al., 2007), unpublished results also can show larger effects than published ones (Detsky et al., 1987) Assessment of the quality of all potential data therefore is desirable, so that useful material does not escape the analyst Approaches to correcting for publication bias are detailed by Rothstein and others (2005), Moreno and others (2009), Ahmed and others (2011), Sterne and others (2011), and Simonsohn and others (2014) Other related biases also can occur during the process of publication These include time-lag bias, where studies with unfavourable findings take longer to be published that those with favourable results (Ioannidis, 1998); language bias, where non-English-language articles are more likely to be rewritten in English if they report significant results (Egger et al., 1997); and selective-outcome reporting, where non-significant study outcomes are entirely excluded on publication (Kirkham et al., 2010) Comparability of sources A key feature of component trials is the variability (heterogeneity) in their results Trials may seem to measure the same outcome but, nevertheless, may be inconsistent or contradictory; this is statistical heterogeneity (Fletcher, 2012) This is generally due to differences in the design, conduct or analysis of the studies (Horwitz, 1987); for example, contradiction appears to be more prominent between studies with small sample sizes, and between those with nonrandomized designs (Ioannidis, 2005) In contrast, different trials may be looking at different concepts, albeit with the same intention (e.g., management of conditions using advisory leaflets, CD roms or consultations with veterinarians or veterinary nurses); this is clinical heterogeneity (Fletcher, 2012) Additionally, different trials may be measuring different response variables on different scales (e.g., ordinal data or visual analogue measurements) Differences between old and recent studies may be ascribed to underlying health trends unrelated to the therapy in question – somewhat akin to the use of historical controls If a meta-analysis intends to address general policy or efficacy of a class of drugs, then incorporation of trials with obvious differences can be condoned However, a specific question will require selection of a relatively homogeneous set of trials Differences between the different studies that are included in an analysis, and reflected in statistical heterogeneity, prevent interpretation of pooled estimates 377 378 17 Clinical trials as being precise28, and 99% confidence limits therefore may be more prudent than the conventional 95% limits Data analysis Analytical techniques treat each incorporated clinical trial as a stratum The single treatment effects are estimated within each trial, and are then combined to produce a suitable summary, weighted treatment effect Methods of weighting, and addressing variability in study results, vary However, the tendency to simply pool the results of the trials and compute an average effect should be avoided This could be dangerously misleading; for example, a mean mortality rate computed from a series of separate mortality rates does not address differences in sample size between trials, and therefore the different precision of each trial’s estimate A common approach for categorical data is to provide a weighted estimate, for example, of the odds ratio or relative risk Standard methods include the Mantel– Haenszel procedure (see Chapter 15); Westwood and others (2003) give an example relating to the effects of monensin treatment on lameness in dairy cattle More sophisticated procedures allow pooling of parameters that have been adjusted for confounding (Greenland, 1987; McCandless, 2012) An approach for continuous response variables involves calculation of an effect size29 This may be expressed as the difference between the mean values of the treatment and control groups divided by the standard deviation in the control group, when the standard deviations in the treatment and control groups differ (Glass’s delta: Glass et al., 1981) Alternatively, if the two standard deviations are roughly the same then they can be pooled to calculate Cohen’s d (Ellis, 2010); and if the groups are of different sizes then each group’s standard deviation can be weighed according to its size to calculate Hedges’ g (Hedges, 1981) The effect size can be interpreted with reference to tables of probabilities associated with the upper tail of the Normal distribution (see Appendix XV) For example, an effect size of 2.9 means that 99.8% of controls have values below the mean value of treated 28 The confidence limit is strictly a limit on the expected results, based on what was done in the studies, rather than on future trials 29 The term ‘effect size’ also is used generally to refer to ways of quantifying the size of differences between two groups It therefore encompasses parameters such as the odds ratio, relative risk and number needed to treat See Kelley and Preacher (2012) for a detailed discussion of effect size, including its different and conflicting definitions individuals Consulting Appendix XV, this percentage is obtained by identifying the one-tailed probability, P, in the body of the table for which the effect size equals z The percentage then equals (1 – P) × 100 Thus, if the effect size = 2.9, P = 0.0019, and (1 – P) × 100 = (1 – 0.0019) × 100 = 99.81% Similarly, for an effect size of 1.0, the corresponding percentage is 84% ({1 – 0.1587} × 100) Effect size has no units, and so allows the combination of results expressed in different units However, it should be interpreted with caution because it depends not only on differences in the effect itself, but also on differences in standard deviations The use of effect size is therefore particularly dubious if sample sizes are small McGough and Faraone (2009) discuss the clinical relevance of the various effect sizes Heterogeneity The statistical heterogeneity between studies must always be addressed Commonly, tests for heterogeneity are based on χ2 or F statistics for categorical (Schlesselman, 1982) and continuous (Fleiss, 1986) data, respectively (see Chapter 15) These are usually interpreted liberally at the 10% level because of the relatively low power of such tests (Breslow and Day, 1980) A sensitivity analysis (see Chapter 23) also can be conducted to determine if exclusion of one or more trials materially affects the heterogeneity If the heterogeneity is larger than can be inferred from the results of significance tests, a summary measure is questionable, and the reason for the heterogeneity should be explored30 Note, however, that a high P value does not unequivocally indicate that the results are homogeneous, and the data should be explored by other means, such as graphical representation Examples include a vertical two-tiered plot of results (e.g., odds ratios, relative risks31 or effect size) with their 95% and 70% confidence intervals, for ease of comparison around the point estimates (Pocock and Hughes, 1990) Higgins and Thompson (2002) derive other statistics to assess heterogeneity, as an alternative to significance testing (see also Chapter 15) Fixed-effect and random-effects models Many of the analytical procedures that have been employed in meta-analyses are based on a fixed-effect (common effect) model (Hedges and Vevea, 1998), 30 For example, different dosage levels (analogous to different exposure levels in observational studies: e.g., Frumkin and Berlin, 1988) may induce heterogeneity 31 Odds ratios and relative risks are best plotted on a logarithmic scale Meta-analysis which assumes that all trials included in the metaanalysis are estimating the same treatment effect They therefore assume that the only differences in treatment effect are as a result of sampling variation, and ignore any treatment variability between different studies when producing a summary estimate Therefore, when assigning weights to the different studies, the information in the smaller studies can be largely ignored because better information is available about the same treatment effect in the larger studies An alternative approach, based on a random-effects model (DerSimonian and Laird, 1986), assumes that the treatment effect may be different, and each study represents a random sample of a (theoretically infinite) number of studies; that is, there is a distribution of true effect sizes The variability between studies is then an integral part of the analysis (Bailey, 1987), the aim of which is to estimate the mean treatment effect in a range of studies Since each study provides information about a different treatment effect, small studies therefore cannot be discounted (large studies lose influence, whereas small studies gain influence), the variations in the observed treatment effects resulting from two sources: (1) the sampling variation in each study (the within-study variance); and (2) the variation of the true study effects about their mean (the between-study variance) The net result of such an analysis is that the interval estimate of treatment effect is generally widened relative to the fixed-effect estimate, particularly if there is clear heterogeneity between studies (Dickersin and Berlin, 1992) In reality, whether or not the studies are all estimating the same treatment effect is not known The results of tests for heterogeneity therefore frequently have formed the basis for deciding on the appropriate model If the result of a test is non-significant, the fixed-effect model is generally employed; whereas significant results prompt a random-effects model Some authorities, however, argue that tests for heterogeneity should not form the basis of model selection (Borenstein et al., 2009, 2010)32, concluding that the fixed-effect model should be used if: (1) the studies included in the analysis are functionally identical (i.e., any variables that have an impact on outcome are the same in all studies); and (2) the aim is to compute a common treatment effect in a particular population, rather than generalizing it to other populations In contrast, if data have accumulated from a series of studies conducted independently then it is unlikely that the studies are functionally identical, and so a random-effects model should be applied This also is appropriate if the analysis aims to generalize results beyond a narrowly-defined population The random-effects model needs to be interpreted with caution (Marubini and Valsecchi, 1995) First, the degree of heterogeneity may be such that a random-effects model may greatly modify the inferences made from a fixed-effect model This will tend to nullify the summary statistic for both models, and there is then a need to investigate the variability further Secondly, specific statistical distributions of the randomeffects model cannot be justified either empirically or by clinical reasoning Finally, the random-effects model cannot be interpreted meaningfully at the level of the target population; it is merely the mean of a distribution that generates effects The random-effects model therefore ‘exchanges a questionable homogeneity assumption for a fictitious distribution of effects’ (Greenland, 1987) Debate continues over the relative merits of the fixed-effect and random-effects approaches Some of the biases can be reduced by excluding poorlydesigned trials and including all relevant results (e.g., results from germane unpublished studies) With this goal in mind, Meinert (1989) has suggested that metaanalyses should be planned prospectively, with the component trials enlisted into a meta-analysis when they start, rather than being retrospectively identified This should promote good individual trial design, and therefore consistent quality Moreover, cumulative meta-analyses may allow both fixed-effect and random-effects models to demonstrate efficacy in the presence of heterogeneity of estimates33 Presentation of results Any pooled results, and the results of each study, should be reported as point estimates with 95% confidence intervals, and presented graphically next to each other Figure 17.2 is an example The results of the individual studies show considerable variability, with some results (Figure 17.2a: Studies 3, 4, 5, 7, 8, 9, 11 and 13) providing no evidence of a treatment effect (the upper 95% confidence limits for the odds ratios being greater than one) This variability is masked in the analysis of the accumulating data (Figure 17.2b), which consistently demonstrates a beneficial effect The pooled results of the random-effects model and the final analysis of accumulating data, in this example, generate similar point and interval estimates of the 33 32 Application of such tests in these circumstances is an example of ‘HARKing’ (Hypothesizing After the Results are Known), which is considered to be bad practice (Kerr, 1998) There is, of course, danger of an increase in Type I error such as that which can occur in sequential trials Yusuf and others (1991) suggest methods of significance-level adjustment in this circumstance 379 380 17 Clinical trials (a) Study Year Number of individuals (b) Odds ratio 0.1 0.2 0.5 Odds ratio 10 0.1 1990 152 152 1990 190 342 1991 274 616 1992 281 897 1992 149 1046 1993 267 1313 1993 80 1393 1994 259 1652 1994 66 1718 10 1994 196 1914 11 1994 240 2154 12 1995 40 2194 13 1995 84 2278 14 1996 237 2515 15 1996 119 2634 Pooled results of random-effects model 0.2 0.5 10 2634 Favours treatment Favours control Favours treatment Favours control Fig 17.2 A typical meta-analysis (a) Individual results and pooled results using a random-effects model; (b) results of sequential 95% interval estimate of the odds ratio (Source: Based on analyses of accumulating data : point estimate of the odds ratio; Ionnidis, 2000 Reproduced with permission from Elsevier.) odds ratio, demonstrating a significant treatment effect Although there has always been some controversy about the validity of meta-analysis (e.g., Liberati, 1995; LeLorier et al., 1997; Borenstein et al., 2009), it is becom- ing increasingly popular as the number of studies with similar protocols has grown By systematically combining studies, a rational attempt is made to overcome limits of size or scope in individual studies to obtain more reliable information about treatment effects Further reading Brody, T (2012) Clinical Trials: Study Design, Endpoints and Biomarkers, Drug Safety, and FDA and ICH Guidelines Academic Press, London Bulpitt, C.J (1996) Randomised Controlled Clinical Trials, 2nd edn Kluwer Academic Publishers, New York Card, N.A and Casper, D.M (2013) Meta-analysis and quantitative research synthesis In: The Oxford Handbook of Quantitative Methods Volume 2: Statistical Analysis Ed Little T.D., pp 701–717 Oxford University Press, New York Chow, S.-C and Liu, J.-P (2014) Design and Analysis of Clinical Trials, 3rd edn John Wiley, New York Cleophas, T.J., Zwinderman, A.H., Cleophas, T.F and Cleophas, E.P (2009) Statistics Applied to Clinical Trials, 4th edn Springer, Dordrecht Dent, N and Visanji, R (Eds) (2001) Veterinary Clinical Trials from Concept to Completion CRC Press, Boca Raton Donner, A and Klar, N (2000) Design and Analysis of Cluster Randomized Trials in Health Research Arnold, London/Oxford University Press, New York Duncan, J.L., Abbott, E.M., Arundel, J.H., Eysker, M., Klei, T.R et al (2002) World association for the advancement of veterinary parasitology (WAAVP): second edition of guide-lines for evaluating the efficacy of equine anthelmintics Veterinary Parasitology, 103, 1–18 Edwards P (2010) Questionnaires in clinical trials: guidelines for optimal design and administration Trials, 11, 2; doi 10.1186/1745-6215-11-2 Elbers, A.R.W and Schukken, Y.H (1995) Critical features of veterinary field trials Veterinary Record, 136, 187–192 Further reading Ellis, P.D (2010) The Essential Guide to Effect Sizes Cambridge University Press, Cambridge (Includes a simple guide to meta-analysis) Friedman, L.M., Furberg, C.D., DeMets, D.L, Reboussin, D.M and Granger, C.B (2015) Fundamentals of Clinical Trials, 5th edn Springer Cham, Heidelberg Gigerenzer, G., Gaissmaier, W., Kurz-Milcke, E., Schwartz, L.M and Woloshin, S (2008) Helping doctors and patients make sense of health statistics Psychological Science in the Public Interest, 8, 53–96 (A comprehensive discussion of the interpretation of the quantitative results of clinical trials and other medical studies) Gomberg-Maitland, M., Frison, L and Halperin, J.L (2003) Active-control clinical trials to establish equivalence or noninferiority: methodological and statistical concepts linked to quality American Heart Journal, 146, 398–403 (Guidelines for equivalence and non-inferiority trials) Guidelines for Clinical Trials (1988) Questionnaire 1088/ A International Dairy Federation, Brussels (Guidelines with specific reference to mastitis) Guidelines for the Conduct of Bioequivalence Studies for Veterinary Medicinal Products (2012) EMA/CVMP/ EWP/81976/2010 European Medicines Agency, London Guideline for the Demonstration of Efficacy for Veterinary Medicinal Products Containing Antimicrobial Substances (2013) EMA/CVMP/261180/2012 European Medicines Agency, London Guideline for the Testing and Evaluation of the Efficacy of Antiparasitic Substances for the Treatment and Prevention of Tick and Flea Infestation in Dogs and Cats (2016) EMEA/XCVMP/ EWP/005/2000-Rev.3 European Medicines Agency, London Guideline on Statistical Principles for Clinical Trials for Veterinary Medicinal Products (Pharmaceuticals) (2012) EMA/CVMP/EWP/81976/2010 European Medicines Agency, London Guidelines: Veterinary Medicinal Products: General, Efficacy, Environmental Risk Assessment (1999) Office for Official Publications of the European Communities, Luxembourg Hackshaw, A.K (2009) A Concise Guide to Clinical Trials Wiley-Blackwell, Chichester Haidich, A.B (2010) Meta-analysis in medical research Hippokratia, 14, Suppl 1, 29–37 Hennesy, D.R., Bauer, C., Boray, J.C., Conder, G., Daugschies, A et al (2006) World Association for the Advancement of Veterinary Parasitology (WAAVP): second edition of guidelines for evaluating the efficacy of anthelmintics in swine Veterinary Parasitology, 141, 138–149 Hunt, M (1997) How Science Takes Stock: The Story of Meta-analysis Russell Sage Foundation, New York (Examples of meta-analyses across a range of disciplines) Jacobs, D.E., Arakawa, A., Courtney, C.H., Gemmell, M A., McCall, J.W et al (1994) World Association for the Advancement of Veterinary Parasitology (WAAVP): guidelines for evaluating the efficacy of anthelmintics for dogs and cats Veterinary Parasitology, 52, 179–202 McLean, A (1998) Good clinical practice for the conduct of clinical trials on veterinary medicine products: a critical look at the E.U note for guidance The Quality Assurance Journal, 2, 69–73 Murad, M.H., Montori, V.M., Ionnidis, J.P.A., Jaeschke, R., Devereaux, P.J et al (2014) How to read a systematic review and meta-analysis Journal of the American Medical Association, 312, 171–179 Noordhuizen, J.P.T.M., Frankena, K., Ploeger, H and Nell, T (Eds) (1993) Field Trial and Error Proceedings of the international seminar with workshops on the design, conduct and interpretation of field trials, Berg en Dal, Netherlands, 27 and 28 April 1993 Epidecon, Wageningen O’Connor, A.M., Sargeant, J.M., Gardner, I.A., Dickson, J S., Torrence, M.E et al (2010) The REFLECT Statement: methods and processes of creating guidelines for randomized controlled trials for livestock and food safety Journal of Veterinary Internal Medicine, 24, 57–64 Perino, L.J and Apley, M.D (1998) Clinical trial design in feedlots Veterinary Clinics of North America, Food Animal Practice, 14, 343–365 Pocock, S.J (1983) Clinical Trials: A Practical Approach John Wiley, Chichester and New York Russo, M.W (2007) How to review a meta-analysis Gastroenterology and Hepatology, 3, 637–642 Schukken, Y.H and Deluyker, H (1995) Design of field trials for the evaluation of antibacterial products for therapy of bovine mastitis Journal of Veterinary Pharmacology and Therapeutics, 18, 274–283 Schulz, K.F., Altman, D.G and Moher, D (2010) CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials British Medical Journal, 340, 698–702 Senn, S (2002) Cross-over Trials in Clinical Research, 2nd edn John Wiley, Chichester Senn, S (2007) Statistical Issues in Drug Development, 2nd edn John Wiley, Chichester (Includes the design and interpretation of clinical trials) Spriet, A and Dupin-Spriet, T (1997) Good Practice of Clinical Drug Trials, 2nd edn Karger, Basel Sutton, A.J and Higgins, J.P.T (2010) Recent developments in meta-analysis Statistics in Medicine, 27, 625–650 381 382 17 Clinical trials Wood, I.B., Amaral, N.K., Bairden, K., Duncan, J.L., Kassai, T et al (1995) World Association for the Advancement of Veterinary Parasitology (WAAVP) second edition of guidelines for evaluating the efficacy of anthelmintics in ruminants (bovine, ovine, caprine) Veterinary Parasitology, 58, 181–213 Yazwinski, T.A., Chapman, H.D., Davis, T.J., Letonja, R.B., Pote, L et al (2003) World Association for the Advancement of Veterinary Parasitology (WAAVP) guidelines for evaluating the efficacy of anthelmintics in chickens and turkeys Veterinary Parasitology, 116, 159–173 ... England 13 14 (cattle) AD 14 00 England 14 90*, 15 51 Italy 15 14 England 16 88 AD 17 00 France 17 10 17 14 Europe 18 th century Rome 17 13 England 17 27 England 17 14, 17 45 17 46 Ireland 17 28 France 17 50 US 17 60... England 17 33, 17 37, 17 50, Spain 17 61 1760, 17 71, 17 88 England 17 63 AD 18 00 England 18 41 18 98 England 18 65 England 18 39 England 18 37 England 18 70 18 72, 18 77 18 85 North America 18 72 England 18 89 18 90... England 19 22 19 25, 19 42, Czechoslovakia 19 57 19 52, 19 67 19 68 Britain 19 63 US 19 63 Canada 19 51 52 Europe 19 65 Poland 19 69 USSR 19 76 Middle East 19 69 19 70 France, The Netherlands, Africa 19 79 19 84

Ngày đăng: 22/01/2020, 12:08

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan