2015 (statistics for social and behavioral sciences) russell g almond, robert j mislevy, linda s steinberg, duanli yan, david m williamson (auth ) bayesian networks in educational assessment springer v

Statistics for Social and Behavioral Sciences Series Editor Stephen E Fienberg Carnegie Mellon University Dept Statistics Pittsburgh Pennsylvania USA Statistics for Social and Behavioral Sciences (SSBS) includes monographs and advanced textbooks relating to education, psychology, sociology, political science, public policy, and law More information about this series at http://www.springer.com/series/3463 Russell G Almond • Robert J Mislevy Linda S Steinberg • Duanli Yan David M Williamson Bayesian Networks in Educational Assessment 2123 Russell G Almond Florida State University Tallahassee Florida USA Duanli Yan Educational Testing Service Princeton New Jersey USA Robert J Mislevy Educational Testing Service Princeton New Jersey USA David M Williamson Educational Testing Service Princeton New Jersey USA Linda S Steinberg Pennington New Jersey USA Statistics for Social and Behavioral Sciences ISBN 978-1-4939-2124-9 ISBN 978-1-4939-2125-6 (eBook) DOI 10.1007/978-1-4939-2125-6 Library of Congress Control Number: 2014958291 Springer New York Heidelberg Dordrecht London c Springer Science+Business Media New York 2015 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Dedication Forward into future times we go Over boulders standing in out way Rolling them aside because we know Others follow in our steps one day Under deepest earth the gems are found Reaching skyward ’till we grasp the heights Climbing up to where the view surrounds Hidden valleys offer new delights Inch by inch and yard by yard until Luck brings us to the hidden vale Desiring a place to rest yet still Returning home now to tell the tale Ever knowing when that day does come New hands will take up work left undone Acknowledgements Bayesian Inference in Educational Assessments (BNinEA) is the direct issue of two projects, and the descendant, cousin, or sibling of many more We are grateful for all we have learned in these experiences from many collaborators and supporters over the years The first direct ancestor is the series of workshops we have presented at the annual meeting of the National Council on Measurement in Education (NCME) almost every year since 2001 We are grateful to NCME for this opportunity and to ETS for support in developing the materials they have granted us permission to use Workshop participants will recognize many concepts, algorithms, figures, and hands-on exercises from these sessions Its second direct ancestor is the Portal project at Educational Testing Service It was here that Linda, Russell, and Bob fleshed out the evidencecentered design (ECD) assessment framework, implemented it in an object model and design system, and carried out the applications using Bayes nets We are grateful to Henry Braun and Drew Gitomer, successive vice-presidents of Research, and Len Swanson, head of New Product Development, for supporting Portal Our collaborators included Brian Berenbach, Marjorie Biddle, Lou DiBello, Howie Chernik, Eddie Herskovits, Cara Cahallan Laitusis, Jan Lukas, Alexander Matukhin, and Peggy Redman Biomass (Chaps 14 and 15) was a ground-up demonstration of Portal, ECD design, standards-based science assessment, web-delivered interactive testing, with automated scoring of inquiry investigations and Bayes net measurement models The Biomass team included Andy Baird, Frank Jenkins, our subject matter lead Ann Kindfield, and Deniz Senturk Our subject matter consultants were Scott Kight, Sue Johnson, Gordon Mendenhall, Cathryn Rubin, and Dirk Vanderklein The Networking Performance Skill System (NetPASS) project was an online performance-based assessment activity for designing and troubleshooting computer networks It was developed in collaboration with the Cisco Networking Academy Program, and led by John Behrens of Cisco NetPASS featured principled design of proficiency, task, and evidence models and a Bayes VIII Acknowledgements net psychometric model using the methods described in BNinEA Team members included Malcolm Bauer, Sarah DeMark, Michael Faron, Bill Frey, Dennis Frezzo, Tara Jennings, Peggy Redman, Perry Reinert, and Ken Stanley The NetPASS prototype was foundational for the Packet Tracer simulation system and Aspire game environment that Cisco subsequently developed, and millions of Cisco Network Academy (CNA) students around the world have used operationally to learn beginning network engineering skills The DISC scoring engine was a modular Bayes-net-based evidence accumulation package developed for the Dental Interactive Simulations Corporation (DISC), by the Chauncey Group International, ETS, and the DISC Scoring Team: Barry Wohlgemuth, DISC President and Project Director; Lynn Johnson, Project Manager; Gene Kramer; and five core dental hygienist members, Phyllis Beemsterboer, RDH, Cheryl Cameron, RDH, JD, Ann Eshenaur, RDH, Karen Fulton, RDH, and Lynn Ray, RDH Jay Breyer was the Chauncey Group lead, and was instrumental in conducting the expert–novice studies and constructing proficiency, task, and evidence models Adaptive Content with Evidence-based Diagnosis (ACED) was the brainchild of Valerie J Shute It had a large number of contributors including Larry Casey, Edith Aurora Graf, Eric Hansen, Waverly Hester, Steve Landau, Peggy Redman, Jody Underwood, and Diego Zapata-Rivera ACED development and data collection were sponsored by National Science Foundation Grant No 3013202 The complete ACED models and data are available online; see the Appendix for details Bob’s initial forays into applying Bayesian networks in educational assessment were supported in part by grants from the Office of Naval Research (ONR) and from the National Center for Research on Evaluation, Standards, and Student Testing (CRESST) at the University of California at Los Angeles We are grateful to Charles Davis, Project Officer of ONR’s Model-Based Measurement program and Eva Baker, Director of CRESST, for their support Much of the work we draw on here appears in ONR and CRESST research reports The findings and opinions expressed in BNinEA, however, not reflect the positions or policies of ONR, the National Institute on Student Achievement, Curriculum, and Assessment, the Office of Educational Research and Improvement, or the U.S Department of Education Some of the ideas in this book are based on Russell’s previous work on the Graphical-Belief project Thanks to Doug Martin at StatSci for sponsoring that project as well as the NASA Small Business Innovation Research (SBIR) program for supporting initial development David Madigan made a number of contributions to that work, particularly pointing out the importance of the weight of evidence Graphical-Belief is based on the earlier work on Belief while Russell was still a student at Harvard Art Dempster and Augustine Kong both provided valuable advise for that work The work of Glenn Shafer and David Schum in thinking about the representation of evidence has been very useful as well Those contributions are documented in Russell’s earlier book Acknowledgements IX Along with the NCME training sessions and working on NetPASS and DISC, David completed his doctoral dissertation on model criticism in Bayes nets in assessment at Fordham University under John Walsh, with Russell and Bob as advisors Hydrive was an intelligent tutoring system for helping trainees learn to troubleshoot the hydraulics subsystems of the F-15 aircraft Drew Gitomer was the Principal Investigator and Linda was the Project Manager The project was supported by Armstrong Laboratories of the US Air Force, under the Project Officer Sherrie Gott Design approaches developed in Hydrive were extended and formalized in ECD Bob and Duanli worked with Drew and Linda to create and test an offline Bayes net scoring model for Hydrive Russell and Bob used drafts of BNinEA in classes at Florida State University (FSU) and the University of Maryland, respectively We received much helpful feedback from students to clarify our ideas and sharpen our presentations Students at Maryland providing editorial and substantive contributions included Younyoung Choi, Roy Levy, Junhui Liu, Michelle Riconscente, and Daisy Wise Rutstein Students at FSU, providing feedback and advice, included Mengyao Cui, Yuhua Guo, Yoon Jeon Kim, Xinya Liang, Zhongtian Lin, Sicong Liu, Umit Tokac, Gertrudes Velasquez, Haiyan Wu, and Yan Xia Kikumi Tatsuoka has been a visionary pioneer in the field of cognitive assessment, whose research is a foundation upon which our work and that many others in the assessment and psychometric communities builds We are grateful for her permission to use her mixed-number subtraction data in Chaps and 11 Brent Boerlage, of Norsys Software Corp., has supported the book in a number of ways First and foremost, he has made the student version of Netica available for free, which has been exceedingly useful in our classes and online training Second, he has offered general encouragement for the project and offered to add some of our networks to his growing Bayes net library Many improvements to a draft of the book resulted from rigorous attention from the ETS review process We thank Kim Fryer, the manager of editing services in the Research and Development division at ETS, Associate Editors Dan Eignor and Shelby Haberman, and the reviewers of individual chapters: Malcolm Bauer, Jianbin Fu, Aurora Graf, Shelby Haberman, Yue Jia, Feifei Li, Johnny Lin, Ru Lu, Frank Rijmen, Zhan Shu, Sandip Sinharay, Lawrence Smith, Matthias von Davier, and Diego Zapata-Rivera We thank ETS for their continuing support for BNinEA and the various projects noted above as well as support through Bob’s position as Frederic M Lord Chair in Measurement and Statistics under Senior Vice-President for Research, Ida Lawrence We thank ETS for permission to use the figures and tables they own and their assistance in securing permission for the rest, through Juana Betancourt, Stella Devries, and Katie Faherty We are grateful also to colleagues who have provided support in more general and pervasive ways over the years, including John Mark Agosta, Malcolm Bauer, Betsy Becker, John Behrens, Judy Goldsmith, Geneva Haertel, Sidney X Acknowledgements Irvine, Kathryn Laskey, Roy Levy, Bob Lissitz, John Mazzeo, Ann Nicholson, Val Shute, and Howard Wainer It has taken longer than it probably should have to complete Bayesian Networks in Educational Assessment For their continuing encouragement and support, we are indebted to our editors at Springer: John Kimmel, who brought us in, and Jon Gurstelle and Hannah Bracken, who led us out 652 SUBJECT INDEX directed graphical representation, 241 directed graphs, 358 directed hyperedges, 88 directed hypergraph, 88, 172 directed hypergraph representation, 242 directed representation, 107 direction of an edge, 95 Dirichlet, XV, 284, 294, 296, 298, 396 Dirichlet distribution, 70, 291, 300 Dirichlet law, 248, 292, 553 Dirichlet-Multinomial family, 291 disabilities, 33 Disciplinary Knowledge, 513, 515, 517 discrepancy measure, 337 discrepancy measures, 338 discrete, 4, 159, 426 discrete random variables, 57 discrete variables, 216 discrimination, 202, 255, 257, 273, 276, 277, 555, 585 discrimination parameters, 272 Disjunctive, 604 disjunctive, 175, 193, 253, 276 Disjunctive Distribution, 172 disjunctive distribution, 173, 176, 177, 235, 258 distractor, 472 distribution, 59, 268 distribution function, 59, 433 distribution tables, 255 distributions, 244 divided bar chart, 434 docks, 489 Domain Analysis, 415, 510 domain analysis, 23, 415, 420 domain knowledge, 547, 563 domain model, 416 Domain Modeling, 97, 415, 510 domain modeling, 23, 415, 416 Dynamic Bayesian networks, 591 dynamic Bayesian networks, 152, 589 dynamic task model variables, 443 e-Learning, 217, 220 E-step, 300, 301 EAP, 65, 434, 473, 488, 491–494, 497–499, 585 EAP score, 560 EAP scores, 573, 578 EAP, 473, 489 easier, 556 ECD, 4, 5, 7, 11, 19, 20, 221, 405, 451, 461, 507, 601 ECD design, 14 ECD framework, 171 edges, 5, 82, 161, 205 educational assessments, 20 educational standards, 25 effective MCMC sample size, 387 effective sample size, 247, 310, 380, 553 effective sample sizes, 381 effective theta, 255, 257, 555, 557, 560 effective theta distributions, 272 EIP, 488 elicitation, 74, 262 eliminate, 150 EM, 301, 393 convergence, 305 EM Algorithm, 328 EM algorithm, 280, 288, 300, 317, 324, 330, 344, 604 EM algorithms, 487 EM solutions, 309 emergent tasks, 477 EMF, 138, 139, 143, 144, 146, 529, 532 EMFs, 535 enemy list, 224 engineering discipline, 594 entropy, 216, 334, 346, 353 equal probability space, 75 equate, 392 SUBJECT INDEX equating, 8, 598 equivalence classes, 345 error variance, 290 essay, 530 evidence, 3, 11, 108, 113–115, 121, 125, 143, 145, 146, 175, 404, 431, 443, 489, 508, 514, 526, 596 evidence accumulation, 31, 35, 544 Evidence Accumulation Process, 36 Evidence accumulation process, 470 evidence accumulation process, 35, 36, 37, 38, 467, 488, 530, 540, 541, 543 evidence balance sheet, 202, 542, 603 evidence balance sheets, 494 evidence bottlenecks, 208 Evidence Centered Design, 97, 167, 274 Evidence Centered Design (ECD), 268 evidence identification, 30, 163, 538, 544 Evidence Identification Process, 35 Evidence identification process, 470 evidence identification process, 35, 35, 37, 38, 472, 473, 491, 493, 501, 539, 540, 543, 545 Evidence Model, 29, 179, 181 evidence model, XIII, 19, 34, 39, 135, 138, 143, 158, 159, 171, 193, 221, 224, 226, 244, 246, 343, 345, 351, 378, 396, 397, 438, 439, 442, 443, 451, 474, 478, 484, 486, 487, 492, 497, 532, 595 Evidence Model Fragment, 139 evidence model fragment, 233, 542 Evidence Model Fragments, 180– 182 evidence model parameters, 38, 576 evidence model variables, 82 Evidence Models, 76, 242 Evidence models, 29, 375, 442 653 evidence models, 12, 29, 33, 34, 137, 145, 147, 182, 268, 273, 289, 315, 347, 395, 451, 457, 467, 473, 482, 486, 515, 522, 529, 550, 554, 589 evidence order, 202 evidence rule, 276 evidence rule analysis, 481 evidence rule data, 37, 445, 530, 541 Evidence rules, 29, 447 evidence rules, 6, 36, 38, 447, 472, 478, 481, 486, 499, 540, 543, 565, 566, 570, 581 evidence variables, Evidence–centered design, 14 Evidence-centered assessment design, 413, 448 evidence-centered assessment design, 4, 583, 596 Evidence-Centered Design, 76, 180, 197 Evidence-centered design, 19, 20 evidence-centered design, 396, 486, 488 evidence-rule data, 472, 475, 478 evidentiary argument, 415, 419 evidentiary focus, 40, 273 evidentiary reasoning, 20 evidentiary value, 127 EWOE, 218, 221, 232, 235, 495, 497 examinee, 283, 472, 492 examinee fit, 575, 577 examinee record, 470, 471, 472, 473, 490, 491, 500 examinees, 284 exchangeable, 284, 286, 379 expectation, 300 expectation a posteriori, 65 expectations, 301 expected a posteriori, 497 expected accuracy matrix, 482 expected proficiency level, 560 Expected Value, 60, 434 654 SUBJECT INDEX expected value of information, 210, 217 Expected Weight Of Evidence, 220 Expected weight of evidence, 217 expected weight of evidence, 166, 197, 218, 235, 276, 278, 406, 454, 456, 484, 494–496, 585 Expected Weight Of Evidence (EWOE), 213 expert, 14, 188 experts, 510 explain, 198 explaining, explaining away, 152 Explanation, 197 explanation, 150 exponential family, 300 exposure controls, 494 factor, 128 factor analysis, 14, 283 factorization hypergraph, 88, 128 factors, 100 fading, 588 fair, 486 fairness, 46, 171 false discovery rate, 480 False negative, 377 false positive, 377, 479 false-negative, 251, 381 false-positive, 250, 381 false-positive parameter, 252, 270 feedback, 205, 423, 496, 507, 530, 543, 544 feedback observables, 452, 530, 539, 540, 544 fill in the blank, 158 filled in, 85, 129 final observables, 530, 540, 545 Fisher information, 213, 216, 454 fixed-form, 220 flexibility, 170, 172 Focusing Evidence, 441 footprint, 138, 139, 143, 146, 154, 221, 532, 533, 542, 552 footprints, 142 forecast, 143 form, 139, 145, 148, 149 formative assessment, 507, 594 forms, 392, 453, 598 Forward and Backward Selection, 356 Forward Selection, 355 four -process delivery architecture, 34 four processes, 37, 488 four-process architecture, 19, 21, 26, 468, 473, 474, 478, 488, 497, 499, 522, 536 four-process assessment delivery system, 535 four-process cycle, 473 four-process delivery architecture, 544 four-process delivery system, 543 fragment, 450, 491 fragments, 145, 180 frame, 101 frame of discernment, 91, 149 full Bayesian model, 182, 288, 324, 382 full conditional, 308, 312, 315, 319 full conditional distributions, 325 full conditionals, 317 full Noisy-And distribution, 252 full noisy-and distribution, 271 functional dependence, 451 functional relationship, 282 fusion and propagation algorithm, 105, 149, 158, 200 games, 459 gamma, XV gamma distribution, XV, 71 Gelman–Rubin R, 384 Gelman-Rubin potential scale reduction factor, 384 gender, 347, 486 general measurement model, 299, 316 SUBJECT INDEX generative probability model, 377 Gibbs, 317 Gibbs Sampler, 324 Gibbs sampler, 308, 309, 312 Gibbs sampling, 315 Gibbs–Markov Equivalence, 94 Gibbs–Markov equivalence, 94 global fit, 332 global independence, 298, 308 global parameter dependence, 316 Global parameter Independence, 244 global parameter independence, 248, 258, 273, 316 Good’s Logarithmic Score, 334 Good’s logarithmic score, 364, 365 graded response, 255, 260, 262, 265 grain size, 517, 519, 599 graph, 5, 82, 83, 449 graphical belief functions, 149 graphical model, 4, 11, 86, 100, 158 graphical models, 82, 97, 149 graphs, 9, 97 greedy, 356 greedy search, 356 group work, 595 growth, 152, 217, 220 hidden variables, 359 hierarchical interaction model, 100 hierarchical model, 280 hierarchically, 25 hierarchy of claims, 423 high, 577 high stakes, high-stakes, 183, 441 high-stakes assessments, 392 high-stakes tests, history plot, 385 HIV Test, 50 holistic scoring, 445, 452 HUGIN, 130 Hugin algorithm, 115 HYDRIVE, 10, 477 Hyper-Dirichlet, 275 655 hyper-Dirichlet, 246, 249, 258, 292, 298, 379, 385, 464, 552 hyper-Dirichlet law, 249, 269, 296, 553 Hyper-Markov Law, 245 hyperedge, 128 hyperedges, 88 hypergraphs, 88 hyperparameters, 244 hypotheses, 225 hypothesis, 201, 227, 494 hypothesis variables, 200 hypothesis., 200 hypothetical data, 270 I-Map, 93 identifiability, 12, 271, 329, 598 improper priors, 75 incidental, 440, 547 incidental parameters, 299 incidentals, 476, 487 incomplete data, 288 incomplete data problem, 294, 299 inconsistent, 299 Independence, 51, 52, 60 independence, 92, 99, 297 independence assumptions, 379 independent, XV, 57, 152, 173, 230, 275, 284 indirect evidence, 183, 203, 207, 208, 233, 396 induced dependencies, 146, 148 induced dependency, 142 inductive reasoning, 166 influence diagram, 95, 424, 590 Influence diagrams, 97 influence diagrams, 149, 586 information, 232 inhibitor, 277, 278, 465, 547, 559, 569 inhibitor distribution, 258, 367, 534 inhibitor effect, 208 inhibits, 259 initial calibration, 315, 321, 323 initial values, 310 656 SUBJECT INDEX initialization, 472 inquiry, 515 instantiate, 108, 163, 185, 188, 494 instantiated, instantiates, 491 instantiating, 162, 496 instruction, 37, 468, 596 instructional design, 218 instructional goals, 220 instructional mode, 217, 220 instructional plan, 209 integer–valued random variables, 57 Integrated Knowledge, 512, 513, 517, 539 integrated knowledge, 547 integrated tasks, 165, 507 Intelligent Tutoring, 217 intelligent tutoring, 220 Intelligent Tutoring System, 406 intelligent tutoring system, 15 Intelligent Tutoring Systems, 583 intelligent tutoring systems, 591 intercept, 257, 555–557 intermediate observable, 447 intermediate variables, 200 intersection node, 120, 150 intersection nodes, 112, 155 inverse correlation matrix, 267, 433 IRT, 31, 55, 76, 101, 192, 202, 257, 260, 263, 273, 300, 338, 342, 442, 448, 449, 542, 604 IRT CAT, 471, 472 IRT models, 272 IRT-CAT, 216, 217 isotonic, 193 item, 146, 158, 180 item analysis, 481, 580 item characteristic curve, 342 item mapping, 427 item parameters, 312, 473 Item Response Theory, 14, textbf 158, 170, 256 item response theory, 18, 20, 55, 102, 283, 297, 312 item response theory (IRT), 216, 254 items, 9, 147, 166, 170, 221, 224, 290 iterative proportional fitting, 300 Jeffrey’s rule, 247, 248 joint distribution, 88, 110, 113, 132 joint expected weight of evidence, 220 joint probabilities, 108 joint probability, 47 joint probability distribution, 126, 430 junction, 142 Junction tree, 150 junction tree, 112, 112, 113, 116, 118, 122, 123, 128, 130–132, 134, 139, 140, 154, 155, 161 junction tree., 130 junction trees, 137, 139 junction-tree algorithm, 111, 123 key, 30, 36, 445, 478, 530 key matching, 539 key-matching, 445, 446 knowledge engineering, 413 knowledge representations, 513, 524 knowledge, skills, and abilities, 27 knowledge-based model construction, 136, 586 Kullback–Leibler, 213, 215 Kullback–Leibler distance, 198 lag, 309, 386 Language Placement Test, 417 Language Placement Test Claims, 421 Language Placement Test Proficiency Model, 428 latent class, 166, 301, 307 latent class analysis, 14, 20, 283 latent class model, 159, 292 latent classes, 274 latent profile analysis, 283 SUBJECT INDEX latent scale, 315 latent variable, 267 Latent variable correlations, 265 latent variables, 163, 248 law, 268 Law of Large Numbers, 43 Law of Total Probability, 48 law of total probability, 87 laws, 244, 284 lazy-propagation algorithms, 144 leaf, 84 learn, 14 learning, 6, 217, 220 learning progressions, 427, 516 learning, model, 241 leave one out prediction, 333 level difference parameters, 272 leverage, 352 licensure test, 226 likelihood, 49, 115, 120, 121, 127, 290, 324, 350, 377, 493, 500 likelihood functions, 302 likelihoods, 76, 289 linear, 493 linear form, 455 Linear programming, 221 linguistic priors, 273 link, 486, 489, 492–494, 500, 505 link function, 255, 260, 262, 265 link functions, 263, 267 link model, 38, 283, 316, 317 link models, 289, 554 link parameters, 244, 451, 482 linked, 598 linking, 383, 388, 392, 396 links, 181, 221, 273, 451, 473, 474, 478, 482, 490, 491 literary terms, 125 local dependence, 366, 448, 451, 585 local dependence property, 193 Local Independence, 137 local independence, 145, 167, 308, 344, 449, 455, 587 Local Independence Property, 139 657 local independence property, 146, 167, 170, 316 local item independence, 158 local maxima, 356 Local Parameter Independence, 244 local parameter independence, 249, 272 logarithm score, 365 logarithmic score, 350, 364, 365 Logarithmic Scoring Rule, 334 logical distributions, 181, 379 logical probability, 194 logistic, 260, 265 low, 565, 577 M-step, 300, 301, 303 machine learning, 446 Mantel–Haenszel, 366, 484, 486 MAP, 65, 192, 226, 227, 305, 328, 434, 482 MAP estimate, 216, 225, 230, 493 MAR, 297, 325, 326, 498 marginal, 47 Marginal Belief, 153 Marginal Distribution, 434 marginal distribution, 110, 346, 481, 482, 492–494 marginal distributions, 564 Marginal Independence, 51 marginal likelihood, 299 marginal probability, 231 marginalization, 110, 150 Marginalize, 113 marginalize, 131 marginalizing, 133 margins, 108 market basket, 435, 597 market basket reporting, 500 Markov, 94, 94 Markov Chain, 309 Markov chain, 306 Markov Chain Monte Carlo, 280, 305, 307 Markov Decision Process, 590 Markov decision processes, 591 658 SUBJECT INDEX Markov property, 309, 589 Markov Tree, 131 Markov tree, 150, 151, 208 Markov tree propagation, 149, 151 Markov trees, 149 married, 93, 128 mastery, 437 Math Quiz, 167 maximizes, 300 Maximum A Posteriori, 192, 226 maximum a posteriori, 65 maximum likelihood, 291 Maximum Likelihood Estimate, 192 maximum likelihood estimate, 247, 290, 291 Maximum likelihood estimation, 64 maximum likelihood estimation, 299 Maximum marginal likelihood estimation, 299 MCAR, 297, 325, 326, 498 MCMC, 305, 308, 312, 324, 330, 336, 356, 382, 385, 393, 398, 402, 429, 487, 566, 576, 604 MCMC chains, 384 MCMC estimation, 315 MCMC estimation error, 387 MCMC sampler, 344, 389 MCMC, mixing, 385 mean, 61, 434 mean field method, 152 meanings, 419 measure, 43 measurement component, 135, 137 measurement model, 31, 38, 467, 535 measurement models, 20 measures of agreement, 228 mediating relationships, 442 message, 114, 120 message center, 536, 540, 544 messages, 114 method of moments, 322, 323, 394 Metropolis, 313, 315, 325, 385 Metropolis–Hastings algorithm, 356 Metropolis-Hastings, 313, 315 minimum constraint, 454 minimum constraints, 457 minimum entropy, 218 MIRT, 584 misfit., 404 Missing at Random, 406 missing at random, 297, 393, 406 missing completely at random, 297 missing data, 297 missing responses, 297 missing-at-random, 393 mix, 324 mixed models, 100 mixed number subtraction, 345, 346, 371, 372 mixed-number subtraction, 178, 188, 194, 420 Mixture models, 287 mixture of trees, 151 MLE, 192, 290, 298, 300 mode, 434, 493 Model checking, 362 model checking, 76, 325 Model criticism, 575 model fit, 332, 577 model fits, 362 model graph, 205 model likelihood, 354 Model Search MCMC, 356 model search MCMC, 357 model validation, 397 model-comparison, 350 models, 27, 416 Monte Carlo, 307 Monte Carlo Integration, textbf 62 moral, 94 moral graph, 90 moralization, 90, 153, 154, 449 moralize, 139 Moralized, 128 moralized, 142 moralized graph, 128 most likely configuration, 150 Most Likely Explanation, 434 SUBJECT INDEX Most Likely Value, 434 motif, 147 motifs, 147 multidimensional, Multidimensional Item Response Theory, 427 multidimensionality, 146 multinomial, 248 multinomial distribution, 70, 291 multinomial distributions, 291, 300 multiple choice, 7, 501, 515, 530, 541 multiple proficiencies, 595 Multiple-choice, 158 multiple-choice, 444, 478, 503 multiple-choice items, 472 Multiplication Rule, 47 multiply connected graph, 122 Multistage, 456 multivariate latent class, 398 multivariate normal, 100 Mutual Independence, 52 Mutual Information, 277 mutual information, 208, 215, 217, 277, 484 myopic, 210, 356 myopic search, 216, 220 NEAT, 396, 406 negative predictive value, 480 neighborhood, 83 neighbors, 83, 308 Netica, 185 NetPASS, 546 neural network, 197 Next Generation Science Standards, 508 NIDA, 252 NIDA distribution, 271 node, 161, 280 node coloring, 198 node fit, 332–334 nodes, 5, 82 Noisy-And, 274 Noisy-and, 270 659 noisy-and, 172, 250, 343 noisy-max, 172, 253 noisy-min, 172, 253 Noisy-Or, 275 noisy-or, 172, 244, 250, 253 non-informative, 247 non-informative prior, 248, 249, 271 nonidentifiability, 76 noninformative priors, 72, 74 nonmyopic search, 220 normal, XV, 298 Normal Distribution, 62 normal distribution, 70, 256, 300, 396 normal law, 272 normalization, 58, 107 normalization constant, 58, 58, 115, 351 normalize, 108, 111, 122 normalized, 59, 115, 351 normalizing constant, 289, 290, 298 nuisance variable, 171 number right, 183, 489, 500, 564 object-oriented Bayesian network, 137 object-oriented Bayesian networks, 587 objective, 45 objective function, 221 observable, XIII, 7, 12, 40, 147, 158, 221, 332, 447–450, 470, 478, 494, 497, 544, 564 observable characteristic plot, 342, 343, 345, 367, 398, 401, 603 observable fit, 403 observable fit statistic, 345, 401 observable outcome, 38, 172, 263, 589 observable outcome variable, 5, 146, 171, 252, 378, 482 observable outcome variables, 6, 9, 10, 179, 193, 198, 244, 444, 448, 449, 484, 491, 529 660 SUBJECT INDEX observable outcomes, 10, 35, 36, 167, 316, 348, 585, 587 observable variable, 30, 107, 381 Observable variables, 137 observable variables, 29, 31, 126, 135, 143, 200, 282, 288, 292, 491, 540, 551 observable,, 159 observables, 31, 139, 170, 187, 223, 334, 346, 427, 444, 447, 449, 451, 498, 532, 542, 581 observation, 213 observation vector, 150 observational equivalence, 74, 269, 290, 291 observational studies, 361 observations, 35, 492, 514, 524 observe, 10, 11 observed outcome, 175, 185, 377, 472, 481 observed outcome variable, 268, 342 observed outcome variables, 499 observed outcomes, 150, 224, 470, 473, 488, 489, 493, 494, 499 odds ratio, 366, 366 omitted responses, 497 on-line calibration, 315, 321 one parameter logistic (1PL), 261 operating characteristics, 228 or-gate, 244 or-gates, 173 order, 202 outcome, 201, 470 outcome pattern, 345 outcome space, 82, 91 outcome variables, 478 outcome vectors, 220, 231 outcomes, 30 outliers, 352 over fitting, 355 overall fit, 403 overall proficiency, 217 overlap constraint, 455 overlap constraints, 454, 457 P+, 161 p-plus, 330 p.d.f., 58 p.m.f., 57 paradigms, 416 parameter, 244 parameter estimation, 280 parameter learning, 280 parameter uncertainty, 393 parameters, 126, 242, 266, 268, 280, 294, 297, 298, 351, 487 parent, 146, 267, 292 parents, 84, 88, 126, 128, 146, 161, 172 parsing rules, 479 part of, 102 part-of, 429 partial ancestral graphs, 358 partial credit, 30, 255 partial graph, 83 participant, 24 particle filtering, 152 passing messages, 351 path, 84, 122 path analysis, 6, 99, 266 Pearson residual, 402 Pearson residuals, 337 peeling, 151 percent correct, 492 perfect map, 94 performance, 137 person fit, 334, 342, 402, 404, 405 person parameters, 288 PFEM algorithm, 155 pixel, 346 planning, 590 plate, 284 Plate notation, 280, 280 plate notation, 242, 292, 382 plates, 280, 281 PMEM, 143, 144 PMEM algorithm, 139, 143 PMF, 138, 139, 139, 140, 143, 144 polytree, 116 polytrees, 111, 152 SUBJECT INDEX POMDP, 590 pool, 454 population, 73, 347, 418, 428 population distribution, 515 population parameters, 244 positive predictive value, 480 positive states, 208 posterior, 49, 291, 298, 308, 309, 393 posterior covariance matrix, 299, 300 posterior distribution, 69, 134, 143, 210, 289, 292 posterior distributions, 305, 316 posterior laws, 567 posterior means, 298 posterior mode, 298, 300 posterior model probability, 354 posterior predictive p-value, 402, 576 posterior predictive check, 401 posterior predictive checks., 346 posterior predictive data, 577 posterior predictive distribution, 346 posterior predictive distribution, 335 posterior predictive model checking, 486 posterior predictive p-value, 337 posterior predictive tests, 337 posteriors, 289, 320 potential, 107, 113, 120, 121, 133, 134, 139, 140, 149 potential table, 113, 131 potential tables, 115, 139 potential weight of evidence, 208 potentials, 90, 94, 107, 151 practice, 508 precision, 61, 71, 327, 480, 567 prediction, 333, 353 predictions, 332 predictive distributions, 144 preposterior predictive data, 575 prerequisite, 179, 180, 194, 379, 405, 428, 429 Presentation material, 438 661 presentation material, 32, 40, 438, 440, 458, 503 Presentation Model, 33 presentation model, 33, 458 presentation models, 34 Presentation Process, 34 Presentation process, 468 presentation process, 34, 35, 37, 38, 472, 473, 500, 501, 538, 540, 543, 544 pretest, 486 pretest data, 38, 481, 484, 487 Principle of Equal Probability Space, 45 prior, 45, 49, 291, 324, 393, 395 prior distribution, 29, 69, 72, 384 prior distributions, 81, 379 prior laws, 293 priors, 173, 289, 300, 380 probabilistic classification, 231 probabilistic classification matrix, 231 Probability, 43, 44, 46 probability, 3, 14, 42, 43, 58 probability density function, 58 probability distribution, 107 probability mass function, 57 probability potential, 101 probative value, 127 probit, 265 procedural scoring algorithm, 446 process knowledge, 509 product multinomial, 249 proficiency, 5, 9, 12, 472, 588 Proficiency Model, 76, 242, 316, 515 proficiency model, 11, 12, 14, 19, 27, 29, 31, 34, 39, 135, 137, 138, 145, 146, 152, 154, 171, 180, 203, 216, 225, 244, 267, 269, 289, 317, 348, 377–379, 395, 397, 398, 424, 426, 433, 441, 443, 444, 448, 451, 455, 467, 471, 481, 484, 488–493, 662 SUBJECT INDEX 495, 515, 519, 529, 533, 535, 541, 552, 563, 580, 584 Proficiency Model Fragment, 180 proficiency model fragment, 146, 373 proficiency model parameters, 384 proficiency model variable, 33 proficiency model variables, 29, 31, 82, 454 proficiency model variables, 28 Proficiency Model–Evidence Model algorithm, 139, 143 proficiency models, 33, 135, 298, 500 proficiency profile, 150, 165, 189, 200, 201, 216, 343, 376–378, 398, 426, 500 proficiency profiles, 201, 220, 227, 344, 346, 377, 379, 398, 428, 431, 433, 448, 515 proficiency variable, 28, 107, 158, 171, 172, 223, 263, 266, 292, 317, 378, 397, 434, 449, 454, 551, 555, 581 Proficiency variables, 298 proficiency variables, XIII, 5, 31, 40, 126, 135, 137, 139, 143, 150, 163, 179, 188, 189, 198, 200, 221, 222, 224, 244, 263, 268, 274, 281, 288, 297, 299, 315, 316, 332, 355, 389, 426, 427, 442, 448–450, 482, 484, 486, 492, 493, 515, 529, 532, 550, 576, 589, 597 profile score, 201 profile scored assessment, 216 profile scores, 15 project, 109 projection, 101, 149 propagated, 186 proper distribution, 67 proper probability distribution, 75 proposal distribution, 313, 325, 328 prospective score report, 594 prototype score report, 436, 437 psychometric models, 137 psychometrician, 269 psychometricians, 5, psychometrics, 598 purpose, 23, 417, 426, 461 purposes, 457 Q-Matrix, 270, 336, 345 Q-matrix, 419, 449 Q3, 338 quasi-utility, 213, 215–217 quasiutilities, 454 quiz, 159 race, 486 racial, 347 radical, 440, 488, 547 radicals, 476, 477, 487 random experiment, 42 random sampling, 360 random variable, 290, 297 random variables, 57, 98 Ranked Probability Score, 334 ranked probability score, 364, 365 Rare Disease Problem, 51 rare disease problem, 228 Rasch, 160, 258, 261, 330 Rasch model, 159, 281 raters, 30, 103, 119 reading comprehension, 463 reading passage, 137 reading passages, 166 real–valued random variables, 57 Rebuttal data, 420 recall, 479 receiver operator characteristic (ROC), 480 Recognizing Task Situations, 443 recursive decomposition, 127 recursive models, 100 recursive representation, 48, 126, 137–139 Regression Distribution, 234 regression distribution, 234, 235, 273 SUBJECT INDEX relevance diagram, 99 relevant potential weight of evidence, 208 reliability, 7, 7, 23, 197, 198, 225, 227, 232, 265, 423, 436, 574 Reporting, 433 Reporting rule, 492 reporting rule, 28 reporting rules, 433, 435, 519 reporting variable, 456 reporting variables, 542 reporting variables, 203 repurposing, 507 Resampling Distribution, 61 research observables, 453 response, 470, 478 response patterns, 303 response processing, 35, 470 response time, 453 Results Database, 35 reuse, 511 reversibility, 308 reversible jumping rule, 357 risk, 214 ROC curve, 502 root node, 150 rubric, 30, 445 rubrics, 10, 472 rule of evidence, 452 Rule Space, 178–180 rule space, 377 rule-based system, 198 Rules of Evidence, 530 rules of evidence, 7, 10, 158, 159, 278, 444, 529, 554 running intersection, 118, 131 Running Intersection Property, 131 sample, 150, 561 sample size, 291 Sampling Algorithm, 150 sampling algorithm, 155 saturated model, 147 science, 509 scientific inquiry, 511 663 score, 225, 390, 433, 492 score report, 500, 519, 545 score reports, 539 score users, 433 Scores, 489 scores, 36, 225, 397, 424, 488, 500 scoring, 163, 488 scoring engine, 330 scoring model, 29, 351, 433, 471, 472, 489, 491, 491, 492–494, 500, 519, 541 scoring record, 35 scoring rubric, 478 scoring rule, 230 second layer, 242 section, 488 sections, 456 Security, 476 security, 315, 383 selected response, 446 selection bias, 360, 367 self-evaluation, 508 sensitivity, 121, 153, 228, 271, 479 Sensitivity Analysis, 77 sensitivity analysis, 45, 67 separable influences, 254, 275 Separation, 91 separator, 306 Sequential importance sampling, 152 shadow data, 335, 336, 346, 401, 575 shrinkage estimator, 79 shrinkage estimators, 74 signal to noise ratio, 225 SimCityEDU, 11 SimCityEDU, 546 simple graph, 82 simple language test, 198 Simple structure, 223 simple structure, 146, 406 simplified language assessment, 237 simplified language test, 227, 229 Simpson’s paradox, 366 Simulated Annealing, 356 simulation, 21, 472, 521 664 SUBJECT INDEX simulation task, 107 simulations, 15, 459 simulee, 150 simulees, 231 skill profile, 225 skill profiles, 152, 345 skill workaround parameters, 252, 271 slice sampler, 315 slope, 258, 555, 556, 558 slope parameter, 257 slow mixing, 309, 314, 385 Speaking, 455 specification rules, 440 specificity, 121, 153, 228, 480 speededness, 453 stakes, 457 standard deviation, 61 standard error, 290, 320 standard error of estimation, 64 standardized tests, 20 Standards, 509 standards, 508, 516, 597 standards-based, 508 starting values, 384 stationary, 384 stationary distribution, 309–311, 313 statistic, 500 statistical part, 31, 448 statistics, 433, 493, 542 StatShop, 383, 384 stealth assessment, 595 step size, 313 stimulus, 137 stimulus materials, 524 stopping rule, 216, 455 story problem, 259, 277 Story problems, 167 strong prior Bayesian, 72 strong priors, 76 Structural equation models, 99 structural equation models, 6, 355 structural equation models, 81 structural parameters, 299 Student t, 71 student growth, 596 student model, 221, 584 student record, 455 subgraph, 83 subject matter experts, 5, 267, 269 subjective, 76 Subjective Probability, 77 subpopulations, 347 subscores, 183 Subtest Independence, 77 sufficient statistics, 294, 300, 303 sum of scores, 31 summary feedback, 470, 473, 538 summary scoring, 36, 470 Summary Scoring Process, 76 summative, 595 superobservable, 452 support, 58, 309 target, 38 target hypothesis, 220 target population, 426 target rule, 224, 454 target rules, 221, 221 targets, 33, 456 task, 5, 29, 34–36, 137–139, 143, 145, 146, 150, 158, 167, 170, 171, 180, 193, 210, 262, 282, 317, 344, 447, 478, 486, 491, 494, 522, 523, 538, 540, 570, 587, 595 Task Construction, 440 task design, 145 task ID, 541, 543 task level feedback, 473, 488, 499 Task Model, 32 task model, 19, 30, 34, 39, 147, 222, 246, 273, 438, 440, 447, 458, 486, 497, 523, 525, 529 task model variable, 286, 525 Task model variables, 440 task model variables, 32, 222, 224, 432, 442, 454, 475, 486–488 Task models, 32, 523 SUBJECT INDEX task models, 33, 34, 221, 315, 375, 392, 397, 443, 451, 454, 474, 487, 554, 588 task pool, 442 task response, 491 task selection, 210, 584 task shells, 588 task-based feedback, 470, 538 task-level feedback, 539 Task/Evidence Composite Library, 37 task/evidence composite library, 468, 470, 472–474, 478, 488, 494, 541, 543 task/evidence library, 491 tasks, 11–13, 36, 137, 147, 166, 221– 224, 284, 286, 315, 438, 443, 451, 474, 486, 508, 543, 551, 554, 566, 596 teacher, 519 Tentacles, 89 test forms, 377 Test Length, 79 test security, 453 test statistic, 337, 575 test user, 519 test users, 23 testing program, 230 testing programs, 392 Testlet, 353 testlet, 137, 338, 352, 449, 551 the four-process architecture, 471 The Proficiency Model, 27 Three Gorges Dam, 382 three-way table, 484 threshold, 479 total graphical model, 135, 137, 144, 145, 154 Toulmin diagrams, 459 trace plot, 385 training data, 355 trait theories, 12 treat assessment, 594 tree, 84, 116 tree of cliques, 122, 123 665 treewidth, 105, 116, 123, 129, 130, 148, 149, 151, 154, 180, 430 triangulate, 139 Triangulated, 128 triangulated, 85, 129, 142 triangulation, 139, 154 true positive, 479 true score, 225 True Score Test Theory, 79 true-positive, 271 true-positive parameter, 270 truncated normal, 581 truncated normal law, 272 two-way table, 482 Type I Errors, 479 Type II Errors, 479 ubiquitous assessment, 595 undirected graph, 83, 94, 95 unidimensional, 7, 146, 158 Unified Knowledge, 513, 546 uniform distribution, 75 uniform distributions, 139 unit potential, 139, 140, 143, 144 unobtrusive assessment, 595 update algorithm, 151 urn, 58 utilities, 97, 98, 101 utility, 209, 213, 214 utility function, 226 validating, 165, 188 validity, 7, 7, 12, 24, 227, 274, 457, 578, 596 validity study, 437 valuation based system, 90 valuation-based systems, 101 valuations, 149 value of information, 98, 210, 214, 215, 424, 519 valued work, 223, 597 variable, 82, 161, 244, 292 variables, 268, 280 Variance, 61 variance, 313, 327, 567 666 SUBJECT INDEX vat, 454 vertices, 82 virtual evidence, 120, 121, 122, 139, 447, 542 warrant, 419 weak prior, 271 weak prior Bayesian, 72, 74 weather forecasting., 333 Weaver’s Surprise Index, 333 Weaver’s surprise index, 364, 365 web application, 561 Weight of Evidence, 276, 277 weight of evidence, 98, 197, 201, 205, 209, 213, 216, 232, 476, 481 weights of evidence, 451, 472, 473, 487 white noise, 385 WinBUGS, 313–315, 324 window, 384 word problems, 367 work product, 29, 30, 32, 35–37, 40, 107, 438, 440, 444, 446, 451, 458, 470, 472, 472, 478, 479, 488, 491, 529, 531, 538, 540, 554, 595 work products, 6, 10, 29, 32, 40, 444, 447, 498, 499, 509, 523, 524, 540, 544, 565 workaround, 251 Working Knowledge, 512, 513, 524, 526 working knowledge, 547, 563 XML, 540 zero, 270 zone of proximal development, 497 ... consistently and in a cost effective way demands an approach to assessment design that supports many kinds of assessments, both familiar selection assessments and new kinds of diagnostic assessments Furthermore,... http://www .springer. com/series/3463 Russell G Almond • Robert J Mislevy Linda S Steinberg • Duanli Yan David M Williamson Bayesian Networks in Educational Assessment 2123 Russell G Almond Florida State... other assessments Since that time, the authors have participated in many design projects using ECD and Bayesian networks, including DISC (Mislevy, Steinberg, et al 1999b; Mislevy, Steinberg, Breyer,

2015 (statistics for social and behavioral sciences) russell g almond, robert j mislevy, linda s steinberg, duanli yan, david m williamson (auth ) bayesian networks in educational assessment springer v

Thông tin tài liệu

Từ khóa liên quan

Mục lục

Acknowledgements

Using This Book

Notation

Random Variables

Sets

Probability Distributions and Related Functions

Transcendental Functions

Usual Use of Letters for Indices

Contents

List of Figures

List of Tables

Part I Building Blocks for Bayesian Networks

1 Introduction

1.1 An Example Bayes Network

1.2 Cognitively Diagnostic Assessment

1.3 Cognitive and Psychometric Science

1.4 Ten Reasons for Considering Bayesian Networks

1.5 What Is in This Book

2 An Introduction to Evidence-Centered Design

2.1 Overview

2.2 Assessment as Evidentiary Argument

2.3 The Process of Design

2.4 Basic ECD Structures

2.4.1 The Conceptual Assessment Framework

2.4.2 Four-Process Architecture for Assessment Delivery

Tài liệu cùng người dùng

Tài liệu liên quan