... is objective and detached from the observers, and that this reality can be Tạp chí Khoa học ĐHQGHN, Ngoại ngữ 24 (2008) 1-6 1 Language program evaluation: Quantitative or qualitative approach? ... approaches: positivistic /quantitative and naturalistic /qualitative. This article will attempt to review these two major paradigms by (i) giving the definition of each paradigm and presenting its logic ... evaluators want to achieve in the evaluation process. However, evaluators have to rely on either quantitative or qualitative approach which has its own strengths and weaknesses. The researchers...
... (Wermter and Hahn, 2004) and for ATR (Wermter and Hahn, 2005), which havebeen shown to outperform several of the statistics-only metrics.3 Methods and Experiments3.1 Qualitative Criteria Because ... best-performing statistics-only measure for CE (cf. Evert and Krenn (2001) and Krenn and Evert (2001)) and also for ATR (seeWermter and Hahn (2005)).Concerning more recent linguistically groundedAMs, ... measures, we use the four criteria describedin Section 3.1 and qualitatively compare the differ-ent rankings given to true positives and true neg-atives by an AM and by Frequency. For this...
... part-of-speech tags and minimal PPs were identified.5The PNV triples were selected automatically suchthat the preposition and the noun are constituentsof the same PP, and the PP and the verb co-occurwithin ... moresusceptible to random variation, which illustratesthat evaluation based on a small number of -bestcandidate pairs cannot be reliable.With respect to the recall curves (Figures 3 and 4), we find: ... log-likelihood, and even precision gainedby frequency is better than or at least comparableto log-likelihood. These pairings – log-likelihood and t-test for AdjN, and t-test and frequency forPNV...
... of xylan and glucan for ADC final reached 79.1% and 88.2%, respectively. The overall yield of xylan and glucan for ADC green was 83.3% and 89.1%, respectively, through pretreatment and enzymatic ... Municipal solid waste: A Technical and Economic Evaluation Jian Shi, Mirvat Ebrik, Bin Yang*, and Charles E. Wyman Center for Environmental Research and Technology Bourns College of Engineering ... transportation fuels and chemicals because of its abundance, the need to find uses for this problematic waste, and its low and perhaps negative cost. However, significant heterogeneity and possible...
... Drug Evaluationand Research, Food and Drug Administra-tion, USA; Andrew Stone, Alan Barge, AstraZeneca, UnitedKingdom; Orhan Suleiman, Centre for Drug Evaluationand Re-search, Food and Drug ... InternationalWorking Party was formed in the mid 1990s to standardise and simplify response criteria. New criteria, known as RECIST(Response EvaluationCriteria in Solid Tumours), were pub-lished in 2000.8Key ... suspected.4.5. Frequency of tumour re -evaluation Frequency of tumour re -evaluation while on treatmentshould be protocol specific and adapted to the type and sche-dule of treatment. However,...
... ICASSP.X. Zhu and G. Penn. 2005. Evaluation of sentence selection forspeech summarization. In ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/ or Summariza-tion.X. Zhu and G. ... Infor-mative Coverage (IC): S2 and S9; Informative Relevance(IRV): S3 and S8; and Informative Redundancy (IRD):S4 and S7.4 Results4.1 Correlation between Human Evaluation and Original ROUGE ScoreSimilar ... R-SU4 and human evaluation. 5 Conclusion and Future WorkIn this paper, we have made a first attempt to system-atically investigate the correlation of automatic ROUGEscores with human evaluation...
... Japan. DUC and TSC both aim to compile standard training and test collections that can be shared among researchers and to provide common and large scale evaluations in single and multiple ... 2000 and 2001. However, the area is still being fleshed out: most past efforts have focused only on single-document summarization (Mani 2000), and no standard test sets and large scale evaluations ... between most and all, cohesion, some and most, and coherence, some and most. This indicates the strategies employed by NeATS (stigma word filtering, adding lead sentence, and time annotation)...
... APPENDIX A: Commercial Product Evaluation Process. . . . . . 87 APPENDIX B: Summary of EvaluationCriteria Divisions . . . . 89 APPENDIX C: Sumary of EvaluationCriteria Classes. . . . . . 91 ... objectives,rationale, and national policy behind the development of the criteria, and guidelines for developers pertaining to: mandatory access control rulesimplementation, the covert channel problem, and security ... the evaluation divisions (Appendix B) and classes (Appendix C), and finally adirectory of requirements ordered alphabetically. In addition, there is aglossary.Structure of the Criteria The criteria...
... wedemonstrate in our bilingual evaluation. 2.3 Evaluation Method Evaluation for hypernymy and synonymy usuallyuses WordNet (Lin and Pantel, 2002; Widdows and Dorow, 2002; Davidov and Rappoport, 2006). ... meronymy (Berland and Charniak, 1999; Girju et al., 2006), synonymy(Widdows and Dorow, 2002; Davidov and Rap-poport, 2006), and verb strength + verb happens-before (Chklovski and Pantel, 2004). ... (Davidov and Rappoport,2006; Widdows and Dorow, 2002) and meronymy(Berland and Charniak, 1999; Girju et al., 2006).Since named entities are very important in NLP,many studies define and discover...
... these are. Belz and Reiter(2006) and Reiter and Belz (2009) describe com-parison experiments between the automatic eval-uation of system output and human (expert and non-expert) evaluation of ... 0.03686Table 4: Correlation between dependency-based evaluation and human judgementsthe parses of the original strings. We calculateboth a weighted and unweighted dependency f-score, as given in ... Short Papers, pages 97–100,Suntec, Singapore, 4 August 2009.c2009 ACL and AFNLPCorrelating Human and Automatic Evaluation of a German SurfaceRealiserAoife CahillInstitut făur Maschinelle...
... mortality are not available. Tables 7 and 8, and 9 and 10, respectively, provide estimates of discarded catch and discard rates by species, area, gear, and target fishery. Within each area or ... 1.7 and 2.2 million t (Fig. 1 and Table 1). The rapid displacement of the foreign and joint-venture fisheries by the domestic fishery between 1984 and 1991 can be seen by comparing Figures 1 and ... StatusNPFMC Economic SAFE STOCK ASSESSMENT AND FISHERY EVALUATION REPORT FOR THE GROUNDFISH FISHERIES OF THE GULF OF ALASKA AND BERING SEA/ALEUTIAN ISLANDS AREA: ECONOMIC STATUS OF THE GROUNDFISH...
... Riezler and J. T. Maxwell III. 2005. On som e pit-falls in automatic evaluationand significance testingfor MT. In Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/ or ... system, and then comparedthe generated texts to the original corpus texts.Similar evaluations have been used e.g. by Banga-lore et al. (2000) and Marciniak and Strube (2004).Such corpus-based evaluations ... (non-repeating column and row entries)experimental design where each combination ofdate and system is assigned one evaluation. 4 ResultsTable 2 shows evaluation scores for the five NLGsystems and the corpus...
... protocols and tools were developed to collect quantitative and qualitative data. Pre and post tests related to HIV/AIDS and nutrition will allow for quantitative comparison of knowledge, attitude and ... scheduledlecture and tutorial hours; (3) opportunities for a variety of learning activities including small groupdiscussion and collaborative projects; and (4) exposure to and a forum for expressing and ... Web-based and classroom learning environments; the design and development of a prototype Webenvironment to facilitate these learning activities; and, the formative evaluation of learningactivities and...