regression for sentencelevel mt evaluation

Tài liệu Báo cáo khoa học: "a Precision-Order-Recall MT Evaluation Metric for Tuning" pdf

Tài liệu Báo cáo khoa học: "a Precision-Order-Recall MT Evaluation Metric for Tuning" pdf

Ngày tải lên : 19/02/2014, 19:20
... word alignment information. 3 Experiments 3.1 PORT as an Evaluation Metric We studied PORT as an evaluation metric on WMT data; test sets include WMT 2008, WMT 2009, and WMT 2010 all-to-English, ... and 22.0% ties). 1 Introduction Automatic evaluation metrics for machine translation (MT) quality are a key part of building statistical MT (SMT) systems. They play two 1 PORT: Precision-Order-Recall ... An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of ACL Workshop on Intrinsic & Extrinsic Evaluation Measures for Machine Translation...
  • 10
  • 387
  • 0
Báo cáo khoa học: "A Re-examination of Machine Learning Approaches for Sentence-Level MT Evaluation" ppt

Báo cáo khoa học: "A Re-examination of Machine Learning Approaches for Sentence-Level MT Evaluation" ppt

Ngày tải lên : 08/03/2014, 02:21
... a good MT- evaluation metric. Second, we analyze the resource requirement for regression models for different sizes of feature sets through learning curves. Finally, we show that SVM -regression ... datasets: NIST 2002 Chinese MT Evaluation (3 systems, 2634 sentences total), NIST 2003 Arabic MT Evaluation (2 systems, 1326 sentences total), and NIST 2004 Chinese MT Evalu- ation (10 systems, ... in- valuable resource for measuring the reliability of au- tomatic evaluation metrics. In this paper, we show that they are also informative in developing better metrics. 3 MT Evaluation with Machine...
  • 8
  • 476
  • 0
Báo cáo khoa học: "A Graphical Interface for MT Evaluation and Error Analysis" doc

Báo cáo khoa học: "A Graphical Interface for MT Evaluation and Error Analysis" doc

Ngày tải lên : 16/03/2014, 20:20
... Association for Computational Linguistics, pages 139–144, Jeju, Republic of Korea, 8-14 July 2012. c 2012 Association for Computational Linguistics A Graphical Interface for MT Evaluation and ... rich set of metrics and meta-metrics for assessing MT quality (Gim ´ enez and M ` arquez, 2010a). Although automatic MT evaluation is still far from manual evaluation, it is indeed necessary to ... existing evaluation measures and to support the development of further improve- ments or even totally new evaluation metrics. This information can be gathered both from the experi- 139 Figure 1: MT...
  • 6
  • 453
  • 0
Tài liệu Báo cáo khoa học: "Collecting a Why-question corpus for development and evaluation of an automatic QA-system" pdf

Tài liệu Báo cáo khoa học: "Collecting a Why-question corpus for development and evaluation of an automatic QA-system" pdf

Ngày tải lên : 20/02/2014, 09:20
... seems that there are people willing to work on them even for free. MTurk requesters cannot however rely on this voluntary workforce. From MTurk Forums it is clear that some of the workers rely on ... significant chal- lenge is the evaluation: manual evaluation is a difficult, time-consuming process and not ap- plicable within efficient development of sys- tems. Automatic evaluation requires a cor- pus ... contains information re- lated to the corpus collection process. We be- lieve this additional information can be used to post-process the data, and to develop an auto- matic approval system for further...
  • 9
  • 610
  • 1
Tài liệu Báo cáo khoa học: "MT Evaluation: Human-like vs. Human Acceptable" doc

Tài liệu Báo cáo khoa học: "MT Evaluation: Human-like vs. Human Acceptable" doc

Ngày tải lên : 20/02/2014, 12:20
... were translated (Akiba et al., 2004). For purposes of au- tomatic evaluation, 16 reference translations and outputs by 20 different MT systems are available for each sentence. Moreover, each of ... table for nine o’clock 7: my name is endo and i reserved a table with you for nine o’clock 8: i ’ve booked a table under endo for nine o’clock 9: my name is endo and i have a table reserved for ... reservation for a table at nine o’clock 11: my name is endo and i reserved a table for nine o’clock 12: the name is endo and i have a reservation for nine 13: i have a table reserved for nine under...
  • 8
  • 334
  • 0
Tài liệu Báo cáo khoa học: "Extending the BLEU MT Evaluation Method with Frequency Weightings" pdf

Tài liệu Báo cáo khoa học: "Extending the BLEU MT Evaluation Method with Frequency Weightings" pdf

Ngày tải lên : 20/02/2014, 16:20
... used in the vec- tor-space model for Information Retrieval (Salton and Leck, 1968) and the S-score proposed for evaluating MT output corpora for the purposes of Information Extraction (Babych ... for translation: MT systems that have no means for prioritising this information often in- troduce excessive information noise into the tar- get text by literally translating structural information, ... translation variation for automatic evaluation of MT quality. In: Proceedings of LREC 2004 (forthcoming). Papineni, K., S. Roukos, T. Ward, W J. Zhu. 2002 BLEU: a method for automatic evaluation of...
  • 8
  • 267
  • 0
Tài liệu Báo cáo khoa học: "Methods for the Qualitative Evaluation of Lexical Association Measures" doc

Tài liệu Báo cáo khoa học: "Methods for the Qualitative Evaluation of Lexical Association Measures" doc

Ngày tải lên : 20/02/2014, 18:20
... for evaluation General statistics for the AdjN and PNV base sets are given in Table 1. Manual annotation was performed for AdjN pairs with frequency and PNV triples with only (see section 5 for ... best measure for identifying AdjN collocations, except for - coordinates between 15% and 20% where t-test outperforms log-likelihood. For PNV data, the curves of all measures (ex- cept for frequency) ... around the best-performing measure for the AdjN data with (log-likelihood). There is no significant difference between log- likelihood and t-test. And only for -best lists with , frequency performs marginally significantly...
  • 8
  • 516
  • 0
Tài liệu Báo cáo khoa học: "AN INTEGRATED HEURISTIC SCHEME FOR PARTIAL PARSE EVALUATION" docx

Tài liệu Báo cáo khoa học: "AN INTEGRATED HEURISTIC SCHEME FOR PARTIAL PARSE EVALUATION" docx

Ngày tải lên : 20/02/2014, 21:20
... that are used for disambiguation and parse evaluation. After some experimentation, the evaluation feature weights were set in the following way. As previously described, the penalty for a skipped ... CMU-CMT-88-MEMO, 1988. [Tomita, 1986] M. Tomita. Efficient Parsing for Nat. nral Language. Kluwer Academic Publishers, Hing- ham, Ma., 1986. 318 AN INTEGRATED HEURISTIC SCHEME FOR PARTIAL ... bad parses. Our results indicate that our full integrated heuris- tic scheme for selecting the best parse out-performs the simple heuristic, that considers only the number of words skipped....
  • 3
  • 346
  • 0
Tài liệu Báo cáo khoa học: "An Efficient Generation Algorithm for Lexicalist MT" ppt

Tài liệu Báo cáo khoa học: "An Efficient Generation Algorithm for Lexicalist MT" ppt

Ngày tải lên : 20/02/2014, 22:20
... it is well-formed or ill-formed. • maximal iff it is well-formed and its parent (if it has one) is ill-formed. In other words, a maxi- mal TNCB is a largest well-formed component of a TNCB. ... time algorithm for lexicalist MT generation pro- vided that sufficient information can be transferred to ensure more determinism. 1 Introduction Lexicalist approaches to MT, particularly ... dominating the new well- formed node are disrupted. By dominance monotonicity, all nodes which were disrupted by the adjunction must become well-formed af- ter re -evaluation. And nodes dominating...
  • 7
  • 410
  • 0
Báo cáo khoa học: "M AX S IM: A Maximum Similarity Metric for Machine Translation Evaluation" doc

Báo cáo khoa học: "M AX S IM: A Maximum Similarity Metric for Machine Translation Evaluation" doc

Ngày tải lên : 08/03/2014, 01:20
... Roles (Gimenez and Marquez, 2007) proposed using deeper linguistic information to evaluate MT per- formance. For evaluation in the ACL-07 MT work- shop, the authors used the metric which they termed as ... auto- matic metric for MT evaluation with improved corre- lation with human judgments. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Mea- sures for MT and/or Summarization, ... Correlations on the News Commentary dataset. MT 2003 evaluation exercise. 5.1 ACL-07 MT Workshop The ACL-07 MT workshop evaluated the transla- tion quality of MT systems on various translation tasks,...
  • 8
  • 248
  • 0
Báo cáo khoa học: "Stochastic Iterative Alignment for Machine Translation Evaluation" doc

Báo cáo khoa học: "Stochastic Iterative Alignment for Machine Translation Evaluation" doc

Ngày tải lên : 08/03/2014, 02:21
... N SC ORE (mt, M, ref, N)  Compute the alignment score of the MT output mt with length M and the reference ref with length N for i = 1; i ≤ M; i = i +1 do for j = 1; j ≤ N; j = j +1 do for k = ... means. While sentence-level evaluation is useful if we are interested in a confidence measure on MT out- puts, syste-x level evaluation is more useful for comparing MT systems and guiding their ... (+07.8%) Table 4: 95% significance intervals for sentence- level adequacy evaluation cult for one metric to significantly outperform an- other metric in sentence-level evaluation. The re- sults show that...
  • 8
  • 264
  • 0
Systems for Research and Evaluation for Translating GenomE-Based discoveries for health potx

Systems for Research and Evaluation for Translating GenomE-Based discoveries for health potx

Ngày tải lên : 15/03/2014, 15:20
... Translating Genomic-Based Research for Health Board on Health Sciences Policy SyStemS for reSearch and evaluation for tranSlating Genome-Based discoveries for health W o r k s h o p s u m m ... horizon, understand what the public is pushing for, and determine at 2 SYSTEMS FOR RESEARCH AND EVALUATION Research for Health identified a need for a workshop to examine existing systems that ... acceptability, and other contextual issues before making a decision. Quantitative Information for Decision Making Quantitative information needed for decision making includes data on effectiveness,...
  • 103
  • 524
  • 0
Báo cáo khoa học: "Using Leading Text for News Summaries: Evaluation Results and Implications for Commercial Summarization Applications" ppt

Báo cáo khoa học: "Using Leading Text for News Summaries: Evaluation Results and Implications for Commercial Summarization Applications" ppt

Ngày tải lên : 17/03/2014, 07:20
... summary for a given document, something that a key approach to evaluation does not capture. Brandow et al. (1995) found this to be the case for some documents where all ANES-genemted and ... news. Leading text extracts such as the LEAD field are appealing for commercial use as summaries for a number of reasons. For general news documents, they are usually acceptable as summaries. ... this investigation, we found that to be the case for most list and newsbrief documents and for many transcripts. Beyond news data, this holds for case law documents, many types of financial...
  • 5
  • 319
  • 0
Báo cáo khoa học: "CD ER: Efficient MT Evaluation Using Block Movements" doc

Báo cáo khoa học: "CD ER: Efficient MT Evaluation Using Block Movements" doc

Ngày tải lên : 17/03/2014, 22:20
... automatic evaluation measures and human judgment. 1 Introduction Research in machine translation (MT) depends heavily on the evaluation of its results. Espe- cially for the development of an MT system, an ... a method for evaluation automatic evaluation metrics for machine translation. COLING 2004, pages 501– 507, Geneva, Switzerland, Aug. D. Lopresti and A. Tomkins. 1997. Block edit models for approximate ... tedious and cost-intensive, automatic evaluation measures are used in most MT research tasks. A high correlation between these automatic evaluation measures and human evaluation is thus desirable. State-of-the-art...
  • 8
  • 245
  • 0
KDIGO Clinical Practice Guideline for the Diagnosis, Evaluation, Prevention, and Treatment of Chronic Kidney Disease-Mineral and Bone Disorder (CKD-MBD) pdf

KDIGO Clinical Practice Guideline for the Diagnosis, Evaluation, Prevention, and Treatment of Chronic Kidney Disease-Mineral and Bone Disorder (CKD-MBD) pdf

Ngày tải lên : 22/03/2014, 09:20
... 3: for serum calcium and phos- phorus, every 6–12 months; and for PTH, based on baseline level and CKD progression. K In CKD stage 4: for serum calcium and phos- phorus, every 3–6 months; and for ... follow-up, and the quality grade for the respective outcome. Conceptually, information on the left upper corner shows high-quality evidence for outcomes of high importance. Information on the right lower ... low-quality evidence for outcomes of lesser impor- tance. Evidence for AEs was not graded for quality, but still tabulated in one column in the matrices. An evidence matrix was not generated for a systematic review...
  • 140
  • 777
  • 1