... variation, 83.04%. For comparison purposes, we also computed the value of R2 for adequacy using the BLEU score formula given in (Papineni et al., 2002), for the 7 systems using the same one ... comparison purposes, we also computed the value of R2 for fluency using the BLEU score formula given in (Papineni et al., 2002), for the 7 systems using the same one reference, and we obtained a ... recall scores for increasingly longer n-grams. For test corpora of reasonable size, the metrics are usually well-defined for N≤4. The ROUGE metric proposed by Lin and Hovy (2003) for automatic...