... the most important. We do not have space here to describe all the details we studied, but we can describe some of them. E.g., does the ordering measure v help tuning performance? To answer ... (Papineni et al., 2002), NIST (Doddington, 2002), WER, PER, TER (Snover et al., 2006), and LRscore (Birch and Osborne, 2011) do not use external linguistic information; they are fast to compute ... but not as good as METEOR. This is because we designed PORT to carry out tuning; we did not optimize its performance as an evaluation metric, but rather, to optimize system tuning performance....