... features of RTER are binary. The remainingtwo models, MTR and MT+RTER, show clearerbenefit from more data. With 20% of the total data,they climb to within 5 points of their final perfor-mance, ... ofautomatically-extracted paraphrases. These ap-proaches reduce the risk that a good translation is rated poorly due to lexical deviation, but do notaddress the problem that a translation may containmany ... main reasonis their inability to properly capture meaning: A good translation candidate means the same thing as thereference translation, regardless of formulation. Wepropose a metric that evaluates...