... presents a probabilistic framework, QARLA, for the evaluation of text summarisation systems. The in-put of the framework is a set of man-ual (reference) summaries, a set of base-line (automatic) ... two automatic summaries a, a and a similarity measure x, if a is more distant to allmanual summaries than a , then a cannot be better281than a . Formally: ∀m ∈ M.x (a, m) < x (a , ... test-bed is reliable (JACKmeasure).2 Formal constraints on any evaluation framework based on similarity metricsWe are looking for a framework to evaluate au-tomatic summarisation systems objectively...