... best model summary for a collec-tion of documents. Human annotators construct apyramid to capture important Summarization Con-tent Units (SCUs) and their weights, which is used to evaluate machine ... test for readability.∗∗, ∗, and – indicate significance level >=99%,>=95%, and <95%, respectively.corporated into the metric, we obtain the best results for all correlation scores: ... al-ways correlate better on the initial task than on theupdate task. This suggests that there is much room for improvement for readability metrics, and metricsneed to consider update information...