... RiloffUniversity of UtahSalt Lake City, UTriloff@cs.utah.eduAbstractWe aim to shed light on the state- of- the- art in NPcoreference resolution by teasing apart the differ-ences in the MUC and ... runs using anoptimal threshold (box 3) for the experiment as de-termined by using the test set. In all remaining ex-periments, we learn the threshold from the trainingset as in the BASELINE ... including the number of documents, annotated CEs, coreference chains, annotatedCEs per chain (average), and number of documents in the train/test split. We use st to indicate a standard train/test...