... exchanging onlyhypotheses that correspond to the same sentence.Table 2 shows the p-values computed by AR, test-ing the significance of the differences between the two systems in each pair. The first three ... optimizeroutputs from our large pool and ran the AR test todetermine the significance; we repeated this proce-dure 250 times. The p-values reported are the p-values at the edges of the 95% confidence ... the language pair, the portionof the search space visible to the optimizer (e.g. 10-best, 100-best, a lattice, a hypergraph), and the sizeof the tuning set. Unfortunately, there is no proxy...