... each of them containing 150 pairs, were constructed randomly and were restricted to words with indi- vidual frequencies between 500 and 2500. We term these two sets as the occurring and non-occurring ... P(x) and P(y) are the probabilities of the events x and y (occurrences of words, in our case) and P(x, y) is the probability of the joint event (a cooccurrence pair). We estimate mutual information ... eration (Smadja and McKeown, 1990), lexicogra- phy (Church and Hanks, 1990), machine transla- tion (Brown et al., ; Sadler, 1989), information retrieval (Maarek and Smadja, 1989) and various disambiguation...