Báo cáo toán học: "ords with simple Burrows-Wheeler Transforms" pot

17 221 0
Báo cáo toán học: "ords with simple Burrows-Wheeler Transforms" pot

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Words with simple Burrows-Wheeler Transforms Jamie Simpson Department of Mathematics and Statistics Curtin University of Technology, Perth, WA 6850, Australia simpson@maths.curtin.edu.au and Simon J. Puglisi School of Computer Science and Information Technology RMIT University, Melbourne, Victoria 3001,Australia sjp@cs.rmit.edu.au Submitted: Mar 17, 2008; Accepted: Jun 9, 2008; Published: Jun 20, 2008 Mathematics Subject Classifications: 68R15, 68W05 Abstract Mantaci et al have shown that if a word x on the alphabet {a, b} has a Burrows- Wheeler Transform of the form b i a j then x is a conjugate or a power of a conjugate of a standard word. We give an alternative proof of this result and describe words on the alphabet {a, b, c} whose transforms have the form c i b j a k . These words have some common properties with standard words. We also present some results about words on larger alphabets having similar properties. 1 Introduction We use the usual notation for combinatorics on words. A word of n elements is x = x[1 n], with x[i] being the ith element and x[i j] the factor of elements from position i to position j. The letters in x come from some alphabet A. The set of all words with letters from A is A ∗ . The length of x, written |x|, is the number of letters in x and the number of occurrences of the letter a in x is |x| a . A factor of length n is an n-factor. Two or more adjacent identical factors form a power. A word which is not a power is primitive. A word x or factor x is periodic with period p if x[i] = x[i + p] for all i such that x[i] and x[i + p] are in the word. Two words x and y are conjugate if there exist words u and v such that x = uv and y = vu. We write C(x) for the set of conjugates of a word x. If x precedes y lexicographically we write x ≺ y and x  y means that either x ≺ y or x = y. Often we will use capital letters for sets of words. X ≺ Y means every word in the set X precedes every word in the set Y . If u and v are words then uXv is a set of words each having the electronic journal of combinatorics 15 (2008), #R83 1 prefix u and suffix v. For any non-empty word x, F (x) and L(x) are, respectively, the first and last letters in x. If x = α 1 α 2 . . . α n then the reverse of x, written x R , is α n . . . α 2 α 1 . A word x is a palindrome if x = x R . If m and n are integers we write gcd(m, n) for the greatest common divisor of m and n. The Burrows-Wheeler Transform (henceforth BW Transform) [2] was introduced in 1994 as part of a data compression scheme, and has since been heavily studied (see, for example, [7] and [9] and references therein). To perform the transform on a word x first list its conjugates in lexicographic order. The transform is then formed by concatenating the final letters of the conjugates in this order. For example to transform “hello” we produce the list elloh, hello, llohe, lohel, ohell and obtain the transform hoell. We will write BW T (x) for the BW Transform of x. The advantage of the transform is that for some words, such as English text, it produces transforms with many repeated letters, and these locally skew first-order statistics can be exploited by a compressor. For example the first two sentences of this paragraph transform to: no]mnhe.rW)fn4asaxsdstttmcsnmead mser [991 . B- 2t rr rrpw dgiunsi er rohhmchcehhldptrlo ssrseuotTtWc(phx sfl nen rrrreoiiooeoaaaiTr cc icw fffffr nameeTtTgoopeooootru iaoteaaw nnnseirssraar nidjBn o e An extreme example of this is when all occurrences of each letter make up a factor in the transform. For example BW T (bbabbba) = b 5 a 2 and BW T (cacbca) = c 3 ba 2 . It is interesting to ask what words have such BW Transforms: they represent the best case for BWT based compressors. In the case of words on a 2 letter alphabet this question was answered by Mantaci et al [6] who obtained the remarkable result that if BW T (x) = b i a j then x is a conjugate or a power of a conjugate of a standard word (defined below). Standard words are, in a sense, the building blocks of the ubiquitous Sturmian words. It is surprising that they should turn up in connection with BW Transforms. It is not possible to have BW T (x) = a i b j with i and j positive and a and b having their usual lexicographic order. In the next section we give a new proof of the Mantaci et al result, and in the third section obtain a similar result for words on a three letter alphabet. In the final section we present some results about words on larger alphabets having similar properties and compare our words on the three letter alphabet with standard words. 2 Size 2 alphabet We consider words defined on the alphabet A = {a, b}. We will describe the set of all words on this alphabet which have BW Transforms of the form b i a j where i and j are non-negative integers. The main result of this section is Theorem 2.5 in which we describe the electronic journal of combinatorics 15 (2008), #R83 2 such words with gcd(i, j) = 1. In Corollary 2.6 we use some results from [6] to obtain the general case. The morphisms φ and ˜ φ are defined by φ(a) = a ˜ φ(a) = ba φ(b) = ab ˜ φ(b) = b. We let S be the smallest set containing a and b which is closed under both φ and ˜ φ. The set S is the set of standard words on {a, b}. See [5], Chapter 2, where standard words are defined in terms of ordered pairs (u, v) of standard words, each pair giving rise to the two ordered pairs (u, uv) and (vu, v). In our case the ordered pairs have the form (X(a), X(b)) where X ∈ {φ, ˜ φ} ∗ and concatenation implies composition. The children of (X(a), X(b)) are (X(a), X(a)X(b)) and (X(b)X(a), X(b)) which equal (X(φ(a)), X(φ(b))) and (X( ˜ φ(a)), X( ˜ φ(b))) respectively. From this it is easy to see the equivalence of the definitions. We will need the following lemmas. The first two are Propositions 2 and 3 from [6]. Lemma 2.1. Two words x and y are conjugate if and only if BW T (x) = BW T (y). Lemma 2.2. If x = u d and BW T (u) = α 1 α 2 . . . α n then BW T (x) = α d 1 α d 2 . . . α d n . Lemma 2.3. If x and y come from {a, b} ∗ , have the same length and x ≺ y then φ(x) ≺ φ(y) and ˜ φ(x) ≺ ˜ φ(y). Proof. Write x = pas and y = pbt for possibly empty strings p, s and t. Applying φ we have φ(x) = φ(p)aφ(s) and φ(y) = φ(p)abφ(t). By the definition of φ, φ(s) must begin with an a so that φ(s) ≺ bφ(t) and so φ(x) ≺ φ(y). The proof of the second part is similar. Lemma 2.4. If x ∈ {a, b} ∗ is a conjugate of y then φ(x) is a conjugate of φ(y). Proof. By observation. We notice that in general {φ(y) : y ∈ C(x)} = C(φ(x)), as φ(x), being longer than x, has more conjugates. Theorem 2.5. BW T (x) = b i a j for some i and j with gcd(i, j) = 1 if and only if x is a conjugate of a word in S. Proof. We first use induction on the length of x to show that every x in S has BW T(x) of the required form, then show that the members of S are the only words with this property. It is clear that a and b belong to S and have BW Transforms of the appropriate form, so the statement holds for |x| = 1. Suppose that any x ∈ S with |x| < n has a BW Transform of the required form. Each member of S\{a, b} is the image under φ or ˜ φ of some other member. Consider a word y the electronic journal of combinatorics 15 (2008), #R83 3 in S with |y| = n. Then there exists x in S such that φ(x) = y or ˜ φ(x) = y. Without loss of generality suppose y = φ(x). The conjugates of x make up sets aX 1 a, aX 2 b, bX 3 a, bX 4 b. Either aX 1 a or bX 4 b must be empty else BW T (x) does not have the required form. Suppose that aX 1 a is empty. The conjugates are lexicographically ordered thus: aX 2 b ≺ bX 4 b ≺ bX 3 a. Applying φ to these and using Lemma 2.3 we get aφ(X 2 )ab ≺ abφ(X 4 )ab ≺ abφ(X 3 )a. The set of conjugates of y = φ(x) also includes baφ(X 2 )a and babφ(X 4 )a. Since each member of φ(X 2 ) begins with a, the full set of conjugates, ordered lexicographically, is aφ(X 2 )ab ≺ abφ(X 4 )ab ≺ abφ(X 3 )a ≺ baφ(X 2 )a ≺ babφ(X 4 )a. By inspecting the final letters of each set we see that BW T (y) has the form b i a j . A similar analysis applies if y = ˜ φ(x). By assumption BW T (x) = b i  a j  for some i  and j  with gcd(i  , j  ) = 1. Then i = i  + j  and j = j  so that gcd(i, j) = 1 as required. We have shown that any member of S has a BW Transform of the required form. We now show that the only words with such BW Transforms are conjugates of words in S. By Lemma 2.1 words are conjugates if and only if they have the same BW Transform, so it is sufficient to show that all words of the form b i a j with gcd(i, j) = 1 are transforms of some member of S. This is equivalent to showing that for all such i and j there is a member x of S with |x| a = i and |x| b = j. This is proved easily by induction on i + j. It clearly holds when i + j = 1 since a and b are in S. Suppose it holds for all pairs (i, j) with gcd(i, j) = 1 and i + j < k. Consider a pair (i  , j  ) with gcd(i  , j  ) = 1 and i  + j  = k. Suppose i  > j  ≥ 1. Then gcd(j  , i  − j  ) = 1 so S contains y, say, with |y| a = i  − j  and |y| b = j  . But then a appears i  times in φ(y) and b appears j  times, as required. If j  > i  the same reasoning applies with φ replaced by ˜ φ. This completes the proof. Corollary 2.6. A word x has BW Transform b i a j if and only if it is a conjugate of a word in S or a conjugate of a power of a word in S. Proof. We first show that for any i and j there is a word in S with BW Transform b i a j . Let gcd(i, j) = d. If d = 1 then the statement is equivalent to the theorem. Otherwise write i = pd and j = qd where gcd(p, q) = 1. By the theorem there exists x in S with BW T (x) = b p a q and by Lemma 2.2 BW T (x d ) = b pd a qd as required. The converse follows from Lemma 2.1. the electronic journal of combinatorics 15 (2008), #R83 4 3 Size 3 alphabet We now describe the set of words x on the alphabet {a, b, c} with the property that BW T (x) = c i b j a k (3.1) for non-negative integers i, j and k. We call a word satisfying (3.1) a Type I word. Examples are given at the beginning of the last section. We will construct a set T of primitive words, each satisfying (3.1) and such that any primitive word satisfying (3.1) is a conjugate of a word in T . Then by Lemma 2.1 and Lemma 2.2 any Type I word is either a conjugate of a word in T or a power of such a conjugate. The words in the set S of the last section satisfy (3.1) with i = 0. Let γ 1 be the morphism defined by γ 1 (a) = b, γ 1 (b) = c. It is easy to see that if x ∈ S and BW T (x) = b j a k then BW T (γ 1 (x)) = c j b k . Similarly if γ 2 is defined by γ 2 (a) = a, γ 2 (b) = c then BW T (γ 2 (x)) = c j a k , so both γ 1 (x) and γ 2 (x) are Type I. Let T 0 = S ∪ {γ 1 (x) : x ∈ S} ∪ {γ 2 (x) : x ∈ S}. (3.2) The conjugates of the words in T 0 are the only primitive words which contain at most 2 distinct letters from {a, b, c} and which satisfy (3.1). We now extend the morphism φ defined in the last section to φ(a) = a, φ(b) = ab, φ(c) = ac. Note that this agrees with the earlier definition when applied to a or b. We also need θ defined by θ(a) = c, θ(b) = b, θ(c) = a. We will introduce a third mapping ψ below. We define T to be the minimal set of words which includes T 0 and is closed under the mappings θ, φ and ψ. To prove that T has the required properties we need several lemmas. Let x 1 ≺ x 2 ≺ · · · ≺ x n be the conjugates of a word x having length n so that BW T (x) = L[x 1 ]L[x 2 ] . . . L[x n ]. It is clear that the set of 2-factors occurring in x is precisely {L[x i ]F [x i ] : i = 1 . . . n}. It is also clear that a necessary and sufficient condition for x to be Type I is that x i ≺ x j ⇒ L(x i )  L(x j ). (3.3) Lemma 3.1. Let x be a Type I word with |x| a = α, |x| b = β and |x| c = γ. (i) If β + γ > α ≥ γ then the set of 2-factors in x is a subset of {ab, ac, ba, bb, ca}. (ii) If α ≥ β + γ then the set of 2-factors in x is a subset of {aa, ab, ac, ba, ca}. (iii) If α + β > γ ≥ α then the set of 2-factors in x is a subset of {ac, bb, bc, ca, cb}. (iv) If γ ≥ α + β then the set of 2-factors in x is a subset of {ac, bc, ca, cb, cc}. the electronic journal of combinatorics 15 (2008), #R83 5 Proof. Let the conjugates of x be x 1 ≺ · · · ≺ x n . Since x is Type I BW T (x) = c γ b β a α which is the concatenation L(x 1 ) . . . L(x n ). We also have F (x 1 ) . . . F (x n ) = a α b β c γ . Con- sider the case β + γ > α ≥ γ. We see that F [x i ]L[x i ] = ac for i = 1 . . . γ F [x i ]L[x i ] = ab for i = γ + 1 . . . α F [x i ]L[x i ] = bb for i = α + 1 . . . β + γ F [x i ]L[x i ] = ba for i = β + γ + 1 . . . α + β F [x i ]L[x i ] = ca for i = α + β + 1 . . . α + β + γ Since the set of 2-factors in x is precisely the set of L[x i ]F [x i ] values, part (i) of the Lemma follows. The proofs of the other parts are similar. Lemma 3.2. Let x and y be words on the alphabet {a, b, c}. (a) If x ≺ y then φ(x) ≺ φ(y) and θ(x)  θ(y). (b) If x is a conjugate of y then φ(x) is a conjugate of φ(y) and θ(x) is a conjugate of θ(y). Proof. (a) This is immediate since for any letters α and β from {a, b, c} α ≺ β implies φ(α) ≺ φ(β) and θ(α)  θ(β). (b) This is also immediate. Note that {φ(y) : y ∈ C(x)} includes all conjugates of φ(x) except those with prefix ba or ca and that {θ(y) : y ∈ C(x)} includes all conjugates of θ(x). Lemma 3.3. The word x is Type I if and only if θ(x) is Type I. Proof. Let the conjugates of x be x 1 ≺ x 2 ≺ · · · ≺ x n . Then by Lemma 3.2 the conjugates of θ(x) are θ(x 1 )  θ(x 2 )  · · ·  θ(x n ). Also note that L(x)  L(y) implies L(θ(x))  L(θ(y)). By (3.3) x is Type I if and only if x i ≺ x j ⇒ L(x i )  L(x j ), that is, if and only if θ(x i )  θ(x j ) ⇒ L(θ(x i ))  L(θ(x j )), that is, by (3.3), if and only if θ(x) is Type I. Lemma 3.4. The word x is Type I if and only if φ(x) is Type I. Proof. Suppose x is Type I. Then its 2-factors come from one of the four sets in Lemma 3.1. Suppose they come from {ab, ac, ba, bb, ca}. Then the conjugates of x may be written aX 1 c ≺ aX 2 b ≺ bX 3 b ≺ bX 4 a ≺ cX 5 a. the electronic journal of combinatorics 15 (2008), #R83 6 The order is implied by x being Type I. Applying φ and using part (a) of Lemma 3.2 we have, using an obvious notation, aφ(X 1 )ac ≺ aφ(X 2 )ab ≺ abφ(X 3 )ab ≺ abφ(X 4 )a ≺ acφ(X 5 )a. The full set of conjugates of φ(x) also includes baφ(X 2 )a, babφ(X 3 )a and caφ(x)a. Since each word in φ(X 2 ) begins with a we have baφ(X 2 )a ≺ babφ(X 3 )a ≺ caφ(x)a. By in- specting the final letters of each set of conjugates we see that φ(x) is Type I. A similar argument applies if the 2-factors belong to any of the other sets in Lemma 3.1. Now suppose that y = φ(x) is Type I. Let the lexicographically ordered conjugates of x be x 1 ≺ x 2 ≺ · · · ≺ x n . Then by part (a) of Lemma 3.2 we have φ(x 1 ) ≺ φ(x 2 ) ≺ · · · ≺ φ(x n ) and by (b) each of these is a conjugate of y. Then (3.3) tells us that L(φ(x 1 ))  L(φ(x 2 ))  · · ·  L(φ(x n )). However, for any word u, L(φ(u)) = L(u) so L(x 1 )  L(x 2 )  · · ·  L(x n ). This implies, by (3.3), that x is Type I. We now introduce the mapping ψ. Let x be a word of length n and let i ∈ [1, n]. (a) Suppose x[i] = a. If i < n and x[i + 1] = a or if i = n and x[1] = a then ψ  (x[i]) = ab; otherwise ψ  (x[i]) = a. (b) Suppose x[i] = b. If i < n and x[i + 1] = b or if i = n and x[1] = b then ψ  (x[i]) = bb; otherwise ψ  (x[i]) = b. (c) Suppose x[i] = c. If i < n and x[i + 1] = c or if i = n and x[1] = c then ψ  (x[i]) = cb; otherwise ψ  (x[i]) = c. Then ψ(x) is the concatenation ψ  (x[1])ψ  (x[2]) . . . ψ  (x[n]). A more intuitive explanation of this is to say that we form ψ(x) from x by inserting a b in the middle of each factor aa, cc, ba and bc and by regarding L(x)F (x) as a factor. For example, ψ(abbacaabac) = abbbacababbac ψ(aabaca) = ababbacab. We will show that x is Type I if and only if ψ(x) is Type I. This will require two lemmas. the electronic journal of combinatorics 15 (2008), #R83 7 Lemma 3.5. If x is a conjugate of y then ψ(x) is a conjugate of ψ(y). Proof. This is easily checked. Note that {ψ(y) : y ∈ C(x)} includes all conjugates of ψ(x) except those with prefix ba or bc. Lemma 3.6. If x and y have the same length, are Type I and x ≺ y then ψ(x) ≺ ψ(y). Proof. If x and y have different first letters then it is easy to see that the statement holds. We therefore assume they have a non-empty common prefix. Let x and y have prefixes uαβ and uαγ respectively where α, β and γ are letters with β ≺ γ. Suppose that α = a. We note that if a word z has prefix uaa, uab or uac then ψ(z) has, respectively, prefix vaba, vabb or vac for some word v. Since vaba ≺ vabb ≺ vac we see that if α = a then ψ(x) ≺ ψ(y). A similar analysis shows this relation also holds when α = b or α = c, and and the statement of the lemma follows. Lemma 3.7. The word x is Type I if and only if ψ(x) is Type I. Proof. Let x be a Type I word with 2-factors from the set {aa, ab, ac, ba, ca}. The conju- gates of x make up sets aX 1 c ≺ aX 2 b ≺ aX 3 a ≺ bX 4 a ≺ cX 5 a. Applying ψ to each of these sets and using Lemma 3.6 gives sets aY 1 c ≺ aY 2 bb ≺ aY 3 ab ≺ bbY 4 a ≺ cY 5 a, (3.4) where ψ(aX 1 c) = aY 1 c et cetera. By Lemma 3.5 these are all conjugates of ψ(x). To make up the full set of conjugates we include baY 2 b and baY 3 a. By (3.4) we have baY 2 b ≺ baY 3 a, so that aY 1 c ≺ aY 2 bb ≺ aY 3 ab ≺ baY 2 b ≺ baY 3 a ≺ bbY 4 a ≺ cY 5 a from which it follows that ψ(x) is Type I. If instead the set of 2-factors of x is a subset of {ab, ac, ba, bb, ca} then its conjugates make up sets aX 1 c ≺ aX 2 b ≺ bX 3 b ≺ bX 4 a ≺ cX 5 a. Applying ψ to these gives sets aY 1 c ≺ aY 2 bb ≺ bbY 3 b ≺ bbY 4 a ≺ cY 5 a (3.5) and set of conjugates baY 2 b which slots in lexicographically between the second and third terms. Again ψ(x) is Type I. Similar analyses apply when the set of 2-factors is one of the others in Lemma 3.1. So far we have shown that if x is Type I then so is ψ(x). We now show the converse. Suppose that ψ(x) is Type I. The definition of ψ means that ψ(x) cannot contain aa or cc as 2-factors so its set of 2-factors comes from the set {ab, ac, ba, bb, ca} or the set the electronic journal of combinatorics 15 (2008), #R83 8 {ac, bb, bc, ca, cb}. Suppose the 2-factors comes from the first of these. Then x cannot contain the factor cb as this would mean ψ(x) also contains this factor which we have denied. Similarly it cannot contain bc. Neither can it contain cc as then ψ(x) would contain bc. Let the set of conjugates of x be the union of sets aX 1 c, aX 2 b, aX 3 a, bX 4 b, bX 5 a, cX 6 a. Under ψ these give rise to the following sets of conjugates of ψ(x): aY 1 c, aY 2 bb, aY 3 ab, bbY 4 b, bbY 5 a, cY 6 a, together with baY 2 b and baY 3 a. The fact that ψ(x) is Type I imposes certain constraints on these sets. We must have aY 1 c ≺ aY 2 bb and hence by Lemma 2.4 aX 1 c ≺ aX 2 b and thus X 1 ≺ X 2 . We must also have baY 2 b ≺ baY 3 a which implies X 2 ≺ X 3 . Combining these observations gives X 1 ≺ X 2 ≺ X 3 . (3.6) We also need bbY 4 b ≺ bbY 5 a which implies X 4 ≺ X 5 . (3.7) At least one of the sets baY 3 a and bbY 4 b must be empty, otherwise we get a contradiction with (3.3). This means that either X 3 or X 4 is empty. The ordered set of conjugates of x is therefore aX 1 c ≺ aX 2 b ≺ bX 4 b ≺ bX 5 a ≺ cX 6 a or aX 1 c ≺ aX 2 b ≺ aX 3 a ≺ bX 5 a ≺ cX 6 a. By inspecting the last letters we see that x is Type I, as required. Similar arguments show that x is Type I when y has 2-factors from {ac, bb, bc, ca, cb}. Lemma 3.8. Every Type I word which contains each of a, b and c has a conjugate in the range of φ, θ ◦ φ or ψ. Proof. Let y be a Type I word. We know from Lemma 3.1 that its set of 2-factors comes from one of the sets {aa, ab, ac, ba, ca}, {ab, ac, ba, bb, ca}, {ac, bb, bc, ca, cb} and {ac, bc, ca, cb, cc}. Suppose it comes from the first. If y does not begin with a then replace it with one of its conjugates that does. Then each occurrence of the letter b is preceded by a: we can replace such a pair with φ(a). Similarly each occurrence of c is preceded by a and the pair ac can be replaced with φ(c). The remaining occurrences of a can be replaced with φ(a) and we see that y is in the range of φ. Suppose the factors of y come from the fourth set. If y does not begin with c then replace it with one of its conjugates that does. Then the factors of θ(y) come from the the electronic journal of combinatorics 15 (2008), #R83 9 first set, so by the previous case there exists x such that φ(x) = θ(y). But θ is its own inverse so θ ◦ φ(x) = y and y is in the range of θ ◦ φ. Now suppose the 2-factors of y come from the set {ab, ac, ba, bb, ca}. If y begins with ba or bc replace it with a conjugate that doesn’t. Say that a factor y[i j] is a b-run if each of its letters equals b, but neither y[i − 1] nor y[j + 1] equals b. Construct a word x by removing a b from each b-run, except in the case where both a prefix and a suffix of y are b-runs. In this case remove a b from the prefix b-run but not from the suffix b-run. A b-run of length 1 in y will be preceded and followed by a’s and correspond to a pair of a’s in x. It is easy to see that y = ψ(x). If the factors come from the third set a similar argument applies but with c in the role of a. We have not yet shown that the words in T are primitive. The following theorem does this and will be used later to specify the possible values of i, j and k when BW T (x) = c i b j a k . If x is in {a, b, c} ∗ then the Parikh vector for x is the vector p(x) = [|x| a , |x| b , |x| c ]. If p(x) = [α, β, γ] then it is clear that p(θ(x)) = [γ, β, α] (3.8) and p(φ(x)) = [α + β + γ, β, γ]. (3.9) The Parikh vector for ψ(x) is less obvious. Suppose that ψ(x) is Type I and that its set of 2-factors comes from either {ab, ac, ba, bb, ca} or {aa, ab, ac, ba, ca}. We write |x| ab for the number of occurrences of ab in x. If L(x) = a and F (x) = b we regard L(x)F (x) as an occurrence of ab and count it in |x| ab . We define |x| aa et cetera in a similar fashion. Since each occurrence of c in x is preceded and succeeded by a we have |x| ac = |x| ca = |x| c . (3.10) It is clear that |ψ(x)| a = α and |ψ(x)| c = γ. Also from the definition of ψ and (3.10), |ψ(x)| b − |x| b = |x| aa + |x| cc + |x| ba + |x| bc = |x| a + |x| c − |x| ca − |x| ac = |x| a − |x| c . If the 2-factors of ψ(x) come from {ac, bb, bc, ca, cb} or {ac, bc, ca, cb, cc} then a similar equality holds with a and c interchanged. In either case we have p(ψ(x)) = [α, β + |α − γ|, γ]. (3.11) . Theorem 3.9. If x is in T and p(x) = [α, β, γ] then gcd(α, β, γ) = 1 and gcd(α + β, β + γ) = 1. the electronic journal of combinatorics 15 (2008), #R83 10 [...]... larger alphabets which have simple Burrows-Wheeler Transforms and can we characterise the Type II and Type III words mentioned above? For larger alphabets can we find results like Theorem 3.12? For binary alphabets standard words can be extended to Sturmian words which are infinite with complexity c(n) = n + 1 for all n Can our Type I words be used to construct infinite words with complexity 2n + 1 for... they are called 3-standard words Like ours they have some properties in common with standard words For example standard words are connected with the Fine-Wilf Periodicity Lemma [4] which states that if a word has periods p and q and length at least p + q − gcd(p, q) then it has period gcd(p, q) If a word w has periods p and q with gcd(p, q) = 1, length p + q − 2 and does not have period 1 then wab or... since all words with such Parikh vectors are Type I Suppose it holds whenever i + j + k < n and consider [i, j, k] with gcd(i, j, k) = gcd(i + j, j + k) = d (3.12) and i + j + k = n We consider 4 cases Case 1 If i ≥ j+k consider the vector [i−j−k, j, k] By (3.12) gcd(i−j−k, j, k) = gcd(i− k, j + k) = d so, providing at least one of j and k is positive, we can apply the induction hypothesis and conclude... exists a Type I word y with p(y) = [i − j − k, j, k] But then φ(y) is Type I by Lemma 3.4, and by (3.9) p(φ(y)) = [i, j, k] So x = φ(y) satisfies the statement of the theorem If j = k = 0 then the Type I word x = ai is satisfactory Case 2 If j + k ≥ i ≥ k consider [i, j − i + k, k] By (3.12) gcd(i, j − i + k, k) = gcd(j + k, j − i + 2k) = d Providing i = k we can apply the induction hypothesis and conclude... Providing i = k we can apply the induction hypothesis and conclude there exists a Type I word y with p(y) = [i, j − i + k, k] Now ψ(y) is Type I by Lemma 3.7 and by (3.11) p(ψ(y)) = [i, j, k] so that ψ(y) satisfies the requirements of the theorem Now consider the case i = k If i = k and j = 0 the word (ac)i is Type I with Parikh vector [i, j, k] as required If i = k = 0 we use bj If i = k > 0 and j > 0 then... then gcd(i + j, j + k) = i + j > gcd(i, j, k), contradicting (3.12) Cases 3 and 4 If k ≥ i + j we use the vector [k − i − j, j, i] and argue as in Case 1 with θ ◦ φ in the role of φ and if i + j ≥ k ≥ i we use [k, i + j − k, i] and argue as in Case 2 with θ ◦ ψ in the role of ψ This completes the proof by induction Note that the sets in the four cases are not disjoint This reflects the fact that some... Proof (⇒) Suppose x is Type I with p(x) = [i, j, k] If x ∈ T then both greatest common divisors equal 1 by Theorem 3.9 This also applies if x is a conjugate of a word in T Otherwise x is the dth power of such a conjugate and then both greatest common divisors equal d, as required (⇐) We show that that if gcd(i, j, k) = gcd(i + j, j + k) then there exists a Type I word x with p(x) = [i, j, k] This holds... then x = ψ ◦θ(y) By (3.9) and (3.11) it follows that p(y) equals [α − β − γ, β, γ] or [α, β − (α − γ), γ] Suppose p(y) = [α − β − γ, β, γ] By the induction hypothesis the components have gcd = 1 from which gcd(α, β, γ) = 1 Also by the induction hypothesis we have gcd((α − β − γ) + β, β + γ) = 1 ⇒ gcd(α − γ, β + γ) = 1 ⇒ gcd(α + β, β + γ) = 1 Thus the statement holds when when x = φ(y) The other case... result and an explanation of how it is used to invert the BW Transform are given in [2] and [6] Equation (4.1) implies that F (i) = L(n + 1 − i) (4.2) for i = 1, , n But note that we cannot replace “=” with “≡” here For x satisfying (4.1) define ω to be the function satisfying F (i) ≡ L(ω(i)) If x satisfies (4.1) then we evaluate ω as follows Let k be the least integer such that k mj ≥ i, j=1 then k−1... reported at the conference WORDS 2005 in Montreal, and at the Workshop on Fibonacci Words, Turku, September 2006 They found various other interesting symmetries that apply in this case but do not hold with larger alphabets The complexity of a word x is a function c(n) equalling the number of distinct factors of length n appearing x An important property of standard words is that their complexity satisfies . Words with simple Burrows-Wheeler Transforms Jamie Simpson Department of Mathematics and Statistics Curtin University. j) with gcd(i, j) = 1 and i + j < k. Consider a pair (i  , j  ) with gcd(i  , j  ) = 1 and i  + j  = k. Suppose i  > j  ≥ 1. Then gcd(j  , i  − j  ) = 1 so S contains y, say, with. first. If y does not begin with a then replace it with one of its conjugates that does. Then each occurrence of the letter b is preceded by a: we can replace such a pair with φ(a). Similarly each

Ngày đăng: 07/08/2014, 21:20

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan