DESIGN AND ANALYSIS OF DISTRIBUTED ALGORITHMS phần 6 potx

60 338 0
DESIGN AND ANALYSIS OF DISTRIBUTED ALGORITHMS phần 6 potx

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

288 DISTRIBUTED SET OPERATIONS example, in most practical applications, the number of sites is 10–100, while the amount of data at each site is ≥ 10 6 . What we need is a different strategy to deal with the general case. Let us think of the set D containing the N elements as a search space in which we need to find d ∗ = D[K], unknown to us, and the only thing we know about d ∗ is its rank Rank[d ∗ , D] = K. An effective way to handle the problem of discovering d ∗ is to reduce as much as possible the search space, eliminating from consideration as many items as possible, until we find d ∗ or the search space is small enough (e.g., O(n)) for us to apply the techniques discussed in the previous section. Suppose that we (somehow) know the rank Rank[d,D] of a data item d in D. If Rank[d,D] = K then d is the element we were looking for. If Rank[d,D] <K then d is too small to be d ∗ , and so are all the items smaller than d. Similarly, if Rank[d,D] >K, then d is too large to be d ∗ , and so are all the items larger than d. This fact can be employed to design a simple and, as we will see, rather efficient selection strategy: Strategy RankSelect: 1. Among the data items under consideration, (initially, they all are) choose one, say d. 2. Determine its overall rank k  = Rank[d,D]. 3. If k  = K then d = d ∗ and we are done. Else, if k  <K, (respectively, k  >K) remove from consideration d all the data items smaller (respectively, larger) than d and restart the process. Thus, according to this strategy, the selection process consists of a sequence of iterations, each reducing the search space, performed until d ∗ is found. Notice that we could stop the process as soon as just few data items (e.g., O(n)) are left for consideration, and then apply protocol Rank. Most of the operations performed by this strategy are rather simple to implement. We can assume that a spanning tree of the network is available and will be used for all communication, and an entity is elected to coordinate the overall execution (becoming the root of the tree for this protocol). Any entity can act as a coordinator and any spanning-tree T of the network will do. However, for efficiency reasons, it is better to choose as a coordinator the communication center s of the network, and choose as a tree T the shortest path spanning-tree PT(s)ofs. Let d(i) be the item selected at the beginning of iteration i. Once d(i) is chosen, the determination of its rank is a trivial broadcast (to let every entity know d(i)) started by the root s and a convergecast (to collect the partial rank information) ending at the root s. Recall Exercise 2.9.43. Once d(i) hasdetermined the rankof d(i), s willnotify allother entities ofthe result: d(i) = d ∗ , d(i) <d ∗ ,ord(i) >d ∗ ; each entity will then act accordingly (terminating or removing some elements from consideration). DISTRIBUTED SELECTION 289 The only operation still to be discussed is how we choose d(i). The choice of d(i) is quite important because it affects the number of iterations and thus the overall complexity of the resulting protocol. Let us examine some of the possible choices and their impact. Random Choice We can choose d(i) uniformly at random; that is, in such a way that each item of the search space has the same probability of being chosen. How can s choose d(i) uniformly at random ? In Section 2.6.7 and Exercise 2.9.52 we have discussed how to select, in a tree, uniformly at random an item from the initial distributed set. Clearly that protocol can be used to choose d(i) in the first iteration of our algorithm. However, we cannot immediately use it in the subsequent iterations. In fact, after an iteration, some items are removed from consideration; that is, the search space is reduced. This means that, for the next iteration, we must ensure we select an item that is still in new search space. Fortunately, this can be achieved with simple readjustments to the protocol of Exercise 2.9.52, achieving the same cost in each iteration (Exercise 5.6.10). That is, each iteration costs at most 2(n −1) +d T (s, x) messages and 2r(s) + d T (s, x) ideal time units for the random selection plus an additional 2(n −1) messages and 2r(s) time units to determine the rank of the selected element. Let us call the resulting protocol RandomSelect. To determine its global cost, we need to determine the number of iterations. In the worst case, in iteration i we remove from the search space only d(i); so the number of iterations can be as bad as N, for a worst case cost of M[RandomSelect] ≤ (4(n −1) +r(s)) N, (5.4) T [RandomSelect] ≤ 5 r(s) N. (5.5) However, on the average, the power of making a random choice is evident; in fact (Exercise 5.6.11): Lemma 5.2.1 The expected number of iterations performed by Protocol Random- Select until termination is at most 1.387 log N +O(1). This means that, on the average M average [RandomSelect] = O(n log N ), (5.6) T average [RandomSelect] = O(n log N ). (5.7) As mentioned earlier, we could stop the strategy RankSelect, and thus terminate protocol RandomSelect, as soon as O(n) data items are left for consideration, and then apply protocol Rank. See Exercise 5.6.12. 290 DISTRIBUTED SET OPERATIONS Random Choice with Reduction We can improve the average message complexity by exploiting the properties discussed in Section 5.2.1. Let ⌬(i) = min{K(i),N(i) −K(i) +1}. In fact, by Property 5.2.2, if at the beginning of iteration i, an entity has more than K(i) elements under consideration, it needs to consider only the K(i) smallest and immediately remove from consideration the others; similarly, if it has more than N(i) −K(i) +1 items, it needs to consider only the N (i) −K(i) +1 largest and immediately remove from consideration the others. If every entity does this, the search space can be further reduced even before the random selection process takes place. In fact, the net effect of the application of this technique is that each entity will have at most ⌬(i) = min{K(i),N(i) −K(i) +1} items still under consideration during iteration i. The root s can then perform random selection in this reduced space of size n(i) ≤ N(i). Notice that d ∗ will have a new rank k(i) ≤ K(i) in the new search space. Specifically, our strategy will be to include, in the broadcast started by the root s at the beginning of iteration i, the values N(i) and K(i). Each entity, upon receiving this information, will locally perform the reduction (if any) of the local elements and then include in the convergecast the information about the size of the new search space. At the end of the convergecast, s knows both n(i) and k(i) as well as all the information necessary to perform the random selection in the reduced search space. In other words, the total number of messages per iteration will be exactly the same as that of Protocol RandomSelect. In the worst case this change does not make any difference. In fact, for the resulting protocol RandomFlipSelect, the number of iterations can still be as bad as N (Exercise 5.6.13), for a worst case cost of M[RandomFlipSelect] ≤ (2(n −1) +r(s)) N, (5.8) T [RandomFlipSelect] ≤ 3 r(s) N. (5.9) The change does however make a difference on the average cost. In fact, (Exercise 5.6.14) Lemma 5.2.2 The expected number of iterations performed by Protocol Random- FlipSelect until termination is less than ln(⌬) +ln(n) +O(1) where ln() denotes the natural logarithm (recall that ln() = .693 log()). This means that, on the average M average [RandomFlipSelect] = O(n (ln(⌬) +ln(n))) (5.10) T average [RandomFlipSelect] = O(n (ln(⌬) +ln(n))). (5.11) DISTRIBUTED SELECTION 291 Also in thiscase, we could stop the strategy RankSelect, and thusterminate protocol RandomSelect, as soon as only O(n) data items are left for consideration, and then apply protocol Rank. See Exercise 5.6.15. Selection in a Random Distribution So far, we have not made any assumption on the distribution of the data items among the entities. If we know something about how the data are distributed, we can clearly exploit this knowledge to design a more efficient protocol. In this section we consider a very simple and quite reasonable assumption about how the data are distributed. Consider the set D; it is distributed among the entities x 1 , ,x n ; let n[x j ] =|D x j | be the number of items stored at x j . The assumption we will make is that all the distributions of D that end up with n[x j ] items at x j ,1≤ j ≤ n, are equally likely. In this case we can refine the selection of d(i). Let z(i) be the entity where the number of elements still under consideration in iteration i is the largest; that is, ∀xm(i) =|D z(i) (i)|≥|D x (i)|. (If there is more than one entity with the same num- ber of items, choose an arbitrary one.) In our protocol, which we shall call Random- RandomSelect, we will choose d(i)tobetheh(i)th smallest item in the set D z(i) (i), where h(i) =  K(i)  m(i)+1 N+1  − 1 2  . We will use this choice until there are less than n items under consideration. At this point, in Protocol RandomRandomSelect we will use Protocol Random- FlipSelect to finish the job and determine d ∗ . Notice that also in this protocol, each iteration can easily be implemented (Exercise 5.6.16) with at most 4(n −1) +r(s) messages and 5r(s) ideal time units. With the choice of d(i) we have made, the average number of iterations, until there are less than n items left under consideration, is indeed small. In fact (Exercise 5.6.17), Lemma 5.2.3 Let the randomness assumption hold. Then the expected number of iterations performed by Protocol RandomRandomSelect until there are less than n items under consideration is at most 4 3 log log ⌬ + 1. This means that, on the average M average [RandomRandomSelect] = O(n(log log ⌬ +log n)) and (5.12) T average [RandomRandomSelect] = O(n(log log ⌬ +log n)). (5.13) Filtering The drawback of all previous protocols rests on their worst case costs: O(nN) messages and O(r(s)N) time; notice that this cost is more than that of input collection, that is, of mailing all the items to s. It can be shown that the probability of the occurrence of the worst case is so small that it can be neglected. However, there 292 DISTRIBUTED SET OPERATIONS might be systems where such a cost is not affordable under any circumstances. For these systems, it is necessary to have a selection protocol that, even if less efficient on the average, can guarantee a reasonable cost even in the worst case. The design of such a system is fortunately not so difficult; in fact it can be achieved with the strategy RankSelect with the appropriate choice of d(i). As before, letD i x denote the setof elements still under considerationat x in iteration i and n i x =|D i x | denote its size. Consider the (lower) median d i x = D i x [ n i x /2 ]of D i x , and let M(i) ={d i x } be the set of these medians. With each element in M(i) associate a weight; the weight associated with d i x is just the size of the corresponding set n i x . Filter: Choose d(i)tobetheweighted (lower) median of M(i). With this choice, the number of iterations is rather small (Exercise 5.6.18): Lemma 5.2.4 The number of iterations performed by Protocol Filter until there are no more than n elements left under consideration is at most 2.41 log(N/n). Once there are at most n elements left after consideration, the problem can be solved using one of the known techniques, for example, Rank, for small sets. However, each iteration requires a complex operation; in fact we need to find the median of the set M(i) in iteration i. As the set is small (it contains at most n elements), this can be done using, for example, Protocol Rank. In the worst case, it will require O(n 2 ) messages in each iteration. This means that, in the worst case, M[Filter] = O  n 2 log N n  (5.14) T [Filter] = O  n log N n  . (5.15) 5.2.5 Reducing the Worst Case: ReduceSelect The worst case we have obtained by using the Filter choice in strategy RankSelect is reasonable but it can be reduced using a different strategy. This strategy, and the resulting protocol that we shall call ReduceSelect, is obtained mainly by combining and integrating all the techniques we have developed so far for reducing the search space with new, original ones. Reduction Tools Let us summarize first of all the main basic tool we have used so far. Reduction Tool 1: Local Contraction If entity x has more than ⌬ items under consideration, it can immediately discard any item greater than the local Kth smallest element and any item smaller than the local (N −K +1)th largest element. DISTRIBUTED SELECTION 293 This tool is based on Property 5.2.2. The requirement for the application of this tool is that each site must know K and N. The net effect of the application of this tool is that, afterwards, each site has at most ⌬ items under considerations that are stored locally. Recall that we have used this reduction tool already when dealing with the two sites case, as well as in Protocol RandomFlipSelect. A different type of reduction is offered by the following tool. Reduction Tool 2: Sites Reduction If the number of entities n is greater than K (respectively, N − K + 1), then n − N entities (respectively n −N +K −1) and all their data items can be removed from consideration. This can be achieved as follows. 1. Consider the set D min ={D x [1]} (respectively D max ={D x [|D x |]})ofthe smallest (respectively, the largest) item at each entity. 2. Find the Kth smallest (respectively, (N − K +1)th largest) element, call it w,of this set. NOTE: This set has n elements; hence this operation can be performed using protocol Rank. 3. If D x [1] > w (respectively D x [|D x |] < w) then the entire set D x can be re- moved from consideration. This reduction technique immediately reduces the number of sets involved in the problem to at most ⌬. For example, consider the case of searching for the 7th largest item when the N data items of D are distributed among n = 10 entities. Consider now the largest element stored at each entity (they form a set of 10 elements), and find the 7th largest of them. The 8th largest element of this set cannot possibly be the 7th largest item of the entire distributed set D; as it is the largest item stored at the entity from which it originated, none of the other items stored at that entity can be the 7th largest element either; so we can remove from consideration the entire set stored at that entity. Similarly we can remove also the sets where the 9th and the 10th largest came from. These two tools can obviously be used one after the other. The combined use of these two tools reduces the problem of selection in a search space of size N distributed among n sites tothat of selectionamong Min{n, ⌬}sites,each with atmost ⌬ elements. This means that, after the execution of these two tools, the new search space contains at most ⌬ 2 data items. Notice that once the tools have been applied, if the size of the search space and/or the rank of f ∗ in that space have changed, it is possible that the two tools can be successfully applied again. For example, consider the case depicted in Table 5.1, where N = 10, 032 is dis- tributedamong n = 5 entities,x 1 , x 5 , andwhere we arelooking for theKth smallest element in this set, where K = 4096. First observe that, when we apply the two Reduction Tools, only the first one (Contraction) will be successful. The effect will be to remove from consideration many elements from x 1 , all larger than f ∗ . In other words, we have significantly reduced the search space without changing the rank of f ∗ in the search space. If we apply again the two Reduction Tools to the new 294 DISTRIBUTED SET OPERATIONS TABLE 5.1: Repeated use of the Reduction Tools N : size of K : rank of f ∗ x 1 x 2 x 3 x 4 x 5 search space in search space 10, 032 4, 096 10, 000 20 5 5 2 4, 126 4, 096 4, 096 20 5 5 2 65 33 33 20 5 5 2 configuration, again only the first one (Contraction) will be successful; however the second will further drastically reduce the size of the search space (the variable N) from 4126 to 65 and the rank of f ∗ in the new search space (the variable K) from 4096 to 33. This fact means that we can iterate Local Contraction until there will no longer be any change in the search space and in the rank of f ∗ in the search space. This will occur when at each site x i the number of items still under consideration n  i is not greater than ⌬  = min{K  ,N  −K  +1}, where N  is the size of the search space and K  the rank of f ∗ in the search space. We will then use the Sites Reduction tool. The reduction protocol REDUCE based on this repeated use of the two Reduction Tools is shown in Figure 5.5. Lemma 5.2.5 After the execution of Protocol REDUCE, the number of items left under consideration is at most ⌬ min{n, ⌬}. The single execution of Sites Reduction requires selection in a small set discussed in Section 5.2.2. Each execution of Local Contraction required by Protocol REDUCE requires a broadcast and a convergecast, and costs 2(n − 1) messages and 2r(s) time. To de- termine the total cost we need to find out the number of times Local Contraction is executed. Interestingly, this will occur a constant number of times, three times to be precise (Exercise 5.6.19). REDUCE begin N  = N; K  = K; ⌬  = ⌬; n  i = n i , 1 ≤ i ≤ n; while ∃x i such that n  i > ⌬  do perform Local Contraction; * update the values of N  ,K  , ⌬  ,n  i (1 ≤ i ≤ n) endwhile if n> ⌬  then perform Sites Reduction; endif end FIGURE 5.5: Protocol REDUCE. DISTRIBUTED SELECTION 295 Cutting Tools The new tool we are going to develop is to be used whenever the number n of sets is at most ⌬ and each entity has at most ⌬ items; this is, for example, the result of applying Tools 1 and 2 described before. Thus, the search space contains at most ⌬ 2 items. For simplicity, and without loss of generality, let K = ⌬ (the case N − K +1 = ⌬ is analogous). To aid in the design, we can visualize the search space as an array D of size n ×⌬, where the rows correspond to the sets of items, each set sorted in an increasing order, and the columns specify the rank of that element in the set. So, for example, d i,j is the jth smallest item in the set stored at entity x i . Notice that there is no relationship among the elements of the same column; in other words, D is a matrix with sorted rows but unsorted columns. Each column corresponds to a set of n elements distributed among the n entities. If an element is removed from consideration, it will be represented by +∞ in the corresponding entry in the array. Consider the set C(2), that is, all the second-smallest items in each site. Focus on the kth smallest element m(2) of this set, where k =K/2. By definition, m(2) has exactly k −1 elements smaller than itself in C(2); each of them, as well as m(2), has another item smaller than itself in its own row (this is because they are second-smallest in their own set). This means that, as far as we know, m(2) has at least (k −1) +k = 2k −1 ≥ K − 1 items smaller than itself in the global set D; this implies that any item greater than m(2) cannot be the Kth smallest item we are looking for. In other words, if we find m(2), then we can remove from consideration any item larger than m(2). Similarly, we can consider the set C(2 i ), where 2 i ≤ K, composed of the 2 i th smallest items in each set. Focus again on the kth smallest element m(2 i )ofC(2 i ), where k =K/2 i . By definition, m(2 i ) has exactly k −1 elements smaller than itself in C(2); each of them, as well as m(2 i ), has another 2 i −1 items smaller than itself in its own row (this is because they are the 2 i th smallest in their own set). This means that m(2 i ) has at least (k −1) +k (2 i −1) = k 2 i −1 ≥ K 2 i 2 i −1 = K −1 items smaller than itself in the global set D; this implies that any item greater than m(2 i ) cannot be the Kth smallest item we are looking for. In other words, if we find m(2 i ), then we can remove from consideration any item larger than m(2 i ). Thus, we have a generic Reduction Tool using columns whose index is a power of two. 296 DISTRIBUTED SET OPERATIONS CUT begin k =K/2; l := 2; while k ≥ log K and search space is not small do if in C(2 l ) there are ≥ k items still under consideration then * use the CuttingT ool :  find the kth smallest element m(l) of C(l); remove from consideration all the elements greater than m(l). endif k := k/2; l := 2l; endwhile end FIGURE 5.6: Protocol CUT. Cutting Tool Let l = 2 i ≤ K and k =K/l. Find the kth smallest element m(l) of C(l), and remove from consideration all the elements greater than m(l). The Cutting Tool can be implemented using any protocol for selection in small sets (recall that each C(l) has at most n elements), such as Rank; a single broadcast will notify all entities of the outcome and allow each to reduce its own set if needed. On the basis of this tool we can construct a reduction protocol that sequentially uses the Cutting Tool first using C(2), then C(4), then C(8), and so on. Clearly, if at any time the search space becomes small (i.e., O(n)), we terminate. This reduction algorithm, that we will call CUT, is shown in Figure 5.6. Let us examine the reduction power of Procedure CUT. After executing the Cutting Tool on C(2), only one column, C(1), might remain unchanged; all others, including C(2), will have at least half of the entries +∞. In general, after the execution of Cutting Tool on C(l = 2 i ), only the l − 1 columns C(1),C(2), ,C(l −1) might remain unchanged; all others, including C(l) will have at least n − K/l of the entries +∞ (Exercise 5.6.20). This can be used to show (Exercise 5.6.21) that Lemma 5.2.6 After the execution of Protocol CUT, the number of items left under consideration is at most min{n, ⌬}log⌬. Each of the log ⌬ execution of the Cutting Tool performed by Protocol CUT requires a selection in a set of size at most min{n, ⌬}. This can be performed using any of the protocols for selection in a small set, for example, Protocol Rank.Inthe worst case, it will require O(n 2 ) messages in each iteration. This means that, in the worst case, M[CUT] = O(n 2 log ⌬), (5.16) T [CUT] = O(n log ⌬). (5.17) SORTING A DISTRIBUTED SET 297 ReduceSelect begin REDUCE; if search space greater than O(⌬  ) then CUT if search space greater than O(n) then Filter Rank; end FIGURE 5.7: Protocol ReduceSelect. Putting It All Together We have examined a set of Reduction Tools. Summa- rizing, Protocol REDUCE, composed of the application of Reduction Tools 1 and 2, reduces the search space from N to at most ⌬ 2 . Protocol CUT, composed of a sequence of applications of the Cutting Tool, reduces the search space from ⌬ 2 to at most min{n, ⌬}log⌬. Starting from these reductions, to form a full selection protocol, we will first reduce the search space from min{n, ⌬}log ⌬ to O(n) (e.g. using Protocol Filter) and then use a protocol for small sets (e.g. Rank) to determine the sought item. In other words, resulting algorithm, Protocol ReduceSelect, will be as shown in Figure 5.7, where ⌬  is the new value of ⌬ after the execution of REDUCE. Let us examine the cost of Protocol ReduceSelect. Protocol REDUCE,aswehave seen, requiresat most 3 iterationsof Local Contractions, eachusing 2(n − 1) messages and 2r(s) time, and one execution of Sites Reduction that consists in an execution of Rank. Protocol CUT is used with N ≤ min{n, ⌬}⌬ and, as we have seen, thus, requires at most log ⌬ iterations of the Cutting Tools, each consisting in an execution of Rank. Protocol Filter, as we have seen, is used with N ≤ min{n, ⌬}log ⌬ and, as we have seen, thus, requires at most log log ⌬ iterations, each costing 2(n −1) messages and 2r(s) time plus an execution of Rank. Thus, in total, we have M[ReduceSelect] = (log ⌬ +4.5 log log ⌬ + 2)M[Rank] +(6 +4.5 log log ⌬)(n − 1), (5.18) T [ReduceSelect] = (log ⌬ +4.5 log log ⌬ + 2)T [Rank] +(6 +4.5 log log ⌬)2r(s). (5.19) 5.3 SORTING A DISTRIBUTED SET 5.3.1 Distributed Sorting Sorting is perhaps the most well known and investigated algorithmic problem. In distributed computing systems, the setting where this problem takes place as well as its nature is very different from the serial as well as parallel ones. In particular, in our setting, sorting must take place in networks of computing entities where no central controller is present and no common clock is available. Not surprisingly, most [...]...298 DISTRIBUTED SET OPERATIONS {11, 22, 30, 34, 45} {68 , 69 , 71, 75} {68 , 69 , 71, 75, 82} {11, 22, 30, 34} 3 2 3 2 1 4 1 4 { 56, 57} {82, 85, 87} {85, 87} (a) {45, 56, 57} (b) FIGURE 5.8: Distribution sorted according to (a) π = 3124 and (b) π = 2431 of the best serial and parallel sorting algorithms do very poorly when applied to a distributed environment In this section... case each entity need to know both n and N The results of the execution of OddEven-LineSort with an invariant-sized in the sorted line of Figure 5.9 is shown in Table 5.2 The correctness of the protocol, although intuitive, is not immediate (Exercises 5 .6. 23, 5 .6. 24, 5 .6. 25, and 5 .6. 26) In particular, the so-called “0 − 1 principle” (employed to prove the correctness of the similar parallel algorithm)... distributions (Exercise 5 .6. 27) Property 5.3.1 OddEven-LineSort sorts an equidistributed distribution in n − 1 iterations if the required sorting is (a) invariant-sized, or (b) equidistributed, or (c) compacted TABLE 5.2: Execution of OddEven-LineSort on the System of Figure 5.9 iteration 1 2 3 4 5 6 x1 {1,9,13,18} → {1,3 ,6, 8} {1,3 ,6, 8} → {1,2,3 ,6} {1,2,3 ,6} → {1,2,3,5} x2 x3 ← {3 ,6, 8,20} {9,13,18,20} →... 5 .6. 55) This gives raise to our strategy for evaluating an arbitrary query: Strategy Global 1 x decomposes Q into Q(1), Q(2), , Q(k) satisfying Property 5.4 .6, and informs each yj of Q(j ); 2 yj locally and directly evaluates Q(j ) and sends the result to x; and 3 x computes the union of all the received items To understand the advantages of this strategy, let us examine again the implications of. .. {5,7,8,9} {6, 7,8,9} → {2,7,12} → ← {2,7,10} {13,18,20} → ← {5,11,12} {10,11,12} → ← {10,11,12} x4 ← {10,15, 16} {12,15, 16} → ← {5,11,12} {13,18,20} → ← {13,14,15} {13,14,15} → x5 {5,11,14} ← {5,11,14} {14,15, 16} ← {14,15, 16} { 16, 18,20} ← { 16, 18,20} 302 DISTRIBUTED SET OPERATIONS Interestingly, the number of iterations can actually be much more than n − 1 if the initial distribution is not equidistributed... contrast, protocol OddEven-MergeSort has always worst-case cost of O(N log n), and it might even not sort The determination of the cost of protocol SelectSort in specific topologies for different storage requirements is the subject of Exercises 5 .6. 41–5 .6. 48 5.3 .6 Unrestricted Sorting In the previous section we have examined the problem of sorting a distributed set according to a given permutation This describes... mainly with distributed data, such as distributed database systems, distributed file systems, distributed objects systems, and so forth the queries are much more complex, and are typically expressed in terms of primitive operations In particular, in relational databases, a query will be an expression of join, project, and select operations These operations are actually operations on sets and can be re-expressed... , and all the largest data items are stored at x1 We are now ready to define the problem of sorting a distributed set SORTING A DISTRIBUTED SET 299 Sorting Problem Given a distribution D1 , , Dn of D and a permutation π , the distributed sorting problem is the one of moving data items among the entities so that, upon termination, 1 D1 , , Dn is a distribution of D, where Di is the final set of. .. Frederickson in special networks (exercises 5 .6. 1–5 .6. 3) [4], and by Shmuel Zaks [23] Always in a small set, the cost using bounded messages and, thus, the bit complexity has been studied by Mike Loui [8] in ring networks; by Ornan Gerstel, Yishay Mansour, and Shmuel Zaks in a star [5]; and in trees by Ornan Gerstel and Shmuel Zaks [6] , and by Alberto Negro, Nicola Santoro, and Jorge Urrutia [12] Selection among... costs of this protocol need to “unravel” the recursion SORTING A DISTRIBUTED SET 305 x1 x2 x3 x4 x5 x6 x7 x8 FIGURE 5.11: Diagram of operations of OddEven-MergeSort with n = 8 When we do this, we realize that the protocol is a sequence of 1 + log n iterations (Exercise 5 .6. 32) In each iteration (except the last) every entity is paired with another entity, and each pair will perform a simple merge of . ←{5,11,14} 3 {1,3 ,6, 8}→ ←{2,7,9,10}{13,18,20}→ ←{5,11,12}{14,15, 16} 4 {1,2,3 ,6} {7,8,9,10}→ ←{5,11,12}{13,18,20}→ ←{14,15, 16} 5 {1,2,3 ,6} → ←{5,7,8,9}{10,11,12}→ ←{13,14,15}{ 16, 18,20} 6 {1,2,3,5} {6, 7,8,9}→. sorts, regardless of the storage requirement, if the initial set is equidistributed (Exercise 5 .6. 33). 3 06 DISTRIBUTED SET OPERATIONS 1 x x 2 x 3 x 4 {4, 8} {1, 4} {6} {3} {7} {6} {6} {1, 3} {3,. intuitive, is not immediate (Exercises 5 .6. 23, 5 .6. 24, 5 .6. 25, and 5 .6. 26) . In particular, the so-called “0 −1 principle” (em- ployed to prove the correctness of the similar parallel algorithm) can

Ngày đăng: 12/08/2014, 16:21

Tài liệu cùng người dùng

Tài liệu liên quan