Dynamic scheduling techniques for adaptive applications on real time embedded systems

Dynamic Scheduling Techniques for Adaptive Applications on Real-Time Embedded Systems Yu Heng (B.Eng, National University of Singapore, Singapore, 2006 ) A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING NATIONAL UNIVERSITY OF SINGAPORE 2011 Acknowledgements This thesis would not have the opportunity to progress and present itself, without the enduring guidance, cooperation, accompany, and encourage from my supervisors, colleagues, and my family. I wish I could express my gratitude to all of them. First of all, I would like to sincerely thank my supervisors, Prof. Ha Yajun and Prof. Bharadwaj Veeravalli, for all their devoted supports during my doctoral studies. I am grateful that they opened my door to the scientific exploration, that they provided timely and valuable advices whenever there are obstacles ahead, and that they enlightened me with their insights of life the way a role model does. I will not forget the time that they arrived before sunrise to help me with the paper revise before its submission. I could be no luckier to have both of my supervisors as they are. I would like to acknowledge the help from Dr. Zhu Guolei and Dr. Akash Kumar for the discussions with key concepts in the NoC related work. I would have no more gratitude to Dr. Wei Ying for introducing me to the Latex world and encouragement during the hard time. I appreciate the support from the smiling ladies in the Electronic Design Labs on my GA duties, as well as the mutual assistance from Zhang Wenjuan, Chen i ACKNOWLEDGEMENTS Xiaolei, and Ganesh Iyer. I am lucky to spend my best time in the VLSI Laboratory with all my fellow mates, for the fun and memory. I have no way to express the love to my parents. They are where warmth and encouragement originate from. To them, this thesis is dedicated. ii Abstract The ability to trade off Quality-of-Service (QoS) with resources on modern embedded platforms makes adaptive applications an interesting value proposition. Applying dynamic scheduling for such applications will bring further flexibility for meeting the overall system’s performance goals. However, the state-of-the-art dynamic scheduling strategies, in general, either are incapable of QoS optimizations, or ignore the increasing platform-introduced impacts that may substantially deteriorate the scheduling performance. This thesis focuses on the design of dynamic scheduling algorithms for adaptive applications, with the goal of maximizing QoS based on the runtime slack reclamation and re-distribution. For the QoS modeling, both the Imprecise-Computation (IC) model [1] and a proposed generic model, are validated and studied. The algorithms are built upon increasingly complicated assumptions, namely scheduling (1) IC-modeled tasks on uni-processor systems, (2) dependent IC-modeled tasks on homogeneous multiprocessors, and (3) a generic QoS model on heterogeneous multiprocessors considering the leakage energy and QoS deterioration due to interprocessor communications. First, a dynamic algorithm for scheduling IC tasks mapped on a single processor is presented. We prove that the QoS maximization can be achieved by iii SUMMARY employing the intra-task Dynamic Voltage Scaling (DVS). The derived theorem leads to the convenient selection of a slack receiver, by comparing the QoS gradients of the IC-modeled receivers. A Gradient Curve Shifting (GCS) approach is proposed to make the theorem applicable to both linear and concave QoS models. Second, we extend to scheduling IC tasks on homogeneous multiprocessors. Although it is possible to apply the uni-processor algorithm to dedicate the whole slack to only one receiver, we consider all parallel receivers in multiprocessors, and optimally derive the slack distribution strategy that outperforms the uniprocessorbased algorithm. Beyond that, a heuristic slack receiver selection strategy is presented to select the best receiver set that potentially produces the maximal QoS. Third, we extend the idealized IC model by proposing a more practical generic QoS model, and present a dynamic scheduling algorithm targeting heterogeneous multiprocessors, where each processor has its individual frequency and energy characteristics. We propose a Guided-Search algorithm that efficiently determines the receiver execution speed, in order to achieve the QoS maximization for the generic model. The receiver selection methodology is also novelly designed for the generic model. Moreover, an enhancement on the scheduling performance by taking care of slack losses due to inter-processor communications is reported. Finally, to make our work self-contained, we develop a static scheduling algorithm targeting inter-processor communications on Network-on-Chip (NoC) architectures. While our dynamic approaches are assumed to adopt any static scheduling results, the proposed method is a unified approach that optimally achieves the computation element mapping, the communication path decision, and the execution time scheduling. We support our proposed algorithms by evaluating the performance of scheduliv SUMMARY ing numerous synthesized task sets and realistic adaptive applications. The evaluation software, employing cycle-accurate architecture and NoC simulators, is also introduced in detail. v Contents Acknowledgements i Abstract iii Contents vi List of Figures x List of Tables xiv Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 List of Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 11 Related Work 12 2.1 Adaptive Applications . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Application Scheduling Techniques . . . . . . . . . . . . . . . . . . 14 2.2.1 Real-Time Scheduling . . . . . . . . . . . . . . . . . . . . . 14 2.2.2 Energy-Aware Scheduling . . . . . . . . . . . . . . . . . . . 15 vi CONTENTS 2.2.3 2.3 Scheduling for Adaptive Applications . . . . . . . . . . . . . 18 NoC-Aware Scheduling and Mapping . . . . . . . . . . . . . . . . . 19 System Modeling and Problem Formulation 21 3.1 Architectural and Energy Model . . . . . . . . . . . . . . . . . . . . 21 3.2 Application Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Scheduling Imprecise Computation Tasks on a Single Processor 31 4.1 Static Scheduling Strategy . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 Dynamic Slack Reclamation without DVS . . . . . . . . . . . . . . 33 4.2.1 Slack allocation for linear QoS functions . . . . . . . . . . . 33 4.2.2 Slack allocation for concave QoS functions . . . . . . . . . . 36 Dynamic Slack Reclamation under DVS . . . . . . . . . . . . . . . 38 4.3.1 Deciding maximal optional cycles . . . . . . . . . . . . . . . 39 4.3.2 Allotting optional cycles . . . . . . . . . . . . . . . . . . . . 41 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3 4.4 Scheduling Imprecise Computation Tasks on Multiprocessors 46 5.1 Motivational Example . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2 Slack Distribution Optimality Analysis . . . . . . . . . . . . . . . . 50 5.3 Slack Receiver Selection . . . . . . . . . . . . . . . . . . . . . . . . 53 5.3.1 Task grouping . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.3.2 Receiver selections in FCS and PCS . . . . . . . . . . . . . 55 5.3.3 Online distribution . . . . . . . . . . . . . . . . . . . . . . . 57 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.4 vii CONTENTS Scheduling Generic Models on Multiprocessors with Realistic Considerations 64 6.1 Motivational Example . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.2 Slack Distribution with Frequency Scaling . . . . . . . . . . . . . . 68 6.2.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.2.2 Guided-Search heuristic . . . . . . . . . . . . . . . . . . . . 70 Slack Receiver Selection . . . . . . . . . . . . . . . . . . . . . . . . 74 6.3.1 Graph decomposition . . . . . . . . . . . . . . . . . . . . . . 76 6.3.2 Receiver selection from FCS . . . . . . . . . . . . . . . . . . 78 6.3.3 Receiver selection from PCS . . . . . . . . . . . . . . . . . . 79 6.3.4 Runtime receiver selection . . . . . . . . . . . . . . . . . . . 81 6.3.5 Implication to static scheduling . . . . . . . . . . . . . . . . 83 6.4 Slack Distribution Considering Inter-Processor Communication . . . 84 6.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.5.1 Setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.5.2 Synthesized task simulation . . . . . . . . . . . . . . . . . . 89 6.5.3 The JPEG2000 decoder . . . . . . . . . . . . . . . . . . . . 90 6.5.4 Considering communication variation . . . . . . . . . . . . . 91 6.3 Supplement: A Communication-Aware Static Scheduling Approach 99 7.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.2 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 107 Conclusions and Future Work 113 viii CHAPTER 7. Supplement: A Communication-Aware Static Scheduling Approach in the above combined execution using the three algorithms respectively. The makespan of an application is defined as total exe time . no of executions The results are obtained from sets of mesh configurations, namely × 3, × 3, × 4, and × 4, which pose several interesting points to be mentioned. First of all, we notice that all the plots have the trend of decreasing makespan with increasing number of processors. This is understandable since contentions are less severe with plenty of resources. We can especially observe that on a mesh structure, both our algorithm and CBM can obtain a result quite near to their critical paths. That means transmission time takes up most of the difference between the makespan and critical path, and delay due to task contention is almost avoided using these two algorithms. On the other hand, BCM results in almost flat curves under different number of CPUs in the case of Encrypter application (Similar with the other two). Because there is no specific rule for independent tasks in BCM, tasks are assigned in a per application manner. The most communication-heavy tasks of the three applications are placed at the center of the mesh. This introduces a large amount of processor contention among the three tasks. Hence, in a round robin manner, a task in the light-weighted Encrypter has to wait for the huge JPEG task to finish on the same processor, resulting in a extremely long makespan for Encrypter. ECG also shows similar result by BCM. We also note that our algorithm results in a shorter makespan as the number of processors reduces, especially in the case of × mesh. This reflects the fact that our algorithm does not always stick one task to a specific processor, but maps each task instance according to the instantaneous situation of the overall system execution. This results in better performance since the both BCM and CBM implement fixed assignment, where the contention among tasks from different applications deteriorates the performance with smaller mesh size. With the mesh 110 CHAPTER 7. Supplement: A Communication-Aware Static Scheduling Approach Fig. 7.3: Simulation results of average transmission time on a × mesh using algorithms on applications. size shrinks further (2 × 3), our algorithm can still generate better performance but limited by the scarce resource. The makespans of the lighter-weighted Encrypter and ECG tend to converge to the heavy-weighted JPEG application. We also measure the average transmission time spent per execution of each application on a × mesh. The transmission time reflects a combined effect of transmission distance and runtime network queueing resulted from the three algorithms. The results are shown in Fig. 7.3, and our algorithm can achieve at least 38.3% less transmission time (in the case of JPEG) on a resource-scarce × mesh structure. Note that JPEG application results in similar transmission delays under the three algorithms mainly due to its long execution time, so it is less affected by other application delays (e.g. the “tiny” Encrypter). Meanwhile, an Encrypter task can be waiting for an extremely long JPEG task to finish in a round robin scheme, hence the extremely long transmission time for the Encrypter 111 CHAPTER 7. Supplement: A Communication-Aware Static Scheduling Approach shown in Fig. 7.3. 112 Chapter Conclusions and Future Work In this thesis, we have systematically presented a dynamic scheduling framework for adaptive applications on embedded systems, addressing the contemporary scheduling challenges of workload flexibility, multiprocessing, leakage power, and platforminduced overheads. Moreover, a NoC-based static scheduling approach is presented to make our work complete. We describe our methodology in a logical manner from the simple assumption to more realistic factors in the process of problem definition, formulation, and solution. To be more specific, we report our approach firstly by presenting a single-processor based imprecise-computation algorithm that theoretically proves the optimal way of slack distribution to achieve maximal dynamic quality. The single-processor work is extended to multiprocessor scenario, where single-processor theorem becomes invalid due to the fact of slack time duplication. In respect to that fact, we also explore the optimal theoretical formulation that optimally utilize slack time and energy for QoS maximization, and found that the QoS slope is the sole factor to determine the slack allocation amount. Having 113 CHAPTER 8. Conclusions and Future Work tackled the imprecise computation scheduling on multiple processors, we direct our research into a more challenging level that comprehends realistic factors to the problem formulation, such as generalized adaptive application representation, leakage power, and platform-introduced overheads. Finally, we also devise a static scheduling approach that takes network-on-chip platform communication into the overall timing decision. By deciding the starting time and the processor mapping of the applications, the static algorithm may serve as the starting point of dynamic approaches, while our dynamic approaches are able to adopt any static scheduling results. In Chapter 4, we propose a novel low complexity single-processor based dynamic scheduling algorithm, named gradient curve shifting, for imprecise computation modeled tasks. From the single-processor scenario, we start describing our framework wherein application requirements are real-time and need to strike a trade-off between available energy and QoS demands. We had shown that our GCS algorithm is able to decide the best allocation of slack cycles and operating voltages to optional tasks, while its complexity remains low compared to other dynamic scheduling solutions. The multiprocessor extension, described in Chapter 5, targets dependent tasks compared to simplified independent task assumptions. Unlike most dynamic scheduling algorithms that are rule-of-thumb based, the algorithm optimally calculates the optional cycle increase and dedicate the slack to each task based on global inspection. Simulation results reveal that our approach outperforms contemporary methods with small execution overhead. An immediate extension to the work reported here would consider to include the voltage transition delays in the model. One of the ways to model is to include 114 CHAPTER 8. Conclusions and Future Work an additive parameter that captures all possible voltage transition delays that exist during the scheduling process. Capturing such delays are subject to the underlying platform and hence one may adopt an empirical approach in measuring such voltage transition delays and reflect it in the model. Also, this being an overhead, one way to compensate is to increase the computation volume of the task node. To a large extent, a voltage transition can take several thousand cycles, hence the granularity of a task can be chosen to be significantly larger than the transition overhead. Despite the overheads caused by the voltage transition that has been extensively studies in literatures, we tackle another type of overheads caused by platforms, that has been rarely studied. In Chapter 6, we propose a novel heuristic for multiprocessor dynamic scheduling with generalized adaptive applications, combining the leakage power model and making use of runtime slack to enhance the execution quality under timing and energy constraints beyond statically scheduled. Our methodology is composed of a heuristic guided-search algorithm that efficiently decides maximized cycle increase on a given set of receiver candidates, as well as a dedicated receiver candidate selection method that boosts the performance of the guided-search algorithm. Moreover, we improve the algorithm practicability by extending the framework to consider the quality degradation brought by inter-processor communications, and propose a local scaling approach that complements the performance of the guided-search algorithm. We use both synthesized and JPEG2000 applications to validate our work, and also test the performance of the local scaling approach on Gaussian distributed transmission time variation, as well as on various NoC routing schemes. Results show that the guided-search algorithm, aided by slack receiver selection, can achieve at least 25% cycle gain improvement, and local scaling can contribute 115 CHAPTER 8. Conclusions and Future Work as large as 16.9% more cycle gain compared with not applying local complementary methods. Our current framework models application adaptiveness with cycle scalability. However, other adaptive application models, such as multi-version tasks and imprecise-computation can also be incorporated into our framework to extend its practicality. On the other hand, the current framework can be further improved in terms of the receiver selection process. More efficient heuristic approaches could be studied, relying on more detailed study on graph analysis techniques. For the static scheduling approach described in Chapter 7, we propose an NoC-targeted algorithm which determines transmission routing and scheduling in the process of task mapping and scheduling. We use three real life applications to evaluate our algorithm and the results appear significantly better than contemporary NoC-targeted mapping algorithms. However, our work achieves predictable performance gain with the prerequisite that the communication should exhibits regular access patterns (hence data transmission time per instance does not have large variation). In the future, we are planning to investigate the scheduling algorithm to deal with irregular access patterns and cache influences. 116 Bibliography [1] Jane W. S. Liu et al., “Imprecise Computations,” Proc. of IEEE, vol. 82(1), pp. 83-94, 1994. [2] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the Scalable Video Coding Extension of the H.264/AVC Standard,” IEEE Trans. on Circuits Syst. Video Techn., vol. 17(9), pp. 1103-1120, Sept. 2007. [3] T. Acharya and P. S. Tsai, JPEG2000 Standard for Image Compression: Concepts, Algorithms and VLSI Architectures, Wiley 2004. [4] T. F. Abdelzaher, E. M. Atkins, and K. G. Shin, “QoS Negotiation in RealTime Systems and Its Application to Automated Flight Control,” IEEE Trans. Computers, vol. 49(11), pp. 1170-1183, 2000. [5] George Cybenko, “Dynamic Load Balancing for Distributed Memory Multiprocessors,” J. Parallel Distrib. Comput., vol. 7(2), pp. 279-301, 1989. [6] M. H. Willebeek-LeMair and A. P. Reeves, “Strategies for dynamic load balancing on highly parallel computers,” IEEE Trans. Parallel Distrib. Syst., vol. 4(9), pp. 979-993, 1993. [7] K. Kennedy et al., “Toward a Framework for Preparing and Executing Adaptive Grid Programs,” Proc. IPDPS’02, pp. 171-175, Aug. 2002. [8] X. Chen and A. M. K. Cheng, “An imprecise algorithm for real-time compressed image and video transmission,” Int’l Conf. Compt. Comm. and Networks (ICCCN’97), pp. 390-397, Sept. 1997. [9] B. Smith and R. Oswald, “Meeting RealCTime Traffic Flow Forecasting Requirements with Imprecise Computations,” Computer-Aided Civil and Infrastructure Engineering, vol. 18(3), pp. 201-213, May 2003. [10] M. Amirijoo, J. Hansson, and S. H. Son, “Specification and Management of QoS in Real-Time Databases Supporting Imprecise Computations,” IEEE Trans. Computers, vol. 55(3), pp. 304-319, 2006. 117 BIBLIOGRAPHY [11] E. K. P. Chong and W. Zhao, “Task Scheduling for Imprecise Computer Systems with User Controlled Optimization,” Proc. Int’l Conf. on Computers and Information, May 1989. [12] K. B. Kenny and K.-J. Lin, “Structuring large real-time systems with performance polymorphism,” Proc. IEEE Real-Time Systems Symposium, pp. 238246, Dec. 1990. [13] H. Zou and F. Jahanian, “A Real-Time Primary-Backup Replication Service,” IEEE Trans. Parallel Distrib. Syst., vol. 10(6), pp. 533-548, 1999. [14] G. Buttazzo, G. Lipari, M. Caccamo, and L. Abeni, “Elastic scheduling for flexible workload management,” IEEE Trans. Computers, vol. 51(3), pp. 289302, Mar. 2002. [15] M. Hamdaoui and P. Ramanathan, “A dynamic priority assignment technique for streams with (m,k)-firm deadlines,” IEEE Trans. Computers, vol. 44(12), pp. 1443-1451, Dec. 1995. [16] P. Ramanathan, “Graceful Degradation in Real-Time Control Applications Using (m, k)-Firm Guarantee,” Proc. IEEE 27th Int’l Symp. Fault-Tolerant Computing (FTCS), pp. 132-141, June 1997. [17] C. L. Liu and J. W. Layland, “Scheduling Algorithms for Multiprogramming in a Hard Real-Time Environment,” Journal of the ACM, vol. 20(1), pp. 46-61, Jan. 1973. [18] J. Y. T. Leung and J. Whitehead, “On the complexity of fixed-priority scheduling of periodic, real-time tasks,” Performance Evaluation, vol. 2(4), pp. 237-250, Dec. 1982. [19] A. K. Mok, Fundamental Design Problems of Distributed Systems for the Hard Real-Time Environment, Ph.D. Thesis, Massachusetts Institute of Technology, 1983. [20] R. Rajkumar, Synchronization in Real-Time Systems: A Priority Inheritance Approach, Kluwer Academic, 1991. [21] L. Sha, R. Rajkumar, and J. P. Lehoczky, “Priority Inheritance Protocols: An Approach to Real-Time Synchronisation,” IEEE Trans. Computers, vol. 39(9), pp. 1175-1185, 1990. [22] T. P. Baker, “Stack-Based Scheduling of Real-Time Processes,” Real-Time Systems, vol. 3(1), pp. 67-100, Mar. 1991. 118 BIBLIOGRAPHY [23] F. Cottet et al., Scheduling in Real-Time Systems, Wiley, ISBN: 0-470-847662, 2002. [24] M. G. Harbour, M. H. Klein, and J. P. Lehoczky, “Fixed Priority Scheduling of Periodic Tasks with Varying Execution Priority,” Proc. IEEE Real-Time Systems Symposium, pp. 116-128, Dec. 1991. [25] S. K. Dhall and C. L. Liu, “On a Real-Time Scheduling Problem,” Oper. Research, vol. 26(1), pp. 127-140, 1978. [26] M. Garey and D. Johnson, “Two-Processor Scheduling with Start-Times and Deadlines,” SIAM J. Comput., vol. 6(3), pp. 416-426, 1977. [27] M. L. Dertouzos and A. K. Mok, “Multiprocessor Online Scheduling of Hard Real-Time Tasks,” IEEE Trans. Software Engineering, vol. 15(12), pp. 14971505, 1989. [28] T. L. Adam, K. M. Chandy, and J. R. Dickson, “A Comparison of List Schedules for Parallel Processing Systems,” Comm. ACM, vol. 17(12), pp. 685-690, Dec. 1974. [29] T. Yang and A. Gerasoulis, “List Scheduling with and without Communication Delays,” Parallel Computing, vol. 19(12), pp. 1321C1344, Sept. 1993. [30] C. V. Ramamoorthy, K. M. Chandy, and M. J. Gonzalez, “Optimal Scheduling Strategies in a Multiprocessor System,” IEEE Trans. Computers, vol. 21(2), pp. 137-146, Feb. 1972. [31] J.-J. Hwang et al., “Scheduling precedence graphs in systems with interprocessor communication times,” SIAM J. Comput., vol. 18(2), pp. 244-257, Apr. 1989. [32] Y. Kwok and H. Ahmad, “Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task graphs to Multiprocessors,” IEEE Trans. Parallel Distrib. Syst., vol. 7(5), pp. 506-521, May 1996. [33] H. Topcuoglu, S. Hariri and M.-Y. Wu, “Performance-Effective and LowComplexity Task Scheduling for Heterogeneous Computing,” IEEE Trans. Parallel Distrib. Syst., vol. 13(3), pp. 260-274, Mar. 2002. [34] T. Yang and A. Gerasoulis, “DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors,” IEEE Trans. Parallel Distrib. Syst., vol. 5(9), pp. 951967, Sept. 1994. 119 BIBLIOGRAPHY [35] M. Srivastava, A. Chandrakasan, and R. Brodersen, “Predictive System Shutdown and Other Architectural Techniques for Energy Efficient Programmable Computation,” IEEE Trans. VLSI Syst., vol. 4(1), pp. 42-55, Mar. 1996. [36] C.-H. Hwang and A. Wu, “A Predictive System Shutdown Method for Energy Saving of Event-driven Computation,” IEEE Int’l. Conf. Computer-Aided Design (ICCAD), pp. 28-32, Nov. 1997. [37] T. D. Burd, T. Pering, A. Stratakos, and R. Brodersen, “A Dynamic Voltage Scaled Microprocessor System,” IEEE J. Solid-State Circuits, vol. 35(11), pp. 1571-1580, 2000. [38] T. D. Burd and R. W. Brodersen, “Energy Efficient CMOS Microprocessor Design,” Proc. Hawaii Int’l. Conf. Syst. Sci., pp. 288-297, Jan. 1995. [39] R. Ernst and W. Ye, “Embedded Program Timing Analysis based on Path Clustering and Architecture Classification,” IEEE Int’l Conf. Computer-Aided Design (ICCAD), pp. 598-604, 1997. [40] F. Yao, A. Demers, and S. Shenker. “A Scheduling Model for Reduced CPU Energy,” Proc. IEEE Symposium on Foundations of Computer Science, pp. 374-382, Oct. 1995. [41] F. Gruian, “System-Level Design Methods for Low-energy Architectures Containing Variable Voltage Processors,” Proc. 1st Int’l Workshop on PACS, pp. 1-12, Nov. 2000. [42] Y. Zhang, X. Hu, and D. Z. Chen, “Task Scheduling and voltage selection for energy minimization,” Proc. Design Automation Conference, pp. 183-188, June 2002. [43] L. Goh, B. Veeravalli, and S. Viswanathan, “Design of Fast and Efficient Energy-aware Gradient-Based Scheduling Algorithms for Heterogeneous Embedded Multiprocessor Systems,” IEEE Trans. Parallel Distrib. Syst. (TPDS), vol. 20(1), pp. 1-12, Jan. 2009. [44] R. Mishra, N. Rastogi, D. Zhu, D. Mosse, and R. Melhem, “Energy Aware Scheduling for Distributed Real-Time Systems,” Proc. Int’l Parallel and Distributed Processing Symposium (IPDPS’03), 2003. [45] M. T. Schmitz, and B. M. Al-Hashimi, “Considering Power Variations of DVS Processing Elements for Energy Minimisation in Distributed Systems,” Proc. Int’l Symp. Syst. Synthesis, pp. 250-255, 2001. 120 BIBLIOGRAPHY [46] D. Moss´ e, H. Aydin, B. Childers, and R. Melhem, “Compiler-Assisted Dynamic Power-Aware Scheduling for Real-Time Applications, Workshop on Compiler and OS for Low Power, Philadelphia, Oct. 2000. [47] D. Zhu, R. Melhem, and B. Childers, “Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation in Multi-Processor Real-Time Systems,” IEEE Trans. Parallel Distrib. Syst., vol. 14(7), pp. 686-700, 2003. [48] J. Luo and N. K. Jha, “Power-Conscious Joint Scheduling of Periodic Task Graphs and Aperiodic Tasks in Distributed Real-time Embedded Systems,” IEEE Int’l Conf. Computer-Aided Design (ICCAD), pp. 357-364, Nov. 2000. [49] D. Shin, J. Kim, and S. Lee, “Intra-Task Voltage Scheduling for Low-Energy Hard Real-Time Applications,” IEEE Design and Test of Computers, vol. 18(2), pp. 20-30, 2001. [50] J. Seo, T. Kim, and N. D. Dutt, “Optimal Integration of Inter-Task and IntraTask Dynamic Voltage Scaling Techniques for Hard Real-Time Applications,” Int’l Conf. Computer-Aided Design (ICCAD), pp. 450-455, 2005. [51] D. Bergstrom, M. Hattendorf, J. Hicks, J. Jopling, J. Maiz, S. Pae, C. Prasad, J. Wiedemer, “45nm Transistor Reliability,” Intel Technology J., vol. 12(2), June 2008. [52] M. Pedram, “Leakage Power Modeling and Minimization,” Tutorial, ICCAD’04, 2004. [53] S. M. Martin, K. Flautner, T. Mudge, and D. Blaauw, “Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Low Power Micropossers under Dynamic Work Loads,” Int’l Conf. Computer-Aided Design (ICCAD), pp. 721725, 2002. [54] W. Zhang et al., “Exploiting VLIW Schedule Slacks for Dynamic and Leakage Energy Reduction,” IEEE/ACM Int’l Symp. Microarchitecture (MICRO’01), pp. 102-113, 2001. [55] S. Irani, S. Shukla, and R. Gupta. “Algorithms for Power Savings,” Proc. ACM-SIAM Symp. Discrete Algorithms, pp. 37-46, 2003. [56] J.-J. Chen and T.-W. Kuo, “Procrastination determination for periodic realtime tasks in leakage-aware dynamic voltage scaling systems,” ICCAD’07, pp. 289-294, 2007. 121 BIBLIOGRAPHY [57] J.-J. Chen, H.-R. Hsu, and T.-W. Kuo, “Leakage-Aware Energy-Efficient Scheduling of Real-Time Tasks in Multiprocessor Systems,” IEEE Real-time and Embedded Technology and Applications Symposium (RTAS), pp. 408-417, 2006. [58] A. Andrei, P. Eles, and Z. Peng, ”Energy Optimization of Multiprocessor Systems on Chip by Voltage Selection,” IEEE Trans. VLSI Syst. , vol. 15(3), pp. 262-275, 2007. [59] C. Xian, Y.-H. Lu, and Z. Li, ”Dynamic Voltage Scaling for Multitasking Real-Time Systems With Uncertain Execution Time,” IEEE Trans. on CAD of Integrated Circuits and Systems (TCAD), vol. 27(8), pp. 1467-1478, 2008. [60] W.-K. Shih, J. W. S. Liu, and J.-Y. Chung. “Fast Algorithms for Scheduling Imprecise Computations,” Proc. Real-Time Systems Symposium (RTSS), pp. 12C19, 1989. [61] J. Y. Chung, J. W. S. Liu, and K. J. Lin, “Scheduling Periodic Jobs that Allow Imprecise Results,” IEEE Trans. Computers, vol. 19(9), pp. 1156-1173, Sept. 1990. [62] W.-K. Shih, J. W. S. Liu, and J.-Y. Chung. “Algorithms for Scheduling Imprecise Computations with Timing Constraints,” SIAM Journal of Computing, 1991. [63] J. Hu and R. Marculescu, ”Energy-Aware Communication and Task Scheduling for Network-on-Chip Architectures under Real-Time Constraints,” Design, Automation and Testing in Europe (DATE), pp. 234-239, 2004. [64] G. Varatkar and R. Marculescu, ”Communication-Aware Task Scheduling and Voltage Selection for Total Systems Energy Minimization,” IEEE Int’l Conf. on Computer-Aided Design (ICCAD), pp. 510-517, 2003. [65] P. Eles, A. Doboli, P. Pop, and Z. Peng, ”Scheduling with bus access optimization for distributed embedded systems,” IEEE Trans. VLSI Syst., vol. 8(5), pp. 472-491, 2000. [66] J. Y. Chung, J. W. S. Liu, and K. J. Lin, “Scheduling Periodic Jobs that Allow Imprecise Results,” IEEE Trans. Computers, vol. 39(9), pp. 1156-1174, 1990. [67] L. A. Cort´ es, P. Eles, and Z. Peng, “Quasi-Static Assignment of Voltages and Optional Cycles in Imprecise-Computation Systems with Energy Considerations,” IEEE Trans. VLSI, vol. 14(10), pp. 1117-1129, 2006. 122 BIBLIOGRAPHY [68] H. Aydin, R. Melhem, D. Mosse, and P. Mejia-Alvarez, “Optimal RewardBased Scheduling for Periodic RealTime Tasks,” IEEE Trans. Computers, vol 50(2), pp. 111-130, Feb. 2001. [69] C. Rusu, R. Melhem, and D. Mosse, “Maximizing Rewards for Real-Time Applications with Energy Constraints,” ACM Transactions on Embedded Computing Systems (TECS), vol. 2(4), pp. 537-559, Nov. 2003. [70] R. M. Karp, R. E. Miller, J. W. Thatcher, “Reducibility Among Combinatorial Problems,” The Journal of Symbolic Logic, vol. 40(4), pp.618-619, 1975. [71] A. Bj¨ orklund, T. Husfeldt, M. Koivisto, “Set partitioning via inclusionexclusion”, SIAM J. on Computing, vol. 39(2), pp. 546-563, 2009. [72] R. P. Dick, D. L. Rhodes, and W. Wolf, “TGFF: Task Graphs for Free,” CODES’98, pp. 97-101, 1998. [73] http://sesc.sourceforge.net [74] http://www.nirgam.ecs.soton.ac.uk [75] L. M. Ni and P. K. McKinley, “A survey of wormhole routing techniques in direct netowrks,” Computer, vol. 26(2), pp. 62-76, Feb. 1993. [76] M.B. Taylor et al, “The RAW microprocessor: a computational fabric for software circuits and general-purpose programs,” IEEE Micro, vol. 22(2), pp. 25-35, March 2002. [77] S. Vangal et al, “An 80-Tile 1.28 TFLOPS Network-on-Chip in 65nm CMOS,” IEEE JSSC, vol.43(1), pp. 29-41, Jan. 2008. [78] W.J. Dally and B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann, c2004. [79] G. Chen, F. Li, S.W. Son, and M. Kandemir, “Application mapping for chip multiprocessors,” Proc. Design Automation Conference, pp. 620-625, June 2008. [80] S. Murali and G. De Micheli, “Bandwidth-constrained mapping of cores onto NoC architectures,” Proc. Design Automation and Test Europe (DATE), pp. 896-901, Feb. 2004. [81] J, Hu and R. Marculescu, “Energy-aware communication and task scheduling for Network-on-Chip Architectures under Real-Time Constraints,” Proc. Design Automation and Test Europe (DATE), pp. 234-239, Feb. 2004. 123 BIBLIOGRAPHY [82] G. Varatkar and R. Marculescu, “Communication-Aware Task Scheduling and Voltage Selection for Total Systems Energy Minimization,” IEEE Int’l Conf. on Computer-Aided Design (ICCAD), pp. 510-517, 2003. [83] A. Jantsch and H. Tenhunen, Networks on Chip, Kluwer Academic Publishers 2003. [84] Z. Lu and A. Jantsch, “Slot Allocation for TDM Virtual-Circuit Configuration for Network-on-Chip,” Int’l Conf. on Computer-Aided Design (ICCAD), pp. 18-25, 2007. [85] K. Goossens, J. Dielissen, and A. Radulescu, “Æthereal network on chip: Concepts, architectures and implementations,” IEEE Design Test Comput., vol. 22(5), pp. 414-421, 2005. [86] M. Millberg, E. Nilsson, R. Thid, and A. Jantsch, “Guaranteed bandwidth using looped containers in temporally disjoint networks within the nostrum network-on-chip,” Design, Automation and Testing in Europe (DATE), pp. 890C895, 2004. [87] J. Duato, S. Yalamanchili, and L.M. Ni, Interconnection Networks: An Engineering Approach, Morgan Kaufmann, 2003. [88] G.-M. Chiu, “The Odd-Even Turn Model for Adaptive Routing,” IEEE Trans. Parallel Distrib. Syst., vol. 11(7), pp. 729-738, 2000. [89] J. Hu, R. Marculescu, “DyAD: smart routing for networks-on-chip,” Proc. Design Automation Conference, pp. 260-263, 2004. [90] M. H. Cho et al, “Path-Based, Randomized, Oblivious, Minimal Routing,” Int’l Workshop on Network on Chip Arch., pp. 23-28, 2009. [91] E. Bolotin, I. Cidon, R. Ginosaur, and A. Kolodny, “QNoC: QoS architecture and design process for network-on-chip,” Jnl. Syst. Archit., vol. 50(2-3), pp. 105-128, 2004. [92] D. Andreasson and S. Kumar, “Slack-time aware routing in NoC systems,” Int’l Symp. on Circuits and Syst. (ISCAS), pp. 2353-2356, 2005. [93] E. Beigne, et al “An asynchronous NOC architecture providing low latency service and its multi-level design framework,” Int’l Symp. on Async. Circuits and Syst. (ASYNC), pp. 54-63, 2005. ´ Tardos, Algorithm Design, Addison Wesley, c2006. [94] J. Kleinberg and E. 124 BIBLIOGRAPHY [95] http://www.virtutech.com/ [96] N. Agarwal, L.-S. Peh, and N. Jha, “GARNET: A Detailed Interconnection Network Model inside a Full-system Simulation Framework,” Technical Report CE-P08-001, 2008. 125 [...]... the literature 2.2.3 Scheduling for Adaptive Applications Scheduling techniques for adaptive applications are attributed with another goal – QoS maximization Together with abovementioned timing and energy constraints, problem formulation for adaptive applications are complicated by that extra dimension For QoS measured as the function of computation volume, deciding the execution time of a task is far... “Quality-Driven Dynamic Scheduling for Real- time Adaptive Applications on Multiprocessor Systems with Communication Awareness,” submitted to IEEE Trans on Computers 2 Heng Yu, Bharadwaj Veeravalli, and Yajun Ha, “Energy/QoS-Aware Dynamic scheduling for Multiprocessor Real- Time Embedded Systems, ” preparing for journal submission 3 Heng Yu, Bharadwaj Veeravalli, and Yajun Ha, “Leakage-aware Dynamic Scheduling for Real- time. .. abstraction levels, performance enhancements based on the static design and aim at coping with the 3 CHAPTER 1 Introduction Fig 1.2: Aircraft pitch performance for controller task level 2 and 4 execution environment dynamism In this thesis, we focus on the OS-level runtime optimization techniques, specifically the design of real- time dynamic scheduling algorithms for adaptive applications Dynamic scheduling algorithms... section, previous work related to the topic of this thesis is reviewed, including overviews of existing adaptive application models and scheduling techniques that are aware of real- time, energy, application adaptiveness, and infrastructural requirements 2.1 Adaptive Applications Application adaptation ambiguously refers to two aspects: the execution adaptation and the quality adaptation As a conventional... parallel execution 108 xiv Chapter 1 Introduction 1.1 Motivation Advancements in silicon processing, IC design, and electronic design automation (EDA) technologies continuously push the drastic performance improvement of embedded computing systems The complexity of applications that an embedded platform could handle increases as well Definitions of application execution performance have... applications is the GrADs project meant for Grid applications [7] 12 CHAPTER 2 Related Work In contrast to spatial execution-adaptable applications, quality-adaptable applications feature in graceful degradation mechanisms that focus on the execution quality adjustment and customization, and can be applied in scenarios such as the runtime quality improvement and the real- time fault tolerance One of the... utilization, energy consumption, and application response time, to the “soft” behaviors of application execution that emphasize on the execution Quality-of-Service (QoS) For instance, the problem of “at which quality level the video could be rendered to the viewer” comes under concern once the transmission reliability is ensured In view of this, adaptive applications are gaining growing attentions owing... universally adoptable for various adaptive applications, and use the energy model that includes both leakage and dynamic power consumptions Moreover, we consider the platform impacts on the scheduling algorithm efficiency, and propose a local scaling scheme to compensate the overheads caused by interconnection fluctuations on the Network -on- Chip (NoC) architectures • To make our work self-contained, we also... QoS-negotiation model is proposed as a methodology of building the QoS spectrum and its associated rewards/penalties [4] 2.2 Application Scheduling Techniques In this section, scheduling strategies for real- time systems are reviewed Although it is a traditional topic, the scheduling algorithm design evolves with the technology advancements of real- time systems The following subsections cover several scheduling. .. notion in the distributed computing, execution-adaptable applications feature in the irregular and unpredictable computation and communication runtime loads imposed onto an execution platform There exist many dynamic load balancing methodologies that exploit task reallocation to alleviate the workload “hot-spot” for performance improvement, e.g [5][6] A well-known programming framework for those applications . 12 2.1 Adaptive Applications . 12 2.2 Application Scheduling Techniques . 14 2.2.1 Real- Time Scheduling 14 2.2.2 Energy-Aware Scheduling 15 vi CONTENTS 2.2.3 Scheduling for Adaptive Applications. 4. execution environment dynamism. In this thesis, we focus on the OS-level runtime optimization techniques, specifically the design of real- time dynamic scheduling algorithms for adaptive applications. Dynamic. Dynamic Scheduling Techniques for Adaptive Applications on Real- Time Embedded Systems Yu Heng (B.Eng, National University of Singapore, Singapore, 2006) A

Dynamic scheduling techniques for adaptive applications on real time embedded systems

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan