Extended process scheduler for improving user experience in multi core mobile systems

Extended Process Scheduler for Improving User Experience in Multi-core Mobile Systems Giang Son Tran Thi Phuong Nghiem Tuong Vinh Ho ICTLab, University of Science ∗ and Technology of Hanoi, VAST tran-giang.son@usth.edu.vn ICTLab, University of Science and Technology of Hanoi, VAST* nghiem-thi.phuong@usth.edu.vn Institute Francophone International, Vietnam National University IRD, UMI 209 UMMISCO ho.tuong.vinh@ifi.edu.vn Chi Mai Luong Institute of Information Technology, VAST* lcmai@ioit.ac.vn ABSTRACT Nowadays, advances in computing infrastructure and technology have made mobile phones become a crucial part of our daily life Almost everyone has their own mobile phones to be used for their daily life activities such as organizing events with calendar, browsing web, sending and receiving emails, entertaining, etc In order to meet this enormous use of mobile market, manufacturers are in effort of producing mobile devices with as high capabilities as possible For example, it is not uncommon nowadays to have mobile phones with cores and low power consumption of 0.3W in a System-on-Chip model [1] This effort of manufacturers can be considered as a marketing strategy to improve user satisfaction when using mobile phones In their work, Yong et al., 2006 [2] show that user satisfaction on mobile devices not only depends on technological capabilities of the phones, but also on responsiveness of mobile user interface to user interactions Unfortunately, users often tend to make excessive use of their mobile devices for performing many tasks at the same time For example, one may simultaneously check email, send message to his friends, download data from the internet, listen to music, and read news These concurrent actions commonly result in high background load or unresponsiveness of user interface to user interactions, and consequently reduce user experience on mobile devices Responsiveness is one of many non-functional requirements that affect success of any mobile applications [3] One direction to overcome the problem of unresponsiveness of user interface to user interactions on mobile devices is improving CPU allocation so that CPU can process mobile tasks required by users more efficiently Following this direction, studies focus on a mechanism of operating system kernel called process scheduler [4] In detail, process scheduler is a component of operating system kernel which shares the CPU resources among running tasks according to their types (classes), priorities and CPU usages The main job of process scheduler is to decide which tasks to execute and how long each task will be executed The output decision of process scheduler is one of the crucial criteria which affects CPU computational power as well as overall performance of Mobile phone is being well integrated into people’s daily life Due to a large amount of time spending with them, users expect to have a good experience for their daily tasks The mobile operating system’s scheduler is in charge of distributing CPU computational power among these tasks However, it currently has not yet taken into account dynamic frequencies of CPU cores at runtime This unawareness of the scheduler with CPU frequency increases unresponsiveness of user interface to user interactions, and consequently reduces user experience on using mobile devices In this paper, we propose an extension of process scheduler which takes into account the dynamic CPU frequency when scheduling the tasks Our method increases smoothness of user interface to user interactions by lowering and stabilizing interface frame times Experimental results show that our proposed scheduler reduces amount of frame time peaks up to 40%, which helps greatly in improving user experience on mobile devices CCS Concepts •Software and its engineering → Scheduling; •Humancentered computing → Smartphones; Keywords Process Scheduler, CPU Frequency, User Experience, Mobile System, Operating System INTRODUCTION ∗Vietnam Academy of Science and Technology, 18, Hoang Quoc Viet, Cau Giay, Hanoi, Vietnam Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page Copyrights for components of this work owned by others than ACM must be honored Abstracting with credit is permitted To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee Request permissions from permissions@acm.org SoICT ’16, December 08-09, 2016, Ho Chi Minh City, Viet Nam c 2016 ACM ISBN 978-1-4503-4815-7/16/12 $15.00 DOI: http://dx.doi.org/10.1145/3011077.3011106 417 the mobile system [5] Another important criterion which affects user experience on mobile phones is their energy consumption [6] If mobile phones quickly deplete their battery power, users will easily get annoyed and consequently encounter a negative user experience Due to this important role of energy consumption, operating systems running on mobile devices need to minimize power consumption To reach this goal, one popular method is to dynamically adjust CPU frequency with the demand workload Following this approach, a CPU governor [4] in the operating system kernel is responsible for this task: it increases CPU frequency when the required workload is high to meet this demand and vice versa In this research context, we follow the direction of improving CPU allocation so as to enhance user experience on mobile phones By analyzing the process scheduler, we are aware that it currently does not take into account CPU frequency as a criterion to schedule running tasks As a consequence, this increases unresponsiveness of user interface to user interactions since the same amount of rendering work (determined by the scheduler) has to be done in a longer duration (as CPU frequency is controlled by the governor) Realizing this problem of the process scheduler, in this paper, we propose an extension for the Linux’s default scheduler (named Completely Fair Scheduler, or CFS) Our extended scheduler takes into account CPU frequency into scheduler decision when selecting appropriate running tasks We will show that our proposal helps in improving the smoothness of user interface to user interactions on mobile systems in comparison with the Linux’s default scheduler CFS The remainder of this paper is organized as follows Section briefly reviews related works about CPU allocation In Section 3, we present the concept of CFS scheduler and point out its current limitation Section is devoted to introducing the principle and algorithm of our proposed frequency-aware scheduler In Section 5, we describe our experiments and an analysis of our results The paper ends with Section 6, which includes a general conclusion and possible future works tems with heavy workloads rather than mobile systems with much less active threads per process Another noticeable work, namely GRACE-OS, was proposed by Yuan et al., 2003 [5] in order to reduce CPU energy consumption on mobile devices using soft real-time scheduling In detail, the method enhances CPU scheduler by performing scheduling and speed scaling at the same time Although applicable on mobile devices, this approach mainly targets multimedia applications, which require statistical performance guarantees (for example, 96% of deadlines is met [5]), and has not yet taken into account user interaction latency, that is, one important aspect for ensuring user experience on mobile applications Concerning the limitations of kernel scheduler without taken into account CPU frequency, operating system researchers have raised a research question of developing new Linux kernel for connecting Linux scheduler and governor [10] Some researchers discussed that it is possible to merge these two components into a single entity [11], proposing an optimization in using CPU power for scheduling tasks Valente et al., [12] proposes a discussion for future research works about improving responsiveness of user interface to user interactions This is of importance for mobile devices since responsiveness of user interface and power consumption saving are two major criteria for ensuring mobile user experience In this work, we propose an extension for Linux kernel scheduler (CFS) which takes into account the current frequency of CPU cores when making scheduling decision for mobile systems Our work focuses on requirement of low latency for user interaction Unlike the aforementioned works, we focus on improving user experience when interacting with mobile devices than saving energy consumption By lowering and stabilizing interface frame times, our work helps in increasing responsiveness of user interface to user interactions on mobile devices COMPLETELY FAIR SCHEDULER In this section, we present internal concepts and algorithms of CFS, the standard scheduler of the Linux kernel We then show a scenario in which CFS shows inefficiency in CPU allocation when its frequency is not taken into account RELATED WORK There exist various works in the literature to improve efficiency of CPU allocation Yang et al., 2001 [7] proposed a divide-and-conquer algorithm for improving runtime flexibility and reducing computational complexity The algorithm is divided into two scheduling phases: the design-time scheduling and runtime scheduling Besides, the algorithm proves that energy is an important criterion in scheduling embedded multiprocessor System-on-Chips Another work about energy-aware scheduler is done by Rizvandi et al., 2010 [8] In detail, the authors proposed a slack reclamation algorithm in the scheduler using a linear combination of the processor’s maximum and minimum frequencies The method helps in saving energy while still providing enough computational power for applications Similarly, Mostafa et al., 2016 [9] proposed an energy-saving scheduler for high performance computing systems The authors use a relocation of thread weights for each active process so as to decrease number of context switches Although these methods [8, 9] reduce energy consumption of the scheduler, they currently target desktop or server sys- 3.1 CFS Model CFS is the Linux kernel scheduler which uses time slice estimation for selecting running tasks [13] CFS was developed based on Earliest Eligible Virtual Deadline First (EEVDF) scheduler [14] To achieve high responsiveness for all tasks, CFS tries to divide a certain amount of time (called period, usually a small value with a minimum of 20ms) to all runnable tasks Time slice for task Ti is given by the following equation: Si = ωi × P, Ωr (1) where • Si is time slice length for the task Ti at the current decision time; • ωi is calculated weight for Ti ; 418 Unresponsive User Interface CPU Load Frame time 16.6ms a b c d e f g h i j k Interface Frame Time High CPU Load No CPU speed limit Frame time reduces Low CPU Load Reduce CPU speed Frame time increases CPU Load Workload reduces m Time Figure 1: Unresponsiveness of user interface (UI) to user interactions in CFS scheduler • Ωr is total weight of the whole run queue of the current CPU (each weight represents a given process’s priority); and user interaction in CFS scheduler The visualization of this scenario is illustrated in figure Grey bars represent CPU load needed and red solid line represents the corresponding user interface’s frame time at the same moment Frame time is the duration which differentiates one fully rendered user interface’s frame from another A horizontal dotted line indicates the 16.6ms limit for each frame time, equivalent to the ability of rendering 60 frames per second (fps) If frame time is below the limit, user eyes could not perceive real differences between two consecutive frames [15] As such, animations being shown on the user interface appears as smooth and fluid Indeed, Claypool et al., 2006 [16] showed that user’s perception performance improved sevenfold when increasing frame rate from to 60 fps Figure indicates two possibilities of having interface frame times higher than the optimal 16.6ms limit: overload and underload At overload time a and b, with high CPU load, the UI thread is not provided enough CPU power to maintain the drawing process below 16.6ms System load reduces at time c and d, leaving more CPU to the rendering thread This load reduction results in a lower interface frame time (in other words, increases interface frame rate) As system load continues to decrease (to an underload point), the governor decides that CPU frequency should be reduced to lower power consumption (between time d and e) Lower CPU frequency also leads to a reduction of CPU power provided to the UI rendering thread As a result, the UI thread struggles in maintaining a good frame rate for the user interface since an optimal user experience must have at least 60fps, or 16.6ms per frame Furthermore, the governor works with a larger interval than the scheduler Not until time k does the governor notice a high CPU load is present and bump CPU frequency up This results in a drop of interface frame time (from time k to time m), keeping it back to under the 16.6ms limit As a result, the user interface is unresponsive in a duration between time e to k It is caused by the unawareness of the scheduler with the lowered CPU frequency If the scheduler had been aware of this change, it would have re-prioritized the UI thread by increasing time slice length for it and reducing time slice length of other running background threads By reconsidering time slices of all threads, the scheduler can • P is the target period that the scheduler tries to execute all tasks When the number of tasks in the run queue is increased, P will be lengthened to reduce performance overheads caused by too many context switches in a short amount of time CFS uses an important term called vruntime (virtual runtime) to track performance and scheduling status of each active thread in its whole lifecycle Virtual runtime υi of task Ti is added after each calculated time slice: υi = υ i + ti × N0 , ωi (2) where ti is the execution time of task Ti in the last execution period, N0 is a constant (N0 = 1024) Nice is a parameter of each task to representing its priority These vruntime values and other scheduling informations of all tasks are stored in CFS using a self-balancing binary tree named “Red-Black tree” CFS tries to put the task Ti with lowest υi to the left-most node of the tree, so that it can be retrieved instantly in the next scheduling period By looking into the internal CFS scheduling algorithm, we can see that all calculations of time slice Si and υi in equations (1) and (2) not take into account target frequency fj of the target CPU core cj in a multi-core or multi-processor system When a running task at the runtime is migrated from one core cj to another ck with different frequency fj = fk , or when the governor reduces core frequency, CPU power may be greatly lost and consequently the system would produce a very bad user experience One example of this consequence is the case when the migrated task is responsible for rendering user interface (UI) and the system becomes very laggy when responding to user interaction on mobile devices 3.2 CFS Limitation In order to demonstrate the current limitation of CFS, we present a scenario where user interface is unresponsive to 419 On the other hand, the governor ’s sampling time is usually configured as a multiple of CFS scheduler’s time slice in the Linux kernel: τi = π × Si In other words, CFS is working with a smaller (and finer) granularity of time than the governor ’s counterpart Thus, we have: potentially provide more CPU power and ensure its fairness for frequency changes IMPROVEMENT TO CFS In this section, we propose a frequency-aware scheduler (hereinafter called FA-CFS) as an extension of CFS scheduler The main idea of our proposed scheduler is to optimize task weight ωi and time slice Si of target task Ti according to frequency changes We propose a scheduler to balance workload and difference in frequencies In detail, we model a workload with its parameters in a multi-core CPU with dynamic frequency managed by the governor and thread scheduling tasks managed by the scheduler This workload is executed in a multitasking, time-sharing and preemptive operating system Let W be a workload that performed in a single thread and can be considered as a number of CPU cycles required to perform a task A workload is measured as a multiplication of speed and time In the simplest case, if this workload is scheduled on a single core CPU with constant frequency f (approximately proportional to number of instructions per second), we have: W = f × T, ωi = fij × (π × Si + ζi ) Due to a large difference between the governor ’s sampling time and scheduler ’s time slice, when the running thread of the workload W is migrated from one core cj to another ck with frequencies fij ≥ fik , performance penalty δi (in terms of work) of a single sampling time slot τi for a lowered CPU speed can be estimated as: δi = ωi − ωi ≤ (fij − fik ) × π × Si + fij × ζi − fik × ζi (10) On the other hand, CFS has scheduling complexity of O(logN ) (N is number of active tasks) [17] N is often unchanged unless there is a new creation of thread or process Therefore, amount of work (frequency × time) for accounting and scheduling is generally a constant between sampling intervals In other words, fij × ζi = fik × ζi Inequality 10 can be simplified as: (3) where T is the total time (in seconds) of execution Generally, the scheduler spends a little CPU time (ζi ) for accounting and selecting the next scheduled thread after each sampling time τi [4] This time can be considered as performance overhead of the process scheduler Therefore, T in equation (3) becomes: δi ≤ (fij − fik ) × π × Si m n (τi + ζi ), (4) (τi + ζi ) (5) To reach this goal, in each single time slice Si , it is possible to counteract with the changes of frequency (i.e minimizing performance penalty) by providing more CPU computational power to this particular workload The extra CPU power can be allocated to this task on core ck by increasing time slice length to Si (the previously allocated time slice is Si on core cj ) When applying this counterbalance, performance penalty δi in inequality (11) becomes n (6) i=1 where fi is the CPU frequency at sampling time ti Since we have a multi-core processor, equation (6) becomes: δi ≤ fij × π × Si − fik × π × Si n fij × (τi + ζi ), (7) i=1 fij is frequency of CPU core cj at sampling time τi where Consider that our global workload W is split into n microworkloads ωi performed during n sampling time: W = n i=1 ωi Each micro-workload at sampling time τi is therefore calculated as: ωi = × (τi + ζi ) (13) p=1 As previously discussed, since the CPU frequency is managed by the governor (in order to minimize power consumption), it fluctuates at runtime based on the total workload of the whole system As a result, CPU frequency f is not a constant: fi × (τi + ζi ), ((fpj − fpk ) × π × Sp ) Minimize i=1 fij (12) m n W = ((fpj − fpk ) × π × Sp ) p=1 After defining total performance penalties because of frequency changes in equation (12), we can state the main objective of our improvement in FA-CFS as: where n is the total number of sampling times during the whole execution duration When taking ζi into account, our global workload in equation (3) becomes: W = m δp ≤ p=1 i=1 W =f× (11) During the workload duration, with m migrations or frequency changes, the total performance penalty (in terms of amount of work) of inequality (11) becomes: ∆= T = (9) In an ideal situation, this performance penalty can be surpressed (i.e we completely counterbalance this frequency difference), δi = 0, therefore fij × π × Si − fik × π × Si ≥ 420 (15) Thus, we can proportionally resize the time slice scale: Si ≥ (8) (14) fij × Si fik (16) Like in aforementioned equations (1) and (2) in section 3.1, time slice estimation of CFS is also proportional to various task weights and run queue weight: fj ωi ωi × P ≥ ik × ×P Ωr Ωr fi JavaScript with Chronium’s V8 JavaScript engine) In order to avoid preloaded images, we clear the browser cache before starting each experiment session We involved a total of users in our experiment With each user, we asked them to perform 16 browsing sessions (8 on each of the two Android devices, described later) On each device, users performed sessions with CFS scheduler and sessions with our FA-CFS scheduler With each scheduler, governors with different characteristics were used in order to manage rising and declining system load with frequency ramp up and ramp down The governors included in our experiments are interactive (default, fastest ramp up with intermediate frequencies, best latency), conservative (slow ramp up), ondemand (fast ramp up, fast ramp down, almost between minimum and maximum frequencies), and performance (keep highest frequency, waste energy) [18] Technical Choices: On the hardware side, our experiments are performed on two categories of Android devices: one LG Nexus and one Asus Nexus Wifi (2012), representing phone and tablet, respectively LG Nexus has a better hardware configuration (RAM is doubled and 30% better CPU core frequency) than the Nexus Wifi 2012 On the software side, we build from source an aftermarket open-source operating system called CyanogenMod, based on Android Open Source Project (AOSP) We use the latest version of CyanogenMod with their supported Linux kernel to implement our model We decided to build CyanogenMod from source because of the ability to customize Linux kernel and flash (or install) the kernel along with the whole operating system into our devices We use an Android’s developer option called “Profile GPU rendering” to monitor and gather interface frame times during the experiments We then use Android’s integrated “dumpsys” tool on the mobile devices to collect through an USB cable various statistic informations, including the monitored interface frame times (17) As a result, our scheduler can counteract with frequency changes by proportionally distribute these weights as follows: ωi ≥ fij × ωi fik (18) We implement our proposed frequency-aware scheduler in Linux environment where our model acts as a frequencyaware extension to the CFS scheduler We use the CPUFreq interface to call the governor for collecting CPU frequencies [18] Having extracted frequencies, we implement our proposed algorithm to balance time slice in Linux’s CFS We use the CPUFreq’s userspace sysfs interface in order to gather statistical information EVALUATION The goal of this section is to present the improvement of responsiveness of mobile user interface to user interaction provided by our FA-CFS scheduler in comparison with CFS scheduler We first present the setting of our experiments, and then provide an analysis of our experimental results 5.1 Experimental Setup Interface Frame Time Measurement: In order to evaluate our proposed FA-CFS scheduler, we use interface frame time as the main metric to measure the improvement of responsiveness of mobile user interface to user interactions We chose to measure interface frame time since it plays an important role in ensuring user experience A fully rendered frame is passed through a set of steps in Android rendering pipelines: execute the issued layout commands, process the swapping buffers, prepare the texture and finally draw the content to the screen Evaluation Scenario: In our experiment, we implement a popular scenario where users browse an online news website using smartphones and tablets, which are installed CFS and FA-CFS Since we want to compare the efficiency of our FA-CFS with CFS, we divide our scenario into two main steps where in the first step, users were asked to browse the online news website (http: //bbc.com in our experiments) with smartphones which was installed CFS; and in the second step, users were asked to browse the same online news website but with FA-CFS installed In both two steps, we recorded interface frame times created by user interactions during their browsing sessions A browsing session in our experiment includes: (1) User starts the stock browser, (2) he types the URL http://bbc com, (3) he waits for page load, and finally (4) he scrolls up and down as soon as one or more parts of the page content appears In this scenario, there are three different types of workload created by the UI thread, background network threads (to fetch data from remote server) and the browser engine (in charge of parsing HTML and processing 5.2 Experimental Results Interface Frame Time Peaks: Figure shows a set of captured frame times from one user session on the LG Nexus with CFS and interactive governor It can be seen from this figure that frame times during this session are not stabilized, but generally are smaller than the optimal 16.6ms In the first part of this session (frame - 100), frame times were relatively high because the web browser needs to perform tasks at the same time: fetching web content, parsing partial HTML contents as they arrive, and rendering them on the screen Rendering thread is not provided with enough computational power because the background threads are overloading the CPU, thus the UI thread struggles to maintain an optimal frame time Since frame 125, page fetching and HTML parsing tasks are finished, but there exist very high frame times, some exceeded 40ms These peaks (or spikes) cause “micro stuttering”, a term used to indicate irregular delays between frames being rendered [19] Micro stuttering decreases user experience, even though the average frame rate is high enough These high frame time peaks can be explained as a consequence of CPU core frequency changes and the UI thread suffers from these 421 50 Draw Prepare Process Execute 16.6ms (60fps) limit Choppy Frames with Frame Time Peaks High frame times, Unresponsive User Interface Time(ms) 40 30 20 10 0 50 100 150 Frame 200 250 Figure 2: Interface frame time peaks on the LG Nexus with CFS scheduler and Interactive governor Table 1: Average frame time percentile (ms) of CFS vs FA-CFS with governors on Nexus Wifi Interactive Ondemand Conservative Performance % CFS FACFS ± CFS FACFS ± CFS FACFS ± CFS FACFS ± 90 91 92 93 94 95 96 97 98 99 100 17.18 17.33 17.76 18.22 18.89 20.44 23.13 27.25 31.29 38.94 48.58 14.27 14.46 14.68 14.93 15.45 15.98 16.08 17.21 23.16 29.12 37.3 -16.9% -16.6% -17.3% -18.1% -18.2% -21.8% -30.5% -36.8% -26.0% -25.2% -23.2% 17.94 18.11 18.45 18.72 19.96 22.89 23.49 29.83 32.83 41.77 55.12 15.02 15.24 15.45 15.65 15.88 16.25 16.73 21.29 25.57 35.1 44.05 -16.3% -15.8% -16.3% -16.4% -20.4% -29.0% -28.8% -28.6% -22.1% -16.0% -20.1% 22.26 22.64 22.83 23.34 23.87 27.6 30.03 34.53 45.68 60.34 83.1 21.19 21.55 22.77 23.19 23.56 28.02 31.62 33.24 43.92 62.17 77.38 -4.8% -4.8% -0.3% -0.6% -1.3% 1.5% 5.3% -3.7% -3.9% 3.0% -6.9% 11.93 12.36 12.75 13.29 13.98 14.73 15.62 16.79 19.03 23.68 31.21 11.81 12.21 12.6 13.42 14.05 14.61 15.88 16.47 19.49 25.23 30.85 -1.0% -1.2% -1.2% 1.0% 0.5% -0.8% 1.7% -1.9% 2.4% 6.5% -1.2% Table 2: Average frame time percentile (ms) of CFS vs FA-CFS with governors on Nexus Interactive Ondemand Conservative Performance % CFS FACFS ± CFS FACFS ± CFS FACFS ± CFS FACFS ± 90 91 92 93 94 95 96 97 98 99 100 14.54 14.78 14.82 15.48 15.63 16.03 16.49 17.58 18.24 24.52 35.36 14.03 14.18 14.25 14.81 15.27 15.42 15.86 16.56 17.14 20.36 30.31 -3.5% -4.1% -3.8% -4.3% -2.3% -3.8% -3.8% -5.8% -6.0% -17.0% -14.3% 15.41 15.62 16.03 16.19 16.49 16.91 17.5 18.04 19.27 23.83 36.65 13.14 13.45 13.87 14.32 14.67 15.29 15.83 16.75 19.41 22.01 33.41 -14.7% -13.9% -13.5% -11.6% -11.0% -9.6% -9.5% -7.2% 0.7% -7.6% -8.8% 20.75 21.07 22.37 23.86 25.91 27.77 31.46 36.17 40.34 50.48 73.55 21.25 22.32 22.81 23.99 26.46 27.29 31.86 35.82 41.19 52.16 71.81 2.4% 5.9% 2.0% 0.5% 2.1% -1.7% 1.3% -1.0% 2.1% 3.3% -2.4% 11.58 11.86 12.15 12.51 12.99 13.34 14.02 15.16 16.42 19.32 24.98 12.32 12.53 12.85 13.28 13.75 14.06 14.5 16.42 17.31 19.38 25.21 6.4% 5.6% 5.8% 6.2% 5.9% 5.4% 3.4% 8.3% 5.4% 0.3% 0.9% 422 differences, similar to the scenario that we previously discussed in section 3, figure Frame Time Percentile: In order to analyze the effectiveness of our FA-CFS scheduler, we use the statistical metric frame time percentile The metric is described as follows: an xth frame percentile at y milliseconds shows that during the experiment, x% of all frame times are less than y milliseconds Frame time percentile represents the stability of frame time and thus, the “quality” of user experience in interactions In this part of evaluation, we focus on analyzing average frame time percentile of all user sessions Tables and show average frame time percentiles of all user sessions on both devices, the LG Nexus and the Asus Nexus Wifi 2012, with different governors It is expected that frame time percentiles of the Nexus are larger than the Nexus 4’s one, because the Nexus has lower hardware configuration yet higher screen resolution It is worth reminding that interactive is the default governor on most mobile phones With the two highly dynamic governors, interactive and ondemand, these tables show a general observation that FACFS achieves better frame time reduction with the Nexus than the Nexus The Nexus benefits greatly from our time slice optimization, with average 21.8% and 29% frame time decreased (with interactive and ondemand, respectively) for 95% amount of total rendered frames While showing less improvement regarding frame time percentiles, FA-CFS still achieves 3.8% and 9.6% enhancement These differences between the Nexus and Nexus can be interpreted as difference in hardware configuration (30% faster CPU and 4% less screen pixels on Nexus than Nexus 7) Not only does our frequency-aware FA-CFS scheduler reduces average frame times but also it provides better frame time stabilization than traditional CFS: 97th , 98th and 99th frame time percentiles provide big improvements on both devices Especially, with better 99th percentile (25.2% and 16% reduction for interactive and ondemand on Nexus 7), user has smoother and more responsive interface as well as experiences less micro stuttering frames during their interactions Furthermore, it can be seen from table that FA-CFS achieves considerably lower average frame time than CFS with interactive The lowest gain (from the lowest level 90th to 95th ) is 16.9% The difference starts increasing at 96th percentile (30%), reaches its peak at 97th (36.8%) and still keeps a wide margin until 100th (maximum frame time) Additionally, we can see an improvement in terms of frame time stabilization of FA-CFS with its ability to keep 97% number of frames under 16.6ms limit (instead of under 90% with the mainline CFS) on Asus Nexus Wifi On the other hand, the right halves of these tables exhibit less improvements for both devices with conservative and performance governors These are less dynamic governors than the previously discussed interactive and ondemand counterparts We observed that with the completely static governor performance, our FA-CFS barely achieved improvements throughout all user sessions This can be explained that performance always provides maximum CPU computational power to all possible threads without fref Number of Frames 1000 100 10 0 10 15 20 25 Frame Time (ms) 30 35 40 Figure 3: Frame time distribution of CFS on Motorola Moto X 2nd edition Number of Frames 1000 100 10 0 10 15 20 25 Frame Time (ms) 30 35 40 Figure 4: Frame time distribution of FA-CFS on Motorola Moto X 2nd edition as little improvement as 6.9% (on Nexus 7) and 2.4% (on Nexus 4) at 100th percentile General frame time did not earned much reduction because this governor tries to minimize CPU frequency as much as possible without many frequency changes The analyses of tables and above show that our FACFS enhances frame time stabilization, increases average frame rate and reduces frame time peaks (or spikes) with widely used governors (interactive and ondemand ) Due to this, our FA-CFS scheduler proves its efficiency in improving user experiences while interacting with mobile devices Frame Time Distribution: In order to further analyze the effectiveness of our FACFS scheduler, we use another statistical metric frame time distribution For this, we setup our experiment on an additional user session with a higher end mobile phone, Motorola Moto X (2nd edition) with a quad-core CPU, each running at 2.5GHz During user interactions, we gather 1189 frame times (in approximately 19 seconds of browsing BBC homepage) of CFS and FA-CFS with interactive governor, and represent them in histograms in figures and 4, respectively It can be seen from figure that even on a high end phone, CFS causes micro stuttering with frames longer than 16.6ms Some frames take even more than 32ms (doubles the 16.6ms limit) These peaks cause choppy during web content scrolling in the browser Applying our FA-CFS into Cyanogenmod greatly reduces these peaks (figure 4) Maximum frame time for FA-CFS is 24ms, when compared to 37ms on CFS In total of 1189 frames, FA-CFS produced only 14 frames longer than the 16.6ms limit In contrast, this number of the CFS counterpart is 24 From these re- j quency changes ( f ik = 1) Its conservative sibling achieves i 423 sults, our FA-CFS achieves a reduction of 40% frame time peaks (24 frames down to 14 frames) Additionally, frame times are better packed in the mean 8ms range These two figures clearly illustrate the benefit to improve user experience of our FA-CFS, even on high end mobile device [8] CONCLUSION AND PERSPECTIVES This paper proposed a new frequency-aware process scheduler for improving user experience on multi-core mobile systems We built a model which acts as an extension of the Linux default scheduler (Completely Fair Scheduler - CFS) for taking into account the dynamic CPU frequency when scheduling the tasks Our model helps in increasing responsiveness of mobile user interface to user interactions by lowering and stabilizing interface frame times The experiments showed that our proposed FA-CFS scheduler reduces the amount of frame time peaks up to 40%, which greatly brings benefits to multi-core mobile systems where user experience relies largely on responsiveness of user interface Several research directions can be taken into account to continue this work First and foremost, since our work helps in improving user experience on mobile systems, it worths investigating our model on various workloads to see if it can bring benefits on larger multi-core and multi-CPU platforms, i.e desktops and virtualized servers [20] Secondly, combining our frequency-aware scheduler with performanceoriented scheduler (e.g BFS scheduler) is also an interesting research direction Our FA-CFS scheduler can take into account BFS scheduler’s advancements to improve UI responsiveness and save CPU power Last but not least, we wonder if our frequency-aware improvement can be applied on the Red-Black tree by restructuring it based on core frequencies at runtime [9] [10] [11] [12] [13] [14] [15] REFERENCES [1] C Van Berkel Multi-core for mobile phones In Proceedings of the Conference on Design, Automation and Test in Europe, pages 1260–1265 European Design and Automation Association, 2009 [2] Y G Ji, J H Park, C Lee, and M H Yun A usability checklist for the usability evaluation of mobile phone user interface International Journal of Human-Computer Interaction, 20(3):207–231, 2006 [3] A I Wasserman Software engineering issues for mobile application development In Proceedings of the FSE/SDP workshop on Future of software engineering research, pages 397–400 ACM, 2010 [4] A Silberschatz, P Galvin, and G Gagne Applied operating system concepts John Wiley & Sons, Inc., 2001 [5] W Yuan and K Nahrstedt Energy-efficient soft real-time cpu scheduling for mobile multimedia systems In ACM SIGOPS Operating Systems Review, volume 37, pages 149–163 ACM, 2003 [6] D Ferreira, A K Dey, and V Kostakos Understanding human-smartphone concerns: a study of battery life In Pervasive computing, pages 19–33 Springer, 2011 [7] P Yang, C Wong, P Marchal, F Catthoor, D Desmet, D Verkest, and R Lauwereins [16] [17] [18] [19] [20] 424 Energy-aware runtime scheduling for embedded-multiprocessor socs IEEE Design & Test of Computers, (5):46–58, 2001 N B Rizvandi, J Taheri, A Y Zomaya, and Y C Lee Linear combinations of dvfs-enabled processor frequencies to modify the energy-aware scheduling algorithms In Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on, pages 388–397 IEEE, 2010 S M Mostafa and S Kusakabe Towards reducing energy consumption using inter-process scheduling in preemptive multitasking os In 2016 International Conference on Platform Technology and Service (PlatCon), pages 1–6, Feb 2016 V Pallipadi and S B Siddha Processor power management features and process scheduler: Do we need to tie them together? LinuxConf Europe, pages 18, 2007 J H Schă onherr, J Richling, M Werner, and G Mă uhl Event-driven processor power management In Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking, pages 61–70 ACM, 2010 P Valente and M Andreolini Improving application responsiveness with the bfq disk i/o scheduler In Proceedings of the 5th Annual International Systems and Storage Conference, page ACM, 2012 C S Pabla Completely fair scheduler Linux Journal, 2009(184):4, 2009 I Stoica, H Abdel-Wahab, K Jeffay, S K Baruah, J E Gehrke, and C G Plaxton A proportional share resource allocation algorithm for real-time, time-shared systems In Real-Time Systems Symposium, 1996., 17th IEEE IEEE, 1996 C McAnlis, P Lubbers, B Jones, D Tebbs, A Manzur, S Bennett, F d’Erfurth, B Garcia, S Lin, I Popelyshev, et al Applying old-school video game techniques in modern web games In HTML5 Game Development Insights Springer, 2014 M Claypool, K Claypool, and F Damaa The effects of frame rate and resolution on users playing first person shooter games In Electronic Imaging 2006, pages 607101–607101 International Society for Optics and Photonics, 2006 P Pawar, S Dhotre, and S Patil Cfs for addressing cpu resources in multi-core processors with aa tree International Journal of Computer Science and Information Technologies, 2014 V Pallipadi and A Starikovskiy The ondemand governor In Proceedings of the Linux Symposium, volume 2, pages 215–230 sn, 2006 J.-M Arnau, J.-M Parcerisa, and P Xekalakis Parallel frame rendering: trading responsiveness for energy on a mobile gpu In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques IEEE Press, 2013 G Von Laszewski, L Wang, A J Younge, and X He Power-aware scheduling of virtual machines in dvfs-enabled clusters In Cluster Computing and Workshops, 2009 CLUSTER’09 IEEE International Conference on, pages 1–10 IEEE, 2009 ... frequency-aware process scheduler for improving user experience on multi- core mobile systems We built a model which acts as an extension of the Linux default scheduler (Completely Fair Scheduler - CFS) for. .. importance for mobile devices since responsiveness of user interface and power consumption saving are two major criteria for ensuring mobile user experience In this work, we propose an extension for Linux... taking into account the dynamic CPU frequency when scheduling the tasks Our model helps in increasing responsiveness of mobile user interface to user interactions by lowering and stabilizing interface

Extended process scheduler for improving user experience in multi core mobile systems

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan