Applied reconfigurable computing architectures, tools, and applications 2018

760 163 0
Applied reconfigurable computing  architectures, tools, and applications 2018

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

LNCS 10824 Nikolaos Voros · Michael Huebner Georgios Keramidas · Diana Goehringer Christos Antonopoulos · Pedro C Diniz (Eds.) Applied Reconfigurable Computing Architectures, Tools, and Applications 14th International Symposium, ARC 2018 Santorini, Greece, May 2–4, 2018 Proceedings 123 Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen Editorial Board David Hutchison Lancaster University, Lancaster, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Zurich, Switzerland John C Mitchell Stanford University, Stanford, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel C Pandu Rangan Indian Institute of Technology Madras, Chennai, India Bernhard Steffen TU Dortmund University, Dortmund, Germany Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany 10824 More information about this series at http://www.springer.com/series/7407 Nikolaos Voros Michael Huebner Georgios Keramidas Diana Goehringer Christos Antonopoulos Pedro C Diniz (Eds.) • • • Applied Reconfigurable Computing Architectures, Tools, and Applications 14th International Symposium, ARC 2018 Santorini, Greece, May 2–4, 2018 Proceedings 123 Editors Nikolaos Voros Technological Educational Institute of Western Greece Antirrio Greece Michael Huebner Ruhr-Universität Bochum Bochum Germany Georgios Keramidas Technological Educational Institute of Western Greece Antirrio Greece Diana Goehringer Technische Universität Dresden Dresden Germany Christos Antonopoulos Technological Educational Institute of Western Greece Antirrio Greece Pedro C Diniz INESC-ID Lisbon Portugal ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-319-78889-0 ISBN 978-3-319-78890-6 (eBook) https://doi.org/10.1007/978-3-319-78890-6 Library of Congress Control Number: 2018937393 LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues © Springer International Publishing AG, part of Springer Nature 2018 This work is subject to copyright All rights are reserved by the Publisherwhether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Printed on acid-free paper This Springer imprint is published by the registered company Springer International Publishing AG part of Springer Nature The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Preface Reconfigurable computing platforms offer increased performance gains and energy efficiency through coarse-grained and fine-grained parallelism coupled with their ability to implement custom functional, storage, and interconnect structures As such, they have been gaining wide acceptance in recent years, spanning the spectrum from highly specialized custom controllers to general-purpose high-end programmable computing systems The flexibility and configurability of these platforms, coupled with increasing technology integration, have enabled sophisticated platforms that facilitate both static and dynamic reconfiguration, rapid system prototyping, and early design verification Configurability is emerging as a key technology for substantial product life-cycle savings in the presence of evolving product requirements, standards, and interface specifications The growth of the capacity of reconfigurable devices, such as FPGAs, has created a wealth of new research opportunities and intricate engineering challenges Within the past decade, reconfigurable architectures have evolved from a uniform sea of programmable logic elements to fully reconfigurable systems-on-chip (SoCs) with integrate multipliers, memory elements, processors, and standard I/O interfaces One of the foremost challenges facing reconfigurable application developers today is how to best exploit these novel and innovative resources to achieve the highest possible performance and energy efficiency; additional challenges include the design and implementation of next-generation architectures, along with languages, compilers, synthesis technologies, and physical design tools to enable highly productive design methodologies The International Applied Reconfigurable Computing (ARC) symposium series provides a forum for dissemination and discussion of ongoing research efforts in this transformative research area The series of editions started in 2005 in Algarve, Portugal The second edition of the symposium (ARC 2006) took place in Delft, The Netherlands, and was the first edition of the symposium to have selected papers published as a Springer LNCS (Lecture Notes in Computer Science) volume Subsequent editions of the symposium have been held in Rio de Janeiro, Brazil (ARC 2007), London, UK (ARC 2008), Karlsruhe, Germany (ARC 2009), Bangkok, Thailand (ARC 2010), Belfast, UK (ARC 2011), Hong Kong, SAR China (ARC 2012), California, USA (ARC 2013), Algarve, Portugal (ARC 2014), Bochum, Germany (ARC 2015), Rio de Janeiro, Brazil (ARC 2016), and Delft, The Netherlands (ARC 2017) This LNCS volume includes the papers selected for the 14th edition of the symposium (ARC 2018), held in Santorini, Greece, during May 2–4, 2018 The symposium attracted a large number of very good papers, describing interesting work on reconfigurable computing-related subjects A total of 78 papers were submitted to the symposium from 28 countries In particular, the authors of the submitted papers are from the following countries: Australia (3), Belgium (5), Bosnia and Herzegovina (4), Brazil (24), China (22), Colombia (1), France (3), Germany (40), Greece (44), VI Preface India (10), Iran (4), Ireland (4), Italy (5), Japan (22), Malaysia (2), The Netherlands (5), New Zealand (1), Norway (2), Poland (3), Portugal (3), Russia (8), Singapore (7), South Korea (2), Spain (4), Sweden (3), Switzerland (1), UK (18), and USA (11) Submitted papers were evaluated by at least three members of the Program Committee The average number of reviews per submission was 3.7 After careful selection, 29 papers were accepted as full papers (acceptance rate of 37.2%) and 22 as short papers These accepted papers led to a very interesting symposium program, which we consider to constitute a representative overview of ongoing research efforts in reconfigurable computing, a rapidly evolving and maturing field In addition, the symposium included a special session dedicated to funded research projects The purpose of this session was to present the recent accomplishments, preliminary ideas, or work-in-progress scenarios of on-going research projects Nine EU- and national-funded projects were selected for presentation in this session Several people contributed to the success of the 2018 edition of the symposium We would like to acknowledge the support of all the members of this year’s symposium Steering and Program Committees in reviewing papers, in helping the paper selection, and in giving valuable suggestions Special thanks also to the additional researchers who contributed to the reviewing process, to all the authors who submitted papers to the symposium, and to all the symposium attendees In addition, special thanks to Dr Christos Antonopoulos from the Technological Educational Institute of Western Greece for organizing the research project special session Last but not least, we are especially indebted to Anna Kramer from Springer for her support and work in publishing this book and to Pedro C Diniz from INESC-ID, Lisbon, Portugal, for his strong support regarding the publication of the proceedings as part of the LNCS series February 2018 Nikolaos Voros Michael Huebner Georgios Keramidas Diana Goehringer Organization The 2018 Applied Reconfigurable Computing Symposium (ARC2018) was organized by the Technological Educational Institute of Western Greece, by the Ruhr-Universität, Germany, and by the Technische Universität Dresden, Germany The symposium took place at Bellonio Conference Center in Fira, the capital of Santorini in Greece General Chairs Nikolaos Voros Michael Huebner Technological Educational Institute of Western Greece Ruhr-Universität, Bochum, Germany Program Chairs Georgios Keramidas Diana Goehringer Technological Educational Institute of Western Greece TU Dresden, Germany Publicity Chairs Luigi Carro Chao Wang Dimitrios Soudris Stephan Wong UFRGS, Brazil USTC, China NTUA, Greece TU Delft, The Netherlands EU Projects Track Chair Christos Antonopoulos Technological Educational Institute of Western Greece Proceedings Chair Pedro C Diniz INESC-ID, Lisbon, Portugal Web Chair Christos Antonopoulos Technological Educational Institute of Western Greece Steering Committee Hideharu Amano Jürgen Becker Mladen Berekovic Koen Bertels João M P Cardoso Keio University, Japan Universität Karlsruhe (TH), Germany Braunschweig University of Technology, Germany Delft University of Technology, The Netherlands University of Porto, Portugal VIII Organization Katherine (Compton) Morrow George Constantinides Pedro C Diniz Philip H W Leong Walid Najjar Roger Woods University of Wisconsin-Madison, USA Imperial College of Science, UK INESC-ID, Portugal University of Sydney, Australia University of California Riverside, USA The Queen’s University of Belfast, UK Program Committee Hideharu Amano Zachary Baker Jürgen Becker Mladen Berekovic Nikolaos Bellas Neil Bergmann Alessandro Biondi João Bispo Michaela Blott Vanderlei Bonato Christos Bouganis João Cardoso Luigi Carro Ray Cheung Daniel Chillet Steven Derrien Giorgos Dimitrakopoulos Pedro C Diniz António Ferrari Jỗo Canas Ferreira Ricardo Ferreira Apostolos Fournaris Carlo Galuzzi Roberto Giorgi Marek Gorgon Frank Hannig Jim Harkin Christian Hochberger Christoforos Kachris Kimon Karras Fernanda Kastensmidt Chrysovalantis Kavousianos Tomasz Kryjak Keio University, Japan Los Alamos National Laboratory, USA Karlsruhe Institute of Technology, Germany C3E, TU Braunschweig, Germany University of Thessaly, Greece University of Queensland, Australia Scuola Superiore Sant’Anna, Italy FEUP/Universidade Porto, Portugal Xilinx, Ireland University of São Paulo, Brazil Imperial College, UK FEUP/Universidade Porto, Portugal Instituto de Informática/UFRGS, Brazil City University of Hong Kong, SAR China AIRN - IRISA/ENSSAT, France Université de Rennes 1, France Democritus University of Thrace, Greece INESC-ID, Portugal Universidade de Aveiro, Portugal INESC TEC/University of Porto, Portugal Universidade Federal de Viỗosa, Brazil Technological Educational Institute of Western Greece, Greece TU Delft, The Netherlands University of Siena, Italy AGH University of Science and Technology, Poland Friedrich-Alexander University Erlangen-Nürnberg, Germany University of Ulster, UK TU Darmstadt, Germany ICCS, Greece Think Silicon S.A., Greece Universidade Federal Rio Grande Sul - UFRGS, Brazil University of Ioannina, Greece AGH University of Science and Technology, Poland Organization Krzysztof Kepa Andreas Koch Stavros Koubias Dimitrios Kritharidis Vianney Lapotre Eduardo Marques Konstantinos Masselos Cathal Mccabe Antonio Miele Takefumi Miyoshi Walid Najjar Horácio Neto Dimitris Nikolos Roman Obermeisser Kyprianos Papadimitriou Monica Pereira Thilo Pionteck Marco Platzner Mihalis Psarakis Kyle Rupnow Marco Domenico Santambrogio Kentaro Sano Yukinori Sato António Beck Filho Yuichiro Shibata Cristina Silvano Dimitrios Soudris Theocharis Theocharides George Theodoridis David Thomas Chao Wang Markus Weinhardt Theerayod Wiangtong Roger Woods Yoshiki Yamaguchi GE Global Research, USA TU Darmstadt, Germany University of Patras, Greece Intracom Telecom, Greece Universit de Bretagne-Sud - Lab-STICC, France University of São Paulo, Brazil University of Peloponnese, Greece Xilinx, Ireland Politecnico di Milano, Italy e-trees.Japan, Inc., Japan University of California Riverside, USA INESC-ID/IST/U Lisboa, Portugal University of Patras, Greece University of Siegen, Germany Technical University of Crete, Greece Universidade Federal Rio Grande Norte, Brazil Otto-von-Guericke Universität Magdeburg, Germany University of Paderborn, Germany University of Piraeus, Greece Advanced Digital Sciences Center, USA Politecnico di Milano, Italy Tohoku University, Japan Tokyo Institute of Technology, Japan Universidade Federal Rio Grande Sul, Brazil Nagasaki University, Japan Politecnico di Milano, Italy NTUA, Greece University of Cyprus, Cyprus University of Patras, Greece Imperial College, UK USTC, China Osnabrück University of Applied Sciences, Germany KMITL, Thailand Queens University Belfast, UK University of Tsukuba, Japan Additional Reviewers Dimitris Bakalis Guilherme Bileki Ahmet Erdem Panagiotis Georgiou Adele Maleki Farnam Khalili Maybodi André B Perina IX University of Patras, Greece University of São Paulo, Brazil Politecnico di Milano, Italy University of Ioannina, Greece University of Siegen, Germany University of Siena, Italy University of São Paulo, Brazil 738 A Sadek et al Fig The generic development process complex and diverse application domain Due to the large data volumes, high performance is needed to analyze images in systems with real-time constraints Furthermore, image processing systems are often deployed in scenarios where power, energy, weight, cost and physical size are first-order constraints The result is an overwhelming challenge for developers The overall objective of the Towards Ubiquitous Low-power Image Processing Platforms (TULIPP) project is to reduce the magnitude of this challenge by providing a complete image processing system package that developers can leverage towards their specific embedded application [9] We refer to this package as the TULIPP Starter Kit (TSK) which consists of a reference handbook, project applications and a platform instance The reference handbook is a highlevel best-practice introduction to embedded low-power image processing which is complemented by a collection of concrete, validated guidelines for embedded image processing system design The project applications are industrygrade examples taken from the medical, automotive and unmanned aerial vehicle domains Finally, the platform instance consists of a hardware platform, a realtime operating system and a collection of design and analysis tools In this paper, we focus on the design and analysis tools that we have developed during the first year of the TULIPP project The main objective of the TULIPP tools is to contribute to substantially reducing the effort required to implement an image processing solution on selected heterogeneous platforms The tools guide the developer through stepwise improvements to an image processing implementation and are designed to use the hardware technology and the operating system services developed in TULIPP The overall development process is based on software optimization best practices by iteratively guiding the developer through successive changes to the code with the aim of achieving real-time performance with maximum energy efficiency To maximize impact, we leverage existing tools where these are available We connect all components of the platform instance – hardware, RTOS, and tools – using an abstraction called the generic development process which is shown in Fig The generic development process is an iterative process for programmers to implement image processing applications that meet low-power requirements while leveraging the heterogeneous processing resources available on the platform instance The starting point of the generic development process Overview of STHEM 739 Fig The TULIPP-PI1 platform instance held together by STHEM to enable the generic development process [8] is the baseline application that executes with correct sequential behaviour on a modern machine with a general-purpose processor High-level partitioning decisions decide which baseline functions should be accelerated and how Partitioning splits off into accelerator-specific development stages that later join to produce an integrated application with the same correct behaviour as the baseline The performance of the integrated application is checked against requirements If found lacking, the partitioning and development stages are restarted In this manner, programmers iteratively refine the baseline application to approach the required low-power and high performance features A platform instance can be created using any combination of hardware, RTOS, and development tools However, support for the generic development process in each platform instance is unlikely to be readily available For example, all the components of the TULIPP reference platform have independent workflows that partially overlap with the generic development process, and at a more basic level have poor to non-existent support for each other We build utilities to resolve limitations of components of platform instances to ensure simplified support for the generic development process Our utilities are collectively called Supporting uTilities for Heterogeneous EMbedded image processing platforms (STHEM) STHEM is designed to be as vendor-independent as possible to simplify implementation for arbitrary platform instances STHEM includes connecting glue that interfaces independent components together and standalone tools that extend individual components to provide complementary features The TULIPP toolchain is a combination of STHEM and existing components of a given platform instance that work together to simplify the generic development process for programmers Figure shows the TULIPP-PI1 platform instance which is platform instance that the TULIPP consortium is currently focusing most attention on The reason for the attention is familiarity with the components that make up the platform instance TULIPP-PI1 consists of the Sundance EMC2-ZU3 carrier board with the Xilinx Zynq UltraScale+ MPSoC processor arranged in a two-board configuration to expose a high degree of parallelism to applications The hardware is operated seamlessly by the HIPPEROS RTOS Application development tools in the platform instance are custom adaptations of Xilinx SDSoC and HIPPEROS tools to support multi-board acceleration and real-time requirements 740 A Sadek et al Table Limitations of TULIPP-PI1 components Utility EMC2-ZU3 HIPPEROS Power Measurement Utility (PMU) No power measurement hardware Does not quantify Cannot correlate power task power consumption with consumption application phases HIPPEROS & SDSoC (HSCL) SDSoC Cannot accelerate No support for tasks on FPGA HIPPEROS HW/SW Image Processing Library (IPL) Few optimized image processing functions Table lists the limitations of the main TULIPP-PI1 components and how our utilities alleviate these limitations The current version of STHEM includes the three utilities that are necessary to provide a minimal end-to-end image processing system for the TULIPP-PI1 platform: – The Power Measurement Utility (PMU) provides hardware support for measuring power in the EMC2-ZU3 and enables programmers to correlate instantaneous power samples with concurrent HIPPEROS application tasks and SDSoC’s HW/SW traces – HW/SW Image Processing Libraries (IPL) enables high performance and productivity for commonly used image processing operations – Dynamic Partial Reconfiguration Utility (DPRU) enables runtime reconfiguration of the FPGA fabric which can be used both within a single image processing algorithm and by the OS to switch accelerators at runtime – The HIPPEROS SDSoC Compatibility Layer (HSCL) adds HIPPEROS support to SDSoC, enabling programmers to accelerate HIPPEROS application tasks on FPGA accelerators The rest of the paper is organized as follows Section describes the implementation of the PMU, IPL, DPRU and HSCL utilities which is the main contribution of the paper We conclude the paper and indicate further work in Sect STHEM Utilities The STHEM utilities are a set of components that facilitate the development of low power image processing systems, shown in Fig In the current phase of the project, they integrate different tools and components in a single suite to make them easier to use for developers 2.1 Power Measurement Utility (PMU) Improving the power efficiency of embedded applications begins with prudent device selection For example, choosing FPGAs made with latest FinFET technology [1] Once the device is fixed, power efficiency is refined to desired Overview of STHEM 741 levels in successive stages of profiling and optimization Profiling uses power models during early design phases, and shifts to real-hardware measurements post-implementation While standard power profiling methods are available for HPC-like systems [16], power profiling in embedded systems remains largely ad-hoc [2,4,13,17,26] The Xilinx Zynq-based embedded platform that we have chosen for the applications of the TULIPP project, has poor support for power profiling Neither hardware nor vendor tools have support for measuring power consumption at runtime This complicates selection of application phases to direct power optimizations and makes it difficult to judge whether low-power requirements are met Ultimately, a key contribution of the project – design guidelines for lowpower embedded vision – cannot be demonstrated To solve this problem, we first looked towards solutions recommended by vendors Xilinx recommends adding on-board current sensors such as precision shunt resistors to provide current measurements to the XADC [27], a hard-IP block in the FPGA substrate of the Zynq A better solution, also recommended by Xilinx, is to replace the voltage regulators on the hardware platform with digital power controllers from Texas Instruments (TI) to measure current and voltages supplied to all power planes [24] Another option is to use special measurementfriendly variants of the embedded platform built by third-parties [23] While useful, these recommended hardware modifications were prohibitive due to cost reasons We also deliberated about a model-only approach, i.e., use the Xilinxprovided power model called the XPE [26] to refine power efficiency as much as possible during early design phases However, XPE can at best provide coarsegrained estimates and cannot correlate power problems with application phases We decided in the end to build external, cost-effective measurement hardware, complemented by specialized profiling software, to diagnose power problems of TULIPP applications at runtime and, in general, advance the state-ofthe-art in power profiling of embedded vision applications Power profiling implementation: Our power profiling approach is packaged as the PMU It essentially consists of an external measurement board that communicates power measurements to profiling tools on the host computer, as shown in Fig The external measurement board is custom-designed and has multiple current sensors that measure power consumed by individual units of interest on the embedded platform, i.e., the EMC2-DP board in TULIPP-PI1 Profiling tools collect additional profiling data from EMC2-DP and analyze it together with power measurements to diagnose problems Problems are shown on various visualization widgets, some of which are part of existing vendor tools We developed an external measurement board which we call Lynsyn Lynsyn uses two INA169 current-shunt monitors [22] from TI to measure and amplify currents across 0.1Ω shunt resistors connected in series with high-side current wires/PCB-tracks that drive units of interest on the EMC2-DP Measurements from the current-shunt monitors are sampled by a Teensy 3.6 microcontroller [21] using 13-bits and transmitted over USB to the host computer at approximately 12K samples per second This rate supports measurements of 742 A Sadek et al Fig Overview of power profiling application tasks with runtime longer than 83 micro-seconds Synchronization signals are sent over JTAG and LVTTL GPIO ports on the EMC2-DP to the Teensy to control measurements At present, we consider two units of interest – the Zynq SoM and the FMC port that connects to the camera Lynsyn can sense currents between 250 mA to A Measuring currents lower than 250 mA is possible by using larger shunt resistors The BoM cost of Lynsyn is less than 50 US dollars A protoboard version of Lynsyn connected to the EMC2 (non-stacked) is shown in Fig We validated the current measurements from Lynsyn using the Uni-T UT139C true-RMS digital multimeter as a reference for two hours of continuous operation Current measurements had negligible differences compared to the reference Rigorous testing with a constant current load is planned as part of future work Lynsyn’s design assumes that it is possible to insert shunt resistors in all current-carrying lines of interest However, not all current-carrying lines are accessible For example, rails that supply power to the FPGA substrate of the Zynq SoM are buried due to dense packaging constraints Potential workarounds include using current-mirrors to avoid inserting shunt resistors [13], or using special test fixtures that expose current-carrying lines on top layers [23] Power visualizations: Current samples sent from Lynsyn to the host computer are converted to power readings assuming a constant supply voltage and stored in the Common Trace Format (CTF) [6] by a profiling tool CTF is a flexible, high-throughput, binary trace format developed by the Multicore Association The power traces can be visualized using Trace Compass, an open-source, standalone viewer popularized by the Linux Tracing Toolkit (LTTng) project [14] Trace Compass enables correlation and filtering of power traces An example is provided in Fig Overview of STHEM 743 Fig Lynsyn, the power measurement board connected the EMC2-DP The current supply wire connects to a current sensor Synchronization signals are used to start and stop power profiling Fig Inspecting a power trace on Trace Compass We visualize instantaneous power computed from the current samples on a running line graph as shown in Fig This helps understand power trends in real-time as the application executes Abrupt, large changes in power values are flagged on the visualization to alert users SDSoC enables users to understand timing of application events in a timeline visualization called the AXI Trace Viewer [25] We extend the AXI Trace Viewer to visualize power traces correlated with application phases as shown in Fig This enables programmers to conveniently attribute power consumption to concurrent application events and isolate power problems However, we are not able to refine user interaction in this mode since SDSoC is closed-source software Improving the PMU: As part of future work, we intend to profile applicationspecific data such as the program counter and parallelization events via the JTAG port while collecting power samples The idea is to analyze this data to 744 A Sadek et al Fig Power monitor visualization tracks instantaneous power consumption during application execution Fig Attributing power consumption to application events on SDSoC’s AXI Trace Viewer pinpoint power problems on high-level semantic visualizations such as control flow graphs and grain graphs [15] 2.2 HW/SW Image Processing Libraries (IPL) The HW/SW Image Processing Library helps programmers implement accelerated image processing applications A template-based software library for streaming based applications has been implemented (C++) FPGAs can outperform other hardware architectures, like CPUs and GPUs, for streaming based applications as shown in [11] The provided functions have been optimized to be accelerated on FPGAs using SDSoC Furthermore, the library has been optimized for latency, memory throughput and resource usage The functions follow the OpenVX specification [12], to address a large group of users OpenVX is an open, royalty-free standard for cross platform acceleration of computer vision applications Additionally, more data types and auto-vectorization are supported for most functions Normally, an image processing function processes one pixel per clock cycle Using vectorization, it can process one, two, four or even eight pixel per clock cycle The maximum bit-width of the complete vector is set to 64-bit Therefore, the maximum vectorization depends on the bit-width of the image data One advantage of vectorization is the possibility to process higher image resolution Overview of STHEM 745 Another advantage is that the frequency of the design can be reduced Therefore, the power consumption of applications decreases The library contains several compile time optimizations, to reduce inputs from users For example, the Gaussian kernel coefficients are computed at compile time using the standard deviation and kernel size They are computed using double precision floating point numbers, then normalized and converted to fixedpoint numbers This computation does not consume extra resources of the FPGA logic The developer will also get compile time errors if unsupported data types or combinations of them are used, to increase usability Image data for functions can be in 8-bit, 16-bit or 32-bit fixed-point representation (unsigned/signed) There are three groups of library functions The first group consists of all windowed functions This includes × Scharr, × Sobel, × Median, Box, Gaussian Convolution and Custom Convolution filters All functions are normalized to avoid overflow (below 1.0 for unsigned and between 0.5 and −0.5 for signed) and optimized in their structure to reduce resource usage The windowed operations support replicated, constant and undefined border handling The second group consists of all pixel-wise functions, which are basically bitwise and arithmetic operations This includes: Absolute Difference, Arithmetic Addition, Arithmetic Subtraction, Gradient Magnitude, Pixel-wise Multiplication, Bitwise And, Bitwise Xor, Bitwise Or and Bitwise Not The arithmetic operations support conversion policies against overflow and different rounding policies if needed The last group contains all remaining functions This includes the Convert Bit Depth, Convert Color, Scale Down, Integral Image, Histogram and Table Lookup functions The Color Conversion function can convert between the RGB, RGBX and grayscale formats The Scale Down function supports nearest neighbor and bilinear interpolation 2.3 Dynamic Partial Reconfiguration Utility (DPRU) SoCs such as the Xilinx Zynq combine hardened processors with programmable logic which can be used to accelerate application hot-spots The programmable logic can be partitioned into static and dynamic regions The dynamic regions can be reconfigured at runtime while the logic in the static region is fixed The procedure of reconfiguring the dynamic region is called Dynamic Partial Reconfiguration (DPR) [3,5] DPR allows upgrading the design without the need to erase the whole FPGA and saves programming time Also, it allows more applications to be time-multiplexed onto the same FPGA The dynamic partial reconfiguration feature is used in TULIPP to: – Reduce the need for FPGA-resources by fitting more functionality on the same set of programmable hardware – Reduce power consumption by disabling dynamic regions of the FPGA that are not used and re-operating them when they are needed – Runtime upgrading which enables more implementation techniques to be deployed at run-time 746 A Sadek et al DPR is being integrated into the STHEM utilities to allow the TULIPP platform user to update the design freely Concretely, a set of TCL scripts have been developed to enable DPR within the Xilinx SDSoC high-level workflow [10] These scripts extend SDSoC functionality for embedded application developments and add more options to the software-hardware partitioning task In future work, we plan to add optimization logic that analyzes the design to help decide which parts of the application are implemented in static and dynamic regions 2.4 HIPPEROS SDSoC Compatibility Layer (HSCL) Xilinx SDSoC is a tool for developing applications for Xilinx System-on-Chip (SoC) architectures and enables the programmer to target its C/C++ code to one of the CPU cores or, through high level synthesis, to the FPGA fabric HIPPEROS [7] is a multi-core real-time operating system (RTOS) that is adapted for high performance and safety-critical embedded systems applications [8] The HIPPEROS SDSoC Compatibility Layer is added to Xilinx SDSoC to enable SDSoC to compile applications for HIPPEROS Previous research work involving the HIPPEROS RTOS includes multi-core micro-kernel design [18], power-aware real-time scheduling [20] and mixed-criticality scheduling [19] SDSoC is not easily extendable and supports only bare metal, FreeRTOS and Linux applications Being a closed-source application, it is not possible for a third party to add an additional OS to SDSoC The approach chosen in this project was to add HIPPEROS support under the disguise of being FreeRTOS To achieve this, the following components were necessary: – SDSoC platform description: The SDSoC platform description is used by SDSoC to target a specific hardware platform In addition to information about the available hardware, it also contains the necessary configuration files and libraries to compile for one of the three supported operating systems HSCL adds to the platform description by modifying the FreeRTOS configuration files such that HIPPEROS binaries are used instead of FreeRTOS – C library: Both SDSoC and HIPPEROS need to be initialized correctly when the developed application boots We cannot modify the SDSoC libraries, so the solution was to put all initialization code into a special C library that automatically gets linked to by the SDSoC toolchain Additionally, SDSoC requires some specific ABI compilation flags to set for every object file linked within the accelerated program Therefore, we created a specific HIPPEROS distribution dedicated to the Tulipp platform and the compatibility with SDSoC – Scripts: Unlike FreeRTOS, HIPPEROS needs an additional step after compilation to package the resulting elf file into an executable binary The necessary script for doing this is provided and presented to the user in the SDSoC SDcard generation step – Bootloader: In order to correctly boot a HIPPEROS application, a bootloader is necessary This is also the case when starting the application from the Xilinx Overview of STHEM 747 debugger It is not sufficient to upload the binary files to memory and jump to the entry address Therefore, a small bootloader is provided such that debugging and tracing from the SDSoC GUI or command line is possible Conclusion and Further Work In this paper, we have presented the underlying philosophy of the analysis and development tools that will be developed during the TULIPP project Further, we have described the implementation of our first four utilities: a novel power measurement and analysis utility (the PMU), a platform-optimized image processing library (the IPL), a dynamic partial reconfiguration utility (the DPRU), and an utility providing support for using the HIPPEROS RTOS within Xilinx SDSoC (the HSCL) The work achieved so far forms the basis of the research that will be carried out during the second half of the TULIPP project We will leverage the developed utilities to provide novel performance analysis and design space exploration tools that specifically focus on embedded image processing systems In addition, we aim to quantitatively compare our image processing library to other libraries and full-custom FPGA implementations Finally, we will use the utilities to improve the TULIPP use case applications The use cases are industry-grade applications within the medical, automotive and unmanned aerial vehicle domains Acknowledgement The work is funded by European Commission under the H2020 Framework Program for Research and Innovation under grant agreement number 688403 References Abusultan, M., Khatri, S.P.: A comparison of FinFET based FPGA LUT designs In: Proceedings of the 24th Edition of the Great Lakes Symposium on VLSI, GLSVLSI 2014, pp 353–358 ACM, New York (2014) Buschho, M., Gă unter, C., Spinczyk, O.: MIMOSA, a highly sensitive and accurate power measurement technique for low-power systems In: Langendoen, K., Hu, W., Ferrari, F., Zimmerling, M., Mottola, L (eds.) Real-World Wireless Sensor Networks LNEE, vol 281, pp 139–151 Springer, Cham (2014) https://doi.org/ 10.1007/978-3-319-03071-5 16 Cornil, M., Paolillo, A., Goossens, J., Rodriguez, B.: Research and implementation challenges of RTOS support for heterogeneous computing platforms In: Heterogeneous Architectures and Real-Time Systems Seminar, May 2017 Di Nisio, A., Di Noia, T., Carducci, C.G.C., Spadavecchia, M.: High dynamic range power consumption measurement in microcontroller-based applications IEEE Trans Instrum Meas 65(9), 1968–1976 (2016) Dye, D.: Partial reconfiguration of Xilinx FPGAs using ISE design suite, wp374 (v1.2) (2012) EfficiOS: Common Trace Format (CTF) (2017) http://www.efficios.com/ctf HIPPEROS (2017) http://hipperos.com/ 748 A Sadek et al Jahre, M., Djupdal, A., Kalms, L., Muddukrishna, A.: D4.1: basic tool chain Technical report, TULIPP Project (2017) Kalb, T., Kalms, L., Gă ohringer, D., Pons, C., Marty, F., Muddukrishna, A., Jahre, M., Kjeldsberg, P.G., Ruf, B., Schuchert, T., Tchouchenkov, I., Ehrenstrahle, C., Christensen, F., Paolillo, A., Lemer, C., Bernard, G., Duhem, F., Millet, P.: TULIPP: towards ubiquitous low-power image processing platforms In: 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), pp 306311, July 2016 10 Kalb, T., Gă ohringer, D.: Enabling dynamic and partial reconfiguration in Xilinx SDSoC In: ReConFigurable Computing and FPGAs (ReConFig) IEEE (2016) 11 Kalms, L., Gă ohringer, D.: Exploration of OpenCL for FPGAs using SDAccel and comparison to GPUs and multicore CPUs In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp 1–4, September 2017 12 Khronos Vision Working Group: The OpenVX Specification (2017) https://www khronos.org/registry/OpenVX/specs/1.2/OpenVX Specification 2.pdf 13 Konstantakos, V., Chatzigeorgiou, A., Nikolaidis, S., Laopoulos, T.: Energy consumption estimation in embedded systems IEEE Trans Instrum Meas 57(4), 797–804 (2008) 14 LLTng: Linux Tracing Toolkit Next Generation (2017) http://www.lttng.org 15 Muddukrishna, A., Jonsson, P.A., Podobas, A., Brorsson, M.: Grain graphs: OpenMP performance analysis made easy In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2016) 16 Mukhanov, L., Petoumenos, P., Wang, Z., Parasyris, N., Nikolopoulos, D.S., De Supinski, B.R., Leather, H.: ALEA: a fine-grained energy profiling tool ACM Trans Archit Code Optim (TACO) 14(1), (2017) 17 Nakutis, Z.: Embedded systems power consumption measurement methods overview MATAVIMAI 2(44), 29–35 (2009) 18 Paolillo, A., Desenfans, O., Svoboda, V., Goossens, J., Rodriguez, B.: A new configurable and parallel embedded real-time micro-kernel for multi-core platforms In: Proceedings of the ECRTS Workshop on Operating Systems Platforms for Embedded Real-Time applications, July 2015 19 Paolillo, A., Rodriguez, P., Svoboda, V., Desenfans, O., Goossens, J., Rodriguez, B., Girbal, S., Faug`ere, M., Bonnot, P.: Porting a safety-critical industrial application on a mixed-criticality enabled real-time operating system In: Proceedings of the 5th Workshop on Mixed-Criticality Systems, December 2017 20 Paolillo, A., Rodriguez, P., Veshchikov, N., Goossens, J., Rodriguez, B.: Quantifying energy consumption for practical fork-join parallelism on an embedded realtime operating system In: Proceedings of the 24th International Conference on Real-Time Networks and Systems, RTNS 2016, pp 329–338 ACM (2016) 21 PJRC: Teensy 3.6 (2017) https://www.pjrc.com/store/teensy36.html 22 Texas Instruments: INA169-Q1: Automotive Grade, 60-V, High-Side, High-Speed, Current Output Current Shunt Monitor (2017) http://www.ti.com/product/ INA169-Q1/datasheet/detailed description#SGLS1854308 23 Trenz-Electronic: Test fixture for Zynq UltraScale+ MPSoC (2017) https://wiki trenz-electronic.de/display/PD/TEBT0808+TRM Overview of STHEM 749 24 Xilinx: Measuring ZC702 Power using TI Fusion Power Designer Tech Tip (2014) http://www.wiki.xilinx.com/Zynq-7000+AP+SoC+Low+Power+Techniques+part +2+-+Measuring+ZC702+Power+using+TI+Fusion+Power+Designer+Tech+ Tip 25 Xilinx: Xilinx Environment Tutorial (UG1028) (2016) 26 Xilinx: Xilinx Power Estimator (2017) https://www.xilinx.com/products/ technology/power/xpe.html 27 Xilinx: Xilinx XADC User Guide (2017) https://www.xilinx.com/support/ documentation/user guides/ug480 7Series XADC.pdf Author Index Daneshtalab, Masoud 304 de Moura, Rafael Fão 355 de Oliveira, Ádria Barros 647 Dhar, Anindya Sundar 537 Djupdal, Asbjørn 737 Doan, Ng Anh Vu 142 Dollas, Apostolos 459 Dondo, Julio 446 Dounis, Anastasios 166 Durak, Umut 700 Durelli, Gianluca 29 Abdoalnasir, Almabrok 166 Afsharmazayejani, Raheel 304 Agyeman, Michael Opoku 217 Alaei, Mohammad 304 Alefragis, Panayiotis 700 Amano, Hideharu 43, 142 Andrews, David 153 Anlauf, Joachim K 81 Antonopoulos, Christos P 269 Antonopoulos, Christos 712 Antonopoulos, Konstantinos 269 Anuchan, H V 564 Appiah, Kofi 204 Bähr, Steffen 615 Bapp, Falco K 685 Beck, Antonio C 499 Beck, Antonio Carlos Schneider 367 Becker, Juergen 700 Becker, Jürgen 485, 615, 685 Bednara, Marcus 700 Benevenuti, Fabio 243 Bhowmik, Deepayan 204, 523 Birbas, Alexios 640 Birbas, Michael 640 Blott, Michaela 29 Bosio, Alberto 647 Bouganis, Christos-Savvas Bozzoli, Ludovica 319 Braeken, An 281 Brandalero, Marcelo 499 Buttazzo, Giorgio 392 Erichsen, Augusto G 231 Exenberger Becker, Pedro H 231, 355, Fan, Baoyu 578 Faraone, Julian 16 Ferreira, João Canas 511 Ferreira, Mário Lopes 511 Figuli, Shalina Percy Delicia Fraser, Nicholas J 29 Fricke, Florian 661 Fukuda, Masahiro 192 499 615 Gambardella, Giulio 29 Garcia, Paulo 523 Georgopoulos, Konstantinos 459, 724 Goehringer, Diana 407, 433, 712, 737 Gogos, Christos 700 Goulas, George 700 Guo, Zhenhua 578 Caba, Julián 446 Cardoso, João M P 446 Carro, Luigi 367, 499 Chattopadhyay, Anupam 119 Cheung, Peter Y K 16, 29 Chrysos, Grigorios 459 Hansmeier, Tim 153 Heid, Kris 471 Herath, Kalindu 105 Hironaka, Kazuei 142 Hochberger, Christian 93, 471 Hoppe, Augusto W 485 Huebner, Michael 331, 343, 511, 661, 712 da Silva, Bruno 281 Dagioglou, Maria 712 Inoguchi, Yasushi 192 Ioannou, Aggelos 724 752 Author Index Jahre, Magnus 737 Janßen, Benedikt 331 Janus, Piotr 379 Jetly, Darshan 255 Jordan, Michael Guilherme 355 Jost, Tiago Trevisan 499 Jung, Lukas Johannes 93 Kachris, Christoforos 67, 673 Kalaitzakis, Kostas 392 Kalms, Lester 737 Kamal, Ahmed 433 Karakonstantis, George 551 Karkaletsis, Vangelis 712 Kasnakli, Koray 700 Kastensmidt, Fernanda Lima 243, 485, 647 Kästner, Florian 331 Katsantonis, Konstantinos 67 Katsimpris, Merkourios 700 Keramidas, Georgios 712 Khan, Habib ul Hasan 433 Khan, Sikandar 392 Kim, Junsik 132 Kitsos, Paris 294 Koch, Andreas 420 Konstantopoulos, Stasinos 712 Koromilas, Elias 673 Kouris, Alexandros KrishnaKumar, N 178, 564 Kryjak, Tomasz 379 Kudoh, Tomohiro 43 Lavagno, Luciano 724 Leong, Philip H W 16, 29 Li, Long 578 Li, Xuelei 578 Liebig, Björn 420 Littlewood, Peter 627 Liu, Junyi 16 López, Juan Carlos 446 Malakonakis, Pavlos 459, 724 Mavroidis, Iakovos 724 Merchant, Farhad 119 Michaelson, Greg 523 Minhas, Umar Ibrahim 551 Mirzaei, Shahnam 603, 627 Mousouliotis, Panagiotis G 55 Muddukrishna, Ananya 737 Müller, David 700 Musha, Kazusa 43 Nakada, Takashi 590 Nakashima, Yasuhiko 590 Nandy, S K 119, 178, 564 Narayan, Ranjani 119 Natarajan, Santhi 178, 564 Navarro, Osvaldo 343 Nikitakis, Antonis 459 Ofori-Attah, Emmanuel 217 Oppermann, Julian 420 Pal, Debnath 178, 564 Palchaudhuri, Ayan 537 Panagiotou, Christos 269 Paolillo, Antonio 737 Papadimitriou, Kyprianos 392 Papaefstathiou, Ioannis 459, 724 Parelkar, Milind 255 Park, Jaehyun 132 Pereira, Monica M 231 Petrou, Loukas P 55 Pfau, Johannes 615 Piszczek, Kamil 379 Platzner, Marco 153 Pnevmatikatos, Dionysios 459 Podlubne, Ariel 737 Prakash, Alok 105 Proiskos, Grigorios 640 Psarakis, Mihalis 166 Pyrgas, Lampros 294 Raha, Soumyendu 119 Ramamoorthy, Krishna Murthy Kattiyan 627 Reder, Simon 700 Rettkowski, Jens 407 Rezaei, Amin 304 Rincón, Fernando 446 Rizakis, Michalis Rodrigues, Gennaro S 647 Rutzig, Mateus Beck 355, 367 Sadek, Ahmad 737 Sartor, Anderson L 231, 367, 499 Schüller, Sebastian 81 Schwiegelshohn, Fynn 712 Author Index Segers, Laurent 281 Sezenlik, Oğuzhan 81 Shahin, Keyvan 661 Sharif, Uzaif 603 Sinnen, Oliver 420 Soudris, Dimitrios 67, 673 Souza, Jeckson Dellagostin 231, 367 Srikanthan, Thambipillai 105 Stamelos, Ioannis 673 Stavrinos, Georgios 712 Steenhaut, Kris 281 Sterpone, Luca 319 Stewart, Robert 523 Su, Jiang 16, 29 Tampouratzis, Nikolaos 459 Theodoridis, George 700 Thomas, David B 16, 29 Touhafi, Abdellah 281 Tzanis, Nikolaos 640 Valouxis, Christos 700 Vatwani, Tarun 119 Venieris, Stylianos I Voros, Nikolaos S 269, 712 Voros, Nikolaos 700 Vu, Hoang-Gia 590 Wallace, Andrew 523 Wang, Xiaohang 217 Wei, Shixin 578 Wenzel, Jakob 471 Werner, André 661 Wingender, Tim 331 Wong, Stephan 231, 367, 499 Woods, Roger 551 Yazdanpanah, Fahimeh 304 Zhao, Yaqian 578 Zhao, Yiren 16 753 ... Pedro C Diniz (Eds.) • • • Applied Reconfigurable Computing Architectures, Tools, and Applications 14th International Symposium, ARC 2018 Santorini, Greece, May 2–4, 2018 Proceedings 123 Editors... laws and regulations and therefore free for general use The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate... (2), The Netherlands (5), New Zealand (1), Norway (2), Poland (3), Portugal (3), Russia (8), Singapore (7), South Korea (2), Spain (4), Sweden (3), Switzerland (1), UK (18), and USA (11) Submitted

Ngày đăng: 02/03/2019, 10:19

Từ khóa liên quan

Mục lục

  • Preface

  • Organization

  • Contents

  • Machine Learning and Neural Networks

  • Approximate FPGA-Based LSTMs Under Computation Time Constraints

    • 1 Introduction

    • 2 Background

      • 2.1 LSTM Networks

      • 3 Related Work

      • 4 Methodology

        • 4.1 Approximations for LSTMs

        • 4.2 Architecture

        • 5 Design Space Exploration

          • 5.1 Roofline Model

          • 5.2 Evaluating the Impact of Approximations on the Application

          • 6 Evaluation

            • 6.1 Comparisons at Constrained Computation Time

            • 7 Conclusion

            • References

            • Redundancy-Reduced MobileNet Acceleration on Reconfigurable Logic for ImageNet Classification

              • 1 Introduction

              • 2 Accelerating Redundancy-Reduced Neural Networks on FPGA

                • 2.1 MobileNet Complexity Analysis

                • 2.2 Model-Level Redundancy Analysis

                • 2.3 Data-Level Redundancy Analysis

                • 3 RR-MobileNet FPGA Acceleration System Design

                  • 3.1 System Architecture

                  • 3.2 Memory Usage

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan