Tài liệu ARM Architecture Reference Manual- P20 pptx

Thông tin tài liệu

Part C Vector Floating-point Architecture Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. ARM DDI 0100E Copyright © 1996-2000 ARM Limited. All rights reserved. C1-1 Chapter C1 Introduction to the Vector Floating-point Architecture This chapter gives an introduction to the Vector Floating-Point (VFP) architecture, and its compliance with the IEEE 754 standard. It contains the following sections: • About the Vector Floating-point architecture on page C1-2 • Overview of the VFP architecture on page C1-3 • Compliance with the IEEE 754 standard on page C1-7 • IEEE 754 implementation choices on page C1-8. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Introduction to the Vector Floating-point Architecture C1-2 Copyright © 1996-2000 ARM Limited. All rights reserved. ARM DDI 0100E 1.1 About the Vector Floating-point architecture The Vector Floating-Point (VFP) architecture is a coprocessor extension to the ARM architecture. It provides single-precision and double-precision floating-point arithmetic, as defined by ANSI/IEEE Std. 754-1985 IEEE Standard for Binary Floating-Point Arithmetic. This document is referred to as the IEEE 754 standard in the following text. Short vectors of up to 8 single-precision or 4 double-precision numbers are handled particularly efficiently by the VFP architecture. Most arithmetic instructions can be used on these vectors, allowing single-instruction, multiple-data (SIMD) parallelism. Furthermore, the floating-point load and store instructions have multiple register forms, allowing vectors to be transferred to and from memory efficiently. To date, there has only been one major version of the VFP architecture (Version 1, or VFPv1). Double-precision support is optional, with its presence being indicated by the variant letter D. So the VFPv1D variant has both single precision and double precision, while VFPv1xD supports single precision only. By default, double-precision support is present. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Introduction to the Vector Floating-point Architecture ARM DDI 0100E Copyright © 1996-2000 ARM Limited. All rights reserved. C1-3 1.2 Overview of the VFP architecture This section provides a brief overview of the VFP architecture. More extensive and detailed information on the architecture is given in Chapter C2 VFP Programmer’s Model. 1.2.1 Registers VFP has 32 general-purpose registers, each capable of holding a single-precision floating-point number or a 32-bit integer. In D variants of the architecture, these registers can also be used in pairs to hold up to 16 double-precision floating-point numbers. There are also three or more system registers: FPSID Is read-only. It can be read to determine which implementation of the VFP architecture is being used. FPSCR Supplies all user-level status and control. Status bits hold comparison results and cumulative flags for floating-point exceptions. Control bits are provided to select rounding options and vector length/stride, and to enable floating-point exception traps. FPEXC Contains a few bits for system-level status and control. The remaining bits of the FPEXC register and any further system registers are IMPLEMENTATION DEFINED, and are typically used for internal communication between the hardware and software components of a VFP implementation (see Hardware and software implementations on page C1-4). 1.2.2 Instructions Instructions are provided to: • Load floating-point values into registers from memory, and store floating-point values in registers to memory. Some of these instructions allow multiple register values to be transferred, providing floating-point equivalents to ARM LDM and STM instructions. Among other purposes, such instructions can be used to load and store short vectors of floating-point values. • Transfer 32-bit values directly between VFP and ARM general-purpose registers. • Transfer 32-bit values directly between VFP system registers and ARM general-purpose registers. • Add, subtract, multiply, divide, and take the square root of floating-point register values. These instructions can be used on short vectors as well as on individual floating-point values. • Copy floating-point values between registers. In the process, the sign bit can be inverted or cleared (or left unchanged), providing negation and absolute value instructions as well as straightforward copies. All of these instructions can also be used on short vectors. • Perform combined multiply-accumulate operations on floating-point values and short vectors, providing space-efficient equivalents for common sequences of multiply, negate, add, and subtract. • Perform conversions between single-precision values, double-precision values, unsigned 32-bit integers and two's complement signed 32-bit integers. • Compare floating-point values in registers with each other or with zero. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Introduction to the Vector Floating-point Architecture C1-4 Copyright © 1996-2000 ARM Limited. All rights reserved. ARM DDI 0100E 1.2.3 Floating-point exceptions The VFP architecture supports all five of the floating-point exceptions defined in the IEEE 754 standard: • Invalid Operation • Division by Zero •Overflow • Underflow • Inexact. These are supported in both untrapped and trapped forms: Untrapped handling of an exception This causes the appropriate cumulative flag in the FPSCR to be set to 1, and any result registers of the exception-generating instruction to be set to the result values specified by the standard. Execution of the program containing the exception-generating instruction then continues. Trapped handling of an exception This is selected by setting the appropriate control bit in the FPSCR. When the exception occurs, a trap handler software routine is called. Details of how trap handler routines are called and of the facilities available to them are IMPLEMENTATION DEFINED. 1.2.4 Hardware and software implementations Because of the existence of trapped floating-point exceptions, any implementation of the VFP architecture must include a software component. This is typically installed on the ARM undefined instruction vector, and has the job of catching a trapped exception and converting it into a trap handler call. The software component of a VFP implementation can perform other tasks in addition to trap handler calls. The division of labour between the hardware and software components of a VFP implementation is IMPLEMENTATION DEFINED. VFP implementations can be classified according to whether they also include a hardware component: Software implementation This implementation consists of software only, with all floating-point arithmetic being emulated by ARM routines. A software implementation is also sometimes called a VFP emulator. Hardware implementation This implementation contains both hardware and software components. Typically, the hardware is designed to handle all common cases, to optimize performance. When a case where the hardware cannot handle on its own is encountered, the software component (also known as support code for the hardware) is called to deal with it. Details of how the hardware and its support code interact are IMPLEMENTATION DEPENDENT. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Introduction to the Vector Floating-point Architecture ARM DDI 0100E Copyright © 1996-2000 ARM Limited. All rights reserved. C1-5 1.2.5 Interactions with the ARM architecture The VFP architecture has been designed to conform fully with the ARM coprocessor architecture. All VFP instructions are special cases of the ARM’s generic coprocessor instructions (CDP, LDC, MCR, MRC, and STC), using coprocessor numbers 10 and 11. As a general rule, coprocessor 10 is used for single-precision instructions and coprocessor 11 for double-precision instructions. All coprocessor 10 and 11 instructions that have not been allocated meanings as VFP instructions are reserved for future expansion of the VFP architecture, and must be treated as UNDEFINED. Hardware coprocessor implementations of the VFP architecture will fail to respond to these instructions, causing the ARM’s Undefined Instruction exception to occur. For more details, see Undefined Instruction exception on page A2-15. The recommended way for a VFP coprocessor to invoke its support code uses the same mechanism: 1. Before the VFP hardware is enabled, the support code is installed on the ARM’s undefined instruction vector. 2. When the hardware needs assistance from the support code, it fails to respond to a VFP instruction. 3. This results in an Undefined Instruction exception, causing the support code to be executed. In such a system, the support code is responsible for distinguishing these Undefined Instruction exceptions from those caused by the reserved instructions and taking different actions accordingly. The ARM tests whether a coprocessor instruction satisfies its condition (as described in The condition field on page A3-5), using the CPSR flags, and treats it as a NOP if the condition fails. If this happens, the ARM signals coprocessors not to execute the instruction, so they also treat the instruction as a NOP. This implies that all VFP instructions are treated as NOPs if their condition check fails. The condition code check is based on the ARM processor’s CPSR flags, not on the similarly named flags in the VFP FPSCR register. To use the FPSCR flags for conditional execution, they must first be transferred to the CPSR by an FMSTAT instruction. VFP load and store instructions are allowed to produce data aborts, and so VFP implementations are able to cope with a data abort on any memory access caused by such instructions. Interrupts As described above, hardware VFP implementations typically use the Undefined Instruction exception to communicate between their hardware and software components. Software VFP implementations also use the Undefined Instruction exception, since all coprocessor instructions that are not claimed by a hardware coprocessor are treated as undefined instructions. Entry to the Undefined Instruction exception causes IRQs to be disabled (see Undefined Instruction exception on page A2-15), and they will not normally be re-enabled until the exception handler returns. Straightforward use of VFP in a system therefore increases worst case IRQ latency considerably. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Introduction to the Vector Floating-point Architecture C1-6 Copyright © 1996-2000 ARM Limited. All rights reserved. ARM DDI 0100E It is possible to reduce this IRQ latency penalty considerably by explicitly re-enabling interrupts soon after entry to the Undefined Instruction handler. This requires careful integration of the Undefined Instruction handler into the rest of the operating system. Details of how this should be done are highly system-specific and go beyond the scope of this manual. In a hardware implementation, if the IRQ handler is going to use the VFP coprocessor itself, there is a second potential cause of increased IRQ latency. This is that a long latency VFP operation initiated by the interrupted program will deny the use of the VFP hardware to the IRQ handler for a significant number of cycles. If a system contains IRQ handlers which require both low interrupt latency and the use of VFP instructions, therefore, it is recommended that the use of the highest latency VFP instructions is avoided. In particular, the use of vector division instructions and vector square root instructions is not recommended in such systems, because these instructions typically have very long latencies. Note FIQs are not disabled by entry to the Undefined Instruction handler, and so FIQ latency is not affected by the way that a VFP implementation uses the Undefined Instruction exception. However, this also means that an FIQ can occur at any point during the execution of a VFP implementation’s software component, including during the entry and exit sequences of the Undefined Instruction handler. If a FIQ handler is going to do anything other than leave the VFP implementation’s state entirely unchanged, great care must be taken to ensure that it handles every case correctly. This is usually incompatible with the intention that FIQs should provide fast interrupt processing, and so it is recommended that FIQ handlers should not use VFP. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Introduction to the Vector Floating-point Architecture ARM DDI 0100E Copyright © 1996-2000 ARM Limited. All rights reserved. C1-7 1.3 Compliance with the IEEE 754 standard The VFP architecture supplies a subset of IEEE 754 functionality. The following operations are mandatory under the standard, but not supplied by the VFP architecture: • the remainder operation • the binary ↔ decimal conversions •the Round Floating-Point Number to Integer Value operation • in D variants of the VFP architecture, comparisons directly between single-precision and double-precision values without first converting the single-precision value to double precision. To obtain a fully compliant implementation of the standard, the VFP architecture must be augmented with these operations (typically in the form of software library routines). Note In some environments, not all of these operations are required. For example, the C language specifies that if a float and a double are compared, the first argument must be converted to a double by the usual binary conversions before the comparison is performed. So, C code never specifies a direct comparison of a single-precision value and a double-precision value. Also, when the Flush to Zero (FZ) bit in the FPSCR is set to 1, the way the VFP architecture handles denormalized numbers and underflow exceptions does not comply with the standard. To obtain fully compliant behavior from the VFP architecture, the FZ bit must be set to 0 (see Flush-to-zero mode on page C2-13 for more details). Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Introduction to the Vector Floating-point Architecture C1-8 Copyright © 1996-2000 ARM Limited. All rights reserved. ARM DDI 0100E 1.4 IEEE 754 implementation choices Many design choices about a compliant floating-point system are left as an implementation option by the IEEE 754 standard. The VFP architecture specifies how many of these choices are to be made. The rest of this section briefly describes these implementation choices. 1.4.1 Supported formats The VFP architecture supports the basic single floating-point format from the standard, and D variants also support the basic double floating-point format. These are known as single precision and double precision in this manual. The standard’s extended formats are not supported. Supported integer formats are unsigned 32-bit integers and two’s complement signed 32-bit integers. 1.4.2 NaNs The IEEE 754 standard only specifies that there must be at least one signaling NaN and at least one quiet NaN, and partly specifies what the representation of NaNs should be (for any NaN, the exponent field should be maximum, and the fraction field non-zero). The VFP architecture specifies its NaNs more fully: • In each format, all values with the exponent field maximum and the fraction field non-zero are valid NaNs. Two such values represent distinct NaNs if their sign bits and/or fraction fields are different. • Copying a signaling NaN with a change of format does not generate an Invalid Operation exception. • Signaling NaNs are distinguished from quiet NaNs by the most significant fraction bit. The NaN is signaling if this bit is 0, and quiet if it is 1. • There are precise rules in the VFP architecture about which NaN is produced for each operation with a NaN result. These rules are described in NaNs on page C2-5. Note The fact that NaNs whose sign or fraction bits differ are treated as distinct NaNs in the VFP architecture does not mean that the floating-point comparison instructions can be used to distinguish them from each other. The IEEE 754 standard requires all NaNs to compare as unordered with every value, including themselves. What it does mean is that the distinct NaNs can be distinguished by using ARM code that looks at their precise bit patterns, and that the NaN handling rules are designed not to change bits in NaN values except where this is required by the standard. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. [...]... handling are IMPLEMENTATION DEFINED ARM DDI 0100E Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark C1-9 Introduction to the Vector Floating-point Architecture C1-10 Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark ARM DDI 0100E Chapter C2 VFP Programmer’s... as the ARM implementation it is attached to If the ARM implementation has configurable endianness, double-precision values must not be loaded or stored before the ARM processor endianness has been set to match that of the memory system (see Endianness on page A2-23 for more details) Note The word order defined here for the VFP architecture differs from that of the earlier FPA floating-point architecture. .. transferred from the ARM) , and convert them back when stored (or transferred to the ARM) , as long as the overall results are correct As a result, software must not rely on any particular type of overlap between the registers More formally, the rules for this are: C2-14 Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark ARM DDI 0100E... initialization on page C2-26 ARM DDI 0100E Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark C2-1 VFP Programmer’s Model 2.1 Floating-point formats This section outlines the basic single-precision and double-precision floating-point formats, as defined by the IEEE 754 standard and used by the VFP architecture In addition,... trap handler ARM DDI 0100E Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark C2-5 VFP Programmer’s Model The IEEE 754 standard does not specify how the two types of NaN are distinguished or how many different NaNs of each type can exist in a floating-point system However, these details are specified by the VFP architecture, ... by the VFP architecture Instructions with non floating-point results The VFP architecture contains instructions to convert floating-point values to integers In accordance with the IEEE 754 standard, these instructions always generate an Invalid Operation exception if their operand is a NaN, regardless of whether it is a signaling NaN or a quiet NaN If this exception is untrapped, the VFP architecture. .. architecture use the same format for their operands and results For these, the VFP architecture specifies that the correct quiet NaN result in either of the above cases is determined as follows: C2-6 Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark ARM DDI 0100E VFP Programmer’s Model 1 For instructions acting on vector operands,... is FSQRT In each case, if the exception is untrapped, the result must be a quiet NaN The VFP architecture specifies that the quiet NaN produced in these cases must have sign bit equal to 0, most significant fraction bit equal to 1, and all remaining fraction bits equal to 0 ARM DDI 0100E Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove... terminology of the standard, the above description means that the VFP architecture requires these choices to be: • the after rounding form of tininess • the denormalization loss form of loss of accuracy C2-10 Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark ARM DDI 0100E VFP Programmer’s Model Inexact exceptions Occur whenever... versions and variants of the VFP architecture support the single-precision format D variants also support the double-precision format The VFP architecture does not support either of the extended formats described in the IEEE 754 standard This section is only intended as an introduction to these formats and to the various types of value they can contain, not as comprehensive reference material on them For . Floating-point Architecture ARM DDI 0100E Copyright © 1996-2000 ARM Limited. All rights reserved. C1-5 1.2.5 Interactions with the ARM architecture The VFP architecture. Vector Floating-point Architecture ARM DDI 0100E Copyright © 1996-2000 ARM Limited. All rights reserved. C1-3 1.2 Overview of the VFP architecture This section

Ngày đăng: 26/01/2014, 19:20

Xem thêm: Tài liệu ARM Architecture Reference Manual- P20 pptx