Adaptive Techniques for Dynamic Processor Optimization Theory and Practice Episode 2 Part 6 ppt

Chapter 11 Dynamic and Adaptive Techniques in SRAM Design 253 2nd Metal #WE WE[n+1] WE[n] 4th Metal n_arvdd n+1_arvdd downvdd n_Bit n_#Bit n+1_Bit n+1_#Bit Capacitive Write Assist Circuit WL WL P-Tr[n] N-Tr[n] P-Tr[n+1] N-Tr[n+1] Nd-Tr Figure 11.4 Charge sharing for supply reduction [14]. (© 2007 IEEE) Since extra supplies are not always available in product design, another example [14] uses charge sharing to lower the supply to the columns being written to. As shown in Figure 11.4, “downvdd” is precharged to VSS. For a write operation, supplies to the selected columns are disconnected from VDD, and shorted to “downvdd”. The charge sharing lowers the supply’s voltage to a level determined by the ratio of the capacitances, allowing writes to occur easily. 254 John J. Wuu Memory cell Memory cell Memory cell Memory cell Vssm Vdd Vddm[n] Vddm[n+1] WCLM[n] WCLM[n+1] MSW[n] MSW[n+1] Figure 11.5 Write column supply switch off [21]. (© IEEE 2006) Yet another example [21] uses a power-line-floating write technique to assist write operations. Instead of switching in a separate supply or charge sharing the supply, as in previous examples, the supply to the write columns is simply switched off, floating the column supply lines at VDD (Figure 11.5). As the cells are written to, the floating supply line (Vddm) discharges through the “0” bitline, as shown in Figure 11.6a. The decreased supply voltage allows easy writing to the cells. As soon as the cell flips to its intended state, the floating supply line’s discharge path is cut off, preventing the floating supply line from fully discharging (Figure 11.6b). Iwrite “L” “L” “H” “H” Vddm Vddm “H”“L” “L” “H” (a) (b) Figure 11.6 Power-line-floating write [21]. (© IEEE 2006) Chapter 11 Dynamic and Adaptive Techniques in SRAM Design 255 In all column voltage manipulation schemes, nonselected cells must retain state with the lowered supply. 11.2.1.2 Row Voltage Optimization Similar to the previous section, designers can apply voltage manipulation in the row direction as well. However, unlike column-based voltage optimization, row-based voltage optimization generally cannot simultaneously optimize for both read and write margins in the same operation, as needed in a column-multiplexed design. Therefore, row- based voltage manipulation tends to be more suitable for non-column- multiplexed designs where all the columns are written to in a write operation. The most obvious method to apply row-based voltage optimization is to raise the supply for the row of accessed cells in a read operation, or to lower the supply for the row of cells being written to. In addition, the following are some other examples of row-based voltage optimization. “L” “L” “H”“H” “H” “H”“H” “H” Word Line Ld1 Dr1 “L” Tr1 Node A MS1 MD1 MR1 Vss PLVC1 Vdd Ic2 Ic1 sw1 Vss_mem cellb x3 Ic1 Ic2 cella Node B Figure 11.7 Raised source line write [20]. (© IEEE 2004) In [20], the SRAM cells’ source line (SL) (i.e., source terminals of M N s in Figure 11.1) is disconnected from VSS during write operations. The SL is allowed to float until it is clamped by an NFET diode (Figure 11.7). The raised SL (Vss_mem in Figure 11.7) decreases the drive of the PFETs, which allows easy overwriting of the cell. (In this specific example, the floating SL is shared among all the cells in the array, not just the cells in a row. However, designers can apply the same technique on a row-by-row 256 John J. Wuu basis at the cost of area overhead.) A variation of this technique would disconnect the SL during both write and standby operations to achieve power savings, and connect the SL to VSS only during read operations when the extra stability margin is needed. The drawback to this variation is the additional delay needed to restore SL to VSS before a read operation can begin. A similar example [13] also floats SL during write operations. In addition, the SL is driven to a negative voltage during read operations. This allows for faster bitline development, as well as more stable cells during read operations. VGND BLC BLTPL2PL1 PL1 PL2 PL3 PL0 WL0 WL1 WL2 SRAM cell Subarray VDD VDD VDD VDD VDD VDD VDD VDD WL1 WL1 Figure 11.8 Supply line coupling [3]. (© IEEE 2004) If a separate supply is not available, another way to boost the internal supply of SRAM cells during a read access to achieve higher stability is through coupling. In [3], wordline wires are routed next to the row’s supply lines. As seen in Figure 11.8, as the wordline rises, it disconnects the supply lines from VDD, and couples the voltages of the supply lines higher than VDD. Assuming insignificant current is sourced from the supply line during a read access, the bootstrapped supply increases the drive on M N s and improves the cell’s stability. However, for cell designs with low M N /M A ratios, the “0” storage node may rise higher than M N ’s threshold voltage, causing the floating supply lines to discharge. Chapter 11 Dynamic and Adaptive Techniques in SRAM Design 257 WL-driver n_arvdd n_Bit n_#Bit Replica Access Tr Read Assist Circuit WL WL WL P WL N WL P WL N WL P WL N WL Figure 11.9 Wordline driver using RATs [14]. (© IEEE 2007) In [14], instead of increasing the SRAM cell’s supply to improve stability, the WL voltage is reduced slightly. Reduced wordline voltage degrades the drive of M A , which essentially improves the M N /M A ratio. This implementation makes additional efforts to account for global threshold voltage variations. Figure 11.9 illustrates the scheme, using “replica access transistors” (RATs) that have almost the same physical topology as M A to lower the WL voltage. In general, lower V TN causes SRAM cells to be less stable. Therefore, the RATs lower WL more when V TN is low, and less when V TN is high, to achieve balance between read margin and read speed. 11.2.2 Timing Control Aside from voltage manipulation, designers can also improve cell stability by decreasing the amount of time the cell is under stress during a read operation. For example, in a design that uses differential sensing, a small bitline voltage drop could be sufficient for sensing the bitcell value. Leaving on the wordline longer than necessary would allow the bitlines to continue to disturb the “0” storage node, leading marginal SRAM cells to flip their values. In typical designs, the wordline shutoff is triggered on phase or cycle boundaries. If the optimal wordline shutoff time does not align with phase or cycle boundaries, or if the designer prefers to have the wordline high time independent of the frequency, then the designer could employ a 258 John J. Wuu pulsed wordline scheme, such as the one used in [11]. The challenge is to design the appropriate pulse width that is just long enough for reads to complete successfully across different process corners and operating conditions. MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC RC RC RC RC WL WDR WDR WDR WDR WOFF REN RW WEN MPC RB MWR Figure 11.10 Read and write replica circuits [21]. (© IEEE 2006) In [15], a read replica path, which uses 12 dummy SRAM cells, was used for generating the shutoff edge for wordlines. The dummy SRAM cells, which resemble real SRAM cells but have internal values hardwired, help the replica path to track the variation in normal read paths. In addition to the read replica circuits [21], a write replica circuit was also added. In general, read operations take more time to complete than write operations. Therefore, it is advantageous to shut off the wordline during a write operation as soon as the write is completed successfully, which will prevent unselected columns in a column-multiplexed design from conti- nuing to discharge the bitlines, resulting in wasted power. Figure 11.10 is an example illustrating the read and write replica paths together. The replica bitline (RB) is precharged to VDD through MPC before read or write operations begin. For a read operation, REN activates to “0”, causing the read-replica wordline (RW) to turn on the read dummy cells’ (RC) wordline. The RC’s discharge RB, which turns off the wordlines through the WOFF signal. In a write operation, RB is discharged through MWR, which also triggers WOFF. In general, higher V TN requires the write time to be longer. Therefore, dies with higher V TN would have a slower discharge through MWR, providing the write operation more time to complete. Chapter 11 Dynamic and Adaptive Techniques in SRAM Design 259 The above illustration is just one example of designs using replica circuits. The danger of replica circuits, of course, is no replica can perfectly track real paths through all process and operating corners. For example, the write replica circuit above does not track PFET variations, which also impact write margin. However, tracking some variation can usually yield more optimal designs than no tracking at all. 11.3 Array Power Reduction With power-per-performance becoming an important parameter, engineers pay increasing attention to reducing the power of embedded SRAM arrays, which often occupy a large percentage of the total die area. Since activity factor is generally low for large caches, leakage power represents a significant, if not the dominant, portion of the overall cache power. Devices in a SRAM cell typically have channel lengths much greater than the process minimum for variation control; thus, subthreshold leakage has traditionally been limited. However, subthreshold leakage has worsened with recent technology nodes and more importantly, gate leakage (and in some cases, junction leakage) is getting significantly worse with oxide scaling. As a result, SRAM leakage power now requires careful attention. Because leakage power has a strong dependence on voltage, many have experimented with or implemented with “sleeping” the cache’s supply. 11.3.1 Sleep Types In general, cache “sleep” involves providing inactive SRAM cells, which do not experience read-disturb, with a lowered supply to achieve power savings. The lowered supply must be high enough to allow the inactive cells to maintain their data. Then, before the cells are accessed, they are “woken up” by providing a higher supply that can fulfill both read-disturb and access speed requirements. The most straightforward implementation of cache sleep involves providing the cache with two separate, external supplies. However, a second supply is an expensive solution, so realistic implementations often choose to generate and regulate the second supply locally. In general, these implementations fall into two categories – active and passive. “Active sleep” schemes try to actively maintain the reduced voltage at a certain level, while “passive sleep” schemes rely on voltage division or threshold voltage to determine the reduced voltage. 260 John J. Wuu 11.3.1.1 Active Sleep Khellah et al. [10] used an op-amp to help control the reduced supply; Figure 11.11 illustrates its general concept. When the arrays are active, “wake” causes SramVSS to be connected to VSS through the strong NFET. During idle mode, the strong NFET is turned off, allowing SramVSS to float. SramVSS will rise due to array leakage, but the op-amp will prevent SramVSS from rising above VREF. Of course, VDD – VREF must be greater than the SRAM cells’ standby VccMin, which is the minimum voltage at which cells are stable, to maintain cell data. In this implementation, VREF is externally supplied for ease of controllability. Also, an “early wake” signal is provide ahead of “wake”, to reduce the ground-bounce noise due to sudden discharge of SramVSS. Jumel et al. [8] used a similar concept as the previous example, but took it a step further. As shown in Figure 11.12, an on-chip bandgap reference generates a reference voltage that is stable across PVT. In addition, the voltage regulator is designed to track VDD, so a higher VDD would also allow SramVSS to rise, maintaining VDD – SramVSS close to VccMin. Finally, the output of this regulator is trimmed on a die-by-die basis at wafer probe to account for process variations. Figure 11.11 Active sleep control [10]. (© IEEE 2006) Chapter 11 Dynamic and Adaptive Techniques in SRAM Design 261 VREF Startup Circuit Reference Bandgap - + Error Amplifier Analog Supply Logic Supply GND SramVSS (to SRAM) Figure 11.12 Active sleep control with bandgap reference and VDD tracking [8]. (© IEEE 2006) Courtesy of Philippe Royannez: Texas Instruments, Inc. 11.3.1.2 Passive Sleep One straightforward way to generate a reduced supply is to use a diode, such as in [1] and illustrated in Figure 11.13. When SramVSS rises to the diode’s threshold voltage, the diode would clamp SramVSS. The downside to this scheme is its inflexibility, as the clamping voltage is determined primarily by just the threshold voltage, and cannot be optimized for different supply voltages. wake SramVSS SRAM array Figure 11.13 Diode clamping sleep voltage. 262 John J. Wuu SramVSS Figure 11.14 Bias generator with replica transistors [18]. (© IEEE 2006) The example shown in Figure 11.14 aims to remove the SRAM supply’s dependency on VDD [18]. Rather than setting the array supply to VDD – V T , which can vary depending on VDD, the array supply depends only on transistor threshold voltages, as specified in Equation (11.1). Array supply = 2 * Max(V T (M N ), V T (M P )) (11.1) In this implementation, the array supply voltage specified in Equation (11.1) is assumed to be sufficient for satisfying VccMin requirements. To adapt to different PVT conditions, the bias generator is built using replica transistors. The two replica load PFETs drop A1’s voltage to A1 = VDD – 2 * V T (M P ) (11.2) Similarly, the two replica driver NFETs drop A2’s voltage to A2 = VDD – 2 * V T (M N ) (11.3) Finally, the matching P1 and P1’ FETs clamp SramVSS at A1, while the matching P2 and P2’ FETs clamp SramVSS at A2. The resulting SramVSS is the lower of A1 and A2, producing Equation (11.1). [...]... Hamzaoglu F, Pandya G, Farhang A, Zhang K, De V (20 06) A 4.2GHz 0.3mm2 25 6kb Dual-Vcc SRAM Building Block in 65 nm CMOS ISSCC Dig Tech Papers, pp 25 72 25 73 [11] Khellah M, Ye Y, Kim NS, Somasekhar D, Pandya G, Farhang A, Zhang K, Webb C, De V (20 06) Wordline & Bitline Pulsing Schemes for Improving SRAM Cell Stability in Low-Vcc 65 nm CMOS Designs Symp VLSI Circuits Dig Tech Papers, pp 9–10 [ 12] Kim C, Kim... Ciroux J, Raibaut C, Ko U (20 06) A Leakage Management System Based on Clock Gating Infrastructure for a 65 -nm Digital Base-Band Modem Chip Symp VLSI Circuits Dig Tech Papers, pp 21 4 21 5 Chapter 11 Dynamic and Adaptive Techniques in SRAM Design 27 1 [9] Kaxiras S, Hu Z (20 01) Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power Int Symp Comput Architect., pp 24 0 25 [10] Khellah M, Kim... Logic VDD and Dynamic Power Rails Symp VLSI Circuits Dig Tech Papers, pp 29 2 29 3 Chang J, Huang M, Shoemaker J, Benoit J, Chen SL, Chen W, Chiu S, Ganesan R, Leong G, Lukka V, Rusu S, Srivastava D (20 07) The 65 -nm 16MB Shared On-Die L3 Cache for the Dual-Core Intel Xeon Processor 7100 Series IEEE J Solid-State Circuits vol 42 no 4, pp 8 46 8 52 Cheng W, Pedram M (20 01) Memory Bus Encoding for Low Power:... chapter surveyed dynamic and adaptive techniques in the area of SRAM design that seek to improve read and write margins, reduce power, and improve reliability Dynamic voltage optimization, especially column-based techniques that can independently improve both read and write margins in a columnmultiplexed design, can be very effective Silicon results from [22 ] demonstrated 10x reduction in random single-bit... H, Ishibashi K, Shinohara H (20 07) A 65 nm SoC Embedded 6T-SRAM Designed for Manufacturability With Read and Write Operation Stabilizing Circuits IEEE J Solid-State Circuits vol 42 no 4, pp 820 – 829 [15] Osada K, Shin JL, Khan M, Liou Y, Wang K, Shoji K, Kuroda K, Ikeda S, Ishibashi K (20 01) Universal-Vdd 0 .65 -2. 0-V 32- kB Cache Using a VoltageAdapted Timing-Generation Scheme and a Lithographically Symmetrical... N-diffusion leakage Therefore, the proper choice between P and N sleep should be evaluated based on the specific process and SRAM cell design 11.3.3 Entering and Exiting Sleep The goal for sleep mode is to reduce power consumption However, each time the cache enters or exits sleep mode, some active power is dissipated Chapter 11 Dynamic and Adaptive Techniques in SRAM Design 26 5 For example, the “wake”... are generally sufficient for SRAM arrays under typical use 11.4 .2 Hard Errors Hard errors such as latent defects and test escapes are not detected during silicon testing, but can surface in the field Rather than accepting these failures as true defects, dynamic techniques exist for arrays to tolerate such failures 26 8 John J Wuu 11.4 .2. 1 Cache Line Disable One dynamic technique dynamically disables cache... vol 36 no 11, pp 1738–1744 [ 16] Sakran N, Yuffe M, Mehalel M, Doweck J, Knoll E, Kovacs A (20 07) The Implementation of the 65 nm Dual-Core 64 b Merom Processor ISSCC Dig Tech Papers, pp 1 06 107 [17] Seevinck E, List FJ, Lohstroh J (1987) Static-Noise Margin Analysis of MOS SRAM Cells IEEE J Solid-State Circuits vol 22 no 5, pp 748–754 [18] Takeyama Y, Otake H, Hirabayashi O, Kushida K, Otsuka N (20 06) ... 4, pp 815– 822 [19] Wuu J, Weiss D, Morganti C, Dreesen M (20 05) The Asynchronous 24 MB On-chip Level-3 Cache for a Dual-core Itanium Family Processor ISSCC Dig Tech Papers, pp 488–489 [20 ] Yamaoka M, Shinozaki Y, Maeda N, Shimazaki Y, Kato K, Shimada S, Yanagisawa K, Osada K (20 04) A 300MHz 25 uA/Mb Leakage On-Chip SRAM Module Featuring Process-Variation Immunity and Low-LeakageActive Mode for Mobile-Phone-Application... inverted before being sent to the long wires In addition, the inversion bit is set to keep track of the data’s polarity Such a scheme reduces the worst case number of transitions to half of the total number of bits, thus saving worst case power and improving worst case di/dt However, savings for typical, random data would be lower, as Chapter 11 Dynamic and Adaptive Techniques in SRAM Design 26 7 the percentage . Hamzaoglu F, Pandya G, Farhang A, Zhang K, De V (20 06) A 4.2GHz 0.3mm2 25 6kb Dual-Vcc SRAM Building Block in 65 nm CMOS. ISSCC Dig. Tech. Papers, pp 25 72 25 73 27 2 John J. Wuu [22 ] Zhang K, Bhattacharya. matching P2 and P2’ FETs clamp SramVSS at A2. The resulting SramVSS is the lower of A1 and A2, producing Equation (11.1). Chapter 11 Dynamic and Adaptive Techniques in SRAM Design 26 3 sleepn SramVSS shutoff BL BL# Sleep. 11.6b). Iwrite “L” “L” “H” “H” Vddm Vddm “H”“L” “L” “H” (a) (b) Figure 11 .6 Power-line-floating write [21 ]. (© IEEE 20 06) Chapter 11 Dynamic and Adaptive Techniques in SRAM Design 25 5

Adaptive Techniques for Dynamic Processor Optimization Theory and Practice Episode 2 Part 6 ppt

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan