Handbook of algorithms for physical design automation part 31 doc

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C014 Finals Page 282 24-9-2008 #7 282 Handbook of Algorithms for Physical Design Automation the design unroutable. A strategy of simple uniform spreading of the design may work well for dense designs, but can unnecessarily hurt timing for sparse designs. Thus, white space management is absolutely required in modern placement algorithms to achieve better timing and routability. 3. Mixed-size placement: For the past few decades, standard cell placement is considered as the norm. A standard cell based design consists of circuit elements called standard cells whose heights remain the same, and the placement problem is to place these circuit elements within regular circuit row boundaries (Figure14.5a).In today’s design methodology, designers are encouraged to take hierarchical or semihierarchical design approaches with reusable internal/third-party IPs to reduce design turnaround time. As a result, a wider distribution of circuit element sizes is observed during placement. In some sense, mixed-size placement (Figure 14.5b), as opposed to uniform-height placement of standard cells, is a more complicated problem because these large macro blocks can cause a serious chal- lenge during placement legalization. Also, these chunky blocks play an important role in determining final timing performance. Hence, the early placement of large macros (during floorplanning or flat placement) is an important problem in a modern timing closure flow. To provide further flexibility in floorplanning and placement, sometimes all the standard cells as well as large macros are considered as movable objects and are placed simultaneously. This new problem is called f loorplacement [8]. In general, today’s placement algorithms must be able to handle a wide range of object sizes, because the trend indicates that more IP blocks are included in a design. 4. Region constraints (movebounds): In a hierarchical design methodology, the functionality of a design is logically partitioned first and floorplanning is executed on the set of logically partitione d blocks to determine the approximate locations of those logical blocks. Circuit elements belonging to the same logical partition are groupedtogether and need to be placed in the vicinity of the layout region. This is in contrast to the top-down flat physical design process where lo gical partitioning is flatten ed and circuit elements are freely placed and routed at the leaf level of the logical hierarchy. In flat physical synthesis, physical partitions do not necessarily correlate with logical partitions. Although flat physical synthesis usually produces a better quality of result, the tight turnaround time requirement makes hierarchical synthesis a more viable solution in today’s environment. In a hierarchical synthesis flow, (a) (b) FIGURE 14.5 Standard cell placement versus mixed-size placement. (a) Standard cell placement and (b) mixed-size placement. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C014 Finals Page 283 24-9-2008 #8 Placement: Introduction/Problem Formulation 283 a region constraint (also known as a movebound) is usually employed to force circuit elements belonging to the same logic partition to be placed within a predetermined layout area. Essentially, a region constraint is a p redetermined boundary where a set of circuit elements has to be placed and routed. Multiple region constraints may exist in a design, but n ot all the circuit elements are constrained b y a region constraint either. Some circuit elements can be placed anywhere in layout area while others must be placed within the corresponding boundary defined by region constraints. Another motivation f or region constraints is multiple clock/voltage domains in a desig n. In modern circuit design, multiple clock domains or voltage domains are frequently used due to performance and power dissipation trade-offs. For example, computationally unimportant parts in a chip can be slowed down to lower clock frequency while critical path computations have to be executed at the highest clock frequency. The lowered clock frequency results in the saving of unnecessary power consumption. The clock network typically consumes a significant portion of the overall chip power [9] and the size of the clock network can serve as a first-order approximation of the clock network power consumption.Hence, a smaller clock domain area is always preferred, if possible, to reduce clock network power dissipation. These clock domains are similar to logical partitions and region constraints can facilitate to define and reduce the size of clock domain. However, region constraints can make the p lacement algorithm complicated and modern placement algorithms must be capable of handling these unforeseen constraints without affecting the quality of results. 5. Clock-awareplacement: Clocknets typically have muchmore sink pins to drive than normal data signalnets. This isbecause anysequential circuit elements(latches or flip-flops) require global clock signals to syn c hronize with each other. Sometimes, there are multiple clock domains in a design due to performance and power consumption trade-offs. Because of the high fan-out nature of clock nets, in a typical placement process, clock nets are ignored during the optimization as they tend to degrade the quality of placement so lu tion. Moreover, the higher frequency design constraint forces a design to include more sequential circuit elements resulting in larger clock domain and higher clock power consumption. Recent studies show that the power budget of clock nets and networks amounts to more than 40 percent of overall chip power[9]. Even worse,each sinkof a clocknet needsto havethe same signal propagation delay from theclock source.In reality, eachsink hasdifferentclock signal propagation delay and the maximum delay difference of pairsof two sink pins iscalled clock skew. One of the objectives of clock network construction is to minimize the clock skew. Because clock nets are sensitive to technology variations due to their large network size and high frequency constraint, the placement of sequential circuit elementsaffects the clock network performance significantly. Postplacement clock network construction tends to fail more frequentlydue to the tighter skew, latency, andpower constraints. Recent research [10] also shows that by considering these clock network constraints during quadratic placement, higher clock network performance (less skew, clock latency, and power consumption) can be achieved without almost any loss of data signal performance. Essentially, they tried to reduce clock network wirelengths by navigating potential locations of sequential circuit element during placement via register contraction techniques. More advanced technology andcorrespondingdesignparadigmswillrequirehigherclockfrequencyandmorerobustness to technology variations. Thus, the combined placement and clock network construction optimization has great promises for better layout solutions. 6. Scalability: As the complexity of a design grows exponentially, the corresponding number of gates in a design is expected to grow at a steep rate. Hierarchical design and the design reuse paradigm can help to manage the siz e of a design. Yet, a multimillion gate count is considered a norm in modern IC design [11,12]. Owing to large design sizes, placement runtime tends to be the bottleneck of the overall timing closure flow and the reduction of placement runtime, i.e., scalable placement algorithms, arises as a critical problem. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C014 Finals Page 284 24-9-2008 #9 284 Handbook of Algorithms for Physical Design Automation Recently, the multilevel paradigm shows great promise to make placement algorithms more scalable with design size [13]. The multilevel method consists of three processes: coarsened abstract structure generation (coarsening or aggregation), optimization on the coarsened structure (relaxation), and transforming optimized solutions to an uncoarsened structure (interpolation). The simplest method of applying multilevel optimization in placement is a so-called “V-cycle” [13] approach where consecutivecoarsenings are executed to obtain the most reduced design, then iterative relaxation, interpolation, and uncoarsening are applied to convert it to an optimized, flattened level. Combined with netlist clustering algorithms, multilevel optimization was demonstrated to provide a significant runtime reduction without almost any degradation of (sometimes even improved) quality of solutions [12]. Chapter 19 addresses multilevel techniques and their application in placement. 7. Stability: To achieve timing closure and design convergence, typically several instances of placement have to be run. Placement can help to identify needed changes in the logic, required buffering, gate sizing, routing congestion, etc. Once these problems are fixed, placement may have to be run again. Ideally, after each subsequent placement run, the problems that were fixed during the last iteration stay fixed and new problems do not crop up. However, if a placement algorithm returns a dramatically different solution from the last solution, entirely new problems could emerge. In other words, a placement algorithm needs to produce similar solutions when almost the same input instance (albeit with a slightly different netlist or constr aints) is given. This stability is a particularly important issue in a timing closure flow where multiple invocations of placement are necessary. The quantification of the degree of stability of placement algorithms is also an important topic for further research [14,15]. 8. Macroblock (random logic module) placement versus ASIC placement: In microprocessor design, blocks with tight timing constraints such as data-paths, floating point units, etc. are still custom designed and layouts are produced by a human being. However, significant portion of macroblocks, for example, control logic modules, consist of randomlogic. These random logicmodules are typically designedin HDL(high-level description language), syn- thesized and laid out via design automation tools. The characteristics of these random logic modules are quite different from those of ASIC designs. Owing to the high-performance nature of microprocessor design, the target timing constraint is extremely tight compared with that of ASIC design.The latchesand leaflevelclock distribution buffersare much larger than standard cells and the locations of these objects tend to affect final timing performance considerably. The number o f circuit elements to be placed is order of magnitude smaller than ASIC placement prob lems. Thus, scalability is not an issue in macroblock placement. Rather more accurate modeling of timing with enhanced optimization techniques (at the cost of runtime, of course) is more important in this placement problem domain. 9. Three-dimensional placement: Traditionally, placement is formulated as a two-dimensional problem that places circuit elements in a 2D plane. Recent advances in packagetechnology, however, enable chips to be piled up a nd interconn ected together so that more functiona l blocks and logics can be inserted to chip designs [16]. Including more logics is not only a predominant advantage of 3D chip integration but also introduces new technical problems. The primary concern of 3D integration is a thermal issue. Because multiple circuit planes are stacked up tightly, it is more difficult to dissipate heat, particularly in the middle planes. Hotspots of integrated circuitsin modern technology have adverseimpacts oncircuit performance because temperature directly affects the subthreshold voltage of transistors, resulting in slower responses and signal propagations. Heat dissipation can be addressed via thermalgradient considerationduring floorplanning/placement[17] or inserting thermal vias to facilitate heat flow [18]. Another principal concern of 3D integration is the signal propagation among different circuit planes. The signal delay from one plane to another is an order-of-magnitude higher than that of the same plane. Because the circuit performance, Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C014 Finals Page 285 24-9-2008 #10 Placement: Introduction/Problem Formulation 285 i.e., maximum frequency of a design, is limited by the maximum delay of timing critical paths in a design, reducing plane-to-plane signal delay is an important issue in 3D placement. Therefore, partition ing and p lacement of logics for planes h ave consequential effects on the final performance of design and plane-to-plane interconnect needs to be considered during placement. These unforeseen concerns of 3D integration require a new formulation of 3D placement and open new opportunities for placement research. In addition to these issues, more and more technology constraints are being considered during placement, such as power/thermal constraints, power/ground network, IR drop constraints, etc. [17, 19–21]. This is because a placement solution can directly affect the final q uality of solution of design closure. This trend will carry on as long as technology scaling continues, and it confirms that placement remains the most important problem in design closure. 14.4 GENERAL APPROACHES TO PLACEMENT Placement algorithms are typically based on a simulated annealing, top-down cut-based partitioning, or analytical paradigm (or some combination thereof). Simulated annealing is an iterative optimization method that mimics the physical metal cooling process. With the given objective function, the process tries to achieve a better solution via a set of predefined moves. If a move improves the objective function, it is always accepted. If a move produces a worse solution, it is accepted b ased on some probability function. At early stages (with high temperature), a bad move has higher chance to get accepted while at later stages of placement (with lower temperature), the probability goes down exponentially. These worse-yet-accepted moves are essential for a simulated annealing placement algorithm to overcome a local optimum solution in which a placement might be stuck. A greedy move-based placement tool cannot escape from this local optimum once it steps into one.The typical set of moves in a simulated annealing placement algorithm are (1) relocation of a circuit element into new position, (2) exchanges of two circuit elements’ locations or (3) mirroring/rotation of a circuit element at the same location, etc. As mentioned earlier, the typical objective function is total wirelength. Recently, other factors such as routability, power, area, and even signal integrity metrics are directly modeled in the objective function of simulated annealing placement tools, because an iterative optimization approach is very flexible to model these nonconventional multidimensional objective functions. However, simulated annealing placement is regarded as a rather slow method compared with other placement algorithms as the design size grows. Chapter 16 provides more detailed discussion of simulated annealing placement algorithms. The advent of flat/multilevel partitioning as afast and effectivealgorithm for min-cut partitioning has helped to spawn off a new generation of top-down cut-based placement tools. A p lacer in this class partitions circuit elements into either two (bisection) or four (quadrisection) regions of the chip, then recursively (following breadth-firstsearch order) partitions each region until a good coarse placement solution is achieved [22]. When each region is partitioned, circuit elements outside the region are assumed to be fixed at the current locations and pseudopins are created around the region under consideration. This is called a terminal propagation [23]. Because the basic algorithm is based on partitioning, the typical objective function is the number of netcuts between subregions. Finding a good partitioning indicates that good logical clustering of circuit elements are found with less communications among them that can lead to a better total wirelength. To speed up the algorithm, multilevel clustering can be combined with a partitioning-based placement. In general, cut-based multilevel partitioning placement can be performed quite well particularly when designs are dense. Also, partitioning-based placement is a relatively fast placement algorithm. Partitioning- based placement algorithms are discussed in Chapter 15, and the fundamental partitioning concept and algorithm itself can be found in Chapter 7. Analytical placementalgorithms typically solve a relaxed placementformulation (e.g., minimum total squared wirelength) optimally, allowing cells to temporarily overlap. Legalization is achieved Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C014 Finals Page 286 24-9-2008 #11 286 Handbook of Algorithms for Physical Design Automation by removing overlaps via either partitioning or by introducing additional forces or constraints to gen- erate a new optimization problem. The formulation of these methods models the mechanical spring network.Each net represents a spring that attracts circuit elements connected to the net. The optimum solution represents the equilibrium state of the given spring network. Analytic placers can perform poorly when the data is naturally degenerate (which occurs when no fixed object exists) b ecause it becomes difficult to legalize a placement where thousands of circuit elements are placed virtually at the same location. Also, analytical methods may have difficulties in dense designs where legalization is forced to significantly alter the analytic solution. The new breed of analytic placement algorithm, dubbed as forced-based placement showed great promise recently. Force-based placement adds additional forces to the formulation that pull circuit elements from high-density regions to low ones. The key point is to achieve a b etter distributed placement solution by integrating these spreading forces into a formulation, instead of relying on explicit partitionin g or other techniques. There are a variety of techniques for cell spreading in f orce-based analytic placement techniques. During placement, some form of density analysis is performed to calculate spreading forces. Once the spreading forces are determined, these forces can be applied to each circuit element via constant forces, or explicit fixed-point methods. Sometimes, a density (overlapping) penalty function is included in the placement objective f unction explicitly so that nonlinear optimization can minimize total wirelengths and overlappingsimultaneously. In this nonlinear optimizationframework, linear wirelength approximation can also be included to minimize half-perimeter bounding box wirelength d irectly. To speed up the convergence of optimization, circuit clustering techniques can be combin ed with these an alytic placement algorithms. These clustering techniques can not only reduce the runtime of placement but also improve the quality of placement solutions. T he gene ral analytical placement algorithm is presented in Chapter 17 and the new force-directed methods are discussed in Chapter 18. Recently, the ISPD (International Symposium on Physical Design) Conference hostedtwo placement contests in 2005 and 2006 for the academic placement research community. The contests provided a common platform where various placement algorithms can be evaluated on the same set of realistic large-scale ASIC designs. Particularly, the new placement benchmark circuits that were released during the contests have set a new bar for requirements of modern placement capability. By providing a common basis for quantitative measurements of contempo r ary placement algorithms, researchers were able to publicize their placement tools and results and d iscuss the pros and cons of different breeds of placement algorithms. For more serious placement r esearchers, ISPD placement contests can serve as a good starting point for further in-depth discussions of placement algorithms and implementations [11,24]. REFERENCES 1. P. G. Villarrubia, Important placement considerations for modern VLSI chips, invited talk at International Symposium on Physical Design, San Diego, CA, 2000. 2. S. M. Sait and H. Youssef, VLSI Physical Design Automation: Theory and Practice, World Scientific, River Edge, NJ, 1999. 3. M. Sarrafzadeh and C. K. Wong, An Introduction to VLSI Physical Design, McGraw-Hill, NY, 1996. 4. N. A. Sherwani, Algorithms for VLSI Physical Design Automation, Kluwer Academic Publishers, Norwell, MA, 1999. 5. M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman and C ompany, NY, 1979. 6. A. B. Kahng, Classical floorplanning harmful? in Proceedings of International Symposium on Physical Design, 2000, San Diego, CA, pp. 207–213. 7. C. Alpert, G. -J. Nam, and P. G. Villarrubia, Effective free space management for cut-based placement via analytical constraint generation, IEEE Transactions on Computer-Aided Design, 22(10): 1343–1353, 2003 (ICCAD 2002). 8. S. N. Adya and I. L. Markov, Combinatorial techniques for mixed-size placement, ACM Transactions on Design Automation of Electronic Systems, 10(5): 2005. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C014 Finals Page 287 24-9-2008 #12 Placement: Introduction/Problem Formulation 287 9. D. E. Duate, N. Vijaykrishnan, and M. J. Irwin, A clock power model to evaluate impact of architectural and technology optimization, IEEE Transactions on VLSI, 10(6): 844–855, December 2002. 10. Y. Lu, C. -N. Sze, X. Hong, Q. Zhou, Y. Cai, L. Huang, and J. Hu, Navigating registers in placement for clock network minimization, in Proceedings of Design Automation Conference, Anaheim, CA, 2005, pp. 176–181. 11. G. -J. N am and J. Cong (Eds.), Modern Circuit Placement: Best Practices and Results, Springer Ver lag, NY, 2007. 12. G J. Nam, S. Reda, C. J. Alpert, P. G. Villarrubia, and A. B. Kahng, A fast hierarchical quadratic placement algorithm, IEEE Transactions on Computer-Aided Design of Circuits and Systems, 25(4): 678–691, April, 2006 (ISPD 2005). 13. J. Cong and J. Shinnerl (Eds.), Multilevel Optimization in VLSI CAD, Kluwer Academic Publishers, AA Dordrecht, the Netherlands, 2003. 14. S.N. Adya, I.L. Markov, andP.G. Villarrubia, On whitespace andstabilityin physical synthesis, Integration: The VLSI Journal, 39(4): 340–362, 2006 (ICCAD 2003). 15. C. Alpert, G. -J. Nam, P. G. Villarrubia, and M. Yildiz, Placement stability metrics, in Proceedings of A sia South Pacific Design Automation Conference, Shanghai, China, 2005, pp. 1144–1147. 16. R. Montoye, The four degrees of 3D, invited talk at International Symposium on Physical Design, Phoenix, AZ, 2004. 17. B. Goplen and S. Sapatnekar, Efficient thermal placement of standard cells in 3D ICs using a force directed approach, in Proceedings of International Conference on Computer-Aided Design, 2003, pp. 86–90. 18. J. Cong and Y. Zhang, T hermal via planning for 3-D ICs, in P roceedings of International Conference on Computer-Aided Design, San Jose, CA, 2005, pp. 745–752. 19. Y. Cheon, P. -H. Ho, A. B. Kahng, S. Reda, and Q. Wang, Po wer-aware placement, in Proceedings of Design Automation Conference, Anaheim, CA, 2005, pp. 795–800. 20. A. B. Kahng, B. Liu, and Q. Wang, Supply voltage degradation aware analytical pl acement, in Proceedings of International Confer ence on Computer Design, San Jose, CA, 2005, pp. 437–443. 21. J. Lou and W. Chen, Crosstalk-aware placement, IEEE Design and Test of Computers, 21(1): 24–32, 2004. 22. A. E. Caldwell, A. B. Kahng, and I. L. Markov, Can recursiv e bisection a lone produce routable placement? in Proceedings of Design Automation Conference, Los Angeles, CA, 2000, pp. 477–482. 23. A. E. Dunl op and B . W. Kernighan, A procedure for placement of standard cell VLSI circuits, IEEE Transactions on Computer-Aided Design of Integrated Circuits, 4(1): 92–98, 1985. 24. Available at http://www.ispd.cc/contests. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C014 Finals Page 288 24-9-2008 #13 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C015 Finals Page 289 29-9-2008 #2 15 Partitioning-Based Methods Jarrod A. Roy and Igor L. Markov CONTENTS 15.1 Top-Down Partitioning-Based Placement Framework 290 15.1.1 Terminal Propagation and Inessential Nets 291 15.1.2 Bipartitioning versus Multiway Partitioning 291 15.1.3 Cutline Selection and Shifting 291 15.1.4 Whitespace Allocation 292 15.1.4.1 Free Cell Addition 292 15.2 Enhancements to the Mincut Framework 293 15.2.1 Better Results through Additional Partitioning 293 15.2.2 Fractional Cut 294 15.2.3 Analytical Constraint Generation 294 15.2.4 Better Modeling of H PWL by Partitioning 295 15.3 Mixed-Size Placement 296 15.3.1 Floorplacement 296 15.3.2 PATOMA and PolarBear 298 15.3.3 Fractional Cut for Mixed-Size Placement 298 15.3.4 Mixed-Size Placement in Dragon2006 299 15.4 Advantages of Mincut Placement 299 15.4.1 Flexible Whitespace Allocation 299 15.4.2 Solving Difficult Instances of Floorplacement 300 15.4.2.1 Selective Floorplanning with Macro Clustering 301 15.4.2.2 Ad Hoc Look-Ahead Floorplanning 303 15.4.3 Optimizing Steiner Wirelength 303 15.4.3.1 Pointsets with Multiplicities 304 15.4.3.2 Performance 305 15.4.4 Incremental Placement 305 15.4.4.1 General Framework 305 15.4.4.2 Handling Macros and Obstacles 307 15.5 State-of-the-Art Mincut Placers 307 15.5.1 Dragon 307 15.5.2 FengShui 307 15.5.3 NTUPlace2 307 15.5.4 Capo 308 References 308 Over the years, partitioning-based placement has seen many revisions and enhan cements, but the underlying framework illustrated in Figures15.1 and 15.2 remains much the same. Top-down partitioning- based placement algorithms seek to decompose a given placement instance into smaller instances by subdividing the placement region, assigning modules to subregions, and cutting the 289 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C015 Finals Page 290 29-9-2008 #3 290 Handbook of Algorithms for Physical Design Automation Variables: queue of placement bins Initialize queue with top-level placement bin 1 While (queue not empty) 2 Dequeue a bin 3 If (bin small enough) 4 Process bin with end-case placer 5 Else 6 Choose a cut-line for the bin 7 Build partitioning hypergraph from netlist and cells contained in the bin 8 Partition the bin into smaller bins (generally via min-cut bisection or quadrisection) 9 Enqueue each child bin FIGURE 15.1 Top-down partitioning-based placement. netlist hypergraph [7,19]. The top-down placement process can be viewed as a sequence of passes where each pass examines all bins and divides some of them into smaller bins. The division step is commonly accomplished with balanced mincut partitioning, which minimizes the number of signal nets connecting modules in multiple regions [7]. These techniques leverage well-understood and scalable algorithms for hypergraph partitioning and typically lead to routable placements [9]. Recent work offers extensions to block placement, large-scale mixed-size placement [15,18,31] and robust incremental placement [33]. 15.1 TOP-DOWN PARTITIONING-BASED PLACEMENT FRAMEWORK Using mincut p artitioning in placement was presented by Breuer in 1977 [7]. The underlying framework remains mostly the same and is illustrated in Figures 15.1 and 15.2. The core area is comprised of a series of placement bins which represent (1) a placement region with allowed module locations (sites), (2) a collection of circuit modules to be placed in this region, (3) all signal nets incident to the modules in the region, and (4) fixed cells and pins outside the region that are connected to modules in the region (terminals). End-case placement Placement bin 4 3 2 1 etc. FIGURE 15.2 The overall process of top-down placement is shown on the left. The placement area and netlist are successively divided into placement bins until the bins are small enough for end-case placement. One important enhancement to top-down placement is terminal propagation, shown on the right. The net in question has five fixed terminals: four above and one below the cutline. It also has movable cells, which are represented by the cell with a dashed outline. The four fixed terminals above the cutline are propagated to the black circle at the top of the bin while the one fixed terminal below the cutline is propagated to the black circle below the cutline. The movable cells remain unpropagated. Note that the net is inessential because terminals are propagated to both sides of the cutline. (From Roy, J. A. and Markov, I. L., IEEE Trans. CAD, 26, 632, 2007. ) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C015 Finals Page 291 29-9-2008 #4 Partitioning-Based Methods 291 Mincut partitioning-basedplacers pro ceed by dividing the netlist and placemen t area into successively smaller pieces until the pieces are small enough to be handled efficiently by optimal end-case placers [11]. State-of-the-art p lacers generally use awide range ofhy pergraph p artitioning techniques to best fit partitioning problem size—optimal (branch-and-bound [11]), middle-range (Fiduccia– Mattheyses [20]), and large-scale (multilevel Fiduccia–Mattheyses [10, 26]). Mincut placement is highly scalable (due in large part to algorithmic advances in mincut partitioning [10,20,26]) and typically produces routable placements. In this section, we introduce topics relevant to top-down partitioning-based placement that must be addressed by all modern mincut placers. Specifically these include terminal propagation, bipartitioning versus multiway partitioning, cutline selection, and whitespace (or free space) allocation. 15.1.1 TERMINAL PROPAGATION AND INESSENTIAL NETS Proper handling of terminals is essential to the success of top-down placement approaches [11,19, 21,37]. When a placement bin is split into multiple subregions, some of the cells inside may be tightly connected to cells outside of the bin. Ignoring such connections can adversely affect the quality of a placement because they can account for significant amounts o f wirelength. On the other hand, these terminals are irrelevant to the classic partitioning formulation as they cannot be freely assigned to partitions. A compromise is possible by using an extended formulation of partitioning with fixed terminals, where the terminals are considered to be fixed in (propagated to) one or more partitions, and assigned zero areas (original areas are ignored). Nets propagated to both partitions in bipartitioning are considered inessential because they will always be cut and can be safely removed from the partitioning instance to improve runtime [11]. Terminal propagation is typically driven by geometric proximity of terminals to subregions/partitions. Figure 15.2 (right) depicts terminal propagation for a net with several fixed terminals. This particular net is inessential for bipartitioning as it ha s terminals prop a gated to both sid es of the cutline. 15.1.2 BIPARTITIONING VERSUS MULTIWAY PARTITIONING In his seminal work on mincut placement, Breuer introduced two forms of recursive mincut placement: slice/bisection and quadrature [7]. The style of mincut placement most commonly u sed today has grown from the quadrature technique, which advocated the use of horizontal and vertical cuts; the slice/bisection technique used only horizontal cuts and exhibited worse performance than quadrature [7]. Since that time, horizontal and vertical cutlines have been standard in all placement techniques, but there has been debate as to whether there should be an ordering to the cuts (i.e., horizontally bisect a bin then vertically bisect its children as in quadrature [7]) or both cuts should be done simultaneously as in quadrisection [37]. Quadrisection has been sh own to allow f or the optimization of techniquesother than mincut(such as minimalspanning tree length [21]),butterminal propagation is more complex when splitting a bin into four child bins instead of two. Also, bisection can simulate quadrisection with added flexibility in cutline selection and shifting (see Section 15.1.3) [31]. There are currently no known implemen tations that use greater than four-way partitioning and the majority of partitionin g-based placement techniques involve mincut bipartitioning. 15.1.3 CUTLINE SELECTION AND SHIFTING Breuer studied two types of cutline direction selection techniques and found that alternating cutline directions from layer to layer produced better half-perimeter wirelength (HPWL) than using only horizontal cuts [7]. The authors of Ref. [40] studied this phenomenon further by testing 64 cutlin e direction sequences. Their experiments did not find that the two cut-sequences that alternate at each layer were the best, but did find that long sequences of cuts in the same dir ection during placement . Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C014 Finals Page 282 24-9-2008 #7 282 Handbook of Algorithms for Physical Design Automation the design unroutable. A strategy of. the 289 Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C015 Finals Page 290 29-9-2008 #3 290 Handbook of Algorithms for Physical Design Automation Variables: queue of placement bins Initialize. Combinatorial techniques for mixed-size placement, ACM Transactions on Design Automation of Electronic Systems, 10(5): 2005. Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C014