Microsoft SQL Server 2008 R2 Unleashed- P121 ppsx

ptg 1144 CHAPTER 34 Data Structures, Indexes, and Performance Deleting Rows What happens when rows are deleted from a table? How, and when, does SQL Server reclaim the space when data is removed from a table? Deleting Rows from a Heap In a heap table, SQL Server does not automatically compress the space on a page when a row is removed; that is, the rows are not all moved up to the beginning of the page to keep all free space at the end, as SQL Server did in versions prior to 7.0. To optimize performance, SQL Server holds off on compacting the rows until the page needs contigu- ous space for storing a new row. Deleting Rows from an Index Because the data pages of a clustered table are actually the leaf pages of the clustered index, the behavior of data row deletes on a clustered table is the same as row deletions from an index page. When rows are deleted from the leaf level of an index, they are not actually deleted but are marked as ghost records. Keeping the row as a ghost record makes it easier for SQL Server to perform key-range locking (key-range locking is discussed in Chapter 37, “Locking and Performance”). If ghost records were not used, SQL Server would have to lock the entire range surrounding the deleted record. With the ghost record still present and visible internally to SQL Server (it is not visible in query result sets), SQL Server can use the ghost record as an endpoint for the key-range lock to prevent “phantom” records with the same key value from being inserted, while allowing inserts of other values to proceed. Ghost records do not stay around forever, though. SQL Server has a special internal house- keeping process that periodically examines the leaf level of B-trees for ghost records and removes them. This is the same thread that performs the autoshrink process for databases. Whenever you delete a row, all nonclustered indexes need to be updated to remove the pointers to the deleted row. Nonleaf index rows are not ghosted when deleted. As with heap tables, however, the space is not compressed on the nonleaf index page until space is needed for a new row. Reclaiming Space Only when the last row is deleted from a data page is the page deallocated from the table. The only exception is if it is the last page remaining; all tables must have at least one page allocated, even if it’s empty. When a deletion of an index row leaves only one row remaining on the page, the remaining row is moved to a neighboring page, and the now-empty index page is deallocated. If the page to be deallocated is the last remaining used page in a uniform extent allocated to the table, the extent is deallocated from the table as well. Download from www.wowebook.com ptg 1145 Data Modification and Performance 34 Updating Rows SQL Server 2008 performs row updates by evaluating the number of rows affected, whether the rows are being accessed via a scan or index retrieval and whether any index keys are being modified, and automatically chooses the appropriate and most efficient update strategy for the rows affected. SQL Server can perform two types of update strategies: . In-place updates . Not-in-place updates In-Place Updates In SQL Server 2008, in-place updates are performed as often as possible to minimize the overhead of an update. An in-place update means that the row is modified where it is on the page, and only the affected bytes are changed. When an in-place update is performed, in addition to the reduced overhead in the table itself, only a single modify record is written to the log. However, if the table has a trigger on it or is marked for replication, the update is still done in place but is recorded in the log as a delete followed by an insert (this provides the before-and-after image for the trigger that is referenced in the inserted and deleted tables). In-place updates are performed whenever a heap is being updated and the row still fits on the same page, or when a clustered table is updated and the clustered key itself is not changed. You can get an in-place update if the clustered key changes but the row does not have to move; that is, the sorting of the rows wouldn’t change. Not-In-Place Updates If the change to a clustered key prevents an in-place update from being performed, or if the modification to a row increases its size such that it can no longer fit on its current page, the update is performed as a delete followed by an insert; this is referred to as a not- in-place update. When performing an update that affects multiple index keys, SQL Server keeps a list of the rows that need to be updated in memory, if it’s small enough; otherwise, it is stored in tempdb. SQL Server then sorts the list by index key and type of operation (delete or insert). This list of operations, called the input stream, consists of both the old and new values for every column in the affected rows as well as the unique row identifier for each row. SQL Server then examines the input stream to determine whether any of the updates conflict or would generate duplicate key values while processing (if they were to generate a duplicate key after processing, the update cannot proceed). It then rearranges the operations in the input stream in a manner to prevent any intermediate violations of the unique key. For example, consider the following update to a table with a unique key on a sequential primary key: update table1 set pkey = pkey + 1 Download from www.wowebook.com ptg 1146 CHAPTER 34 Data Structures, Indexes, and Performance Even though all values would still be unique when the update finished, if the update were performed internally one row at a time in sequential order, it would generate duplicates during the intermediate processing as the pkey value was incremented and matched the next pkey value. SQL Server would rearrange and rework the updates in the input stream to process them in a manner that would avoid the duplicates and then process them a row at a time. If possible, deletes and inserts on the same key value in the input stream are collapsed into a single update. In some cases, you might still get some rows that can be updated in place. Forward Pointers As mentioned earlier, when page splits on a clustered table occur, the nonclustered indexes do not need to be updated to reflect the new location of the rows because the row locator for the row is the clustered index key rather than the page and row ID. When an update operation on a heap table causes rows to move, the row locators in the nonclustered index would need to be updated to reflect the new location or the rows. This could be expensive if there were a larger number of nonclustered indexes on the heap. SQL Server 2008 addresses this performance issue through the use of forward pointers. When a row in a heap moves, it leaves a forward pointer in the original location of the row. The forward pointer avoids having to update the nonclustered index row locator. When SQL Server is searching for the row via the nonclustered index, the index pointer directs it to the original location, where the forward pointer redirects it to the new row location. A row never has more than one forward pointer. If the row moves again from its forwarded location, the forward pointer stored at the original row location is updated to the row’s new location. There is never a forward pointer that points to another forward pointer. If the row ever shrinks enough to fit back into its original location, the forward pointer is removed, and the row is put back where it originated. When a forward pointer is created, it remains unless the row moves back to its original location. The only other circumstance that results in forward pointers being deleted occurs when the entire database is shrunk. When a database file is shrunk and the data reorga- nized, all row locators are reassigned because the rows are moved to new pages. Index Utilization Now that you have an understanding of table and index structures and the overhead required to maintain your data and indexes, you might want to put things into practice to actually come up with an index design for your database, defining the appropriate indexes to support your queries. To effectively determine the appropriate indexes that should be created, you need to determine whether they’ll actually be used by the SQL Server Query Optimizer. If an index is not being used effectively, it’s just wasting space and creating unnecessary overhead during updates. Download from www.wowebook.com ptg 1147 Index Utilization 34 The main criterion to remember is that SQL Server does not use an index for the more efficient row locator lookup if at least the first column of the index is not included in a valid search argument (SARG) or join clause. You should keep this point in mind when choosing the column order for composite indexes. For example, consider the following index on the stores table in the bigpubs2008 database: create index nc1_stores on stores (city, state, zip) NOTE Unless stated otherwise, all sample queries from this point on in this chapter are run in the bigpubs2008 database, which is available on the included CD or via download from this book’s website at www.samspublishing.com. Instructions on installing this database is provided in the Introduction. Each of the following queries could use the index because they include the first column, city, of the index as part of the SARG: select stor_name from stores where city = ‘Frederick’ and state = ‘MD’ and zip = ‘21702’ select stor_name from stores where city = ‘Frederick’ and state = ‘MD’ select stor_name from stores where city = ‘Frederick’ and zip = ‘21702’ However, the following queries do not use the index for a row locator lookup because they don’t specify the city column as a SARG: select stor_name from stores where state = ‘MD’ and zip = ‘21702’ select stor_name from stores where zip = ‘21702’ For the index nc1_stores to be used for a row locator lookup in the last query, you would have to reorder the columns so that zip is first—but then the index wouldn’t be useful for Download from www.wowebook.com ptg 1148 CHAPTER 34 Data Structures, Indexes, and Performance any queries specifying only city and/or state. Satisfying all the preceding queries in this case would require additional indexes on the stores table. NOTE For the two preceding queries, if you were to display the execution plan information (as described in Chapter 36, “Query Analysis”), you might see that the queries actually use the nc1_stores index to retrieve the result set. However, if you look closely, you can see the queries are not using the index in the most efficient manner; the index is being used to perform an index scan rather than an index seek. An index seek is what we are really after. (Alternative query access methods are discussed in more detail in Chapter 35). In an index seek, SQL Server searches for the specific SARG by walking the index tree from the root level down to the specific row(s) with matching index key values and then uses the row locator value stored in the index key to directly retrieve the matching row(s) from the data page(s); the row locator is either a specific row identifier or the clustered key value for the row. For an index scan, SQL Server searches all the rows in the leaf level of the index, looking for possible matches. If any are found, it then uses the row locator to retrieve the data row. Although both seeks and scans use an index, the index scan is still more expensive in terms of I/O than an index seek but slightly less expensive than a table scan, which is why it is used. However, in this chapter you learn to design indexes that result in index seeks, and when this chapter talks about queries using an index, index seeks are what it refers to (except for the section on index covering, but that’s a horse of a slightly dif- ferent color). You might think that the easy solution to get row locator lookups on all possible columns is to index all the columns on a table so that any type of search criteria specified for a query can be helped by an index. This strategy might be somewhat appropriate in a read- only decision support system (DSS) environment that supports ad hoc queries, but it is not likely because many of the indexes probably still wouldn’t even be used. As you see in the section “Index Selection,” later in this chapter, just because an index is defined on a column doesn’t mean that the Query Optimizer is necessarily always going to use it if the search criteria are not selective enough. Also, creating that many indexes on a large table could take up a significant amount of space in the database, increasing the time required to back up and run DBCC checks on the database. As mentioned earlier, too many indexes on a table in an OLTP environment can generate a significant amount of overhead during inserts, updates, and deletes and have a detrimental impact on performance. TIP A common design mistake often made is too many indexes defined on tables in OLTP environments. In many cases, some of the indexes are redundant or are never even considered by the SQL Server Query Optimizer to process the queries used by the applications. These indexes end up simply wasting space and adding unnecessary overhead to data updates. Download from www.wowebook.com ptg 1149 Index Selection 34 A case in point was one client who had eight indexes defined on a table, four of which had the same column, which was a unique key, as the first column in the index. That column was included in the WHERE clauses for all queries and updates performed on the table. Only one of those four indexes was ever used. It is hoped that, by the end of this chapter, you understand why all these indexes were unnecessary and are able to recognize and determine which columns benefit from having indexes defined on them and which indexes to avoid. Index Selection To determine which indexes to define on a table, you need to perform a detailed query analysis. This process involves examining the search clauses to see what columns are referenced, knowing the bias of the data to determine the usefulness of the index, and ranking the queries in order of importance and frequency of execution. You have to be careful not to examine individual queries and develop indexes to support one query, without consid- ering the other queries that are executed on the table as well. You need to come up with a set of indexes that work for the best cross-section of your queries. TIP A useful tool to help you identify your frequently executed and critical queries is SQL Server Profiler. I’ve found SQL Server Profiler to be invaluable when going into a new client site and having to identify the problem queries that need tuning. SQL Server Profiler allows you to trace the procedures and queries being executed in SQL Server and capture the runtime, reads and writes, execution plans, and other processing information. This information can help you identify which queries are providing substandard performance, which ones are being executed most often, which indexes are being used by the queries, and so on. You c an analyze thi s informatio n your self manually or save a trace to analyze with the Database Engine Tuning Advisor. The features of SQL Server Profiler are covered in more detail in Chapter 6, “SQL Server Profiler.” The Database Engine Tuning Advisor is discussed in more detail in Chapter 55, “Configuring, Tuning, and Optimizing SQL Server Options.” Because it’s usually not possible to index for everything, you should index first for the queries most critical to your applications or those run frequently by many users. If you have a query that’s run only once a month, is it worth creating an index to support only that query and having to maintain it throughout the rest of the month? The sum of the additional processing time throughout the month could conceivably exceed the time required to perform a table scan to satisfy that one query. Download from www.wowebook.com ptg 1150 CHAPTER 34 Data Structures, Indexes, and Performance TIP If, due to query response time requirements, you must have an index in place when a query is run, consider creating the index only when you run the query and then drop- ping the index for the remainder of the month. This approach is feasible as long as the time it takes to create the index and run the query that uses the index doesn’t exceed the time it takes to simply run the query without the index in place. Evaluating Index Usefulness SQL Server provides indexes for two primary reasons: as a method to enforce the uniqueness of the data in the database tables and to provide faster access to data in the tables. Creating the appropriate indexes for a database is one of the most important aspects of physical database design. Because you can’t have an unlimited number of indexes on a table, and it wouldn’t be feasible anyway, you should create indexes on columns that have high selectivity so that your queries will use the indexes. The selectivity of an index can be defined as follows: Selectivity ratio = Number of unique index values / Number of rows in table If the selectivity ratio is high—that is, if a large number of rows can be uniquely identified by the key—the index is highly selective and useful to the Query Optimizer. The optimum selectivity would be 1, meaning that there is a unique value for each row. A low selectivity means that there are many duplicate values and the index would be less useful. The SQL Server Query Optimizer decides whether to use any indexes for a query based on the selectivity of the index. The higher the selectivity, the faster and more efficiently SQL Server can retrieve the result set. For example, say that you are evaluating useful indexes on the authors table in the bigpubs2008 database. Assume that most of the queries access the table either by author’s last name or by state. Because a large number of concurrent users modify data in this table, you are allowed to choose only one index—author’s last name or state. Which one should you choose? Let’s perform some analysis to see which one is a more useful, or selective, index. First, you need to determine the selectivity based on the author’s last name with a query on the authors table in the bigpubs2008 database: select count(distinct au_lname) as ‘# unique’, count(*) as ‘# rows’, str(count(distinct au_lname) / cast (count(*) as real),4,2) as ‘selectivity’ from authors go # unique # rows selectivity 160 172 0.93 Download from www.wowebook.com ptg 1151 Evaluating Index Usefulness 34 The selectivity ratio calculated for the au_lname column on the authors table, 0.93, indicates that an index on au_lname would be highly selective and a good candidate for an index. All but 12 rows in the table contain a unique value for last name. Now, look at the selectivity of the state column: select count(distinct state) as ‘# unique’, count(*) ‘# rows’, str(count(distinct state) / cast (count(*) as real),4,2) as ‘selectivity’ from authors go # unique # rows selectivity 38 172 0.22 As you can see, an index on the state column would be much less selective (0.22) than an index on the au_lname column and possibly not as useful. One of the questions to ask at this point is whether a few values in the state column that have a high number of duplicates are skewing the selectivity or whether there are just a few unique values in the table. You can determine this with a query similar to the following: select state, count(*) as numrows, count(*)/b.totalrows * 100 as percentage from authors a, (select convert(numeric(6,2), count(*)) as totalrows from authors) as b group by state, b.totalrows having count(*) > 1 order by 2 desc go state numrows percentage CA 37 21.5116200 NY 18 10.4651100 TX 15 8.7209300 OH 9 5.2325500 FL 8 4.6511600 IL 7 4.0697600 NJ 7 4.0697600 WA 6 3.4883700 PA 6 3.4883700 CO 5 2.9069700 LA 5 2.9069700 MI 5 2.9069700 Download from www.wowebook.com ptg 1152 CHAPTER 34 Data Structures, Indexes, and Performance MN 3 1.7441800 MO 3 1.7441800 OK 3 1.7441800 AZ 3 1.7441800 AK 2 1.1627900 IN 2 1.1627900 GA 2 1.1627900 MA 2 1.1627900 NC 2 1.1627900 NE 2 1.1627900 SD 2 1.1627900 VA 2 1.1627900 WI 2 1.1627900 WV 2 1.1627900 As you can see, most of the state values are relatively unique, except for one value, ’CA’, which accounts for more than 20% of the values in the table. Therefore, state is probably not a good candidate for an indexed column, especially if most of the time you are searching for authors from the state of California. SQL Server would generally find it more efficient to scan the whole table rather than search via the index. NOTE When a single value skews the selectivity of an index, as in this example with the state column, this type of column might be a candidate for a filtered index, a new fea- ture in SQL Server 2008. See the section “Filtered Indexes and Statistics,” later in this chapter. As a general rule of thumb, if the selectivity ratio for a nonclustered index key is less than 0.85 (in other words, if the Query Optimizer cannot discard at least 85% of the rows based on the key value), the Query Optimizer generally chooses a table scan to process the query rather than a nonclustered index. In such cases, performing a table scan to find all the qualifying rows is more efficient than seeking through the B-tree to locate a large number of data rows. NOTE You c an relate the concept of selectivity to a hypothetical example. Say t hat you ne ed to find every instance of the word SQL in this book. Would it be easier to do it by using the index and going back and forth from the index to all the pages that contain the word, or would it be easier just to scan each page from beginning to end to locate every occurrence? What if you had to find all references to the word squonk, if any? Squonk would definitely be easier to find via the index (actually, the index would help you determine that it doesn’t even exist). Therefore, the selectivity for Squonk would be high, and the selectivity for SQL would be much lower. Download from www.wowebook.com ptg 1153 Index Statistics 34 How does SQL Server determine whether an index is selective and which index, if it has more than one to choose from, would be the most efficient to use? For example, how would SQL Server know how many rows the following query might return? select * from table where key between 1000000 and 2000000 If the table contains 10,000,000 rows with values ranging between 0 and 20,000,000, how does the Query Optimizer know whether to use an index or a table scan? There could be 10 rows in the range, or 900,000. How does SQL Server estimate how many rows are between 1,000,000 and 2,000,000? The Query Optimizer gets this information from the index statistics, as described in the next section. Index Statistics As mentioned earlier, the selectivity of a key is an important factor that determines whether an index will be used to retrieve the data rows that satisfy a query. SQL Server stores the selectivity and a histogram of sample values of the key; based on the statistics stored for the key columns for the index and the SARGs specified for the query, the Query Optimizer decides which index to use. To see the statistical information stored for an index, use the DBCC SHOW_STATISTICS command, which returns the following pieces of information: . A histogram that contains an even sampling of the values for the first column in the index key. SQL Server stores up to 200 sample values in the histogram. . Index densities for the combination of columns in the index. Index density indicates the uniqueness of the index key(s) and is discussed later in this section. . The number of rows in the table at the time the statistics were computed. . The number of rows sampled to generate the statistics. . The number of sample values (steps) stored in the histogram. . The average key length. . Whether the index is defined on a string column. . The date and time the statistics were generated. The syntax for DBCC SHOW_STATISTICS is as follows: DBCC SHOW_STATISTICS (tablename, index) Listing 34.4 displays the abbreviated output from DBCC SHOW_STATISTICS, showing the statistical information for the aunmind nonclustered index on the au_lname and au_fname columns of the authors table. Download from www.wowebook.com . queries is SQL Server Profiler. I’ve found SQL Server Profiler to be invaluable when going into a new client site and having to identify the problem queries that need tuning. SQL Server Profiler. strategy for the rows affected. SQL Server can perform two types of update strategies: . In-place updates . Not-in-place updates In-Place Updates In SQL Server 2008, in-place updates are performed. makes it easier for SQL Server to perform key-range locking (key-range locking is discussed in Chapter 37, “Locking and Performance”). If ghost records were not used, SQL Server would have to

Microsoft SQL Server 2008 R2 Unleashed- P121 ppsx

Thông tin tài liệu

Từ khóa liên quan

Mục lục

Table of Contents

Introduction

Part I: Welcome to Microsoft SQL Server

1 SQL Server 2008 Overview

SQL Server Components and Features

SQL Server 2008 R2 Editions

SQL Server Licensing Models

Summary

2 What’s New in SQL Server 2008

New SQL Server 2008 Features

SQL Server 2008 Enhancements

Summary

3 Examples of SQL Server Implementations

Application Terms

OLTP Application Examples

DSS Application Examples

Summary

Part II: SQL Server Tools and Utilities

4 SQL Server Management Studio

What’s New in SSMS

The Integrated Environment

Administration Tools

Development Tools

Summary

5 SQL Server Command-Line Utilities

What’s New in SQL Server Command-Line Utilities

The sqlcmd Command-Line Utility

Tài liệu cùng người dùng

Tài liệu liên quan