Hướng dẫn học Microsoft SQL Server 2008 part 154 pdf

Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1492 Part X Business Intelligence Service view, exposing only what makes sense to them. From the designer’s perspective, limiting the number of cubes and databases keeps the number of linked dimensions and measures to a minimum. Using the Cube Wizard has been covered in earlier sections, both from the top-down approach (see ‘‘Analysis Services Quick Start’’) using the cube design to generate corresponding relational and Integration Services packages, and from the bottom-up approach (see ‘‘Creating a Cube’’). Once the cube structure has been created, it is refined using the Cube Designer. Open any cube from the Solution Explorer to use the Cube Designer, shown in Figure 71-7. The Cube Designer presents information in several tabbed views described in the remainder of this section. FIGURE 71-7 Cube Designer Cube structure The cube structure view is the primary design surface for defining a cube. Along with the ever-present Solution Explorer and Properties panes, three panes present the cube’s structure: ■ Data Source View: This pane, located in the center of the view, shows a chosen portion of thedatasourceviewonwhichthecubeisbuilt. Each table is color-coded: yellow for fact 1492 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1493 Building Multidimensional Cubes with Analysis Services 71 tables, blue for dimensions, and white for neither. The tables available can be changed by right-clicking on the design surface and choosing an option from the context menu. Right- clicking a table presents options to hide that table or to show related tables. Diagrams defined within the data source view can be used as well by selecting the Copy Diagram From option on the context menu. Additionally, the toolbar can be used to toggle between diagram and tree views of the table and relationship data; the tree view can be very useful for answering questions about complex diagrams. ■ Measures: This pane, located in the upper-left section of the view, lists all of the cube’s measures organized by measure group. Both the toolbar and the context menu toggle between the tree and grid view of measures. ■ Dimensions: This pane, located in the lower-left section of the view, lists all dimensions associated with the cube. This list may be a subset of the defined dimensions from the Solution Explorer if not every dimension is in the current cube. Each dimension in the Dimensions pane shows the user hierarchies and attributes, and has a link to edit that dimension in the Dimension Designer. Because the order in which measures and dimensions appear in their respective lists determines the order in which users see them presented, the lists can be reordered using either the right-click Move Up/Move Down options or drag-and-drop while in tree view. Like the Dimension Designer, changes to a cube must be deployed before they can be browsed. Measures Each measure is based on a column from the data source view and an aggregate function. The aggregate function determines how data is processed from the fact table and how it is summarized. For example, consider a simple fact table with columns of day, store, and sales amount being read into a cube with a sales amount measure, a stores dimension, and a time dimension with year, month, and day attributes. If the aggregate function for the sales amount measure is Sum, then rows are read into the cube’s leaf level by summing the sales amount for any rows with the same store/day combinations. Higher levels, such as the store/month level, are determined by adding up individual days in that month. However, if the aggregate function is Min, then the smallest value is saved from all the sales on a given day, and the store/month level would be determined as the smallest of all the days in that month. Available aggregate functions include the following: ■ Sum: Adds the values of all children ■ Min: Minimum value of children ■ Max: Maximum value of children ■ Count: Count of the corresponding rows in the fact table ■ Distinct Count: Counts unique occurrences of the column value (e.g., Unique Customer Count) ■ None: No aggregation performed. Any value not read directly from the fact table will be null. ■ AverageOfChildren: Averages non-empty children ■ FirstChild: Value of the first child member as evaluated along the time dimension. ■ FirstNonEmpty: Value of the first non-empty child member as evaluated along the time dimension. 1493 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1494 Part X Business Intelligence ■ LastChild: Value of the last child member as evaluated along the time dimension. ■ LastNonEmpty: Value of the last non-empty child member as evaluated along the time dimension. ■ ByAccount: Aggregation varies based on the values in the Account Dimension. The dimension’s Type property must be Accounts, and one of the dimension’s attributes must have the Type property set to AccountType. The column corresponding to AccountType contains defined strings that identify the type of account, and thus the aggregation method, to Analysis Services. The best way to add a new measure is to right-click in the Measures pane and choose New Measure. Specify the aggregation function and table/column combination in the New Measure dialog. The new measure will automatically be added to the appropriate measure group. Measure groups are created for each fact table plus any distinct count measure defined. These groups correspond to different SQL queries that are run to retrieve the cube’s data. Beyond measures derived directly from fact tables, calculated measures can be added by the Business Intelligence Wizard and directly via the calculations view. For more information about calculated measures, see Chapter 72, ‘‘Programming MDX Queries.’’ Measures can be presented to the user grouped in folders by setting the DisplayFolder property to the name of the folder in which the measure should appear. It is also good practice to assign each measure a default format by setting the FormatString property, either by choosing one of the common formats from the list or by directly entering a custom format. Each cube can have a default measure specified if desired, which provides a measure for queries when no measure is explicitly requested. To set the default measure, select the cube name at the top of the Measures pane tree view, and set the DefaultMeasure property by selecting a measure from the list. Cube dimensions The hierarchies and attributes for each dimension can be either disabled (Enabled and AttributeHierarchyEnabled properties, respectively) or made invisible (Visible and AttributeHierarchyVisible properties, respectively) if appropriate for a particular cube context (see ‘‘Visibility and Organization’’ earlier in the chapter, for example scenarios). Access these settings in the Dimensions pane and then adjust the associated properties. These properties are specific to a dimension’s role in the cube and do not change the underlying dimension design. Dimensions can be added to the cube by right-clicking the Dimensions pane and choosing New Dimen- sion. Once the dimension has been added to the cube, review the dimension usage view to ensure that the dimension is appropriately related to all measure groups. Dimension usage The dimension usage view displays a table showing how each dimension is related to each measure group. With dimensions and measure groups as row and column headers, respectively, each cell of the table defines the relationship between the corresponding dimension/measure group pair. Drop-down lists in the upper-left corner enable rows and columns to be hidden to simplify large views. 1494 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1495 Building Multidimensional Cubes with Analysis Services 71 The Cube Designer creates default relationships based on the data source view relationships, which are accurate in most cases, although any linked objects require special review because they are not derived from the data source view. Click on the ellipses in any table cell to launch the Define Relationship dialog and choose the relationship type. Different relationship types require different mapping information, as described in the following sections. No relationship For a database with more than one fact table, there will likely be dimensions that don’t relate to some measure groups. Signified by gray table cells with no annotation, this setting is expected for measure group/dimension pairs that don’t share a meaningful relationship. When a query is run that specifies dimension information unrelated to a given measure, it is ignored by default. Regular The regular relationship is a fact table relating directly to a dimension table, as in a star schema. Within the Define Relationship dialog, choose the Granularity attribute as the dimension attribute that relates directly to the measure group, usually the dimension’s key attribute. Once the granularity attribute has been chosen, specify the fact table column names that match the granularity attribute’s key columns in the relationships grid at the bottom of the dialog. Choosing to relate a dimension to a measure group via a non-key attribute does work, but it must be considered in the context of the dimension’s natural hierarchy (see ‘‘Attribute Relationships,’’ earlier in the chapter). Think of the natural hierarchy as a tree with the key attribute at the bottom. Any attribute at or above the related attribute will be related to the measure group and behave as expected. Any attribute below or on a different branch from the related attribute will have ‘‘no relationship,’’ as described in the preceding section. Fact Fact dimensions are those derived directly from the fact table when a fact table contains both fact and dimension data. No settings are required beyond the relationship type. Only one dimension can have a fact relationship with a given measure group, effectively requiring a single fact dimension per fact table containing all dimension data in that fact table. Referenced When dimension tables are connected to a fact table in a snowflake schema, the dimension could be implemented as a single dimension that has a regular relationship with the measure group, or the dimension could be implemented as a regular dimension plus one or more referenced dimensions. A referenced dimension is indirectly related to the measure group through another dimension. The single dimension with a regular relationship is certainly simpler, but if a referenced dimension can be created and used with multiple chains of different regular dimensions (e.g., a Geography dimension used with both Store and Customer dimensions), then the referenced option will be more storage and process efficient. Referenced relationships can chain together dimensions to any depth. Create the referenced relationship in the Define Relationship dialog by selecting an intermediate dimension by which the referenced dimension relates to the measure group. Then choose the attributes by which the referenced and intermediate dimensions relate. Normally, the Materialize option should be selected for best performance. 1495 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1496 Part X Business Intelligence Many-to-Many Relationships discussed so far have all been one-to-many: One store has many sales transactions, one country has many customers. For an example of a many-to-many relationship, consider tracking book sales by book and author, whereby each book can have many authors and each author can create many books. The many-to-many relationship can be modeled in Analysis Services, but it requires a specific configuration beginning with the data source view (see Figure 71-8). The many-to-many relationship is implemented via an intermediate fact table that lists each pairing of the regular and many-to-many dimensions. For other slightly simpler applications, the regular dimension can be omitted and the intermediate fact table related directly to the fact table. FIGURE 71-8 Example of a many-to-many relationship SalesItemID FactSalesItem PK FK1 BookID BookID AuthorID FactBookAuthor PK, FK1 PK, FK2 BookID dimBook PK PK Title AuthorID dimAuthor Fact Table Intermediate Fact Table Regular Dimension Many-to-Many Dimension FirstName LastName The Define Relationship dialog only requires the name of a measure group created on the intermediate fact table to configure the many-to-many relationship. Other configuration is derived from the data source view. Many-to-many relationships have the query side effect of generating result sets that don’t total in an intuitive way. Using the book sales example, assume that many of the books sold have multiple authors. A query showing books by author will display a list of numbers whose arithmetic total is greater than the total number of books sold. Often, this will be expected and understood behavior, although some applications will require MDX scripting to gain the desired behavior in all views of the cube. Calculations The Calculations tab enables the definition of calculated measures, sets of dimension members, and dynamic control over cube properties. While the Calculations tab offers forms to view many of the objects defined here, the underlying language is MDX (Multidimensional Expressions), so details on how to manipulate calculations are covered in the next chapter. For more information about defining scripting, see Chapter 72, ‘‘Programming MDX Queries.’’ 1496 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1497 Building Multidimensional Cubes with Analysis Services 71 KPIs A Key Performance Indicator (KPI) is a server-side calculation meant to define an organization’s most important metrics. These metrics, such as net profit, client utilization, or funnel conversion rate, are frequently used in dashboards or other reporting tools for distribution at all levels throughout the organization. Using a KPI to host such a metric helps ensure consistent calculation and presentation. Within the KPI’s view, an individual KPI consists of several components: ■ The actual value of the metric, entered as an MDX expression that calculates the metric ■ The goal for the metric — for example, what the budget says net profit should be. The goal is entered as an MDX expression that calculates the metric’s goal value. ■ The status for the metric, comparing the actual and goal values. This is entered as an MDX expression that returns values between -1 (very bad) to +1 (very good). A graphic can also be chosen as a suggestion to applications that present KPI data, helping to keep the presentation consistent across applications. ■ The trend for the metric, showing which direction the metric is headed. Like status, trend is entered as an MDX expression that returns values between -1 and +1, with a suggested graphic. As KPI definitions are entered, use the toolbar to switch between form (definition) and browser mode to view results. The Calculations Tools pane (lower left) provides cube metadata and the MDX functions list for drag-and drop-creation of MDX expressions. The Templates tab provides templates for some common KPIs. Actions The Actions tab of the Cube Designer provides a way to define actions that a client can perform for a given context. For example, a drillthrough action can show detailed rows behind a total, or a reporting action can launch a report based on a dimension attribute’s value. Actions can be specific to any dis- played data, including individual cells and dimension members, resulting in more detailed analysis or even integration of the analysis application into a larger data management framework. New in 2008 D rillthrough actions now use cube data to display their results. Prior versions required access to the underlying relational data to provide the display of detail data. Partitions Partitions are the unit of storage in Analysis Services, storing the data of a measure group. Initially, the Cube Designer creates a single MOLAP partition for each measure group. MOLAP is the preferred storage mode for most scenarios, but setting partition sizes and aggregations is key to both effective processing and efficient query execution. 1497 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1498 Part X Business Intelligence Partition sizing Cube development normally begins by using a small but representative slice of the data, yet production volumes are frequentlyquite large, with cubes summarizing a billion rows per quarter and more. A partitioning strategy is needed to manage data through both the relational and Analysis Services databases, beginning with the amount of data to be kept online and the size of the partitions that will hold that data. The amount of data to be kept online is a trade-off between the desire for access to historical data and the cost of storing that data. Once the retention policy has been determined, there are many possible ways to partition that data into manageable chunks, but a time-based approach is widely used, usually keeping either a year’s or a month’s worth of data in a single partition. For partitions being populated on the front end, the size of the partition is important for the time it takes to process — processing time should be kept to a few hours at most. For partitions being deleted at the back end, the size of the partition is important for the amount of data it removes at one time. Matching the partition size and retention between the relational database and Analysis Services is a simple and effective approach. As the number of rows imported each day grows, smaller partition sizes (such as week or day) may be required to expedite initial processing. As long as the aggregation design is consistent across partitions, Analysis Services will allow smaller partitions to be merged, keeping the overall count at a manageable level. Best Practice T ake time to consider retention, processing, and partitioning strategies before an application goes into production. Once in place, changes may be very expensive given the large quantities of data involved. Creating partitions The key to accurate partitions is including every data row exactly once. Because it is the combination of all partitions that is reported by the cube, including rows multiple times will inflate the results. A common mistake is to add new partitions while forgetting to delete the default partition created by the Designer; because the new partitions contain one copy of all the source data, and the default partition contains another, cube results are exactly double the true values. The partition view consists of one collapsible pane for each measure group, each pane containing a grid listing the currently defined partitions for that measure group. Highlighting a grid row will select that partition and display its associated properties in the Properties pane. Start the process of adding a partition by clicking the New Partition link, which launches a series of Par- tition Wizard dialogs: ■ Specify Source Information: Choose the appropriate Measure group (the default is the measure group selected when the wizard is launched). If the source table is included as part of the data source view, then it will appear in the Available tables list and can be selected there. If the source table is not part of the data source view, then choose the appropriate data source from the Look in list and press the Find Tables button to list other tables with the same structure. Optionally, enter a portion of the source table’s name in the Filter Tables text box to limit the list of tables returned. 1498 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1499 Building Multidimensional Cubes with Analysis Services 71 ■ Restrict Rows: If the source table contains exactly the rows to be included in the partition, then skip this page. If the source table contains more rows than should be included in the partition, then select the ‘‘Specify query to restrict rows’’ option, and the Query box will be populated with a fully populated SELECT query missing only the WHERE clause. Supply the missing constraint(s) in the Query window and press the Check button to validate syntax. ■ Processing and Storage Locations: The defaults will suffice for most situations. If necessary, choose options to balance load across disks and servers. ■ Completing the Wizard: Supply a name for the partition — generally the same name as the measure group suffixed with the partition slice (e.g., Internet_Orders_2004). If aggregations have not been defined, define them now. If aggregations have already been defined for another partition, then copy these existing aggregations from that partition to ensure consistency across partitions. Once a partition has been added, the name and source can be edited by clicking in the appropriate cell in the partition grid. Aggregation design The best trade-off between processing time, partition storage, and query performance is defining only aggregations that help answer queries commonly run against a cube. Analysis Services’ usage-based optimization tracks queries run against the cube and then designs aggregations to meet that query load. However, representative query history usually requires a period of production use, so the aggregations can also be based on intelligent guesses. New in 2008 T he Cube Designer now includes an Aggregations tab that allows summary and detailed views of aggregations for each partition. It introduces the concept of named aggregation designs, which are groups of aggregations specific to a measure group that can be assigned to its associated partitions. A good approach is to first create a modest number of aggregations using the Aggregation Design Wizard and assign that design to all active partitions. Then deploy the cube for use to collect a realistic query history by enabling query logging (see Analysis Server ‘‘Log’’ properties by right clicking on the server in SQL Server Management Studio). Finally, use the query log to generate a more efficient aggregation design based on usage-based optimization. Aggregation Design Wizard The Aggregation Design Wizard will create aggregations based on intelligent guesses. Invoke the wizard from the toolbar on the Aggregations tab of the Cube Designer. The wizard steps through several pages: ■ Select Partitions to Modify: Each run of the wizard is specific to the measure group selected when the wizard is invoked. Check all the partitions to be updated with the new aggregation design. At least one partition must be selected, and designs can also be moved to other partitions later. 1499 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1500 Part X Business Intelligence ■ Review Aggregation Usage: All the attributes for every dimension related to the measure group are presented with their usage settings. The default generally suffices, but options include the following: ■ Full: Include this attribute in every aggregation. ■ None: Don’t include this attribute in any aggregation. ■ Unrestricted: Considers this attribute for inclusion in the design without restrictions. ■ Specify Object Counts: Accurate row counts for each partition and dimension table drive how aggregations are calculated. Pressing the Count button will provide current row counts, with the Estimated Count reflecting the total number of rows currently in the database, and the Partition Count reflecting the number of rows that will be included in the first partition. Numbers can be manually entered if the current data source is different from the target design (e.g., a small development data set). ■ Set Aggregation Options: This page actually designs the aggregations. Options on the left tell the designer when to stop creating new aggregations, while the graph on the right provides estimated storage versus performance gain. Press the Continue button to create an aggregation design before pressing the Next button. There are no strict rules, but some general guidelines may help: ■ Unless storage is the primary constraint, target an initial performance gain of 10–20 per- cent. On the most complex cubes this will be difficult to obtain with a reasonable number of aggregations (and associated processing time). On simpler cubes more aggregations can be afforded, but they are already so fast that the additional aggregations don’t buy much. ■ Keep the total number of aggregations under 200 (aggregation count is shown at the bottom, just above the progress bar). ■ Look for an obvious knee (flattening of the curve) in the storage/performance graph and stop there. ■ Completing the Wizard: Give the new aggregation design a name. Choose either to save the design or to save and process it. Best Practice T he best aggregations are usage-based: Collect usage history in the query log and use it to opti- mize each partition’s aggregation design periodically. Query logging must be enabled in Analysis Server’s Server properties, in the Log\QueryLog section: Set CreateQueryLogTable to true, define a QueryLogConnectionString, and specify a QueryLogTableName. Aggregations tab The toolbar of this tab can launch the wizard as described earlier and the usage-based optimization wizard as well. The pane itself toggles between standard and advanced views. Standard view lists all the measure groups and summarizes which aggregation designs are assigned to which partitions. Right-click a design’s name to assign partitions to it. 1500 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1501 Building Multidimensional Cubes with Analysis Services 71 The advanced view allows detailed exploration and manual modification of an aggregation design. Choose the measure group and design name in the header, and a table of dimensions vs. individual aggregations appears. Any check that appears in the table indicates that the aggregation (such as A5) includes summaries by the indicated dimension attributes (such as Product Line and Quarter). Manual updates to a design are generally not effective because usage-based optimization tends to be more accurate than individual judgment, but cases do arise in which problem queries can be addressed by a well-placed aggregation. Use the toolbar to copy an existing design to a new name, and then modify as needed. New columns (aggregations) can be copied/added to the table using the toolbar as well. Perspectives A perspective is a view of a cube that hides items and functionality not relevant to a specific purpose. Perspectives appear as additional cubes to the end user, so each group within the company can have its own ‘‘cube,’’ each just a targeted view of the same data. Add a perspective by either right-clicking or using the toolbar, and a new column will appear. Overwrite the default name at the top of the column with a meaningful handle, and then uncheck the items not relevant to the perspective. A default measure can be chosen for the perspective as well — look for the DefaultMeasure object type in the second row of the grid. Data Storage The data storage strategy chosen for a cube and its components determines not only how the cube will be stored, but also how it can be processed. Storage settings can be set at three different levels, with parent settings determining defaults for the children: ■ Cube: Begin by establishing storage settings at the cube level to set defaults for the entire cube (dimensions, measure groups, and partitions). Access the Cube Storage Settings dialog by choosing a cube in the Cube Designer and then clicking the ellipses on the Proactive Caching property of the cube. ■ Measure Group: Used in the unlikely case that storage settings for a particular measure group differ from cube defaults. Access the Measure Group Storage Settings dialog by either clicking the ellipses on the measure group’s Proactive Caching property in the Cube Designer or by choosing the Storage Settings link in partition view without highlighting a specific partition. ■ Object level (specific partition or dimension): Sets the storage options for a single object. Access the Dimension Storage Settings dialog by clicking the ellipses on the dimension’s Proac- tive Caching property in the Dimension Designer. Access the Partition Storage Settings dialog by selecting a partition in the partition view and clicking the Storage Settings link. Each of the storage settings dialogs are essentially the same, differing only in the scope of the setting’s effect. The main page of the dialog contains a slider that selects preconfigured option settings — from the most real-time (far left) to the least real-time (far right). Each ‘‘stop’’ on the slider displays a summary of the options available. Alternately, position the slider and click the Options button to examine the options associated with a particular position. Beyond these few presets, the Storage Options dialog enables a wide range of behaviors. 1501 www.getcoolebook.com . to add new partitions while forgetting to delete the default partition created by the Designer; because the new partitions contain one copy of all the source data, and the default partition contains. have already been defined for another partition, then copy these existing aggregations from that partition to ensure consistency across partitions. Once a partition has been added, the name and. realistic query history by enabling query logging (see Analysis Server ‘‘Log’’ properties by right clicking on the server in SQL Server Management Studio). Finally, use the query log to generate