Microsoft SQL Server 2008 R2 Unleashed- P64 pdf

ptg 574 CHAPTER 19 Replication . Snapshot replication—A complete copy of the publication is sent out to all subscribers. This includes both changed and unchanged data. . Merge replication—All sites make changes to local data independently and then update the publisher. It is possible for conflicts to occur, but they can be resolved. SQL Server Replication Types Microsoft has narrowed the field to three major types of data replication approaches within SQL Server: snapshot, transactional, and merge. Each replication type applies to only a single publication. However, it is possible to have multiple replication types per database. Snapshot Replication Snapshot replication makes an image of all the tables in a publication at a single moment in time and then moves that entire image to the subscribers. Little overhead on the server is incurred because snapshot replication does not track data modifications as the other forms of replication do. It is possible, however, for snapshot replication to require large amounts of network bandwidth, especially if the articles being replicated are large. Snapshot replication is the easiest form of replication to set up and is used primarily with smaller tables for which subscribers do not have to perform updates. An example of this might be a phone list that is to be replicated to many subscribers. This phone list is not considered to be critical data, and the frequency of it being refreshed is more than enough to satisfy all its users. The primary agents used for snapshot replication are the snapshot agent and distribution agent. . The snapshot agent creates files that contain the schema of the publication and the data. The files are temporarily stored in the snapshot folder of the distribution server, and then the distribution jobs are recorded in the distribution database. . The distribution agent is responsible for moving the schema and data from the distributor to the subscribers. A few other agents are also used; they deal with other needed tasks for replication, such as cleanup of files and history. In snapshot replication, after the snapshot has been delivered to all the subscribers, these agents delete the associated .bcp and .sch files from the distributor’s working directory. Transactional Replication Transactional replication is the process of capturing transactions from the transaction log of the published database and applying them to the subscription databases. With SQL Server transactional replication, you can publish all or part of a table, views, or one or more stored procedures as an article. All data updates are then stored in the distribution database and sent and applied to any number of subscribing servers. Obtaining these updates from the publishing database’s transaction log is extremely efficient. No direct reading of tables is required except during initial snapshot, and only the minimal amount Download from www.wowebook.com ptg 575 SQL Server Replication Types 19 of traffic is generated over the network. This has made transactional replication the most often used method. As data changes are made, they are propagated to the other sites at nearly real-time; you determine the frequency of this propagation. Because changes are usually made only at the publishing server, data conflicts are avoided for the most part. As an example, push subscribers usually receive updates from the publisher in a minute or less, depending on the speed and availability of the network. Subscribers also can be set up for pull subscrip- tions. This capability is useful for disconnected users who are not connected to the network at all times. The primary agents used for transactional replication are the snapshot agent, log agent, and distribution agent: . The snapshot agent creates files that contain the schema of the publication and the data. The files are stored in the snapshot folder of the distribution server, and the distribution jobs are recorded in the distribution database. . The log reader agent monitors the transaction log of the database that it is set up to service. Each database published has its own log reader agent set up for replication, and it will copy the transactions from the transaction log of that published database into the distribution database. . The distribution agent is responsible for moving the schema and data from the distributor to the subscribers for the initial synchronization and then moving all the subsequent transactions from the published database to each subscriber as they come in. These transactions are stored in the distribution database for a certain length of time and are eventually purged. A few other agents deal with the other housekeeping issues surrounding data replication, such as schema files cleanup, history cleanup, and transaction cleanup. Merge Replication Merge replication involves getting the publisher and all subscribers initialized and then allowing data to be changed at all sites involved in the merge replication at the publisher and at all subscribers. All these changes to the data are subsequently merged at certain intervals so that, again, all copies of the database have identical data. Occasionally, data conflicts have to be resolved. The publisher does not always win in a conflict resolution. Instead, the winner is determined by whatever criteria you establish. The primary agents used for merge replication are the snapshot agent and merge agent: . The snapshot agent creates files that contain the schema of the publication and the data. The files are stored in the snapshot folder of the distribution server, and the distribution jobs are recorded in the distribution database. This is essentially the same behavior as with all other types of replication methods. Download from www.wowebook.com ptg 576 CHAPTER 19 Replication . The merge agent takes the initial snapshot and applies it to all the subscribers. It then reconciles all changes made on all the servers, based on the rules you configure. Preparing for Merge Replication When you set up a table for merge replication, SQL Server performs three schema changes to the database. First, it must either identify or create a unique column for each row that will be replicated. This column is used to identify the different rows across all the different copies of the table. If the table already contains a column with the ROWGUIDCOL prop- erty, SQL Server automatically uses that column for the row identifier. If not, SQL Server adds a column called rowguid to the table. SQL Server also places an index on this rowguid column. Next, SQL Server adds triggers to the table to track changes that occur to the data in the table and record them in the merge system tables. The triggers can track changes at either the row or column level, depending on how you set it up. SQL Server supports multiple triggers of the same type on a table, so merge triggers do not interfere with user-defined triggers on the table. Finally, SQL Server adds new system tables to the database that contains the replicated tables. The MSMerge_contents and MSMerge_tombstone tables track the updates, inserts, and deletes. These tables rely on rowguid to track which rows have actually been changed. The merge agent is responsible for moving changed data from the site where it was changed to all other sites in the replication scenario. When a row is updated, the triggers added by SQL Server fire off and update the new system tables, setting the generation column equal to 0 for the corresponding rowguid. When the merge agent runs, it collects the data from the rows where the generation column is 0 and then resets the generation values to values higher than the previous generation numbers. This allows the merge agent to look for data that has already been shared with other sites without having to look through all the data. The merge agent then sends the changed data to the other sites. When the data reaches the other sites, the data is merged with existing data according to rules you have defined. These rules are flexible and highly extensible. The merge agent evaluates existing and new data and resolves conflicts based on priorities or which data was changed first. Another available option is that you can create custom resolution strategies using the Component Object Model (COM) and custom stored procedures. After conflicts have been handled, synchronization occurs to ensure that all sites have the same data. The merge agent identifies conflicts using the MSMerge_contents table. In this table, a column called lineage is used to track the history of changes to a row. The agent updates the lineage value whenever a user makes changes to the data in a row. The entry into this column is a combination of a site identifier and the last version of the row created at the site. As the merge agent is merging all the changes that have occurred, it examines each site’s information to see whether a conflict has occurred. If a conflict has occurred, the agent initiates conflict resolution based on the criteria mentioned earlier. Download from www.wowebook.com ptg 577 Basing the Replication Design on User Requirements 19 Basing the Replication Design on User Requirements As mentioned earlier, business requirements drive your replication configuration and method. In addition, nailing down all the details of the business requirements is the hardest part of a data replication design process. After you have completed the requirements gathering, the replication design usually just falls into place from it easily. The requirements gathering is highly recommended to get a prototype up and running as quickly as possible to measure the effectiveness of one approach over the other. You must understand several key aspects to make the right design decisions, including the following: . What is the number of sites, and what is the site autonomy in the scope (location)? . Which sites have the master data (data ownership)? . What is the data latency requirement (by site)? . What types of data accesses are being made (by site)? . Reads . Writes . Updates . Deletes This information needs to include exactly what data and data subsets that drive filtering are needed for the data accesses (by site). . What is the volume of activity/transactions, including the number of users (by site)? . How many machines do you have to work with (by site)? . What are the available processing power (CPU and memory) and disk space on each of these machines (by site)? . What are the stability, speed, and saturation level of the network connections between machines (by site)? . What is the dial-in, Internet, or other access mechanism requirement for the data? . What potential subscriber or publisher database engines are involved? Figure 19.23 shows the factors that contribute to replication designs and the possible data replication configuration that would best be used. It is only a partial table because of the numerous factors and many replication configuration options available. However, it gives a good idea of the general design approach described here. Perhaps 95% of user requirements can be classified fairly easily. The other 5% might take some imagination in deter- mining the best overall solution. Depending on the requirements that need to be Download from www.wowebook.com ptg 578 CHAPTER 19 Replication FIGURE 19.23 Replication design factors. supported, you might even end up with a solution using something like database mirror- ing or other distribution techniques. Data Characteristics You need to analyze the underlying data types and characteristics thoroughly. Issues such as collation or character set and data sorting come into play. You must be aware of what they are set to on all nodes of your replication configuration. SQL Server 2008 does not convert the replicated data and might even mistranslate the data as it is replicated because it is impossible to map all characters between character sets. It is best to look up the character set “mapping chart” for SQL Server replication to all other data target environments. Most are covered well, but problems arise with certain data types, such as image, timestamp, and identity. Sometimes, using the Unicode data types at all sites is best for consistency. Following is a general list of issues to watch out for in this regard: . Collation consistency across all nodes of replication. . Time stamp column data in replication. It might not be what you think. . identity, uniqueidentifier, and guid column behavior with data replication. . text or image data types to heterogeneous subscribers. . Missing or unsupported data types because of prior versions of SQL Server or heterogeneous subscribers as part of the replication configuration. Download from www.wowebook.com ptg 579 Setting Up Replication 19 FIGURE 19.24 SQL Server 2008 replication object limitations. . Maximum row size limitations between merge replication and transactional replication. Figure 19.24 lists further SQL Server 2008 replication object limitations. NOTE If you have triggers on your tables and you want them to be replicated along with your table, you might want to add the line of code NOT FOR REPLICATION so that the trigger code isn’t executed redundantly on the subscriber side. Setting Up Replication In general, SQL Server 2008 data replication is exceptionally easy to set up via SQL Server Management Studio wizards. However, if you use the wizards, you need to be sure to generate SQL scripts for every phase of replication configuration. In a production environ- ment, you are likely to rely heavily on scripts and not have the luxury of having much time to set up and break down production replication configurations via wizards. Generating SQL scripts also eases the setup/breakdown process in development, test, and user acceptance environments. You always have to define any data replication configuration in the following order: 1. Create or enable a distributor to enable publishing. 2. Enable publishing. (A distributor must be designated for a publisher.) 3. Create a publication and define articles within the publication. 4. Define subscribers and subscribe to a publication. Figure 19.25 shows SQL Server Management Studio Object Explorer with three separate server connections. These three servers represent a possible replication topology. Download from www.wowebook.com ptg 580 CHAPTER 19 Replication FIGURE 19.25 Three servers to be used in the replication topology (central publisher, remote distributor, and subscriber). The following section takes you through the process of building up a typical central publisher/remote distribution data replication configuration. The following SQL Server named instances are used for different purposes (as shown in Figure 19.25): . Publisher—A SQL08DE01 named instance . Distributor—A SQL08DE02 named instance (REMOTE distributor) . Subscriber—A SQL08DE03 named instance The following section highlights the different areas in SQL Server Management Studio that are needed to create this replication configuration. Creating a Distributor and Enabling Publishing Before setting up a publisher, you have to designate a distribution server to be used by that publisher. As discussed earlier, you can either configure the local server as the distribution server or choose a remote server as the distributor (not on the same machine as the Download from www.wowebook.com ptg 581 Setting Up Replication 19 FIGURE 19.26 Configuring a separate distributor (REMOTE) wizard. publication server). You can configure the server as a distributor and publisher at the same time, or you can configure the server as a dedicated distributor on the remote server separately. In the sample topology described here, you start by creating a remote distributor separately so you can orient yourself to what is happening on each server in the topology as it is being built up. You are also able to enable a specific SQL Server instance as the publisher that will use this distributor (all in one wizard sequence). This method is very efficient. Before you can configure replication, you must be a member of the sysadmin server role, so you should ensure that now. Then you use the following steps to configure a server as a distributor (remote distributor): 1. In SQL Server Management Studio, locate the Replication node under the server that will be the distributor (under the SQL08DE02 named instance node). Right-click the Replication node and choose Configure Distribution. This starts you through the wizard, which provides three options: . Configure this server to be a distributor. . Configure this server to be both a publisher and distributor. . Configure this server to be a publisher that uses another server as its distributor. 2. When the wizard starts, click past the initial Configure Distribution Wizard splash page. Then choose the first radio button, which should say ’DBARCH- LT2\SQL08DE02’ Will Act as Its Own Distributor (as shown in Figure 19.26). This designates this server as a distributor for one or more publishers. The distribution database and log are created here as well (and not on the publication server). Download from www.wowebook.com ptg 582 CHAPTER 19 Replication FIGURE 19.27 Specification of the distribution database name and location. 3. You are then asked how you want the replication agents to be started. Select the agents to be started automatically (the Yes option). 4. Next comes the location for the snapshot folder. Give it the proper network full pathname. Remember that potentially a large amount of data will be coming here, and it should be on a drive that can support the snapshot concept without filling up the drive. 5. When you are asked to configure the distribution database, select the default settings. Figure 19.27 shows all the distribution database name and location information. 6. Identify the publisher if you know which SQL Server instance will be publishing the data that this distributor will distribute for. To do this, click the Add button at the bottom-left corner of the Publishers page to enable servers to use this distributor when they become publishers. You are prompted for the server name and authenti- cation method for the distributor to reach this publisher. Specify DBARCH- LT2\SQL08DE01 as a publisher that will use this distributor. The end result, as shown in Figure 19.28, is DBARCH-LT2\SQL08DE01 designated (checked) as a publisher that will use this distribution database (distributor). Remember to uncheck the SQL Server named instance of the distribution server if you don’t want to publish from that server (the SQL08DE02 named instance). 7. Specify a distributor password. This is the password that will be used by publishers to connect to the distributor. You will be able to administer this password through SQL Server Management Studio directly. The wizard then summarizes what actions you want to take place, such as configure the distribution server or generate a script file with steps to configure distribution. Choose both. It’s always good to have the scripts created now so you can start script-based configurations immediately. A Download from www.wowebook.com ptg 583 Setting Up Replication 19 FIGURE 19.28 Designate the publisher that will use this remote distributor. FIGURE 19.29 Completing the configuration of the distributor and enabling the publisher. Complete the Wizard page is displayed, describing all the tasks that are about to hap- pen, along with their configuration specifications. Figure19.29 show this summary. When you click Finish, several things begin to occur. First, a configuring dialog page comes up and spins its wheels through each step you have requested (as shown in Figure 19.30). A summary of steps, errors, and warnings is displayed on this page. When it completes, you can explore any issues (errors or warnings) by drilling down in the Report Download from www.wowebook.com . with the ROWGUIDCOL prop- erty, SQL Server automatically uses that column for the row identifier. If not, SQL Server adds a column called rowguid to the table. SQL Server also places an index on. general, SQL Server 2008 data replication is exceptionally easy to set up via SQL Server Management Studio wizards. However, if you use the wizards, you need to be sure to generate SQL scripts. versions of SQL Server or heterogeneous subscribers as part of the replication configuration. Download from www.wowebook.com ptg 579 Setting Up Replication 19 FIGURE 19.24 SQL Server 2008 replication