1309 SQL server 2012 data integration recipes

1,043 42 0
  • Loading ...
1/1,043 trang
Tải xuống

Thông tin tài liệu

Ngày đăng: 11/07/2018, 16:58

www.it-ebooks.info For your convenience Apress has placed some of the front matter material after the index Please use the Bookmarks and Contents at a Glance links to access them www.it-ebooks.info Contents at a Glance About the Author��������������������������������������������������������������������������������������������������������� xlv About the Technical Reviewers xlvii Acknowledgments xlix Introduction li ■■Chapter 1: Sourcing Data from MS Office Applications ■■Chapter 2: Flat File Data Sources .61 ■■Chapter 3: XML Data Sources 133 ■■Chapter 4: SQL Databases 179 ■■Chapter 5: SQL Server Sources .241 ■■Chapter 6: Miscellaneous Data Sources 285 ■■Chapter 7: Exporting Data from SQL Server 343 ■■Chapter 8: Metadata .425 ■■Chapter 9: Data Transformation .481 ■■Chapter 10: Data Profiling 559 ■■Chapter 11: Delta Data Management 619 ■■Chapter 12: Change Tracking and Change Data Capture 681 ■■Chapter 13: Organising And Optimizing Data Loads .731 v www.it-ebooks.info ■ Contents at a Glance ■■Chapter 14: ETL Process Acceleration 801 ■■Chapter 15: Logging and Auditing 853 ■■Appendix A: Data Types 931 ■■Appendix B: Sample Databases and Scripts 973 Index 989 vi www.it-ebooks.info Introduction Microsoft SQL Server 2012 is a vast subject One part of the ecosystem of this powerful and comprehensive database which has evolved considerably over many years is data integration – or ETL if you want to use another virtually synonymous term Long gone are the days when BCP was the only available tool to load or export data Even DTS is now a distant memory Today the user is spoilt for choice when it comes to the plethora of tools and options available to get data into and out of the Microsoft RDBMS This book is an attempt to shed some light on many of the ways in which data can be both loaded into SQL Server and sent from it into the outside world I also try to give some ideas as to which techniques are the most appropriate to use when faced with various different challenges and situations This book is not, however, just an SSIS manual I have a profound respect for this excellent product, but not believe that it is the “one stop shop” which some developers take it to be I wanted to show readers that there are frequently alternative technologies which can be applied fruitfully in many ETL scenarios Indeed my philosophy is that when dealing with data you should always apply the right solution, and never believe that there is only one answer Consequently this book includes recipes on many of the other tools in the SQL Server universe Sometimes I have deliberately shown varied ways of dealing with essentially the same challenge I hope by doing this to arouse your curiosity and also to provide some practical examples of ways to get data from myriad sources into SQL Server databases cleanly and efficiently Although this book specifically targets users of SQL Server 2012 I try, wherever feasible, to say if a recipe can be applied to previous versions of the database I also try and highlight any new features and differences between SQL Server 2012 and older versions This is because it is unlikely that users will only ever deal with the latest version of this RDBMS, and are likely to have multiple versions in production on most sites I only ever go back to SQL Server 2005 when pointing out how the database has evolved, as this was the version which introduced SSIS which was the major turning point in SQL Server-based ETL As the book is focused on SQL Server nearly all the code used is T-SQL Some of the samples given are extremely simple, others are more complex All of it is concentrated on ETL requirements Consequently you will find no OLTP or DBA-based examples in this book You will find a few touches of MDX where handling Analysis Services data is concerned and some VB.Net where SSIS script tasks are used I have chosen to use VB.Net in nearly all the SSIS script tasks described in this book as it is, in my experience, the Net language that many T-SQL programmers are most familiar with Nonetheless I have added one or two snippets of C# (particularly where CLR assemblies are used) to avoid accusations of neglecting this particular language Data integration is a vast subject Consequently, in an attempt to apply a little structure to a potentially enormous and disparate domain, this book is divided into two main parts The first part—Chapters through 7—deals with the mechanics of getting data into and out of SQL Server Here you will find the essential details of how to connect to various data sources, and then ingurgitate the data As many potential pitfalls and traps as possible are brought to your attention for each data source The second part—Chapters through 15—deal with the wider ETL environment Here we progress from the nuts and bolts to the coordinated whole of extracting, transforming, and (efficiently) loading data These chapters take the reader on a trip through the process of metadata analysis, data transformation, profiling source data, logging data processes, and some of the ways of optimizing data loads For this book I decided to avoid the ubiquitous AdventureWorks, and use my own sample database There are a few reasons for this Firstly, I thought that AdventureWorks was so large and complex that it could divert attention from some of the techniques which I wanted to explain I prefer to use an extremely simplistic data li www.it-ebooks.info ■ Introduction structure so that the reader is free to focus on the essence of what is being explained, and not the data itself Secondly I wished to avoid the added complexity of the multiple interrelated tables and foreign keys present in AdventureWorks Finally I did not want to be using data which took time to load This way, once again, you can concentrate on process and principle, and not develop “ETL-stare” while you watch a clock ticking as thousands of records churn into a table, accompanied by whirling on-screen images or the blinking of a bleary-eyed hard disk indicator Consequently I have preferred to use an extremely uncluttered set of source data A full description of the source database(s) is given in Appendix B Please also note that this book is not destined to be a progressive self-tuition manual You are strongly advised to drop and recreate the sample databases between recipes to ensure a clean environment to test the examples that are given Indeed the whole philosophy of the recipe-based approach is that you can dip in anywhere to find help, except in the rare cases where there are specific indications that a recipe requires prior reading or builds on a previous explanation The recipes in this book cover a wide variety of needs, from the extremely simple to the relatively complex This is in an attempt to cover as wide a range of subjects as possible The consequence is that some recipes may seem far too simplistic for certain readers, while others may wonder if the more advanced solutions are relevant to their work I can only hope that SQL Server beginners will find easy answers and that advanced users will nonetheless find tweaks and suggestions which add to their knowledge In all cases I sincerely hope that you will find this book useful Inevitably, not every question can be answered and not every issue resolved in one book I truly hope that I have covered many of the essential ETL tasks that you will face, and have provided ways of solving a reasonable number of the problems that you may encounter My apologies, then, to any reader who does not find the answer to their specific issue, but writing an encyclopaedia was not an option In any case, I can only encourage you to read recipes other than those that cover the precise subject that interests you, as you may find potential solutions elsewhere in this book I wish you good luck in using SQL Server to extract, transform, and load data And I sincerely hope that you have as much fun with it as I had writing this book —Adam Aspin lii www.it-ebooks.info Chapter Sourcing Data from MS Office Applications I suspect that many industrial-strength SQL Server applications have begun life as a much smaller MS Officebased idea, which has then grown and been extended until it has finished as a robust SQL Server application In any case, two Microsoft Office programs—Excel and Access—are among the most frequently used sources of data for eventual loading into SQL Server There are many reasons for this, from their sheer ubiquity to the ease with which users can enter data into Access databases and Excel spreadsheets So it is no wonder that we developers and DBAs spend so much of our time loading data from these sources into SQL Server There are a number of ways in which data can be pushed or pulled from MS Office sources into SQL Server These include: • Using T-SQL (OPENDATASOURCE and OPENROWSET) • Linked Servers (yes, an Access database or even an Excel spreadsheet can be a linked server) • SSIS • The SQL Server Import Wizard • The SQL Server Migration Assistant for Access This chapter examines all these techniques and tries to give you some guidelines on their optimal uses (and inevitable limitations) Any sample files used in this chapter are found in the C:\SQL2012DIRecipes\CH01 directory—assuming that you have downloaded the samples from the book’s companion web site and installed them as described in Appendix B 1-1 Ensuring Connectivity to Access and Excel Problem You want to be able to import data from all versions of Excel and Access (including the latest file formats) in both 32-bit and 64-bit environments Solution You need to install the Microsoft Access Connectivity Engine (ACE) driver Here are the steps to follow: Click Download on the requisite web page This will download the executable file to your selected directory www.it-ebooks.info CHAPTER ■ Sourcing Data from MS Office Applications ■■Note  The ACE driver can be found at www.microsoft.com/en-us/download/details.aspx?id=13255 This location could change over time—but a quick Internet search should point you to the current source fast enough Double-click the AccessDatabaseEngine.exe file that you have downloaded This will be AccessDatabaseEngine_x64.exe for the 64-bit version Follow the instructions In SSMS, expand Server Objects ➤ Linked Servers ➤ Providers Assuming that the driver installation was successful, you should see the Microsoft.ACE.OLEDB.12.0 provider Double-click the provider and check Allow InProcess and Dynamic Parameter As an alternative to steps 4-6, if you prefer a command-line approach, run the following T-SQL snippet (C:\SQL2012DIRecipes\CH01\SetACEProperties.Sql in the samples for this book): EXECUTE master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0' , N'AllowInProcess' , 1; GO EXECUTE master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0' , N'DynamicParameters' , 1; GO You now have the driver installed and ready to use How It Works Before attempting to read data from Excel or Access, it is vital to ensure that the drivers that allow the files to be read are installed on your server Only the “old” 32-bit Jet driver is currently installed with an SQL Server installation, and that driver has severe limitations These are principally that it cannot read the latest versions of Access and Excel, and that it will not function in a 64-bit environment Using the latest ACE driver generally makes your life much easier, as the newest versions have all the capabilities of the older versions as well as adding extra functionality Despite being called the “AccessDatabaseEngine,” this driver also reads and writes data to Excel files, as well as to text files Confusingly, the 2007 Office System Driver and the Microsoft Access Engine 2010 redistributable are both found as “Microsoft.ACE.OLEDB.12.0” in the list of linked server providers in SSMS The 64-bit SQL Server applications can access to 32-bit Jet and 2007 Office System files by using 32-bit SQL Server Integration Services (SSIS) on 64-bit Windows The versions of the Office drivers currently available are listed in Table 1-1 www.it-ebooks.info CHAPTER ■ Sourcing Data from MS Office Applications Table 1-1.  MS Office Drivers Driver Title Driver Name Source Comments OLEDB Provider for Microsoft Jet Microsoft.Jet OLEDB.4.0 SQL Server Installation (installed with the client tools) 32-bit only Reads and writes Excel & Access 97-2003 Accepts xls and mdb formats 2007 Office System Driver Microsoft ACE OLEDB.12.0 www.microsoft.com/downloads/ thankyou.aspx?familyId= 7554f536-8c28-4598-9b72ef94e038c891&displayLang=en 32-bit only Reads and writes Excel & Access 97-2007 Accepts xls/.xlsx/.xslm/.xlsx/ xlsb and mdb/.accdb formats Microsoft Access Engine 2010 redistributable Microsoft ACE OLEDB.12.0 www.microsoft.com/downloads/en/ details.aspx?familyid=C06B8369-60DD4B64-A44B-84B371EDE16D&displaylang =en#Instructions 32-bit or 64-bit versions available Reads and writes Excel & Access 97-2010 Accepts xls/.xlsx/.xslm/.xlsx/ xlsb and mdb/.accdb formats Hints, Tips, and Traps • If you still want to use the old 32-bit Jet driver, then you can so provided that you save the Excel source in Excel 97–2003 format and are working in a 32-bit environment • The ACE drivers are supported by Windows 7; Windows Server 2003 R2, 32-bit x86; Windows Server 2003 R2, x64 editions; Windows Server 2008 R2; Windows Server 2008 with Service Pack 2; Windows Vista with Service Pack 1; and Windows XP with Service Pack • You can only install either the 64-bit version of the ACE driver or the 32-bit version on the same server This means that you cannot develop in Business Intelligence development Studio (BIDS) or SQL Server Development Tools (SSDT) with the 64-bit ACE driver installed—as BIDS/SSDT is a 32-bit environment However, if you install the 32-bit ACE driver instead, then you cannot run a 64-bit package, and have to use one of the 32-bit workarounds Ideally, you should develop in a 32-bit environment with the 32-bit ACE driver installed (or on a 64-bit machine, but not expect to run the package normally), and deploy to a 64-bit environment where the 64-bit driver is ready and waiting 1-2 Importing Data from Excel Problem You want to import data from an Excel spreadsheet as fast and as simply as possible www.it-ebooks.info CHAPTER ■ Sourcing Data from MS Office Applications Solution Run the SQL Server Import and Export Wizard and use it to guide you through the import process Here is the process to follow: In SQL Server Management Studio, right-click a database (preferably the one into which you want the data imported), click Tasks ➤ Import Data (see Figure 1-1) Figure 1-1.  Launching the Import/Export Wizard from SSMS Skip the splash screen The Choose a Data Source screen appears Select Microsoft Excel as the data Source, and enter or browse for the file to import Be sure to select the Excel version that corresponds to the type of source file from the pop-up list, and specify if your data includes headers (see Figure 1-2) www.it-ebooks.info ■ Contents 10-9 Reading Profile Data 578 Solution 578 How It Works 579 Hints, Tips, and Traps 579 10.10 Storing SSIS Profile Data in a Database 580 Solution 580 How It Works 583 Hints, Tips, and Traps 584 10-11 Tailoring Specific Source Data Profiles in SSIS 585 Solution 585 How It Works 592 Hints, Tips, and Traps 593 10-12 Domain Analysis in SSIS 594 Solution 594 Hints, Tips, and Traps 598 10-13 Performing Multiple Domain Analyses 598 Solution 598 How It Works 600 Hints, Tips, and Traps 600 10-14 Pattern Profiling in a Data Flow 601 Solution 601 How It Works 603 Hints, Tips, and Traps 603 10-15 Pattern Profiling Using T-SQL 603 Solution 603 How It Works 605 10-16 Profiling Data Types 606 Solution 606 How It Works 613 Hints, Tips, and Traps 614 xxxii www.it-ebooks.info ■ Contents 10.17 Controlling Data Flow via Profile Metadata 614 Solution 614 How It Works 617 Hints, Tips, and Traps 618 Summary 618 ■■Chapter 11: Delta Data Management 619 Preamble: Why Bother with Delta Data? 619 Delta Data Approaches 620 Detecting Changes at Source 620 Detecting Changes During the ETL Load 620 11-1 Loading Delta Data as Part of a Structured ETL Process 622 Solution 622 How It Works 630 Hints, Tips, and Traps 632 11-2 Loading Data Changes Using a Linked Server 633 Solution 633 How It Works 634 Hints, Tips, and Traps 635 11-3 L oading Data Changes From a Small Source Table as Part of a Structured ETL Process�����������������������������������������������������������������635 Solution 635 How It Works 639 Hints, Tips, and Traps 639 11-4 Detecting and Loading Delta Data Only 639 Solution 639 How It Works 648 Hints, Tips, and Traps 649 11-5 Performing Delta Data Upserts with Other SQL Databases 650 Solution 650 How It Works 653 Hints, Tips, and Traps 653 xxxiii www.it-ebooks.info ■ Contents 11-6 Handling Data Changes Without Writing to the Source Server 653 Solution 653 How It Works 658 Hints, Tips, and Traps 658 11-7 Detecting Data Changes with Limited Source Database Access 658 Solution 658 How It Works 665 Hints, Tips, and Traps 666 11-8 D  etecting and Loading Delta Data Using T-SQL and a Linked Server When MERGE Is Not Practical������������������������������������������������������������������������������������666 Solution 667 How It Works 668 11-9 Detecting, Logging, and Loading Delta Data 669 Solution 669 How It Works 675 Hints, Tips, and Traps 677 11-10 Detecting Differences in Rowcounts, Metadata, and Column Data 677 Solution 677 How It Works 679 Hints, Tips, and Traps 679 Summary 679 ■■Chapter 12: Change Tracking and Change Data Capture 681 12-1 D  etecting Source Table Changes with Little Overhead and No Custom Framework���������������������������������������������������������������������������������������������������682 Solution 682 How It Works 685 Hints, Tips, and Traps 687 12-2 Pulling Changes into a Destination Table with Change Tracking 688 Solution 688 How It Works 691 Hints, Tips, and Traps 692 xxxiv www.it-ebooks.info ■ Contents 12-3 Using Change Tracking as Part of a Structured ETL Process 692 Solution 692 How It Works 699 12-4 Detecting Changes to Source Data Using the SQL Server Transaction Log 700 Solution 700 How It Works 702 Hints, Tips, and Traps 703 12-5 Applying Change Data Capture with SSIS 704 Solution 705 How It Works 716 Hints, Tips, and Traps 716 12-6 Using Change Data Capture with Oracle Source Data 717 Solution 717 How It Works 728 Hints, Tips, and Traps 729 Summary 729 ■Chapter 13: Organising And Optimizing Data Loads� ���731 13-1 Loading Multiple Files 732 Solution 732 How It Works 737 Hints, Tips, and Traps 737 13-2 Selecting Multiple Text Files to Import 737 Solution 737 How It Works 738 Hints, Tips and Traps 738 13-3 Loading Multiple Files Using Complex Selection Criteria 739 Solution 739 How It Works 745 Hints, Tips, and Traps 746 xxxv www.it-ebooks.info ■ Contents 13-4 Ordering and Filtering File Loads 746 Solution 747 How It Works 751 Hints, Tips, and Traps 752 13-5 Loading Multiple Flat Files in Parallel 753 Solution 753 How It Works 759 Hints, Tips, and Traps 760 13-6 Loading Source Files with Load Balancing 761 Solution 761 How It Works 766 Hints, Tips, and Traps 767 13-7 Loading Data to Parallel Destinations 767 Solution 767 How It Works 771 Hints, Tips, and Traps 771 13-8 Using a Single Data File As a Multiple Data Source for Parallel Destination Loads 771 Solution 772 How It Works 775 Hints, Tips, and Traps 775 13-9 Reading and Writing Data from a Database Source in Parallel 775 Solution 776 How It Works 777 Hints, Tips, and Traps 777 13-10 Inserting Records in Parallel and in Bulk 777 Solution 777 How It Works 780 Hints, Tips, and Traps 781 13-11 Creating Self-Optimizing Parallel Bulk Inserts 781 Solution 781 How It Works 784 xxxvi www.it-ebooks.info ■ Contents 13-12 Loading Files in Controlled Batches 784 Solution 784 How It Works 793 Hints, Tips, and Traps 794 13-13 Executing SQL Statements and Procedures in Parallel Using SSIS 795 Solution 795 How It Works 796 13-14 Executing SQL Statements and Procedures in Parallel Without SSIS 796 Solution 796 How It Works 797 Hints, Tips, and Traps 797 13-15 Executing SQL Statements and Procedures in Parallel Using SQL Server Agent 798 Solution 798 Hints, Tips, and Traps 798 Summary 798 ■■Chapter 14: ETL Process Acceleration 801 14-1 Accelerating SSIS Lookups 802 Solution 802 How It Works 806 Hints, Tips, and Traps 807 14-2 Disabling and Rebuilding Nonclustered Indexes in a Destination Table 807 Solution 807 How It Works 811 Hints, Tips, and Traps 812 14-3 Persisting Destination Database Index Metadata 812 Solution 812 How It Works 816 Hints, Tips, and Traps 817 xxxvii www.it-ebooks.info ■ Contents 14-4 S  cripting and Executing DROP Statements for Destination Database Indexes����������818 Solution 818 How It Works 820 Hints, Tips, and Traps 820 14-5 S  cripting and Executing CREATE Statements for Destination Database Indexes������������������������������������������������������������������������������820 Solution 820 How It Works 827 Hints, Tips, and Traps 827 14-6 S  toring Metadata, and Then Scripting and Executing DROP and CREATE Statements for Destination Database XML Indexes���������������������������������������������������828 Solution 828 How It Works 834 Hints, Tips, and Traps 835 14-7 Finding Missing Indexes 835 Solution 836 How It Works 836 Hints, Tips, and Traps 837 14-8 Managing Check Constraints 837 Solution 837 How It Works 840 14-9 Managing Foreign Key Constraints 841 Solution 841 How It Works 845 Hints, Tips, and Traps 846 14-10 Optimizing Bulk Loads 846 Solution 846 How It Works 849 Hints, Tips, and Traps 850 Summary 851 xxxviii www.it-ebooks.info ■ Contents ■■Chapter 15: Logging and Auditing 853 15-1 Logging Events from T-SQL 856 Solution 856 How It Works 858 Hints, Tips, and Traps 859 15-2 Logging Data from SSIS 859 Solution 859 How It Works 860 Hints, Tips, and Traps 862 15-3 Customizing SSIS Logging 863 Solution 863 How It Works 864 15-4 Saving and Applying Complex SSIS Logging Details 864 Solution 864 How It Works 865 Hints, Tips, and Traps 865 15-5 Extending SSIS Logging to an SQL Server Destination 866 Solution 866 How It Works 867 Hints, Tips, and Traps 869 15-6 Logging Information from an SSIS Script Task 869 Solution 869 How It Works 870 15-7 Logging from T-SQL to the SSIS Log Table 870 Solution 870 How It Works 872 Hints, Tips, and Traps 872 15-8 Handling Errors in T-SQL 872 Solution 873 How It Works 873 xxxix www.it-ebooks.info ■ Contents 15-9 Handling Errors in SSIS 874 Solution 874 How It Works 876 Hints, Tips, and Traps 877 15-10 Creating a Centralized Logging Framework 877 Solution 877 How It Works 881 Hints, Tips, and Traps 883 15-11 Logging to a Centralized Framework When Using SSIS Containers 883 Solution 884 How It Works 885 15-12 L ogging to a Centralized Framework When Using SSIS Script Tasks and Components��������������������������������������������������������������������������������������������885 Solution 886 How It Works 886 15-13 Logging to a Text or XML File from T-SQL 889 Solution 889 How It Works 890 Hints, Tips, and Traps 891 15-14 Logging Counters in T-SQL 891 Solution 891 How It Works 893 Hints, Tips, and Traps 893 15-15 Logging Counters from SSIS 893 Solution 893 How It Works 895 Hints, Tips, and Traps 895 15-16 Creating an SSIS Catalog 896 Solution 896 How It Works 901 Hints, Tips, and Traps 902 xl www.it-ebooks.info ■ Contents 15-17 Reading Logged Events and Counters from the SSIS Catalog 903 Solution 903 How It Works 904 Hints, Tips, and Traps 905 15-18 Analyzing Events and Counters In-Depth via the SSIS Catalog 905 Solution 906 How It Works 910 Hints, Tips, and Traps 913 15-19 Creating a Process Control Framework 913 Solution 913 How It Works 918 15-20 Linking the SSIS Catalog to T-SQL Logging 920 Solution 920 How It Works 921 15-21 Baselining ETL Processes 921 Solution 921 How It Works 924 15-22 Auditing an ETL Process 924 Solution 925 How It Works 926 Hints, Tips, and Traps 927 15-23 Logging Audit Data 927 Solution 927 How It Works 928 Hints, Tips, and Traps 929 Summary 929 ■■Appendix A: Data Types 931 SQL Server Data Types 931 SSIS Data Types 933 Default Data Mapping in the Import/Export Wizard 934 xli www.it-ebooks.info ■ Contents MSSQL9 to MSSQL8 935 MSSQL to DB2 936 MSSQL to IBMDB2 937 MSSQL to Jet4 938 MSSQL to SSIS11 939 OracleClient to MSSQL 941 OracleClient to MSSQL11 942 OracleClient to SSIS11 943 Oracle to MSSQL 944 Oracle to MSSQL11 945 Oracle to SSIS11 946 SQLClient9 to MSSQL8 947 SQLClient to DB2 949 SQLClient to IBMDB2 950 SQLClient to MSSQL11 951 SQLClient to Oracle 952 SQLClient to SSIS 953 SSIS11 to DB2 955 SSIS11 to IBMDB2 956 SSIS11 to MSSQL 957 DB2 to MSSQL 958 SSIS to Jet 959 DB2 to MSSQL11 960 SSIS to Oracle 961 DB2 to SSIS11 962 IBMDB2 to MSSQL 963 IBMDB2 to MSSQL11 964 IBMDB2 to SSIS11 965 xlii www.it-ebooks.info ■ Contents Jet to MSSQL8 966 Jet to SSIS 967 ACE to SSIS 968 Excel to SQL Server and SSIS Data Mapping 968 Access to SQL Server and SSIS Data Mapping 969 Oracle to SQL Server and SSIS Data Mapping 969 Oracle to SQL Server Replication Data Type Mapping 970 MySQL Data Types 972 Sybase to SQL Server Data Type Conversion 972 ■■Appendix B: Sample Databases and Scripts 973 Sample Databases and Files 973 CarSales 973 Creating the CarSales Database 974 CarSales_Staging 980 CarSales_DW 981 The CarSales SSAS Cube 986 Restoring the CarSales_Cube SSAS Database 987 CarSales_Logging 987 Directory Structure for the Sample Files 987 Index 989 xliii www.it-ebooks.info About the Author Adam Aspin is an independent Business Intelligence consultant based in the United Kingdom He has worked with SQL Server for seventeen years During this time, he has developed several dozen reporting and analytical systems based on the Microsoft BI product suite A graduate of Oxford University, Adam began his career in publishing before moving into IT Databases soon became a passion, and his experience in this arena ranges from dBase to Oracle, and Access to MySQL, with occasional sorties into the world of DB2 He is, however, most at home in the Microsoft universe when using SQL Server Analysis Services, SQL Server Reporting Services, and above all, SQL Server Integration Services Business Intelligence has been his principal focus for the last ten years He has applied his skills for a range of clients, including J.P Morgan, The Organisation for Economic Co-operation and Development (OECD), Tesco, Centrica, Harrods, Vodafone, Crédit Agricole, Cartier, and EMC Conchango Adam has been a frequent contributor to SQLServerCentral.com for several years He has written numerous articles for various French IT publications A fluent French speaker, Adam has worked in France and Switzerland for many years Contact him at adam@calidra.co.uk xlv www.it-ebooks.info About the Technical Reviewers Ben Eaton is an independent consultant based in the Midland counties of England, specialising in business intelligence, software architecture, and application development with the Microsoft stack Ben began professional development with Microsoft Access in the late 1990s, until he discovered that providing the same end-to-end data management features on an enterprise level required a whole new toolset Access databases with SQL Server back ends were soon followed by NET SOA applications and early adoption of Reporting Services Apart from the odd dabble in SharePoint, he now works with the SQL Server stack (SSIS, SSAS, SSRS) and most of the NET framework (WCF, WPF, ASP.NET) for a broad range of private and public sector clients Jason Brimhall is first and foremost a family man He has 15+ yrs experience in IT and has worked with SQL Server starting with SQL Server 6.5 He has worked for both large and small companies in varied industries He has experience in performance tuning, high transaction environments, large environments, and VLDBs He is currently a DB Architect and an MCITP for SQL 2008 Jason regularly volunteers for PASS and is the VP of the Las Vegas User Group (SSSOLV) You can read more more from Jason on his blog at: http://jasonbrimhall.info xlvii www.it-ebooks.info Acknowledgments Writing a technical book can be a lonely occupation So I am all the more grateful for the help and encouragement that I have received from so many fabulous friends and colleagues Firstly, my considerable thanks go to Jonathan Gennick, the commissioning editor of this book From my initial contact with Apress through to final publication, Jonathan has been both a tower of strength and an exemplary mentor He shared his vast experience selflessly and courteously It is thanks to him that this book has seen the light of day Heartfelt thanks goes to Brigid Duffy, the Apress coordinating editor, for managing this book through the publication process She succeeded in the near-impossible task of making a potentially stress-filled agenda into a journey filled with light and humor Her team also deserve much praise for their calm under pressure So thanks to Kimberly Burton for her tireless and subtle work editing and polishing the prose, and also to Dhaneesh Kumar for the hours spent formatting—and reformatting—the text When lost in the depths of technical questions, it is easy to lose sight of what should be one’s main objectives Fortunately, the team of technical reviewers—Robin Dewson, Ben Eaton, and Jason Brimhall—have all worked unstintingly to remind me of where the focus should be All three have placed their considerable experience at my disposal and have enriched the subject matter enormously with their suggestions and comments Always patient and endlessly helpful, I owe them a deep debt of gratitude Thank guys! A penultimate thanks goes to my old friend and colleague Steven Wilbur for his helpful comments on the initial manuscript and his encouragement to persevere in the path to publication when this book was in its initial phase of gestation Thanks also to Steve Jones at SQLServerCentral.com for encouraging me to write over the years—and for publishing my articles However, my deepest gratitude must be reserved for the two people who have given the most to this book They are my wife and son Timothy has put up with a mentally absent father for months, while providing continual encouragement to persevere Karine has given me not only the support and encouragement to continue, but also the love without which nothing would be worth it I am a very lucky man to have both of them xlix www.it-ebooks.info ... loading data from these sources into SQL Server There are a number of ways in which data can be pushed or pulled from MS Office sources into SQL Server These include: • Using T -SQL (OPENDATASOURCE... (C: SQL2 012DIRecipesCH01AddExcelLinkedServer .Sql) : EXECUTEmaster.dbo.sp_addlinkedserver @SERVER = 'Excel' ,@SRVPRODUCT = 'ACE 12.0' ,@PROVIDER = 'Microsoft.ACE.OLEDB.12.0' ,@DATASRC = 'C: SQL2 012DIRecipesCH01CarSales.xlsx'... the source data, only using the linked server name and worksheet (or range) name in four-part notation using a T -SQL snippet like (C: SQL2 012DIRecipesCH01SelectEXcelLinkedServer .Sql) : SELECT
- Xem thêm -

Xem thêm: 1309 SQL server 2012 data integration recipes , 1309 SQL server 2012 data integration recipes , 1-2. Importing Data from Excel, 1-3. Modifying Excel Data During a Load, 1-4. Specifying the Excel Data to Load During an Ad-Hoc Import, 1-7. Loading Excel Data as Part of a Structured ETL Process, 1-8. Importing Excel 2007/2010 Data Using SSIS 2005, 1-9. Handling Source Data Issues When Importing Excel Worksheets Using SSIS, 1-12. Ad Hoc Imports of Access Data, 1-13. Obtaining Access Data Without Regular Imports, 1-14. Importing Access Data as Part of a Regular ETL Process, 1-15. Convert a Complex Access Database to SQL Server, 1-16. Resolving Complex Data Migration Problems During an Access to SQL Server Upgrade, 2-1. Importing Data From a Text File, 2-2. Importing a Delimited Text File, 2-3. Automatically Determining Data Types, 2-4. Importing Fixed-Width Text Files, 2-6. Mapping a Source File, 2-9. Importing Flat Files from the Command Line, 2-10. Importing Large Text Files Using T-SQL and Putting the Emphasis on Speed, 2-12. Performing a BULK INSERT with a Format File, 2-14. Executing BULK INSERT from SSIS, 2-15. Handling Complex Flat File Formats with a Row Prefix in SSIS, 2-16. Pre-Parsing and Staging File Subsets in SSIS, 2-19. Handling Irregular Numbers of Columns in the Source File in SQL Server 2005 and 2008, 2-21. Preparing CSV Files for Import, 3-1. Loading XML Files for Storage in SQL Server, 3-2. Loading XML Data into Rows and Columns, 3-3. Shredding an XML File into an SQL Server Table, 3-4. Importing XML Data as Part of an ETL Process, 3-7. Flattening an XML File to Prepare It for Import, 3-8. Importing XML Data from Very Large Files, Putting a Priority on Speed, 3-9. Loading Multiple Tables at Once from a Single XML Source File, 3-10. Loading and Shredding Relational Tables from an XML Source File, 3-11. Overcoming Bulk Loading Challenges for XML Files, 3-14. Validating an XML Document Against a Schema File in SSIS, 4-2. Importing Data from Oracle As a Regular Process, 4-3. Accelerating Oracle Data Import, 4-4. Importing Oracle Data on an “Ad Hoc” Basis, 4-5. Migrating Multiple Oracle Tables and Views, 4-6. Loading DB2 Data on a Regular Basis, 4-8. Sourcing Data from MySQL on a Regular Basis, 4-9. Importing MySQL Data on an “Ad Hoc” Basis, 4-12. Loading Data from Sybase Adaptive Server Enterprise (ASE), 4-14. Importing Sybase ASE Data on an “Ad Hoc” Basis, 4-15. Importing Sybase ASE Data on a Regular Basis, 4-17. Sourcing Data from PostgreSQL, 5-2. Reading Data from Another SQL Server Instance over a Permanent Connection, 5-3. Loading Large Data Sets Using T-SQL, 5-4. Load Data Exported from SQL Server from the Command Line, 5-6. Transferring Data Between SQL Server Databases Regularly, 5-7. Porting a Tiny Amount of Data Between SQL Server Databases, 5-13. Transferring a Complex Subset of Data Between Databases, 5-15. Loading Data into SQL Server Azure as Part of a Regular ETL Process, 5-18. Transferring Tables Between Databases, 6-1. Importing Data from SQL Server Analysis Services, 6-6. Importing Files into SQL Server on a Regular Basis, 6-7. Importing Files with Their Attributes into SQL Server, 6-10. Loading Data from Web Services, 6-11. Importing Windows Management Instrumentation Data, 6-12. Importing Data over ODBC, 6-13. Linking to 32-bit data sources from a 64-bit SQL Server, 7-1. Exporting Data Occasionally in Various Formats, 7-2. Exporting Data As a Delimited Text File, 7-4. Exporting Text Files from the Command Line, 7-9. Exporting and Compressing Multiple Tables, 7-14. Shaping XML Export Data, 7-15. Exporting XML Data on a Regular Basis, 7-16. Routinely Exporting Small XML Datasets, 7-21. Exporting Data to Excel Using SSIS, 7-23. Pulling Data from Office Applications, 7-28. Exporting an SSAS Dimension Using SSIS, 7-29. Exporting the Result of an MDX Query in SSIS, 7-32. Exporting Data from SQL Server Azure, 8-6. Analyzing SQL Server Table Metadata, 8-7. Analyzing SQL Server Column Metadata, 8-8. Displaying Microsoft Access Metadata, 8-12. Understanding the Oracle Data Dictionary, 8-14. Obtaining SQL Server Metadata Using .NET, 9-2. Removing Duplicates from Data, 9-5. Subsetting Column Data Using T-SQL, 9-9. Concatenating Data Using SSIS, 9-13. PIVOTing Data in T-SQL, 9-14. Pivoting Data in SSIS with SQL Server 2012, 9-21. Normalizing Data into Multiple Relational Tables Using SSIS, 9-23. Denormalizing Data by Referencing Lookup Tables in SSIS, 9-25. Handling Type 2 Slowly Changing Dimensions in T-SQL, 9-26. Handling Type 2 Slowly Changing Dimensions with SSIS, 9-28. Handling Type 4 Slowly Changing Dimensions Using T-SQL, 9-29. Handling Type 4 Slowly Changing Dimensions with SSIS, 9-30. Cleansing Data As Part of an ETL Process, 10-2. Profiling Domain and Value Distribution, 10-6. Profiling Data Quickly in SSIS, 10-8. Using the SSIS Data Profiling Task on non-SQL Server Data Sources, 10-10. Storing SSIS Profile Data in a Database, 10-11. Tailoring Specific Source Data Profiles in SSIS, 10-12. Domain Analysis in SSIS, 10-13. Performing Multiple Domain Analyses, 10-15. Pattern Profiling Using T-SQL, 10-17. Controlling Data Flow via Profile Metadata, 11-1. Loading Delta Data as Part of a Structured ETL Process, 11-3. Loading Data Changes From a Small Source Table as Part of a Structured ETL Process, 11-4. Detecting and Loading Delta Data Only, 11-5. Performing Delta Data Upserts with Other SQL Databases, 11-6. Handling Data Changes Without Writing to the Source Server, 11-7. Detecting Data Changes with Limited Source Database Access, 11-8. Detecting and Loading Delta Data Using T-SQL and a Linked Server When MERGE Is Not Practical, 11-9. Detecting, Logging, and Loading Delta Data, 11-10. Detecting Differences in Rowcounts, Metadata, and Column Data, 12-1. Detecting Source Table Changes with Little Overhead and No Custom Framework, 12-2. Pulling Changes into a Destination Table with Change Tracking, 12-4. Detecting Changes to Source Data Using the SQL Server Transaction Log, 12-5. Applying Change Data Capture with SSIS, 12-6. Using Change Data Capture with Oracle Source Data, 13-3. Loading Multiple Files Using Complex Selection Criteria, 13-4. Ordering and Filtering File Loads, 13-5. Loading Multiple Flat Files in Parallel, 13-6. Loading Source Files with Load Balancing, 13-7. Loading Data to Parallel Destinations, 13-8. Using a Single Data File As a Multiple Data Source for Parallel Destination Loads, 13-10. Inserting Records in Parallel and in Bulk, 13-11. Creating Self-Optimizing Parallel Bulk Inserts, 13-12. Loading Files in Controlled Batches, 13-15. Executing SQL Statements and Procedures in Parallel Using SQL Server Agent, 14-2. Disabling and Rebuilding Nonclustered Indexes in a Destination Table, 14-3. Persisting Destination Database Index Metadata, 14-5. Scripting and Executing CREATE Statements for Destination Database Indexes, 14-6. Storing Metadata, and Then Scripting and Executing DROP and CREATE Statements for Destination Database XML Indexes, 14-9. Managing Foreign Key Constraints, 15-1. Logging Events from T-SQL, 15-2. Logging Data from SSIS, 15-5. Extending SSIS Logging to an SQL Server Destination, 15-9. Handling Errors in SSIS, 15-10. Creating a Centralized Logging Framework, 15-12. Logging to a Centralized Framework When Using SSIS Script Tasks and Components, 15-15. Logging Counters from SSIS, 15-16. Creating an SSIS Catalog, 15-18. Analyzing Events and Counters In-Depth via the SSIS Catalog, 15-19. Creating a Process Control Framework, 15-20. Linking the SSIS Catalog to T-SQL Logging, 15-22. Auditing an ETL Process

Mục lục

Xem thêm

Gợi ý tài liệu liên quan cho bạn

Nhận lời giải ngay chưa đến 10 phút Đăng bài tập ngay