Tuesday, January 17, 2012

Loading Big Data into Federated SQL Azure Tables with the SQL Azure Federation Data Migration Wizard v1.2

Updated 1/17/2012 with change from v1.0 to v1.2 with the following release notes:

Release Notes
v1.2

  1. Added FederationInfo.txt to output directory containing Federation member range, member id, and physical database name.
  2. Added FederationMember into to BCP upload status so that you know which federation member data is being uploaded to.
v1.1
  1. SQLAzureMWUtils library 3.8.1
  2. Added error check for 08S01 in app.config file (for BCP retry).

See end of post for new v1.2 features. Although earlier screen captures show v0.3.0, this post was written with v1.0.


Most demonstrations, workshops and hands-on labs for SQL Azure Federations use simple T-SQL INSERT … VALUES() statements executed by copying and pasting into query editing windows of the SQL Azure Management Portal or SQL Server Management Studio 2008 R2 SP1. Obviously that approach won’t work for production applications.

imageGeorge Huey (@gihuey), a Microsoft Data Architect, is the author of the SQL Azure Migration Wizard (SQLAzureMW), which migrates entire SQL Server databases to SQL Azure. I described Using the SQL Azure Migration Wizard v3.3.3 with the AdventureWorksLT2008R2 Sample Database in a detailed 7/18/2010 post. SQLAzureMW was at v3.8 when I wrote this post.

You can use SQLAzureMW to upload data to existing federated SQL Azure tables, but George’s new (as of 12/12/2011) SQL Azure Federation Data Migration Wizard (SQLAzureFedMW) v1.2 is simpler and more straightforward for uploading data, especially Big Data.

For more information about SQL Azure Federations, see MSDN’s Federations in SQL Azure (SQL Azure Database) topic and its subtopics.

This tutorial explains how to use SQLAzureFedMW to load data from a local SQL Server 2008 R2 SP1 WADPerfCounters table into a federated WADPerfCounters table in the AzureDiagnostics1 root member of the WADFederation. Here’s the SQL Azure Portal’s query editing page displaying the first 5 columns of 10 rows from 398,000 rows uploaded in an initial test:

imageClick image for full-size (1024x768) screen captures.

Following is SQLAzureFedMW’s Target Server Response window (edited) displaying the results from the initial upload test:

image 

The 398,000 rows uploaded in 1,956 seconds (203.5 rows/sec) over a 384-kbps (upload) AT&T commercial DSL connection to Microsoft’s South Central US (San Antonio) data center.

About the SQL Azure Federation Data Migration Wizard

SQLAzureFedMW’s CodePlex project page begins as follows:

ProjectDescription
SQL Azure Federation Data Migration Wizard simplifies the process of migrating data from a single database to multiple federation members in SQL Azure Federation.

SQL Azure Federation Data Migration Wizard (SQLAzureFedMW) is an open source application that will help you move your data from a SQL database to (1 to many) federation members in SQL Azure Federation. SQLAzureFedMW is a user interactive wizard that walks a person through the data migration process.

The SQLAzureFedMW tool greatly simplifies the data migration process. …

SQLAzureFedMW Project Details
The SQL Azure Federation Data Migration Wizard (SQLAzureFedMW ) allows you to select a SQL Server database and specify which tables (Data Only) to migrate. The data will be extracted (via BCP) and then uploaded to SQL Azure Federation. The BCP data upload process can be done in a sequential process or parallel process (where you specify the number of parallel threads). See Documentation for more detail.

Note
SQLAzureFedMW expects that your database schema has already been migrated to SQL Azure Federation and that they tables on the source database match the tables in the federated member databases.

SQLAzureFedMW is [from] the SQLAzureMWUtils library found in the codeplex project SQL Azure Migration Wizard

Prerequisites

A Windows Azure account offering at least one 1-GB SQL Azure Web edition database, such as the new 3 Month Free Trial Account with a US$0 Spending Limit. Splitting the federation into six databases, which is optional, requires five more 1-GB databases costing about US$0.33/day each.

  • SQL Server 2008 R2 SP1 Express edition or higher
  • SQL Server Management Studio (SSMS) 2008 R2 SP1 Express edition or higher
  • SQLAzureFedMW v1.0
  • Windows 7 or Windows Server 2008 R2. George says it “should work fine on XP or Vista.”
  • .NET Framework 3.5 SP1 or later
  • The WADPerformanceCountersTable-1000rows.txt source data file from SkyDrive*

SSMS 2008 R2 SP1 is included with SQLEXPRADV_x64_ENU.exe, SQLEXPRADV_x86ENU.exe, SQLEXPRWT_x64_ENU.exe and SQLEXPRWT_x86_ENU.exe. SSMS is required to create the local data file that you upload to SQL Azure.

*This file is 258 KB in size and contains 1,000 rows of data with primary key values greater than those of the other WADPerformanceCountersTable*.txt files, which are 102+ MB and contain 398,000 rows. 1,000 rows reduces upload time to a few seconds with a reasonably fast DSL connection and minimizes time to split the federated tables by CounterId. You don’t need to delete the existing rows to prevent primary key violations when uploading other WADPerformanceCountersTable*.txt files in sequence, if you choose to do so.

Background

imageMy (@rogerjenn) Generating Big Data for Use with SQL Azure Federations and Apache Hadoop on Windows Azure Clusters post of 1/6/2012 describes how to create large data files containing historical Windows Azure Diagnostics (WAD) performance counter data. You can download from my SkyDrive account the text files of WAD data generated by Cerebrata Software’s Cloud Storage Studio and scripts to create the required WADPerfCounts table in a local SQL Server 2008 R2 SP1 [Express] instance, import the data into the table, and add/populate the CounterId federation key column.

Warning: Don’t use WADPerformanceCountersTable-Page-3.txt from WADPerformanceCountersTable-Page-3.zip; it throws data truncation errors, as described in the “Attempting to Deal with CounterValue Truncation Problems” section near the end of the above post.

My Creating a SQL Azure Federation in the Windows Azure Platform Portal post of 1/10/2012 shows you how create a new federation (WADFederation) and the required federation root database (AzureDiagnostics1) into which you import the data.

Installing SQLAzureFedMW

Download SQLAzureFedMW v1.0 Release Binary, open SQLAzureFedMW v1.0 Release Binary.zip, and extract its files to a \Program Files (x86)\SQLAzureFedMW folder or a folder of your choice.

Editing SQLAzureFedMW.exe Arguments in SQLAzureFedMW.exe.config to Prevent String Truncation Errors

Installing SQLAzureFedMW adds a SQLAzureFedMW.exe.config file in the directory containing SQLAzureFedMW.exe, the program’s executable. You must open SQLAzureFedMW.exe.config in Visual Studio (by default) or a text editor, such as Notepad, and change two Bulk Copy Protocol (BCP) -n arguments to -c as shown in the two red circles near the end of the <configuration> section:

image

Note: If you don’t change the -n arguments, you incur multiple string truncation errors when you attempt to upload data to the Federation and no data migrates. See MSDN’s bcp Utility topic for more information on bcp and its arguments, as well as Using Native Format to Import or Export Data (-n) and Using Character Format to Import or Export Data (-c).

Optionally, add your server name (after you create it) to the TargetServerName  value, administrative logon ID to the TargetUserName element, administrative password to the TargetPassword  element and AzureDiagostics1 to the TargetDatabase element after you create it to replace the default values in the Connect to Server dialogs.

Create a Local AzureDiagnostics Database with a WADPerfCounters Table

Refer to the “Creating the WADPerfCounters Table and Indexes” and “Testing the T-SQL BULK INSERT Command with a 1,000-row Sample File” sections of Generating Big Data for Use with SQL Azure Federations and Apache Hadoop on Windows Azure Clusters for instructions.

Change the C:\Users\Administrator\My Documents\Cerebrata\WADPerformanceCountersTable-Page-1.txt folder file path to where you stored the downloaded file and specify

SET DATEFORMAT ymd
BULK INSERT dbo.WADPerfCounters
FROM “C:\YourFilePath\WADPerformanceCountersTable-1000rows.txt”
WITH
(
FIRSTROW = 2,
LASTROW = 1001,
BATCHSIZE = 1000,
FIELDTERMINATOR = '\t'
)

as the command to load the data.

In the “Dealing with Non-Numeric Columns as Federation Keys” section, add the CounterId column with the int data type after the CounterValue column and change Network Adapter _2 to Network Adapter _3 in the two places emphasized below:

UPDATE dbo.WADPerfCounters SET CounterId =
    (CASE
        WHEN CounterName = '\Network Interface(Microsoft Virtual Machine Bus Network Adapter _3)\Bytes Sent/sec' THEN 1
        WHEN CounterName = '\Network Interface(Microsoft Virtual Machine Bus Network Adapter _3)\Bytes Received/sec' THEN 2
        WHEN CounterName = '\ASP.NET Applications(__Total__)\Requests/Sec' THEN 3
        WHEN CounterName = '\TCPv4\Connections Established' THEN 4
        WHEN CounterName = '\Memory\Available Mbytes' THEN 5
        WHEN CounterName = '\Processor(_Total)\% Processor Time' THEN 6
        ELSE 0
    END)
FROM dbo.WADPerfCounters

The number of virtualized Network Adapters changed while the diagnostics were collected.

Right-click the table and choose Select Top 1000 rows. Verify that the CounterId column values range from 1 to 6:

image

Create the AzureDiagnostics1 Database and the WADFederation

Follow all 19 steps in the Creating a SQL Azure Federation in the Windows Azure Platform Portal post to create the WADFederation and its AzureDiagnostics1 root member.

Connect to the Local Database and Create the BCP Data (*.dat) File

1. Launch SQLAzureFedMW.exe and, with the default Data Source tab displayed, click the Connect to Server button to open the eponymous dialog:

image

2. Accept the defaults and click Connect to open the Select Servers page. Select [AzureDiagnostics] and [dbo].[WADPerfCounters]:

image

3. Click the SQL Azure Federation Target tab and the Connect to Server dialog. Type your server name, administrative user name, @ and the server name, your password, and AzureDiagnostics1 in the Connect to Server dialog’s text boxes:

image

4. Click Connect and accept the default WADFederation and Id (Min to Max) settings:

image

5. Click Next and click Yes when asked “Are you ready to download data?” (from the local database to a BCP *.dat file) to display BCP Output Results:

image

6. Click Next and click Yes when asked “Are you ready to upload data?” (from the BCP *.dat file to the federated WADPerfCounters table in the AzureDiagnostics1 root database) to display Target Server Response Results:

image

7. Click Exit to close SQLAzureFedMW.

8. Return to the Windows Azure Management Portal and, if necessary, select the WADFederation1 item, and the right-arrow key to open the 1 Federation Member page, click the root federation box to open the WADFederation(LOW..HIGH) context menu, and select Query:

image 

9. Click the New Query link to open the query editing page in the context of the federation root member.

10. Type SELECT TOP(10) * FROM WADPerfCounters in the editing pane, click the Run button, and scroll to display the CounterId column:

image

11. Verify that CounterId values range between 1 and 6 by typing SELECT MIN(CounterId) FROM WADPerfCounters, clicking Run and SELECT MAX(CounterId)FROM WADPerfCounters, and clicking Run.

Split CounterId Values >  1 to Five More Member Databases

You must split the the WADFederation into a total of six members to gain the maximum benefit of horizontal partitioning afforded by Windows Azure Federations for the design imposed when you selected CounterID as the federation key early in the Generating Big Data for Use with SQL Azure Federations and Apache Hadoop on Windows Azure Clusters process.

Warning: If you have a 90 Day Trial Subscription, this will cost you ~US$1.65 per day until you delete the added member databases. If your subscription derives from a Visual Studio Ultimate MSDN subscription, you have five free 1-GB Web databases (if you haven’t used them), so the cost will be only ~US$0.33 per day.

1. Click the X and OK buttons to close the page without saving changes, open the context menu, select Split, and type 2 as the value on which to split:

image

2. Click the Split button and wait for completion. If you receive an error message regarding inability to connect to context, return to the main SQL Azure Portal page, click Manage and log on again:

image

Notice that the WADFederation count has increased to 2 from 1.

3. Click the right-arrow button to display the 2 Federations page which has gained a new box:

image

4. Click the LOW box, and repeat the preceding section’s step 10 to verify that only rows with a CounterId value of 1 appear:

image

5. Click the X and OK buttons to close the query editing window.

6. Repeat steps 1 through 4 four more times, starting with the newly added federation members box, incrementing the Split at value in step 2, and incrementing the verification value in step 4. Click the Refresh button to display the progress of the split operation.

image

Notice that the [Id] = # expression’s value increases by 1 for each operation when you run the SELECT TOP(10) * FROM WADPerfCounters query.

7. Your 6 Federation Members page appears as follows when you complete the operation:

image

Displaying Federation Metadata

MSDN’s Managing Database Federations (SQL Azure Database) topic offers links to five tables that provide federation metadata:

Federation Metadata Table Description
sys.federations (SQL Azure Database) Returns the federations within a database.
sys.federation_members (SQL Azure Database) Returns the federation members within a federation.
sys.federation_distributions (SQL Azure Database) Returns the distribution type and data types used by a federation.
sys.federation_member_distributions (SQL Azure Database) Returns the distribution name and range covered by a federation member.
sys.federated_table_columns (SQL Azure Database) Returns federation specific information on federated tables.

The most interesting of these tables is sys.federation_member_distributions. Querying federation metadata tables requires executing

USE FEDERATION ROOT WITH RESET
GO

before the SELECT statement:

image

You can specify a single range value explicitly by updating the -2147483648 range_low value for the first member with 1, and updating all numeric range_high values with null.

Auto-sharding Larger Data Batches

After you partition your federation, uploading additional rows with all six CounterId values automatically distributes the rows to one of the six federation members. The SkyDrive folder includes a WADPerformanceCountersTable-Page-79.zip file that contains a *.txt file with the 398,000 rows preceding the 1,000 you uploaded earlier. You can use this field to test autosharding without encountering primary key violations.

1. Download and extract WADPerformanceCountersTable-Page-79.txt from WADPerformanceCountersTable-Page-79.zip to the same folder you created earlier in this tutorial.

2. Launch SSMS, connect to your server, delete all rows from the WADPerfCounters table, and delete the CounterId column.

3. Repeat the process of the “Create a Local AzureDiagnostics Database with a WADPerfCounters Table” but replace 1000rows.txt with Page-79.txt.

4. Repeat steps 1 through 7 of the earlier “Connect to the Local Database and Create the BCP Data (*.dat) File” section. In step 4, select all six federation members:

image


5. When you click Yes to upload data in step 6, the Target Server Response window generates a tab for each of the six shards and uploads them simultaneously on six different threads:

image

Note: The number of threads supported is set by the NumberOfBCPThreads key of the SQLAzureFedMW.exe.config file.

6. During upload, the following error messages appeared in the rightmost page:

image

Note: Error code 08S01 indicates a network failure. Several early SQL Azure adopters have reported similar communication link failures with conventional (not federated) tables. See the comments to Wayne Walter Berry’s BCP and SQL Azure post of 5/21/2011 to the SQL Azure blog.

7. Clicking the Results tab displayed the following page:

image

66,333 * 6 = 397,998, which is within 2 of the total row count of 398,000.

36.47 rows/sec * 6 = 218.82 rows/sec which is close to that of the first example at the beginning of this post, 203.5 rows/sec.

8. Clicking Retry on the page with errors resumed uploading rows, but the upload failed with a primary key constraint violation because some :

image

9. Clicking Skip stopped the attempted upload and reported Done:

image 

10. According to George, you can trap and retry on the 08S01 error by adding it to the SQLAzureFedMW.exe.config file’s BCPSQLAzureErrorCodesRetry element, as shown here:

image

Update 1/17/2012: As of v1.1, retry for error 08S01 is included in the BCPSQLAzaureErrorCodesRetry list.

A new post about recovering from errors such as this has been added as Adding Missing Rows to a SQL Azure Federation with the SQL Azure Federation Data Migration Wizard v1 updated 1/15/2012. In most cases, you should add missing rows before you attempt to add more rows to federation members.

Update 1/17/2012: Following is the range Id value added to the Target Server Response delay:

image

 

0 comments: