So imagine (if you can!) my excitement last September with the announcement that it had became possible to add Azure SQL DB as a linked server. As well as allowing queries which joined tables across multiple servers, I also had high hopes that it would make migration of data from one to another much easier. Between two linked SQL Servers, I've often used the nice and simple SELECT INTO command to copy both the data and schema of a table in one easy step.
On my veritable roller coaster of emotions that day, imagine now the devastation caused by the following response from Azure:
Msg 40510, Level 16, State 1, Line 1
Statement 'SELECT INTO' is not supported in this version of SQL Server.
...due, it seems, to something about tables without clustered indices not being supported :-(
After my initial disappointment, I figured that I could still move data from one place to another using INSERT INTO ... SELECT, providing the schema was already in place on the destination. After lots of playing, I finally came up with the procedure defined below for copying an entire database from a classic SQL Server into a linked Azure database - whilst also maintaining referential integrity. Various other restrictions encountered along the way means that it gets a bit ugly in places (especially due to all the dynamic SQL), but it seems to work pretty well and I've found it especially useful in scenarios where I've needed to re-run the data push several times over during development.
1. Build the schema on Azure
The apps I'm writing at the moment are based on Entity Framework (code-first), so I must admit that building the actual schema of databases is something I get for free these days. If you're in a similar situation to me, just plug into your Web.config the connection string which the Azure portal provides for your server, and fire up your app to have Entity Framework build the empty database.
Otherwise, you could use Management Studio's "Script Database as" menu item to give you the SQL to create the schema.
2. Define the linked server
EXEC sp_addlinkedserver 'azure', N'SQL Server', 'serverref.database.windows.net' GO EXEC sp_addlinkedsrvlogin @rmtsrvname = 'azure', @useself = 'FALSE', @rmtuser = 'username@serverref', @rmtpassword = 'password' GO
3. Ready the Azure schema for migration
You'll need two new columns on each table to facilitate the data migration: DataImport_OriginalId (to contain the original primary key value of the data that will be transferred - needed to keep foreign keys valid across tables), and DataImport_IsClone for some of the jiggery-pokery that takes place in (5). I'm assuming that each table has one PK which is an identity and an int.
You'll also need to disable FK constraint checking in readiness for the data to be bulk loaded (because it's quite likely that referenced rows won't yet exist at the time of loading).
This generates some SQL to add the columns to each table, and disable the constraints. If you're brave, replace the PRINT commands with EXEC, but I prefer to view the SQL it generates and then copy and paste to run it manually.
DECLARE @tableName nvarchar(max) DECLARE @sql nvarchar(max) DECLARE tableCursor CURSOR FOR SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES OPEN tableCursor FETCH NEXT FROM tableCursor INTO @tableName WHILE @@FETCH_STATUS = 0 BEGIN SELECT @sql = 'ALTER TABLE ' + @tablename + ' ADD DataImport_OriginalId int, DataImport_IsClone bit' PRINT @sql SET @sql = 'ALTER TABLE ' + @tableName + ' NOCHECK CONSTRAINT ALL' PRINT @sql FETCH NEXT FROM tableCursor INTO @tableName END CLOSE tableCursor DEALLOCATE tableCursor
4. Push the data
This will generate some SQL to copy over the data from your originating server to Azure, also populating the new columns we added in (3) as appropriate.
DECLARE @tableName nvarchar(max) DECLARE @identityColumn nvarchar(max) DECLARE @otherColumns nvarchar(max) DECLARE @sql nvarchar(max) DECLARE tableCursor CURSOR FOR SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME NOT IN ('EdmMetadata') OPEN tableCursor FETCH NEXT FROM tableCursor INTO @tableName WHILE @@FETCH_STATUS = 0 BEGIN SELECT @identityColumn = COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @tableName AND COLUMNPROPERTY(object_id(TABLE_NAME), COLUMN_NAME, 'IsIdentity') = 1 SET @otherColumns = null SELECT @otherColumns = COALESCE(@othercolumns + ',','') + ('[' + COLUMN_NAME + ']') FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @tableName AND COLUMN_NAME <> @identityColumn ORDER BY ORDINAL_POSITION SET @Sql = 'INSERT INTO azure.remotedatabase.dbo.' + @tableName + ' ([DataImport_OriginalId],[DataImport_IsClone],' + @otherColumns + ') SELECT [' + @identityColumn + '],0,' + @otherColumns + ' FROM ' + @tableName PRINT @sql FETCH NEXT FROM tableCursor INTO @tableName END CLOSE tableCursor DEALLOCATE tableCursor
5. Tidy up!
We now have all of the data sitting in our Azure database, but the primary keys will almost certainly be different from their originals, hence any foreign key values will reference non-existent or incorrect rows. That's because we can't SET IDENTITY_INSERT ON across successive connections, and with a linked server you have no control over which connection each SQL statement runs on. Instead, we have the original IDs in that new DataImport_OriginalId column we defined earlier.
Because we can neither modify the newly assigned PKs, nor even reseed identity columns (DBCC isn't supported in Azure SQL DB), the only way I've found to get the rows back with their original PK values is as follows:
- Switch on IDENTITY_INSERT
- Find a suitably distant PK value from anything that currently exists or will need to exist... lazily, I've used MAX(PK) + MAX(DataImport_OriginalId)
- Insert (and then delete) one row to set the identity seed to this value
- Switch off IDENTITY_INSERT
- Clone all rows, flagging them as cloned using the DataImport_IsClone column
- Delete all original rows (where DataImport_IsClone = 0)
- Switch on IDENTITY_INSERT
- Clone all cloned rows, with PK now explicitly set to DataImport_OriginalId
- Switch off IDENTITY_INSERT
- Delete all rows from the first clone!
- Repeat for all tables; then (and only then) re-enable FK constraint checking.
DECLARE @tableName nvarchar(max) DECLARE @identityColumn nvarchar(max) DECLARE @otherColumns nvarchar(max) DECLARE @maxIdentity int DECLARE @sql nvarchar(max) DECLARE tableCursor CURSOR FOR SELECT TABLE_NAME FROM Information_Schema.Tables WHERE TABLE_NAME NOT IN ('EdmMetadata') OPEN tableCursor FETCH NEXT FROM tableCursor INTO @tableName WHILE @@FETCH_STATUS = 0 BEGIN SELECT @identityColumn = COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @tableName AND COLUMNPROPERTY(object_id(TABLE_NAME), COLUMN_NAME, 'IsIdentity') = 1 SET @otherColumns = null SELECT @otherColumns = COALESCE(@othercolumns + ',','') + ('[' + COLUMN_NAME + ']') FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @tableName AND COLUMN_NAME <> @identityColumn AND COLUMN_NAME NOT LIKE 'DataImport_%' ORDER BY ORDINAL_POSITION SET @sql = 'SET IDENTITY_INSERT ' + @tableName + ' ON' PRINT @sql SET @sql = 'INSERT INTO ' + @tableName + ' ([' + @identityColumn + '],[DataImport_IsClone],' + @otherColumns + ') SELECT TOP 1 (SELECT MAX(DataImport_OriginalId) + MAX(' + @identityColumn + ') FROM ' + @tableName + ')+1,1,' + @otherColumns + ' FROM ' + @tableName PRINT @sql SET @sql = 'SET IDENTITY_INSERT ' + @tableName + ' OFF' PRINT @sql SET @sql = 'DELETE FROM ' + @tablename + ' WHERE DataImport_IsClone = 1' PRINT @sql SET @sql = 'INSERT INTO ' + @tableName + ' ([DataImport_OriginalId],[DataImport_IsClone],' + @otherColumns + ') SELECT [DataImport_OriginalId],1,' + @otherColumns + ' FROM ' + @tableName + ' WHERE DataImport_OriginalId IS NOT NULL' PRINT @sql SET @sql = 'DELETE FROM ' + @tablename + ' WHERE DataImport_IsClone = 0 OR DataImport_IsClone IS NULL' PRINT @sql SET @sql = 'SET IDENTITY_INSERT ' + @tableName + ' ON' PRINT @sql SET @sql = 'INSERT INTO ' + @tableName + ' ([' + @identityColumn + '],[DataImport_IsClone],' + @otherColumns + ') SELECT DataImport_OriginalId,0,' + @otherColumns + ' FROM ' + @tableName PRINT @sql SET @sql = 'SET IDENTITY_INSERT ' + @tableName + ' OFF' PRINT @sql SET @sql = 'DELETE FROM ' + @tablename + ' WHERE DataImport_IsClone = 1' PRINT @sql FETCH NEXT FROM tableCursor INTO @tableName END CLOSE tableCursor OPEN tableCursor FETCH NEXT FROM tableCursor INTO @tableName WHILE @@FETCH_STATUS = 0 BEGIN SET @sql = 'ALTER TABLE ' + @tableName + ' CHECK CONSTRAINT ALL' PRINT @sql FETCH NEXT FROM tableCursor INTO @tableName END CLOSE tableCursor DEALLOCATE tableCursor
Note that I've deliberately left behind the two columns I created in (3), because they have no negative impact on my application, and they allow me to repeat this process as many times as I need by emptying the database, and re-running (4) and (5).
Improvements?
I find this technique for migrating data into the cloud useful, but it would be great to find a way to remove the need for all the data shuffling that goes on in (5) just to restore the original identities. Something like the ability to switch on IDENTITY_INSERT for all connections would make things a whole lot nicer.
If you have any thoughts or suggestions to share for making this cleaner, I'd love to hear from you!
Improvements?
I find this technique for migrating data into the cloud useful, but it would be great to find a way to remove the need for all the data shuffling that goes on in (5) just to restore the original identities. Something like the ability to switch on IDENTITY_INSERT for all connections would make things a whole lot nicer.
If you have any thoughts or suggestions to share for making this cleaner, I'd love to hear from you!
Excelent post, thanks!
ReplyDelete