How I mitigated the impossible SELECT INTO on Windows Azure

I'm the kind of developer that likes things under control by writing some stuff myself and keeping it as optimized as possible. This is especially true for database scripts. Sure we have these ORM tools these days (Entity Framework, NHibernate and such) but the calls that end up on the database side are many times unoptimized. Why? They have to work for all cases so sacrifices are made and performance usually suffers. That's why I like micro ORMs that make it super simple to translate database data to code (via object mapers) but still let you write optimized queries suited completely to your database.

The way that I upgrade my database requires me to backup existing data, drop the whole model, recreate it and restore data back to appropriate tables. Especially the backup strategy is problematic as it uses select into statements. Windows Azure requires all tables to have clustered indices prior to inserting data which makes it impossible to use select into statement as it has no control over clustered indices. How can we then automate backup scripts without creating those backup tables manually? With a bit of work, everything can be done.

In case you've already read this blog post and just want the code, you can easily head over to GitHub Gists and download all scripts below + two batch files used to recreate or upgrade database.

Managing database model and functionality

On the projects where I'm the only developer I normally don't write database version migration scripts but rather keep my scripts in specific files that represent each step of (re)creating or upgrading the database. And I version these files on CVS. The major requirement is they have to be idempotent. These are the self-descriptive names of these scripts:

00 Backup Data.sql - saves data of all existing tables into backup tables
01 Drop Model.sql
02 Create Model.sql - creates all tables, their keys, relations and indices in appropriate order
03 Functions.sql
04 Procedures.sql
05 Static Data.sql - any vital data that needs to be predefined i.e. permission levels, other type lookup tables data, predefined users (superuser) etc.; also deployable to production
06 Test Data.sql - any data that's used strictly for development purposes and must not be deployed to production
07 Restore Data.sql

The purpose of filenames having numbered prefix is purely for file ordering in file explorers (and Solution Explorer in Visual Studio as well). They're always in alphabetic order (or should I say ASCIIbetic order). Some of these files are regularly edited when I change or add my database model or functionality. I usually edit 02, 03 and 04. Sometimes 05 and 06. But if all goes according to plan I never touch 00, 01 and 07. These should adopt to my ever evolving database model and functionality.

I then have three different batch files that deploy a subset of these files to my target database:

Recreate Development.bat - uses files 01, 02, 03, 04, 05 and 06
Upgrade Development.bat - uses files 00, 01, 02, 03, 04 and 07
Upgrade Production.bat - same as development upgrade but connecting to a different database

The usual backup strategy

Normally I was just using sys.objects to get my table names and produce a script that first drops existing backup tables (when present from previous script run) and create new ones. All my backup tables use the same name as original tables except for the added prefix Backup_. This is the code I'm normally using on normal SQL Server.

   1:  -- Drop any existing Backup tables

2:

   3:  set nocount on;

4:

   5:  declare @sql varchar(max) = '';

6:

   7:  select @sql = @sql + 'print ''Dropping backup table dbo.[' + o.name + ']'';drop table dbo.[' + o.name + '];'

   8:  from sys.objects o

   9:  where o.type = 'U' and left(o.name, 7) = 'Backup_'

  10:  order by o.create_date asc;

11:

  12:  exec(@sql);

13:

  14:  -- Backup all existing data into Backup tables

15:

  16:  set @sql = '';

17:

  18:  select @sql = @sql + 'print ''Backing up table dbo.[' + o.name + ']'';select * into dbo.[Backup_' + o.name + '] from dbo.[' + o.name + '];'

  19:  from sys.objects o

  20:  where o.type = 'U' and left(o.name, 7) != 'Backup_'

  21:  order by o.create_date asc;

22:

  23:  exec(@sql);

24:

  25:  set nocount off;

So if you use everyday SQL Server you can easily put this code in the 00 Backup Data.sql script and it will work as expected.

How about Windows Azure backup script

As you can see in the second part of the script, when I'm dynamically creating a script to backup all my tables' data I'm using select into. As this doesn't work on Azure we have to take a different and much more complicated path to the same goal. We actually have to create all tables, add clustered indices to them and then replace select into statements with insert select statements. As you will see this makes backup script much more complex than the original one, but it works in the same way. We just have to put extra effort to add appropriate clustered indices on backup tables and transpose/serialize/concatenate column names.

So here's the code. It still has two main parts: dropping existing backup tables and creating new ones. But the second part of creating new ones is split into 3 subparts:

creating backup tables
creating clustered indices
backing up data

Especially the first couple are more complex using two CTEs and similar stuff. Although they may seem a bit overwhelming, they do their job just fine. Run individual parts yourself, manually, and observe what they do in case you have hard time understanding the code. The following is the 00 Backup Data.sql script file:

   1:  -- Drop any existing Backup tables

2:

   3:  set nocount on;

4:

   5:  declare @sql varchar(max) = '';

6:

   7:  select @sql = @sql + 'print ''Dropping backup table dbo.[' + o.name + ']'';drop table dbo.[' + o.name + '];'

   8:  from sys.objects o

   9:  where o.type = 'U' and left(o.name, 7) = 'Backup_'

  10:  order by o.create_date asc;

11:

  12:  exec(@sql);

13:

14:

  15:  -- BACKUP SCRIPT ADJUSTED FOR AZURE

16:

17:

  18:      -- Create backup tables

19:

  20:      print '';

  21:      set @sql = '';

22:

  23:      with

  24:      TableColumns (TableName, ColumnName, ColumnType, Nullable, order1, order2)

  25:      as (

  26:          select

  27:              o.name,

  28:              c.name,

  29:              case when c.max_length = t.max_length then t.name else t.name + '(' + iif(c.max_length = -1,'max',cast(c.max_length as varchar)) + ')' end as Type,

  30:              c.is_nullable,

  31:              o.create_date,

  32:              c.column_id

  33:          from sys.objects o

  34:              join sys.columns c

  35:                  join sys.types t

  36:                  on (t.user_type_id = c.user_type_id)

  37:              on (c.object_id = o.object_id)

  38:          where

  39:              o.type = 'U' and

  40:              left(o.name, 7) != 'Backup_'

  41:      ),

  42:      Concatenated (TableName, Columns, order1)

  43:      as (

  44:          select

  45:              tcO.TableName,

  46:              stuff((

  47:                  select ',' + ColumnName + ' ' + ColumnType + (case Nullable when 0 then ' not ' else ' ' end) + 'null'

  48:                  from TableColumns tcI

  49:                  where tcI.TableName = tcO.TableName

  50:                  order by order1, order2

  51:                  for xml path('')

  52:              ), 1, 1, ''),

  53:              tcO.order1

  54:          from TableColumns tcO

  55:          group by tcO.TableName, tcO.order1

  56:      )

  57:      select @sql = @sql + 'print ''Creating backup table dbo.[Backup_' + TableName + ']'';create table dbo.[Backup_' + TableName +'] (' + Columns + ');'

  58:      from Concatenated

  59:      order by order1;

60:

  61:      exec(@sql);

62:

  63:      -- Create clustered indices

64:

  65:      print '';

  66:      set @sql = '';

67:

  68:      with

  69:      IndexColumns (TableName, ColumnName, order1, order2)

  70:      as (

  71:          select o.name, c.name, o.create_date, ic.index_column_id

  72:          from sys.indexes i

  73:              join sys.objects o

  74:              on (o.object_id = i.object_id)

  75:              join sys.columns c

  76:              on (c.object_id = o.object_id)

  77:              join sys.index_columns ic

  78:              on ((ic.column_id = c.column_id) and (ic.object_id = o.object_id) and (ic.index_id = i.index_id))

  79:              join sys.types t

  80:              on (t.user_type_id = c.user_type_id)

  81:          where

  82:              o.type = 'U' and

  83:              left(o.name, 7) != 'Backup_' and

  84:              i.type = 1 -- CLUSTERED

  85:      ),

  86:      Concatenated (TableName, Columns, order1)

  87:      as (

  88:          select

  89:              icO.TableName,

  90:              stuff((

  91:                  select ',' + icI.ColumnName

  92:                  from IndexColumns icI

  93:                  where icI.TableName = icO.TableName

  94:                  order by order1, order2

  95:                  for xml path('')

  96:              ), 1, 1, ''),

  97:              icO.order1

  98:          from IndexColumns icO

  99:          group by icO.TableName, icO.order1

 100:      )

 101:      select @sql = @sql + 'print ''Creating clustered index on dbo.[Backup_' + TableName + ']'';create clustered index IDX_Backup_' + TableName + '_Clustered on dbo.[Backup_' + TableName + '] (' + Columns + ');'

 102:      from Concatenated

 103:      order by order1;

 104:

 105:      exec(@sql);

 106:

 107:      -- Backup data

 108:

 109:      print '';

 110:      set @sql = '';

 111:

 112:      select @sql = @sql + 'print ''Backing up data for dbo.[' + o.name + ']'';insert dbo.[Backup_' + o.name + '] select * from dbo.[' + o.name + '];'

 113:      from sys.objects o

 114:      where

 115:          o.type = 'U' and

 116:          left(o.name, 7) != 'Backup_'

 117:      order by o.create_date;

 118:

 119:      exec(@sql);

 120:

 121:  set nocount off;

Other automated script files

In order to make this post even more useful to you I have to provide other automated script files as well. As mentioned initially three of the database script files are automated and are usually not edited at all (except for when they don't work as happened with Windows Azure). So here are the other two script files that are fully automated.

01 Drop Model.sql

   1:  -- Drop all user defined objects in reverse order they were created

2:

   3:  print '';

   4:  declare @sql varchar(max);

5:

   6:  with types (type, name) as

   7:  (

   8:      select 'FN', 'function' union all    -- scalar function

   9:      select 'IF', 'function' union all    -- inline table function

  10:      select 'TF', 'function' union all    -- table function

  11:      select 'P', 'procedure' union all    -- stored procedure

  12:      select 'TR', 'trigger' union all     -- SQL DML trigger

  13:      select 'U', 'table' union all        -- user table

  14:      select 'V', 'view'                   -- view

  15:  )

  16:  select @sql = isnull(@sql, '') + 'print ''Dropping ' + t.name + ' dbo.[' + o.name + ']'';drop ' + t.name + ' dbo.[' + o.name + '];'

  17:  from sys.objects o

  18:      join types t

  19:      on (t.type = o.type)

  20:  where left(o.name, 7) != 'Backup_'

  21:  order by o.create_date desc

22:

  23:  -- drop all

  24:  exec (@sql);

07 Restore Data.sql

   1:  -- Restore backup data

2:

   3:  set nocount on;

4:

   5:  print '';

   6:  declare @sql varchar(max) = '';

   7:  declare @error bit = 0;

8:

   9:  -- check that no columns have been removed from existing tables

  10:  with

  11:  OldColumns (B, TableName, ColumnName)

  12:  as (

  13:      select 1, substring(o.name, 8, 100), c.name

  14:      from sys.objects o

  15:          join sys.columns c

  16:          on c.object_id = o.object_id

  17:      where o.type = 'U' and left(o.name, 7) = 'Backup_'

  18:  ),

  19:  NewColumns (TableName, ColumnName)

  20:  as (

  21:      select o.name, c.name

  22:      from sys.objects o

  23:          join sys.columns c

  24:          on c.object_id = o.object_id

  25:      where o.type = 'U' and left(o.name, 7) != 'Backup_'

  26:  )

  27:  select @sql = @sql + 'print ''Columns removed in table dbo.[' + o.TableName + ']'';'

  28:  from OldColumns o

  29:      left join NewColumns n

  30:      on ((n.TableName = o.TableName) and (n.ColumnName = o.ColumnName))

  31:  where n.ColumnName is null

32:

  33:  select @error = iif(@@rowcount > 0, 1, 0);

34:

  35:  select @sql = @sql + iif(@error = 1, 'throw 50000, ''Data not restored due to table differences.'', 0;', 'print ''Starting data restore...'';');

  36:  exec(@sql);

37:

  38:  print '';

  39:  select @sql = iif(@error = 1, null, '');

40:

  41:  -- restore data

  42:  with

  43:  TableColumns (TableName, ColumnName, TableOrder, ColumnOrder)

  44:  as (

  45:      select substring(o.name, 8, 100), c.name, o.create_date, c.column_id

  46:      from sys.objects o

  47:          join sys.columns c

  48:          on (c.object_id = o.object_id)

  49:      where o.type = 'U' and left(o.name, 7) = 'Backup_'

  50:  ),

  51:  ConcatenatedColumns (TableName, AllColumns)

  52:  as (

  53:      select

  54:          tc.TableName,

  55:          stuff((

  56:              select ',' + tc1.ColumnName

  57:              from TableColumns tc1

  58:              where tc1.TableName = tc.TableName

  59:              order by tc1.TableOrder asc, tc1.ColumnOrder asc

  60:              for xml path('')

  61:          ), 1, 1, '')

  62:      from TableColumns tc

  63:      group by tc.TableName

  64:  )

  65:  select @sql = @sql + 'print ''Restoring data for table dbo.[' + o.name + ']'';begin try set identity_insert dbo.[' + o.name + '] on end try begin catch end catch;insert dbo.[' + o.name + '] (' + cc.AllColumns + ') select * from dbo.[Backup_' + o.name + '];begin try set identity_insert dbo.[' + o.name + '] off end try begin catch end catch;'

  66:  from sys.objects o

  67:      join ConcatenatedColumns cc

  68:      on cc.TableName = o.name

  69:  where o.type = 'U'

  70:  order by o.create_date asc;

71:

  72:  set @sql = isnull(@sql, '');

  73:  exec(@sql);

74:

  75:  set nocount off;

Wrapup

These are the automated database script files I'm using to manage my database during development. You can find (and donwload them) also as a Gist on GitHub. If you have any suggestions or revisions, let me know.

Erratic software development

Google analytics script

Latest jQuery CDN with code tiggling.

Thursday, 3 April 2014