Google analytics script

Latest jQuery CDN with code tiggling.

Thursday, 3 April 2014

How I mitigated the impossible SELECT INTO on Windows Azure

I'm the kind of developer that likes things under control by writing some stuff myself and keeping it as optimized as possible. This is especially true for database scripts. Sure we have these ORM tools these days (Entity Framework, NHibernate and such) but the calls that end up on the database side are many times unoptimized. Why? They have to work for all cases so sacrifices are made and performance usually suffers. That's why I like micro ORMs that make it super simple to translate database data to code (via object mapers) but still let you write optimized queries suited completely to your database.

The way that I upgrade my database requires me to backup existing data, drop the whole model, recreate it and restore data back to appropriate tables. Especially the backup strategy is problematic as it uses select into statements. Windows Azure requires all tables to have clustered indices prior to inserting data which makes it impossible to use select into statement as it has no control over clustered indices. How can we then automate backup scripts without creating those backup tables manually? With a bit of work, everything can be done.

In case you've already read this blog post and just want the code, you can easily head over to GitHub Gists and download all scripts below + two batch files used to recreate or upgrade database.

Managing database model and functionality

On the projects where I'm the only developer I normally don't write database version migration scripts but rather keep my scripts in specific files that represent each step of (re)creating or upgrading the database. And I version these files on CVS. The major requirement is they have to be idempotent. These are the self-descriptive names of these scripts:

  1. 00 Backup Data.sql - saves data of all existing tables into backup tables
  2. 01 Drop Model.sql
  3. 02 Create Model.sql - creates all tables, their keys, relations and indices in appropriate order
  4. 03 Functions.sql
  5. 04 Procedures.sql
  6. 05 Static Data.sql - any vital data that needs to be predefined i.e. permission levels, other type lookup tables data, predefined users (superuser) etc.; also deployable to production
  7. 06 Test Data.sql - any data that's used strictly for development purposes and must not be deployed to production
  8. 07 Restore Data.sql

The purpose of filenames having numbered prefix is purely for file ordering in file explorers (and Solution Explorer in Visual Studio as well). They're always in alphabetic order (or should I say ASCIIbetic order). Some of these files are regularly edited when I change or add my database model or functionality. I usually edit 02, 03 and 04. Sometimes 05 and 06. But if all goes according to plan I never touch 00, 01 and 07. These should adopt to my ever evolving database model and functionality.

I then have three different batch files that deploy a subset of these files to my target database:

  • Recreate Development.bat - uses files 01, 02, 03, 04, 05 and 06
  • Upgrade Development.bat - uses files 00, 01, 02, 03, 04 and 07
  • Upgrade Production.bat - same as development upgrade but connecting to a different database

The usual backup strategy

Normally I was just using sys.objects to get my table names and produce a script that first drops existing backup tables (when present from previous script run) and create new ones. All my backup tables use the same name as original tables except for the added prefix Backup_. This is the code I'm normally using on normal SQL Server.

   1:  -- Drop any existing Backup tables
   2:   
   3:  set nocount on;
   4:   
   5:  declare @sql varchar(max) = '';
   6:   
   7:  select @sql = @sql + 'print ''Dropping backup table dbo.[' + o.name + ']'';drop table dbo.[' + o.name + '];'
   8:  from sys.objects o
   9:  where o.type = 'U' and left(o.name, 7) = 'Backup_'
  10:  order by o.create_date asc;
  11:   
  12:  exec(@sql);
  13:   
  14:  -- Backup all existing data into Backup tables
  15:   
  16:  set @sql = '';
  17:   
  18:  select @sql = @sql + 'print ''Backing up table dbo.[' + o.name + ']'';select * into dbo.[Backup_' + o.name + '] from dbo.[' + o.name + '];'
  19:  from sys.objects o
  20:  where o.type = 'U' and left(o.name, 7) != 'Backup_'
  21:  order by o.create_date asc;
  22:   
  23:  exec(@sql);
  24:   
  25:  set nocount off;

So if you use everyday SQL Server you can easily put this code in the 00 Backup Data.sql script and it will work as expected.

How about Windows Azure backup script

As you can see in the second part of the script, when I'm dynamically creating a script to backup all my tables' data I'm using select into. As this doesn't work on Azure we have to take a different and much more complicated path to the same goal. We actually have to create all tables, add clustered indices to them and then replace select into statements with insert select statements. As you will see this makes backup script much more complex than the original one, but it works in the same way. We just have to put extra effort to add appropriate clustered indices on backup tables and transpose/serialize/concatenate column names.

So here's the code. It still has two main parts: dropping existing backup tables and creating new ones. But the second part of creating new ones is split into 3 subparts:

  • creating backup tables
  • creating clustered indices
  • backing up data
Especially the first couple are more complex using two CTEs and similar stuff. Although they may seem a bit overwhelming, they do their job just fine. Run individual parts yourself, manually, and observe what they do in case you have hard time understanding the code. The following is the 00 Backup Data.sql script file:
   1:  -- Drop any existing Backup tables
   2:   
   3:  set nocount on;
   4:   
   5:  declare @sql varchar(max) = '';
   6:   
   7:  select @sql = @sql + 'print ''Dropping backup table dbo.[' + o.name + ']'';drop table dbo.[' + o.name + '];'
   8:  from sys.objects o
   9:  where o.type = 'U' and left(o.name, 7) = 'Backup_'
  10:  order by o.create_date asc;
  11:   
  12:  exec(@sql);
  13:   
  14:   
  15:  -- BACKUP SCRIPT ADJUSTED FOR AZURE
  16:   
  17:   
  18:      -- Create backup tables
  19:   
  20:      print '';
  21:      set @sql = '';
  22:   
  23:      with
  24:      TableColumns (TableName, ColumnName, ColumnType, Nullable, order1, order2)
  25:      as (
  26:          select
  27:              o.name,
  28:              c.name,
  29:              case when c.max_length = t.max_length then t.name else t.name + '(' + iif(c.max_length = -1,'max',cast(c.max_length as varchar)) + ')' end as Type,
  30:              c.is_nullable,
  31:              o.create_date,
  32:              c.column_id
  33:          from sys.objects o
  34:              join sys.columns c
  35:                  join sys.types t
  36:                  on (t.user_type_id = c.user_type_id)
  37:              on (c.object_id = o.object_id)
  38:          where
  39:              o.type = 'U' and
  40:              left(o.name, 7) != 'Backup_'
  41:      ),
  42:      Concatenated (TableName, Columns, order1)
  43:      as (
  44:          select
  45:              tcO.TableName,
  46:              stuff((
  47:                  select ',' + ColumnName + ' ' + ColumnType + (case Nullable when 0 then ' not ' else ' ' end) + 'null'
  48:                  from TableColumns tcI
  49:                  where tcI.TableName = tcO.TableName
  50:                  order by order1, order2
  51:                  for xml path('')
  52:              ), 1, 1, ''),
  53:              tcO.order1
  54:          from TableColumns tcO
  55:          group by tcO.TableName, tcO.order1
  56:      )
  57:      select @sql = @sql + 'print ''Creating backup table dbo.[Backup_' + TableName + ']'';create table dbo.[Backup_' + TableName +'] (' + Columns + ');'
  58:      from Concatenated
  59:      order by order1;
  60:   
  61:      exec(@sql);
  62:   
  63:      -- Create clustered indices
  64:   
  65:      print '';
  66:      set @sql = '';
  67:   
  68:      with
  69:      IndexColumns (TableName, ColumnName, order1, order2)
  70:      as (
  71:          select o.name, c.name, o.create_date, ic.index_column_id
  72:          from sys.indexes i
  73:              join sys.objects o
  74:              on (o.object_id = i.object_id)
  75:              join sys.columns c
  76:              on (c.object_id = o.object_id)
  77:              join sys.index_columns ic
  78:              on ((ic.column_id = c.column_id) and (ic.object_id = o.object_id) and (ic.index_id = i.index_id))
  79:              join sys.types t
  80:              on (t.user_type_id = c.user_type_id)
  81:          where
  82:              o.type = 'U' and
  83:              left(o.name, 7) != 'Backup_' and
  84:              i.type = 1 -- CLUSTERED
  85:      ),
  86:      Concatenated (TableName, Columns, order1)
  87:      as (
  88:          select
  89:              icO.TableName,
  90:              stuff((
  91:                  select ',' + icI.ColumnName
  92:                  from IndexColumns icI
  93:                  where icI.TableName = icO.TableName
  94:                  order by order1, order2
  95:                  for xml path('')
  96:              ), 1, 1, ''),
  97:              icO.order1
  98:          from IndexColumns icO
  99:          group by icO.TableName, icO.order1
 100:      )
 101:      select @sql = @sql + 'print ''Creating clustered index on dbo.[Backup_' + TableName + ']'';create clustered index IDX_Backup_' + TableName + '_Clustered on dbo.[Backup_' + TableName + '] (' + Columns + ');'
 102:      from Concatenated
 103:      order by order1;
 104:   
 105:      exec(@sql);
 106:   
 107:      -- Backup data
 108:   
 109:      print '';
 110:      set @sql = '';
 111:   
 112:      select @sql = @sql + 'print ''Backing up data for dbo.[' + o.name + ']'';insert dbo.[Backup_' + o.name + '] select * from dbo.[' + o.name + '];'
 113:      from sys.objects o
 114:      where
 115:          o.type = 'U' and
 116:          left(o.name, 7) != 'Backup_'
 117:      order by o.create_date;
 118:   
 119:      exec(@sql);
 120:   
 121:  set nocount off;

Other automated script files

In order to make this post even more useful to you I have to provide other automated script files as well. As mentioned initially three of the database script files are automated and are usually not edited at all (except for when they don't work as happened with Windows Azure). So here are the other two script files that are fully automated.

01 Drop Model.sql

   1:  -- Drop all user defined objects in reverse order they were created
   2:   
   3:  print '';
   4:  declare @sql varchar(max);
   5:   
   6:  with types (type, name) as
   7:  (
   8:      select 'FN', 'function' union all    -- scalar function
   9:      select 'IF', 'function' union all    -- inline table function
  10:      select 'TF', 'function' union all    -- table function
  11:      select 'P', 'procedure' union all    -- stored procedure
  12:      select 'TR', 'trigger' union all     -- SQL DML trigger
  13:      select 'U', 'table' union all        -- user table
  14:      select 'V', 'view'                   -- view
  15:  )
  16:  select @sql = isnull(@sql, '') + 'print ''Dropping ' + t.name + ' dbo.[' + o.name + ']'';drop ' + t.name + ' dbo.[' + o.name + '];'
  17:  from sys.objects o
  18:      join types t
  19:      on (t.type = o.type)
  20:  where left(o.name, 7) != 'Backup_'
  21:  order by o.create_date desc
  22:   
  23:  -- drop all
  24:  exec (@sql);

07 Restore Data.sql

   1:  -- Restore backup data
   2:   
   3:  set nocount on;
   4:   
   5:  print '';
   6:  declare @sql varchar(max) = '';
   7:  declare @error bit = 0;
   8:   
   9:  -- check that no columns have been removed from existing tables
  10:  with
  11:  OldColumns (B, TableName, ColumnName)
  12:  as (
  13:      select 1, substring(o.name, 8, 100), c.name
  14:      from sys.objects o
  15:          join sys.columns c
  16:          on c.object_id = o.object_id
  17:      where o.type = 'U' and left(o.name, 7) = 'Backup_'
  18:  ),
  19:  NewColumns (TableName, ColumnName)
  20:  as (
  21:      select o.name, c.name
  22:      from sys.objects o
  23:          join sys.columns c
  24:          on c.object_id = o.object_id
  25:      where o.type = 'U' and left(o.name, 7) != 'Backup_'
  26:  )
  27:  select @sql = @sql + 'print ''Columns removed in table dbo.[' + o.TableName + ']'';'
  28:  from OldColumns o
  29:      left join NewColumns n
  30:      on ((n.TableName = o.TableName) and (n.ColumnName = o.ColumnName))
  31:  where n.ColumnName is null
  32:   
  33:  select @error = iif(@@rowcount > 0, 1, 0);
  34:   
  35:  select @sql = @sql + iif(@error = 1, 'throw 50000, ''Data not restored due to table differences.'', 0;', 'print ''Starting data restore...'';');
  36:  exec(@sql);
  37:   
  38:  print '';
  39:  select @sql = iif(@error = 1, null, '');
  40:   
  41:  -- restore data
  42:  with
  43:  TableColumns (TableName, ColumnName, TableOrder, ColumnOrder)
  44:  as (
  45:      select substring(o.name, 8, 100), c.name, o.create_date, c.column_id
  46:      from sys.objects o
  47:          join sys.columns c
  48:          on (c.object_id = o.object_id)
  49:      where o.type = 'U' and left(o.name, 7) = 'Backup_'
  50:  ),
  51:  ConcatenatedColumns (TableName, AllColumns)
  52:  as (
  53:      select
  54:          tc.TableName,
  55:          stuff((
  56:              select ',' + tc1.ColumnName
  57:              from TableColumns tc1
  58:              where tc1.TableName = tc.TableName
  59:              order by tc1.TableOrder asc, tc1.ColumnOrder asc
  60:              for xml path('')
  61:          ), 1, 1, '')
  62:      from TableColumns tc
  63:      group by tc.TableName
  64:  )
  65:  select @sql = @sql + 'print ''Restoring data for table dbo.[' + o.name + ']'';begin try set identity_insert dbo.[' + o.name + '] on end try begin catch end catch;insert dbo.[' + o.name + '] (' + cc.AllColumns + ') select * from dbo.[Backup_' + o.name + '];begin try set identity_insert dbo.[' + o.name + '] off end try begin catch end catch;'
  66:  from sys.objects o
  67:      join ConcatenatedColumns cc
  68:      on cc.TableName = o.name
  69:  where o.type = 'U'
  70:  order by o.create_date asc;
  71:   
  72:  set @sql = isnull(@sql, '');
  73:  exec(@sql);
  74:   
  75:  set nocount off;

Wrapup

These are the automated database script files I'm using to manage my database during development. You can find (and donwload them) also as a Gist on GitHub. If you have any suggestions or revisions, let me know.

No comments:

Post a Comment