Sunday, March 31, 2013

Data migration to Azure SQL Database

I've long been a fan of SQL Azure (under its various names), mainly because it's usually provided the easiest first step for migrating existing .NET apps into the cloud.  That said, there have always been hundreds of features it's lacking compared to "real" SQL Server, and I've never been particularly satisfied with any of the official means for data migration.

So imagine (if you can!) my excitement last September with the announcement that it had became possible to add Azure SQL DB as a linked server.  As well as allowing queries which joined tables across multiple servers, I also had high hopes that it would make migration of data from one to another much easier.  Between two linked SQL Servers, I've often used the nice and simple SELECT INTO command to copy both the data and schema of a table in one easy step.

On my veritable roller coaster of emotions that day, imagine now the devastation caused by the following response from Azure:

Msg 40510, Level 16, State 1, Line 1
Statement 'SELECT INTO' is not supported in this version of SQL Server.

...due, it seems, to something about tables without clustered indices not being supported :-(

After my initial disappointment, I figured that I could still move data from one place to another using INSERT INTO ... SELECT, providing the schema was already in place on the destination.  After lots of playing, I finally came up with the procedure defined below for copying an entire database from a classic SQL Server into a linked Azure database - whilst also maintaining referential integrity.  Various other restrictions encountered along the way means that it gets a bit ugly in places (especially due to all the dynamic SQL), but it seems to work pretty well and I've found it especially useful in scenarios where I've needed to re-run the data push several times over during development.


1. Build the schema on Azure

The apps I'm writing at the moment are based on Entity Framework (code-first), so I must admit that building the actual schema of databases is something I get for free these days.  If you're in a similar situation to me, just plug into your Web.config the connection string which the Azure portal provides for your server, and fire up your app to have Entity Framework build the empty database.

Otherwise, you could use Management Studio's "Script Database as" menu item to give you the SQL to create the schema.


2. Define the linked server

EXEC sp_addlinkedserver 'azure',
 N'SQL Server',
 'serverref.database.windows.net'
GO

EXEC sp_addlinkedsrvlogin
 @rmtsrvname = 'azure',
 @useself = 'FALSE',
 @rmtuser = 'username@serverref', 
 @rmtpassword = 'password'
GO

3. Ready the Azure schema for migration

You'll need two new columns on each table to facilitate the data migration: DataImport_OriginalId (to contain the original primary key value of the data that will be transferred - needed to keep foreign keys valid across tables), and DataImport_IsClone for some of the jiggery-pokery that takes place in (5).  I'm assuming that each table has one PK which is an identity and an int.

You'll also need to disable FK constraint checking in readiness for the data to be bulk loaded (because it's quite likely that referenced rows won't yet exist at the time of loading).

This generates some SQL to add the columns to each table, and disable the constraints.  If you're brave, replace the PRINT commands with EXEC, but I prefer to view the SQL it generates and then copy and paste to run it manually.

DECLARE @tableName nvarchar(max)
DECLARE @sql nvarchar(max)

DECLARE tableCursor CURSOR FOR  
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES

OPEN tableCursor
FETCH NEXT FROM tableCursor INTO @tableName

WHILE @@FETCH_STATUS = 0
BEGIN
    SELECT @sql = 'ALTER TABLE ' + @tablename 
    + ' ADD DataImport_OriginalId int, DataImport_IsClone bit'
    PRINT @sql
        
    SET @sql = 'ALTER TABLE ' + @tableName 
    + ' NOCHECK CONSTRAINT ALL'
    PRINT @sql
    
    FETCH NEXT FROM tableCursor INTO @tableName
END   

CLOSE tableCursor 
DEALLOCATE tableCursor

4. Push the data

This will generate some SQL to copy over the data from your originating server to Azure, also populating the new columns we added in (3) as appropriate.

DECLARE @tableName nvarchar(max)
DECLARE @identityColumn nvarchar(max)
DECLARE @otherColumns nvarchar(max)
DECLARE @sql nvarchar(max)

DECLARE tableCursor CURSOR FOR  
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME NOT IN ('EdmMetadata')

OPEN tableCursor
FETCH NEXT FROM tableCursor INTO @tableName

WHILE @@FETCH_STATUS = 0
BEGIN
    SELECT @identityColumn = COLUMN_NAME
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = @tableName
     AND COLUMNPROPERTY(object_id(TABLE_NAME), 
      COLUMN_NAME, 'IsIdentity') = 1
    
    SET @otherColumns = null
    SELECT @otherColumns = COALESCE(@othercolumns + ',','') 
    + ('[' + COLUMN_NAME + ']')
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = @tableName
     AND COLUMN_NAME <> @identityColumn
    ORDER BY ORDINAL_POSITION
    
    SET @Sql = 'INSERT INTO azure.remotedatabase.dbo.' + @tableName 
    + ' ([DataImport_OriginalId],[DataImport_IsClone],' + @otherColumns 
    + ') SELECT [' + @identityColumn + '],0,' + @otherColumns 
    + ' FROM ' + @tableName
    PRINT @sql
    
    FETCH NEXT FROM tableCursor INTO @tableName
END   

CLOSE tableCursor 
DEALLOCATE tableCursor

5. Tidy up!

We now have all of the data sitting in our Azure database, but the primary keys will almost certainly be different from their originals, hence any foreign key values will reference non-existent or incorrect rows.  That's because we can't SET IDENTITY_INSERT ON across successive connections, and with a linked server you have no control over which connection each SQL statement runs on.  Instead, we have the original IDs in that new DataImport_OriginalId column we defined earlier.

Because we can neither modify the newly assigned PKs, nor even reseed identity columns (DBCC isn't supported in Azure SQL DB), the only way I've found to get the rows back with their original PK values is as follows:
  1. Switch on IDENTITY_INSERT
  2. Find a suitably distant PK value from anything that currently exists or will need to exist... lazily, I've used MAX(PK) + MAX(DataImport_OriginalId)
  3. Insert (and then delete) one row to set the identity seed to this value
  4. Switch off IDENTITY_INSERT
  5. Clone all rows, flagging them as cloned using the DataImport_IsClone column
  6. Delete all original rows (where DataImport_IsClone = 0)
  7. Switch on IDENTITY_INSERT
  8. Clone all cloned rows, with PK now explicitly set to DataImport_OriginalId
  9. Switch off IDENTITY_INSERT
  10. Delete all rows from the first clone!
  11. Repeat for all tables; then (and only then) re-enable FK constraint checking.
Here comes the humdinger of a query to generate the SQL for all tables:

DECLARE @tableName nvarchar(max)
DECLARE @identityColumn nvarchar(max)
DECLARE @otherColumns nvarchar(max)
DECLARE @maxIdentity int
DECLARE @sql nvarchar(max)

DECLARE tableCursor CURSOR FOR  
SELECT TABLE_NAME
FROM Information_Schema.Tables
WHERE TABLE_NAME NOT IN ('EdmMetadata')

OPEN tableCursor
FETCH NEXT FROM tableCursor INTO @tableName

WHILE @@FETCH_STATUS = 0
BEGIN

    SELECT @identityColumn = COLUMN_NAME
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = @tableName
     AND COLUMNPROPERTY(object_id(TABLE_NAME),
      COLUMN_NAME, 'IsIdentity') = 1

    SET @otherColumns = null
    SELECT @otherColumns = COALESCE(@othercolumns + ',','') 
    + ('[' + COLUMN_NAME + ']')
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = @tableName
     AND COLUMN_NAME <> @identityColumn
     AND COLUMN_NAME NOT LIKE 'DataImport_%'
    ORDER BY ORDINAL_POSITION

    SET @sql = 'SET IDENTITY_INSERT ' + @tableName + ' ON'
    PRINT @sql

    SET @sql = 'INSERT INTO ' + @tableName 
    + ' ([' + @identityColumn + '],[DataImport_IsClone],' + @otherColumns 
    + ') SELECT TOP 1 (SELECT MAX(DataImport_OriginalId) + MAX(' 
    + @identityColumn 
    + ') FROM ' + @tableName + ')+1,1,' + @otherColumns 
    + ' FROM ' + @tableName
    PRINT @sql

    SET @sql = 'SET IDENTITY_INSERT ' + @tableName + ' OFF'
    PRINT @sql
    
    SET @sql = 'DELETE FROM ' + @tablename + ' WHERE DataImport_IsClone = 1'
    PRINT @sql
    
    SET @sql = 'INSERT INTO ' + @tableName 
    + ' ([DataImport_OriginalId],[DataImport_IsClone],' + @otherColumns 
    + ') SELECT [DataImport_OriginalId],1,' + @otherColumns 
    + ' FROM ' + @tableName 
    + ' WHERE DataImport_OriginalId IS NOT NULL'
    PRINT @sql
    
    SET @sql = 'DELETE FROM ' + @tablename 
    + ' WHERE DataImport_IsClone = 0 OR DataImport_IsClone IS NULL'
    PRINT @sql

    SET @sql = 'SET IDENTITY_INSERT ' + @tableName + ' ON'
    PRINT @sql

    SET @sql = 'INSERT INTO ' + @tableName 
    + ' ([' + @identityColumn + '],[DataImport_IsClone],' + @otherColumns 
    + ') SELECT DataImport_OriginalId,0,' + @otherColumns 
    + ' FROM ' + @tableName
    PRINT @sql

    SET @sql = 'SET IDENTITY_INSERT ' + @tableName + ' OFF'
    PRINT @sql
    
    SET @sql = 'DELETE FROM ' + @tablename + ' WHERE DataImport_IsClone = 1'
    PRINT @sql
        
    FETCH NEXT FROM tableCursor INTO @tableName

END   

CLOSE tableCursor 

OPEN tableCursor
FETCH NEXT FROM tableCursor INTO @tableName

WHILE @@FETCH_STATUS = 0
BEGIN

    SET @sql = 'ALTER TABLE ' + @tableName + ' CHECK CONSTRAINT ALL'
    PRINT @sql

    FETCH NEXT FROM tableCursor INTO @tableName

END   

CLOSE tableCursor 
DEALLOCATE tableCursor
Note that I've deliberately left behind the two columns I created in (3), because they have no negative impact on my application, and they allow me to repeat this process as many times as I need by emptying the database, and re-running (4) and (5).


Improvements?

I find this technique for migrating data into the cloud useful, but it would be great to find a way to remove the need for all the data shuffling that goes on in (5) just to restore the original identities.  Something like the ability to switch on IDENTITY_INSERT for all connections would make things a whole lot nicer.

If you have any thoughts or suggestions to share for making this cleaner, I'd love to hear from you!

Tuesday, March 19, 2013

Comparing SQL Server Schemas

If you're using Entity Framework code-first without database migrations, you'll be only too aware of the pain of updating the live database schema whenever you make changes to the model.

Assuming you're using a DropCreateDatabaseIfModelChanges initializer on your debug build, your development database will be automatically rebuilt whenever the model changes, but (assuming you don't want to lose data), any changes will need to be manually made on the live server before deployment.

The following scripts are what I use to identify the differences between development and live schemas.

1. Create a linked server for Live on your development machine
EXEC sp_addlinkedserver 'myLiveServer', N'SQL Server', 'www.myserver.com'
GO

EXEC sp_addlinkedsrvlogin
 @rmtsrvname = 'myLiveServer',
 @useself = 'FALSE',
 @locallogin = null,
 @rmtuser = 'sa', 
 @rmtpassword = 'myPassword'
GO

2. Compare the two schemas with reference to INFORMATION_SCHEMA
SELECT
 Local.TABLE_NAME,
 Local.COLUMN_NAME,
 Local.DATA_TYPE,
 Local.CHARACTER_MAXIMUM_LENGTH,
 Local.IS_NULLABLE,
 Remote.TABLE_NAME [Remote Table],
 Remote.COLUMN_NAME [Remote Column]
FROM myDatabase.INFORMATION_SCHEMA.COLUMNS [Local]
 FULL OUTER JOIN myLiveServer.myDatabase.INFORMATION_SCHEMA.COLUMNS [Remote]
  ON Remote.TABLE_NAME = Local.TABLE_NAME
  AND Remote.COLUMN_NAME = Local.COLUMN_NAME
  AND Remote.DATA_TYPE = Local.DATA_TYPE
  AND (
   Remote.CHARACTER_MAXIMUM_LENGTH = Local.CHARACTER_MAXIMUM_LENGTH
   OR (
    Remote.CHARACTER_MAXIMUM_LENGTH IS NULL
    AND Local.CHARACTER_MAXIMUM_LENGTH IS NULL
   )
  )
  AND Remote.IS_NULLABLE = Local.IS_NULLABLE
WHERE
 Local.TABLE_NAME IS NULL
 OR Local.COLUMN_NAME IS NULL
 OR Remote.TABLE_NAME IS NULL
 OR Remote.COLUMN_NAME IS NULL
ORDER BY
 Remote.TABLE_NAME,
 Remote.COLUMN_NAME,
 Local.TABLE_NAME,
 Local.COLUMN_NAME

3. Make the manual modifications

Running script (2) will return a set of modifications to be made on the live server.

Items that appear with 'Remote Table' and 'Remote Column' as NULL need to be added to the live server, according to the details given in the other columns.  Items that have entries for 'Remote Table' and 'Remote Column' but NULLs elsewhere need to be removed from the live server.

So, in the example shown below, a new column HomeAddress of type nvarchar(max) needs to be added to the Students table on the live server.  And Telephone2 needs to be removed from the Students table.


4. Update the hash

Script (2) can be run as many times as needed.  When it returns no rows, you know that the schemas on both development and live servers are the same.  In order to let Entity Framework work with the new version of the schema, you'll also need to update the hash manually - copying the hash that was generated automatically on the development server.
UPDATE myLiveServer.myDatabase.dbo.EdmMetadata
SET ModelHash = (SELECT ModelHash FROM myDatabase.dbo.EdmMetadata)

5. Copy over the code

Now you can deploy the new version of the code onto the live server, safe in the knowledge that the required schema is ready and waiting.

Wednesday, June 29, 2011

Google Contacts Synchronisation

Having recently migrated to Google Apps at the organisation in which I work, one of the glaring omissions we discovered was the concept of a Global Address List to which I'd personally become accustomed in MS Exchange. Despite this persistently being at the top of the feature request list with Google, there still seems to be no easy way of sharing contacts between users, and so I decided to write some code to implement it programmatically.

The application simply copies all contacts from one account (which might be designated the "master") to another, flagging them as it does so in order that they can be deleted/updated when the program is scheduled to run regularly. As such, the destination account's existing contacts are never modified. This works within Google Apps, and also as a means of syncing contacts between regular Gmail accounts.

To obtain the reference assemblies, you'll need to get the GData .NET Client Library from http://code.google.com/p/google-gdata/, or I've included the relevant DLLs in the source download at www.cslacey.co.uk/blog/downloads/CSL.GoogleContactsSync.zip.

The latter link also contains the compiled application - to start using it straight away, simply enter the relevant credentials into the .config file within /bin/Release, and run CSL.GoogleContactsSync.ConsoleApp.exe.

(Assembly references: System.Configuration, Google.GData.Client, Google.GData.Contacts, Google.GData.Extensions)

using System;
using System.Collections.Generic;
using System.Configuration;
using System.Linq;
using Google.Contacts;
using Google.GData.Client;
using Google.GData.Contacts;

namespace CSL.GoogleContactsSync.ConsoleApp
{
class Program
{
const int MAX_BATCH_SIZE = 100;
const string GOOGLE_APP_NAME = "CSL.GoogleContactsSync.ConsoleApp";
const string GAL_FIELD_KEY = "Origin";
const string GAL_FIELD_VALUE = "Global Address List";

static void Main(string[] args)
{
List<Contact> sourceContacts = GetSourceContacts(
ConfigurationManager.AppSettings["SourceGoogleUsername"],
ConfigurationManager.AppSettings["SourceGooglePassword"]);
DeleteExistingGlobalContacts(
ConfigurationManager.AppSettings["DestinationGoogleUsername"],
ConfigurationManager.AppSettings["DestinationGooglePassword"]);
InsertContacts(sourceContacts,
ConfigurationManager.AppSettings["DestinationGoogleUsername"],
ConfigurationManager.AppSettings["DestinationGooglePassword"]);
}

/// <summary>
/// Retrieve all contacts from the specified account in a form
/// suitable for insertion into other destination accounts
/// </summary>
private static List<Contact> GetSourceContacts(
string username, string password)
{
List<Contact> result = new List<Contact>();

RequestSettings requestSettings = new RequestSettings(
GOOGLE_APP_NAME, username, password);
ContactsRequest contactsRequest =
new ContactsRequest(requestSettings);
Feed<Contact> feed = contactsRequest.GetContacts();
feed.AutoPaging = true;

foreach (Contact contact in feed.Entries)
{
// Remove all group membership, as groups may not exist in
// destination account
while (contact.GroupMembership.Count > 0)
{
contact.GroupMembership.RemoveAt(0);
}

// Remove all user defined fields, as they may not exist in
// destination account
while (contact.ContactEntry.UserDefinedFields.Count > 0)
{
contact.ContactEntry.UserDefinedFields.RemoveAt(0);
}

// Flag retrieved contact as originating from the GAL
contact.ContactEntry.UserDefinedFields.Add(
new UserDefinedField(GAL_FIELD_VALUE, GAL_FIELD_KEY));

Console.WriteLine(String.Format("RETRIEVING CONTACT: {0}",
contact.Name.FullName));
result.Add(contact);
}

return result;
}

/// <summary>
/// Delete all contacts which have been flagged
/// as originating from the GAL
/// </summary>
private static void DeleteExistingGlobalContacts(
string username, string password)
{
RequestSettings requestSettings = new RequestSettings(
GOOGLE_APP_NAME, username, password);
ContactsRequest contactsRequest =
new ContactsRequest(requestSettings);
Feed<Contact> feed = contactsRequest.GetContacts();
feed.AutoPaging = false;

List<Contact> contactsToDelete;

do
{
contactsToDelete = new List<Contact>();

foreach (Contact contact in feed.Entries)
{
for (int i = 0;
i < contact.ContactEntry.UserDefinedFields.Count; i++)
{
if (contact.ContactEntry.UserDefinedFields[i].Key
== GAL_FIELD_KEY
&& contact.ContactEntry.UserDefinedFields[i].Value
== GAL_FIELD_VALUE)
{
// Contact originated from GAL, so add it to the
// delete batch
Console.WriteLine(String.Format("PENDING DELETE: {0}",
contact.Name.FullName));
contactsToDelete.Add(contact);
}
}
}

Console.WriteLine("BATCH DELETING {0} Contacts",
contactsToDelete.Count);
contactsRequest.Batch(contactsToDelete, feed,
GDataBatchOperationType.delete);

// Re-intialise batch and feed for next execution
contactsToDelete = new List<Contact>();
feed = contactsRequest.GetContacts();
}
while (feed.Entries.Count() > 0 && contactsToDelete.Count > 0);
}

/// <summary>
/// Insert the specified contacts
/// </summary>
private static void InsertContacts(
List<Contact> contacts, string username, string password)
{
RequestSettings requestSettings = new RequestSettings(
GOOGLE_APP_NAME, username, password);
ContactsRequest contactsRequest =
new ContactsRequest(requestSettings);
Feed<Contact> feed = contactsRequest.GetContacts();
List<Contact> contactsToInsert = new List<Contact>();

foreach (Contact contact in contacts)
{
Console.WriteLine(String.Format("PENDING INSERT: {0}",
contact.Name.FullName));
contactsToInsert.Add(contact);

if (contactsToInsert.Count == MAX_BATCH_SIZE)
{
Console.WriteLine("BATCH INSERTING {0} Contacts",
contactsToInsert.Count);
contactsRequest.Batch(contactsToInsert, feed,
GDataBatchOperationType.insert);

// Re-initialise batch for next execution
contactsToInsert = new List<Contact>();
}
}

Console.WriteLine("BATCH INSERTING {0} Contacts",
contactsToInsert.Count);
contactsRequest.Batch(contactsToInsert, feed,
GDataBatchOperationType.insert);
}
}
}

Monday, November 2, 2009

Migrating Classic WCF to the Service Bus

The Service Bus (within Microsoft's new .NET Services product) is a nice bit of technology that allows two-way communication between nodes on the Internet, regardless of the presence of firewalls, NAT gateways or other complicated network topologies. It makes possible all sorts of useful tools, such as Rich's and Rob's SocketShifter.

What's really nice about Service Bus's implementation, however, (and often overlooked) is that it's nothing more than a new set of WCF bindings which result in the creation of publicly reachable and discoverable endpoints.

With that in mind, I wondered whether I could take my previous simple WCF example, and change it to use the Service Bus by doing nothing more than modifying the config. In theory, it seems that this should be possible once the Service Bus becomes a standard part of the .NET framework - but for now it proves difficult. This is because installing the .NET Services CTP will modify your machine.config file such that netTcpRelayBinding cannot be added again as an extension within an app.config, but HAS to be added in that way for machines without the CTP installed (i.e. all of your customers' servers, or indeed an Azure VM).

The only sensible way to achieve this, therefore, is by explicitly setting options within the code itself - but this proved pretty easy.


Messaging Model

Let's start by modifying the code of the previous example slightly so that we emphasise the delivery of a message and a reply. We'll make the GetTimestamp method take a string, which gets printed at the server and copied back within the reply:

SharedTypes.IHealthCheck
...
[OperationContract]
string GetTimestamp(string serviceName);

Server.HealthCheck
...
public string GetTimestamp(string serviceName)
{
    Console.WriteLine("Request received from {0}", serviceName);
    return String.Format("{0}: {1}", serviceName, DateTime.Now.ToString());
}

Client.Program
...
while (true)
{
    Console.WriteLine(healthCheck.GetTimestamp("My service"));



Migrating to Service Bus

In both the Server and the Client, we now need to configure the endpoint as follows (If you don't yet have a .NET Services account, you can create one at http://www.microsoft.com/windowsazure/developers/dotnetservices/):

  • Specify a Service Bus address (sb://yourAccount.servicebus.windows.net/...)
  • Specify an appropriate Service Bus binding (here, we'll use NetTcpRelayBinding)
  • Specify a new behaviour which contains your Service Bus credentials

As mentioned before, the only safe way of doing this at the moment is programatically:

Server.Program

Configure the endpoint of the ServiceHost, just before opening it:

(Assembly reference: Microsoft.ServiceBus)

using System.ServiceModel.Description;
using Microsoft.ServiceBus;
...
static void Main(string[] args)
{
    // Construct EndpointBehaviour for SB username and password
    TransportClientEndpointBehavior serviceBusCredential =
        new TransportClientEndpointBehavior();
    serviceBusCredential.CredentialType =
        TransportClientCredentialType.UserNamePassword;
    serviceBusCredential.Credentials.UserName.UserName = "yourAccount";
    serviceBusCredential.Credentials.UserName.Password = "yourPassword";
 
    ServiceHost serviceHost = new ServiceHost(typeof(HealthCheck));
 
    // Set the binding manually (overriding the app.config)
    ServiceEndpoint serviceEndpoint = serviceHost.Description.Endpoints[0];
    serviceEndpoint.Address = new EndpointAddress(
        "sb://yourAccount.servicebus.windows.net/test/");
    serviceEndpoint.Binding = new NetTcpRelayBinding();
    serviceEndpoint.Behaviors.Add(serviceBusCredential);
 
    serviceHost.Open();
 
    // Keep the process running (hence the service open) for an hour
    System.Threading.Thread.Sleep(3600000);
}


Client.HealthCheck

Configure the endpoint of the ClientBase class we're inheriting from on construction, to ensure we only do it once:

(Assembly reference: Microsoft.ServiceBus)

using Microsoft.ServiceBus;
...
public HealthCheck()
{
    // Construct EndpointBehaviour for SB username and password
    TransportClientEndpointBehavior serviceBusCredential =
        new TransportClientEndpointBehavior();
    serviceBusCredential.CredentialType =
        TransportClientCredentialType.UserNamePassword;
    serviceBusCredential.Credentials.UserName.UserName = "yourAccount";
    serviceBusCredential.Credentials.UserName.Password = "yourPassword";
 
    // Set the binding manually (overriding the app.config)
    this.Endpoint.Address = new EndpointAddress(
        "sb://yourAccount.servicebus.windows.net/test/");
    this.Endpoint.Binding = new NetTcpRelayBinding();
    this.Endpoint.Behaviors.Add(serviceBusCredential);
}


That's it! If you fire up the server and the client, you should now see them working exactly as before - except you can now have the server running on your home machine behind a NAT gateway, and the client running on your office machine behind a corporate firewall!

Monday, June 29, 2009

Triple Boot MacBook

Having bought a shiny new MacBook a few months ago, I got a little bored of OS X and wanted to make most use of its Intel chipset - to let me boot into Windows, OS X and Linux all on the same box.

There are quite a few long-winded, complicated explanations out there that describe how to achieve this, but actually it's quite simple once you know what to avoid. The following procedure has been tested using OS X Leopard (I'm assuming it's already installed), Windows XP Professional SP2 and Ubuntu 9.04.


1. Install the rEFIt Bootloader

Bootcamp seems to have problems working with more than two parallel OS's. So inside OS X, download and install rEFIt from http://refit.sourceforge.net, which is an alternative bootloader providing support for the Extensible Firmware Interface that Macs use.

Reboot after installing to check that you get the new rEFIt bootup screen, and choose to boot into OS X again.


2. Partition your drive

You'll need at least three partitions (one for OS X, one for XP, and one for Ubuntu). Ideally, though, you might choose to have an additional swap partition for Linux, and perhaps a shared data partition which you can access from all three OS's. You'll also probably already have your first partition defined as an EFI system partition - it's best not to fiddle with that.

Limitations to bear in mind are that the Ubuntu GRUB bootloader must be installed on the third partition, and XP on the fourth (due to their dependence on PC BIOS and the MBR, both seemingly have to be within the first four partitions, and XP must be as "low down" as possible). XP also seems to be unable to see any more than the first four partitions (as it doesn't support the GUID partition scheme), so make sure that your shared data is within the first four.

I found my ideal partition map to be as follows:

  1. EFI - 200MB (untouched, as from the factory)
  2. OS X - 50GB (the original OS X partition, shrunk but otherwise not modified)
  3. Shared Data (plus GRUB bootloader for Ubuntu) - 20GB
  4. XP - 50GB
  5. Ubuntu - 25GB
  6. Linux Swap - 2.5GB

In OS X, enter the command:

diskutil list

...and you should see that your existing disk structure looks something like this:
/dev/disk0
#: type name size identifier
0: GUID_partition_scheme *149.1 GB disk0
1: EFI 200.0 MB disk0s1
2: Apple_HFS Macintosh HD 148.9 GB disk0s2

Given that you want to resize the large partition and make lots of other smaller ones, you should then issue the following command (without linebreaks, and assuming my desired partition map):

diskutil resizeVolume disk0s2 50G
"HFS+" "Data" 20G
"MS-DOS FAT32" "Windows" 50G
"HFS+" "Linux" 25G
"HFS+" "Linux Swap" 2.5G

In actual fact, the formats selected here don't matter all that much, because both Windows and Linux will reformat the relevant drives upon installation. Make sure, however, that for the Data partition you initially pick a format that Windows can't understand - this will ensure that the Windows partition ends up as drive C upon installation! You can then reformat Data later (as MS-DOS FAT 32, being the only format all three OSs can read and write) using Terminal or Disk Utility in OSX.


3. Install XP

Insert your XP boot CD, and restart the machine while holding down the Option (alt) key. Choose the option to boot from CD.

Install Windows as normal, selecting the fourth partition (which should be labelled C:). I also chose to reformat the drive using NTFS, as this provides better performance than FAT32.

As Windows needs to restart the machine a couple of times during this process, ensure that each time you select the "Windows" option which should appear in the rEFIt menu, to allow the installation to carry on.

Once this is complete, install the relevant Windows drivers for the Mac hardware using the "Boot Camp" installer for Windows (on the Mac OS X Install DVD).


4. Install Ubuntu

Insert a bootable Ubuntu CD, and again restart the machine while holding down the Option (alt) key. Choose the option to boot from CD.

Install Ubuntu as normal, selecting manual configuration of partitions when prompted. On the subsequent screen, you should select /dev/sda5 to be formatted with ext3 (my preference), and mounted as "/". Assuming my partition map, you should also select /dev/sda6 to be formatted as a swap partition.

Important: On the final screen of the installer, select 'Advanced...' and change the location of the GRUB bootloader to be /dev/sda3. This is important, as otherwise Ubuntu's bootloader will attempt to take over the whole disk, which can have some odd results working alongside rEFIt.


5. Celebrate

That's it! The only thing that doesn't work for me (bizarrely) is restarting from within Ubuntu... I have to make sure I only ever choose "Shut Down"!

Thursday, January 22, 2009

Obfuscating QueryString Parameters

I need to provide a link to a URL which contains some parameters I'd like to obfuscate. Specifically, within an automatically generated Email, instead of embedding a link /sendMessage.aspx?address=foo@bar.com&expiry=2009-01-22Z13:30 , I'd prefer /sendMessage.aspx?token=s0MEth1ngUnfath0mable!

Broken down, my requirements for obfuscation are:
  • Serialize an object which contains my parameters (address and expiry), so that it can be transmitted to a different server (i.e. the web server)
  • Encrypt the serialized bytes to prevent tampering
  • Make Quotable the encrypted bytes so that they can be included within a URL querystring (albeit after a suitable UrlEncode)


Key Generation

My security requirements are pretty basic, as I'm not trying to protect anything desperately important here - rather prevent basic tampering by spotting patterns in the quotable string. Therefore, I'll keep things as simple as possible and take some shortcuts, for example using a basic security algorithm (DES), and embedding the encryption key within my source code.

Firstly, we'll need to generate a key upon which the encryption and decryption can occur:

using System;
using System.IO;
using System.Security.Cryptography;
 
namespace ObfuscationTesting
{
    class Program
    {
        static void Main(string[] args)
        {
            // Generate a suitable key
            DESCryptoServiceProvider desCryptoServiceProvider =
                new DESCryptoServiceProvider();
            desCryptoServiceProvider.GenerateKey();
 
            // Represent key as a quotable string
            string base64EncodedKey =
                Convert.ToBase64String(desCryptoServiceProvider.Key);
 
            // Save to file
            File.WriteAllText(@"C:\key.txt", base64EncodedKey);
        }
    }
}


Obfuscator Class

Now, we'll create a static class that takes any serializable object and returns us a string which represents that object (and, naturally, allows us to re-create the original object from such a string):

using System;
using System.IO;
using System.Runtime.Serialization.Formatters.Binary;
using System.Security.Cryptography;
 
namespace ObfuscationTesting
{
    public static class Obfsucator
    {
        // The key previously generated
        private const string KEY = "MXvUAmUobjA=";
 
        public static string Obfuscate(object targetObject)
        {
            // Binary serialize
            MemoryStream memoryStream = new MemoryStream();
            BinaryFormatter binaryFormatter = new BinaryFormatter();
            binaryFormatter.Serialize(memoryStream, targetObject);
            byte[] serialized = memoryStream.GetBuffer();
 
            // Encrypt
            DESCryptoServiceProvider desCryptoServiceProvider =
                new DESCryptoServiceProvider();
            desCryptoServiceProvider.Key = Convert.FromBase64String(KEY);
            desCryptoServiceProvider.IV = Convert.FromBase64String(KEY);
            byte[] encrypted = desCryptoServiceProvider
                .CreateEncryptor()
                .TransformFinalBlock(serialized, 0, serialized.Length);
 
            // Render as string
            string quotable = Convert.ToBase64String(encrypted);
 
            return quotable;
        }
 
        public static object Deobfuscate(string quotable)
        {
            // Retrieve the bytes from the quotable string
            byte[] encrypted = Convert.FromBase64String(quotable);
 
            // Decrypt
            DESCryptoServiceProvider desCryptoServiceProvider =
                new DESCryptoServiceProvider();
            desCryptoServiceProvider.Key = Convert.FromBase64String(KEY);
            desCryptoServiceProvider.IV = Convert.FromBase64String(KEY);
            byte[] serialized = desCryptoServiceProvider
                .CreateDecryptor()
                .TransformFinalBlock(encrypted, 0, encrypted.Length);
 
            // Deserialize
            MemoryStream memoryStream = new MemoryStream(serialized);
            BinaryFormatter binaryFormatter = new BinaryFormatter();
            object deserialized = binaryFormatter.Deserialize(memoryStream);
 
            return deserialized;
        }
    }
}


Usage

We'll create a class which contains the parameters I'd like to represent, remembering to mark it as serializable:

using System;
 
namespace ObfuscationTesting
{
    [Serializable]
    public class Token
    {
        public string EmailAddress { get; set; }
        public DateTime Expiry { get; set; }
    }
}

To construct the URL with the obfuscated parameters, all we need to have is:

(Assembly reference: System.Web)

using System;
using System.Web;

...
            Token token = new Token()
            {
                EmailAddress = "foo@bar.com",
                Expiry = DateTime.Now.AddHours(1)
            };
            string url = String.Format(@"/sendMessage.aspx?token={0}",
                 HttpUtility.UrlEncode(Obfsucator.Obfuscate(token)));

And, on our web page that receives the URL, we just need to de-obfuscate again:

            string tokenString = Request.QueryString["token"];
            Token token = (Token)Obfsucator.Deobfuscate(tokenString);
            if (token.Expiry < DateTime.Now)
            {
                // Token has expired
                Response.End();
            }
            else
            {
                string toAddress = token.EmailAddress;
                // etc
            }

Wednesday, January 14, 2009

Self-Hosting WCF

I'd like to have two separate processes (both under my control) communicate over WCF. What's the easiest way to achieve this? As an example, I'd like Process 1 (a client) to be able to check the health or presence of Process 2 (the server) by periodically requesting the server's local time.

To permit communication, we need to define a contract between the two processes. All I'd like this contract (let's call it HealthCheck) to define is an operation GetTimestamp. Given that my two separate processes will be two separate projects in Visual Studio, and that they both need to access this contract, we'll create a third class library called SharedTypes that will contain nothing but the contract.


SharedTypes (contains the contract)

Assembly references: System.ServiceModel

We'll define the contract as an interface, decorated by System.ServiceModel.ServiceContract. The interface's one method signature (GetTimestamp) should be decorated by System.ServiceModel.OperationContract.

using System;
using System.ServiceModel;
 
namespace SharedTypes
{
    [ServiceContract]
    public interface IHealthCheck
    {
        [OperationContract]
        DateTime GetTimestamp();
    }
}


Server

Assembly references: SharedTypes, System.ServiceModel

The server process, perhaps unsurprisingly, will need to implement this contract interface as follows:

using System;
using SharedTypes;
 
namespace Server
{
    class HealthCheck : IHealthCheck
    {
        public DateTime GetTimestamp()
        {
            return DateTime.Now;
        }
    }
}

The program's main method will simply instantiate and open a ServiceHost to expose the service, and hang around in the background:

using System;
using System.ServiceModel;
 
namespace Server
{
    class Program
    {
        static void Main(string[] args)
        {
            ServiceHost serviceHost = new ServiceHost(typeof(HealthCheck));
            serviceHost.Open();
 
            // Keep the process running (hence the service open) for an hour
            System.Threading.Thread.Sleep(3600000);
        }
    }
}

Finally, we'll need to specify the address and protocol by which the service can be called. Part of the joy of WCF is allowing this to be configurable by an end-user or sysadmin, so we'll specify a minimal app.config to contain the core settings we require:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
   <system.serviceModel>
      <services>
         <service name="Server.HealthCheck">
            <endpoint address="basic" binding="basicHttpBinding"
                      contract="SharedTypes.IHealthCheck" />
            <host>
               <baseAddresses>
                  <add baseAddress="http://localhost:8080/HealthCheck" />
               </baseAddresses>
            </host>
         </service>
      </services>
   </system.serviceModel>
</configuration>


Client

Assembly references: SharedTypes, System.ServiceModel

The client will also need to have a class that implements the service contract, but we'll make it such that calls to its methods actually result in a remote call over WCF:

using System;
using System.ServiceModel;
using SharedTypes;
 
namespace Client
{
    public class HealthCheck : ClientBase<IHealthCheck>, IHealthCheck
    {
        public DateTime GetTimestamp()
        {
            return base.Channel.GetTimestamp();
        }
    }
}

The program's main method will just sit in a loop calling and echoing the result from GetTimestamp:

using System;
 
namespace Client
{
    class Program
    {
        static void Main(string[] args)
        {
            HealthCheck healthCheck = new HealthCheck();
            while (true)
            {
                Console.WriteLine(healthCheck.GetTimestamp());
 
                // Wait for a second
                System.Threading.Thread.Sleep(1000);
            }
        }
    }
}

Again, we'll need to specify in app.config the location of the server, and the protocol we want to use:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
   <system.serviceModel>
      <client>
         <endpoint address="http://localhost:8080/HealthCheck/basic"
                   binding="basicHttpBinding"
                   contract="SharedTypes.IHealthCheck" />
      </client>
   </system.serviceModel>
</configuration>