Parallel Data Warehouse (SQL Servers Big Brother) Presentation

Faster is better. That’s the rule that applies to pretty much everything: cars, broadband, bar service, data warehouses… I went along to Microsoft last week with this in mind as I knew already that their Parallel Data Warehouse is all about speed and they were keen to demo it.
A little bit late to the party it may be but 4 years after being for sale in other Western markets, Microsoft’s Parallel Data Warehouse (PDW) has entered the Australian market. It sits at the extreme end of the Microsoft Data Warehouse product range:
Presented by ex Teradata employee Matthew Winter, it was a truly impressive fact fest that shows what can be achieved performance-wise when you let the vendors configure everything and then lock the box so no pesky client types can tinker in it.
  • Sold as an appliance  (pre-configured hardware & software all in 1 big box)
  • Contains control nodes, compute nodes & storage nodes (several nodes per rack)
  • Compute nodes are virtual machines with their own SQL Server instances
  • Up to 9 compute nodes per rack
  • Virtual machines run in Hyper-V
  • Hyper-V limited to 64 hosts, which is only thing that limits PDW size
  • Virtual machines watched by MS system center, 99.8% same performance as non virtual
  • Uses MPP – massively parallel processing, instead of current Symmetric Multi-Processing
  • Storage: JBOD – just a bunch of disks, not raid
  • Max storage: 6 Ptbytes
  • Cheap data drives, 70 disks, 32 per comp node plus hot swappable
  • Labelled SQL 2012 but actually 2014
  • Base unit is half rack
  • Full rack : 2 Tb processing per hour
  • Shell database records  meta data on a control node
  • Compute nodes do work
  • SQL execution plans are injected with special parallel instructions
  • Cluster column storage used
  • Don’t forget batch mode for column store
  • Won’t be batch mode if you only use auto stats. Collect your own stats
  • Polybase lets you write SQL against Hadoop and PDW, combining data
  • PDW 1000 * faster than Hadoop so good to load from Hadoop first
  • Migration is simple – schema needs distribution keys, nc indexes removed.
  • Identity columns not supported.
  • Price based on 32 core SQL license
  • Microsoft currently training 5 core partners


If your pockets are deep enough and need to process data at the kind of speed that is likely to set your hair on fire then PDW is for you. It is ultra appealing to have everything pre-configured and not have to worry about it.

There is only a handful of companies using it in Australia so far but that will change as there is a clear market for it, as demonstrated by the competitors already at the party – Oracle, Teradata, IBM etc…