Saturday, August 12, 2006

Table Partitioning

Table portioning can be coined as distributing a table and its associated indexes in different file groups. Table partitions will be quite handy if you have a large OLTP database where data keeps comes in and goes out.

The important thing before implementing the table partition is planning and thinking. We got to plan well before implementing and designing partition. I assure your life is going to become hell if you try to modify the partition after implementing it in your production. Even though the steps seems like its a cake walk its not. So think 100 times about all the possibilities before implementing this and plan a head on this.

The first step in implementing the table partitions is to decide the number of partitions and how we are going to spread the partitions across the file groups. In this particular scenario since im having 24 partitions we decided to have 24 file groups spread across the 8 disks. There is no restriction like we have to have only one partition in one file group we can have more than one partition in one file groups also but for the performance reasons we decided to have 24 file groups. Lets dig more into the partition stuff.

Table partition consists of three things

1. Partition Function.

2. Partition Schema

3. The real partitioned table.

1. Partition Function.

Partition function is just a function which determines which data should go to which partition. Hope you should have confused reading this (if some one does). Let us consider the below scenario we have a crazy requirement (requirements are always crazy that’s another issue).

The requirement is like we have a system that will keep on inserting its log data in the sqlserver. The data in that table becomes worth less after a day. A table should be designed that will hold 24 hours worth of data, the data that is older than 23 hours should be deleted. There will be a couple of applications that will access the data that is older than an hour to collect statistics.

We decided to implement the table partition the reason is there will be several application servers that will be keep on inserting data and there will be applications that will be selecting older data. If all these things are kept in a normal table then it will end up in blocking, locking etc. So we decided to implement table partitions.

There will be 24 partitions representing one an hour. The data inserted say at 12 pm in the afternoon should go to the partition number 12; the data inserted at 1 pm should go to the partition number 13 and so on. The table structure is given below:

svrID int
PartitionNo int -- Partition number
svrDate datetime
svrMsg varchar(500)
logOrigin varchar(100)
userId varchar(20)

Ogay now we got our table designed the next thing is the partition function,

Syntax:

CREATE PARTITION FUNCTION partition_function_name ( input_parameter_type )
AS RANGE [ LEFT | RIGHT ]
FOR VALUES ( [ boundary_value [ ,...n ] ] )
[ ; ]

Example:

CREATE PARTITION FUNCTION [HourRangePF1](int) AS RANGE LEFT FOR VALUES (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)

The above is the simplest of the simplest partition function. Guess who is the author its meeeeeeeeeee(who else can write these kinds of stupid codes, this is too much of show off isnt?). okey it says insert all the data which is less than 1 in the partition one and so on.

Next we got to design the partition scheme

Syntax:

CREATE PARTITION SCHEME partition_scheme_name
AS PARTITION partition_function_name
[ ALL ] TO ( { file_group_name | [ PRIMARY ] } [ ,...n ] )
[ ; ]

Example:

CREATE PARTITION SCHEME [HoursPS1] AS PARTITION [HourRangePF1] TO ([SECONDARY01], [SECONDARY02], [SECONDARY03], [SECONDARY04], [SECONDARY05], [SECONDARY06], [SECONDARY07], [SECONDARY08], [SECONDARY09], [SECONDARY10], [SECONDARY11], [SECONDARY12], [SECONDARY13], [SECONDARY14], [SECONDARY15], [SECONDARY16], [SECONDARY17], [SECONDARY18], [SECONDARY19], [SECONDARY20], [SECONDARY21], [SECONDARY22], [SECONDARY23], [SECONDARY24])

The above partition scheme gives the skeleton for the table. Basically it defines which partition should go to which file group using the partition function.

CREATE TABLE [dbo].[srvLogs](
svrID int Not Null,
svrDate datetime Null,
svrMsg varchar(500) Null,
logOrigin varchar(100) Null,
userId varchar(20) Null,
PartitionNo int NotNull,
CONSTRAINT [PK_ srvLogs_1] PRIMARY KEY NONCLUSTERED
(
svrId ASC
PartitionNo ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [HoursPS1]([PartitionNo])
) ON [HoursPS1]([ PartitionNo])

The table is ready now. It is created across 24 file groups. It is upto you to place the file groups in as many disks you want. Here we go we have created a partitioned table. The table is partitioned based upon the column partition Number.
So if the partition number is equal to 1 then it will go to the first partition, else if it is 2 then it will go to the partition 2 and so on. This part of segregating the data is taken care by the partition function. There are still more things to come on this partition table stuff let me continue it in couple of other posts.

2 comments:

  1. Anonymous11:57 PM

    Ever compared this with Oracle Partitioning? Oracle is light years ahead of MS in database technology.

    ReplyDelete
  2. yeah anony you are absolutely right. Oracle is very much a head of SQL Server. But SQL Server is not a matured database as oracle it is just targetting the medium business.

    ReplyDelete