Question : Complex SQL Partition and Index creation ON very large tables

I'm trying to find the best way to do indexes in on a table paritioned by date. The table will house up to 2 billion rows in it. There are 2 partition schemes that are partition by month one for index and one for data (48 file groups per partition scheme). Also if you know a very indepth book on indexes and partitioning I wouldn't mind knowing about it.

The rows are NOT inserted sequentially by date inserts are fairly random for the dates going in, so I don't know if I should be avoiding clustered indexes for that reason.
Should I be using a non alligned file group just to put the primary key in and just use my unique index to partition the data and avoid aligning it.
There will be NO switch statemetns used I may want to do splits and merges in the future.
There are updates that hapen with ( only ID = somenumber) specified in the where clause. Most all select statements are narrowed by date.

Does anyone have any suggestions of how to make this table operate Fast but with the ability to handle index maintenance.

Code Snippet:

         
           
             1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:

           
           
             CREATE TABLE [dbo].[sometable](
      [ID] [bigint] IDENTITY(1,1) NOT NULL,
      DATE_TIME  smalldatetime NOT NULL,
      DATA1 varchar(10),
      DATA2  int,
      ... a bunch more columns,
 CONSTRAINT [PK] PRIMARY KEY NONCLUSTERED 
(
	[ID] ASC,
	[DATE_TIME] ASC
)ON  INDEX_PARTITION_SCHEME]([DATE_TIME]),
 CONSTRAINT [ix_UNIQUE] UNIQUE NONCLUSTERED 
(
      [DATE_TIME] ASC,
       Data2 ASC,
       Data3 ASC
) ON [INDEX_SMALLDATE_PARTITION_SCHEME]([DATE_TIME])
)ON DATA_SMALLDATE_PARTITION_SCHEME]([DATE_TIME])

           
         

Open in New Window Select All

Answer : Complex SQL Partition and Index creation ON very large tables

Sorry if I wasn't clear enough.

I didn't say to keep a unique ID between 50+ tables nor to partition by the ID column. The only role of the ID, if is in there anyway, would be to use it as a clustered primary key. You won't be using it to join or anything but only for the sake of data order. Remember, the clustered index/key determines the phisycal row order on hard drive and that's why it is recomended to always grow with any insert. If you can't garantee that with your actual data columns, an identity column set as clustered primary key would take care of the problem. A table HAS to have a unique key in order to work properly and efficient. However, if you can come up with another unique and always growing key than the identity column (can be a comnbination of columns) , which is unlikely, you could use that as primary clustered and you won't be needing the ID column.

In the code you posted you created unique and nonclustered composed index as PK on ID+DATE_TIME. You only used ID identity column here to ensure unicity to your PK. You could have made it clustered as it would always increase regardless of the date column. The problem is that in order to achieve this you don't need DATE_TIME column as ID is already unique AND always growing. If a key is unique and always growing than is the perfect candidate for a clustered PK. The point is that the DATE_TIME column in your PK is superfluos, not needed.

On the other hand you will HAVE TO create another index on your tables but not clustered and not uniquie on the DATE_TIME column, which I assume would be used in the partinoning check constraint. Depending on this column your data will belong to one of the 50+ tables. Also based on DATE_TIME column you will then build the partitioned view.

In conclusion the primary key issue that has nothing to do with the partitioning and if you want to solve the problem smoothly without having to write alambicated code to take care of the data partitioned in different tables that the partitioned view is the way to go as it would take care of all underground work and it will look to you as a regular table.