Why table can have only one clustered index? No it is not what you think.

Recently, I have been on an interview spree to get a DBA on board in our team. And this was one of the question I asked to nearly everyone. No, not only because this is my favorite question but also because it is a one of the litmus test that help me estimate how clear a DBA is about the core concepts of RDBMS.


Problem

How many clustered indexes can table have and why? Or. Why a table can have only one clustered index?

Below is the most common answer I got for these question:

“Because we can have only one primary key on the table”.

Well, that answer is a big NO. Yes, NO THAT IS NOT THE CORRECT ANSWER.

Surprised, well I was too when I use to hear that answer. Yes, it is correct we can’t have more than one primary key on a table but that is due to its definition and functional requirement. And that’s why by default SQL Server provides functionality of creating a clustered index along with it. My obvious next question to them was “Can we create a primary key without a clustered index?”. Again, unfortunately most answered that as “No, we can’t”, because they didn’t know that these two are separate entities because most never got chance to explore beyond the default SQL Server behavior.

Another most common answer was that because in clustered index, data is physically sorted by column values but not in non-clustered index. Well that is a close one but here too, most were unaware or unable to explain what physical sorting exactly meant.

Due to this I decided to put up an article to clearly show why we can’t have more than one clustered index. And before we move please note that for both clustered and non-clustered indexes, we have B-tree for sorting values. So let us begin.


Solution

For this test I’m going to create a similar table which I did in this video too, where I also showed that we can create a primary key without a clustered index. Anyways below is the table definition.


CREATE TABLE [dbo].[Test_Table_1]

(

       [id] [int] NOT NULL,

       [info] [char](100) NOT NULL

) ON [PRIMARY]

GO


Now let us insert 100K records into this table.


SET NOCOUNT ON;

GO 

DECLARE @i int = 1; 

WHILE @i <= 100000

BEGIN 

       INSERT INTO Test_Table_1 VALUES

       (@i,'abc'+CONVERT(char(997),@i)); 

       SET @i = @i + 1; 

END

 

Now, let us see how many pages get allocated to this table.


Heap

So, as you can see we have only one row with index id as 0, which is for HEAP and we have 1450 data pages allocated to it. Now let us first create a primary key with non-clustered index. Yes, you read that right, with a non-clustered index.


ALTER TABLE [dbo].[Test_Table_1] ADD CONSTRAINT [PK_Test_Table_1] PRIMARY KEY NONCLUSTERED

(

       [id] ASC

)

Now, let us see how allocation of pages for this table looks like:

Heap with PK non-clustered

So, as you can see, we can create a primary key without a clustered index by explicitly using keyword “NONCLUSTERED” while creating the primary key. This will add a primary key with a (no wins for guesses) a non-clustered index. And as you can see that all of our data pages (1450) are still allocated with heap itself and we got 227 more pages with our primary key which is holding its B-tree structure.

Ok, now let us create clustered index on the same “id” column of this table. Please note that for this testing and show purpose only I’m creating a clustered index on the same primary key column, however I strongly recommend not to do this in your production system please.

CREATE CLUSTERED INDEX [CIX_Test_Table_1] on [dbo].[Test_Table_1]

(

       [id] ASC

)

Now, let us again see how the allocation of pages for this table looks now:

Clustered Index

As you can see the entry for heap (index id 0) now is gone and we’ll see 1412 pages allocated with clustered index instead. This is what we meant by physical sorting.

What exactly happened here is that all of the data pages related to this table gets deallocated and each row now gets sorted and aligned at leaf node of the clustered index. It means that your clustered index not only sorted the data as per the “id” column’s values but the clustered index basically is your table itself now (in a sorted order that is).

So, if the clustered index is the table itself than obviously we can’t have two (or more) clustered indexes because that wouldn’t be physically and/or logically possible unless any system can keep creating copy of the whole table, which as you can see doesn’t make sense.

This is it guys, the mystery is finally solved. One should strive to keep exploring their prime technology, which eventually makes us a better engineers and such conceptual clarity eventually help us solving many complex problems in future.



Comments

Popular posts from this blog

SQL Server: SQL Server services not starting. TDSSNIClient initialization failed with error 0x139f, status code 0x80. Reason: Unable to initialize SSL support.

SQL Server 2014 SP\CU installation getting stuck at “MsiTimingAction”

SQL Server: Cluster Installation failed with error “Wait on the Database Engine recovery handle failed.”