Why table can have only one clustered index? No it is not what you think.
Recently, I have been on an interview spree to get a DBA on board in our team. And this was one of the question I asked to nearly everyone. No, not only because this is my favorite question but also because it is a one of the litmus test that help me estimate how clear a DBA is about the core concepts of RDBMS.
Problem
How many clustered indexes can
table have and why? Or. Why a table can have only one clustered index?
Below is the most common answer I
got for these question:
“Because we can have only one
primary key on the table”.
Well, that answer is a big NO. Yes,
NO THAT IS NOT THE CORRECT ANSWER.
Surprised, well I was too when I
use to hear that answer. Yes, it is correct we can’t have more than one primary
key on a table but that is due to its definition and functional requirement.
And that’s why by default SQL Server
provides functionality of creating a clustered index along with it. My obvious
next question to them was “Can we create a primary key without a clustered
index?”. Again, unfortunately most answered that as “No, we can’t”, because they
didn’t know that these two are separate entities because most never got chance
to explore beyond the default SQL Server behavior.
Another most common answer was
that because in clustered index, data is physically sorted by column values but
not in non-clustered index. Well that is a close one but here too, most were
unaware or unable to explain what physical sorting exactly meant.
Due to this I decided to put up
an article to clearly show why we can’t have more than one clustered index. And
before we move please note that for both clustered and non-clustered indexes, we
have B-tree for sorting values. So let us begin.
Solution
For this test I’m going to create
a similar table which I did in this video too, where I also showed that we can
create a primary key without a clustered index. Anyways below is the table
definition.
CREATE TABLE [dbo].[Test_Table_1]
(
[id] [int] NOT NULL,
[info] [char](100) NOT NULL
) ON [PRIMARY]
GO
Now let us insert 100K records
into this table.
SET NOCOUNT ON;
GO
DECLARE @i int = 1;
WHILE @i <= 100000
BEGIN
INSERT INTO
Test_Table_1 VALUES
(@i,'abc'+CONVERT(char(997),@i));
SET @i = @i + 1;
END
Now, let us see how many pages get allocated to this table.
So, as you can see we have only one row with index id as 0, which is for HEAP and we have 1450 data pages allocated to it. Now let us first create a primary key with non-clustered index. Yes, you read that right, with a non-clustered index.
ALTER TABLE [dbo].[Test_Table_1] ADD CONSTRAINT [PK_Test_Table_1] PRIMARY KEY NONCLUSTERED
(
[id] ASC
)
Now, let us see how allocation of pages for this table looks like:
Ok, now let us create clustered
index on the same “id” column of this table. Please note that for this testing
and show purpose only I’m creating a clustered index on the same primary key column,
however I strongly recommend not to do this in your production system
please.
CREATE CLUSTERED INDEX
[CIX_Test_Table_1] on [dbo].[Test_Table_1]
(
[id] ASC
)
Now, let us again see how the allocation of pages for this table looks now:
What exactly happened here is
that all of the data pages related to this table gets deallocated and each row
now gets sorted and aligned at leaf node of the clustered index. It means that your
clustered index not only sorted the data as per the “id” column’s values but the clustered index basically is your table itself now (in a sorted order that is).
So, if the clustered index is the
table itself than obviously we can’t have two (or more) clustered indexes because that
wouldn’t be physically and/or logically possible unless any system can keep
creating copy of the whole table, which as you can see doesn’t make sense.
This is it guys, the mystery
is finally solved. One should strive to keep exploring their prime technology,
which eventually makes us a better engineers and such conceptual clarity
eventually help us solving many complex problems in future.
Comments
Post a Comment