increasing the performance of count(*)
I got a very strange request: We have to constantly count the rows in a table can we increase the performance of count(*)
So lets analyze the problem step by step.
The table is a typical table the actual columns have no effect so I replaced them with just one column in the example to make things easier; Let's start by creating a heap
CREATE TABLE MyTable
(
ID intNOTNULLIDENTITY(1,1),
Payload varchar(300)NOTNULL
)
and fill it with random data
DECLARE @i ASint= 1;
WHILE (@i < 1000)
BEGIN
INSERTINTO MyTable(Payload) VALUES (REPLICATE('ABC', 20 + @i % 80));
SET @i += 1;
END
Now lets see where we are on a heap
STATISTICSIOON
SELECT COUNT(*)FROM MyTable
Table 'MyTable'. Scan count 1, logical reads 25, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
To count the rows SQL Server must do a full table scan and reads each page exactly once.
Let's see if an index will help us…
CREATE CLUSTEREDINDEX CI_MyTable ON MyTable(ID); SELECT COUNT(*)FROM MyTable;
Table 'MyTable'. Scan count 1, logical reads 27, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server now performs a clustered index scan which is in fact a table scan plus the overhead to find the first page. Actually a little bit more work than the table scan.
The trick to get it faster is to create an index that uses less space that the table (SQL Server will always select the smalest index to execute a COUNT(*))
The Smallest INDEX you can create on a table that has a clustered index is a secondary index that is composed just out of the clustered key:
CREATE INDEX IX_ID ON MyTable(ID);
SELECT COUNT(*)FROM MyTable;
Table 'MyTable'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Now it just has to Scan the leaf nodes of the secondary index
Careful this kind of index has very limited use besides the example above.
Other uses would be for example: SELECT xyz WHERE uvw IN (SELECT ID FROM MyTable)
| Print article | This entry was posted by Thomas Grohser on 2011-01-26 at 13:14:00 . Follow any responses to this post through RSS 2.0. |

2011-01-26 @ 14:22:49
If I were to perform a COUNT(*) constantly on a large table I would maybe revise this strategy and question the requirement at all. Even with a tailored index just to support that query, the actual work still has to be carried out by SQL Server and I wouldn't be surprised when the query still runs like a dog.
However, if accuracy of the COUNT(*) isn't important at all, say for example, if you want to use this for some kind of paging, or track growth over time, or other stuff where you can live with a more or less "good approximation", it might be an option to get the row_count from the system tables such as sys.partitions.
Of course, with the usual caveat, that system tables can change over time...
2011-01-26 @ 15:53:15
The actual problem was in the size of about 0 to 2000 rows, so not a huge table, the exact count was not 100% relevant, we thought about query in the system tables too but we found out that the actual work for sql server in this size of table was smaller letting him count than finding the right object, the coresponding partitions and then reading the result.
Sure there is overhead for the extra index but the counting was done much more often then inserts.
for large tables I totaly agree the system tables are the much better way to go
2011-01-26 @ 20:33:36
LOL.
Agreed, in that table dimensions I guess it is really not worth it departing from the "standard" way of doing things. :-)
2011-02-24 @ 19:25:46
An indexed view is another choice.
With 1100000 rows in MyTable a select makes 957 logical reads on my system using the IX_ID index.
A select on MyView (code follows) makes 2 logical reads
create view dbo.MyView
with schemabinding
as
select
count_big(*) as cnt
from dbo.MyTable
go
create unique clustered index cuidx on MyView(Cnt)
go
select cnt from MyView with (noexpand)
It is interessting the noexpand hint is necessary with more then 1000000 rows in MyTable on my system... (with Express Editions you need this hint)
And, the inserts are faster without the IX_ID Index. The update of the indexed view costs almost nothing. To check this I used the number function from Steve Kass (http://stevekass.com/2006/06/03/how-to-generate-a-sequence-on-the-fly/) and this statement:
insert into MyTable(Payload)
select replicate('ABC', 100)
from dbo.numbers(1, 100000)
Please check the plan. Maybe I have overseen something.