increasing the performance of count(*)

 

I got a very strange request: We have to constantly count the rows in a table can we increase the performance of count(*)

So lets analyze the problem step by step.

The table is a typical table the actual columns have no effect so I replaced them with just one column in the example to make things easier; Let's start by creating a heap

 

CREATE TABLE  MyTable

(

ID intNOTNULLIDENTITY(1,1),

Payload varchar(300)NOTNULL

)

 

and fill it with random data

 

DECLARE @i ASint= 1;

WHILE (@i < 1000)

BEGIN

INSERTINTO MyTable(Payload) VALUES (REPLICATE('ABC', 20 + @i % 80));

SET @i += 1;

END

 

Now lets see where we are on a heap

 

STATISTICSIOON

 

SELECT COUNT(*)FROM  MyTable

 

Table 'MyTable'. Scan count 1, logical reads 25, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

To count the rows SQL Server must do a full table scan and reads each page exactly once.

Let's see if an index will help us…

 

CREATE  CLUSTEREDINDEX CI_MyTable ON MyTable(ID); SELECT COUNT(*)FROM MyTable;

 

Table 'MyTable'. Scan count 1, logical reads 27, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

SQL Server now performs a clustered index scan which is in fact a table scan plus the overhead to find the first page. Actually a little bit more work than the table scan.

The trick to get it faster is to create an index that uses less space that the table (SQL Server will always select the smalest index to execute a COUNT(*))

 

The Smallest INDEX you can create on a table that has a clustered index is a secondary index that is composed just out of  the clustered key:

 

CREATE INDEX IX_ID ON MyTable(ID);

 

SELECT COUNT(*)FROM MyTable;

 

 

Table 'MyTable'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

Now it just has to Scan the leaf nodes of the secondary index

 

Careful this kind of index has very limited use besides the example above.

Other uses would be for example: SELECT xyz WHERE uvw  IN (SELECT ID FROM MyTable)

  • 5 stars
    Comment from: Frank Kalis
    2011-01-26 @ 14:22:49

    If I were to perform a COUNT(*) constantly on a large table I would maybe revise this strategy and question the requirement at all. Even with a tailored index just to support that query, the actual work still has to be carried out by SQL Server and I wouldn't be surprised when the query still runs like a dog.

    However, if accuracy of the COUNT(*) isn't important at all, say for example, if you want to use this for some kind of paging, or track growth over time, or other stuff where you can live with a more or less "good approximation", it might be an option to get the row_count from the system tables such as sys.partitions.
    Of course, with the usual caveat, that system tables can change over time...

  • 5 stars
    Comment from: =tg=
    2011-01-26 @ 15:53:15

    The actual problem was in the size of about 0 to 2000 rows, so not a huge table, the exact count was not 100% relevant, we thought about query in the system tables too but we found out that the actual work for sql server in this size of table was smaller letting him count than finding the right object, the coresponding partitions and then reading the result.
    Sure there is overhead for the extra index but the counting was done much more often then inserts.

    for large tables I totaly agree the system tables are the much better way to go

  • 5 stars
    Comment from: Frank Kalis
    2011-01-26 @ 20:33:36

    LOL.

    Agreed, in that table dimensions I guess it is really not worth it departing from the "standard" way of doing things. :-)

  • Christoph Ingenhaag
    Comment from: Christoph Ingenhaag
    2011-02-24 @ 19:25:46

    An indexed view is another choice.

    With 1100000 rows in MyTable a select makes 957 logical reads on my system using the IX_ID index.
    A select on MyView (code follows) makes 2 logical reads

    create view dbo.MyView
    with schemabinding
    as
    select
    count_big(*) as cnt
    from dbo.MyTable
    go

    create unique clustered index cuidx on MyView(Cnt)
    go

    select cnt from MyView with (noexpand)

    It is interessting the noexpand hint is necessary with more then 1000000 rows in MyTable on my system... (with Express Editions you need this hint)

    And, the inserts are faster without the IX_ID Index. The update of the indexed view costs almost nothing. To check this I used the number function from Steve Kass (http://stevekass.com/2006/06/03/how-to-generate-a-sequence-on-the-fly/) and this statement:

    insert into MyTable(Payload)
    select replicate('ABC', 100)
    from dbo.numbers(1, 100000)

    Please check the plan. Maybe I have overseen something.

Leave a comment

You must be logged in to leave a comment. Log in now!

If you have no account yet, you can register now...
(It only takes a few seconds!)