SQL Server Storage Engine under the hood: How SQL Server performs I/O

I will present this session on September 6th 2011 in Dublin Ireland at 6:30pm (None Irish audience is welcome too, I promise to finnish in time for a pint of Guinnes :-)

How does SQL Server really access the data and log files on the disk? What's done asynchronous and what's synchronous and what impact does that have on our queries and DML statements? How do you specify IO requirement and what are they to begin with?
What I/O characteristics do different access methods, workloads and maintenance task have?
What impact do features like database mirroring and replication have? What changes have occurred to the SQL Server startup?
Which I/O Subsystems (including SSD) are best for which type of data and what's the impact if you choose a different one? Which files can be placed on the same volume and which should not?
Thomas H. Grohser is a SQL Server MVP (SQL Server Performance Engineering). In the last 17 of his more than 24 years as an IT Professional, he spent the majority of the time exploring the deeper inner workings of SQL Server and its features while working for entertainment , pharmaceutical and other industry leading companies. The main focus of his work is to architect, plan and provide a reliable, highly available, secure and scalable infrastructure for and with the features of SQL Server. This covers all angles from hardware, network, storage systems, Operating systems, SQL Server configuration and the Data Access Layer of applications;
Currently, he works at one of the largest SQL Server installations in the world processing hundreds of millions of transactions daily while moving peta bytes of data in a 24x7 environment ;
Now its time to share all the findings and the gathered know how with the SQL Server community therefore he started speaking regularly at international conferences in North America and Europe and at local events and work closely with SQLCAT to make sure everyone interested can access the information.
Location :
Ground Floor, Microsoft Atrium Building Block B, Carmenhall Road, Sandyford Industrial Estate, Dublin 18 (close to the Sandyford Luas stop). Limited underground parking is available at the back of the building

Registration :
The event is free to all. However, registration is mandatory for safety and security reasons.
When: Tuesday, Sept 6th at 18:30
Where: Auditorium, Microsoft Atrium
How: http://www.mtug.ie/Home/tabid/38/ctl/Details/Mid/369/ItemID/91/Default.aspx?ContainerSrc=[G]Containers/_default/No+Container

Found prove that SQL Server is the Devils Engine - ERROR 666

Msg 666, Level 16, State 2, Line 1
The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID 1234567890.
Dropping and re-creating the index may resolve this; otherwise, use another clustering key.

It all started a few hours ago with the discussion on how Sql Server uses uniqueifiers to handle none unique clustered index keys.
After consulting SQL Server Internals books the answer was that SQL Server adds 4 extra bytes per row. So naturally the next question came up: signed or unsigned integer in the 4 bytes... No answer to that question so a test was mandadory

I will not show the source because nobody should do that. But i will dicribe the process: Create a table with one tinyint column
create a clusterd index on the column (do not specify unique). Insert a row with the value 1. Repeat the last step 214783647 times and enjoy the error message above and it's a signed int...

The uniqueifier is per value so even after you could no longer insert a value of 1 into the table you could insert 2 billion and change records with the values 2 and so on.

Filtered Index with WHERE col IN (...) clause

I was asked if it is possible to create an filtered index with a WHERE col IN (…) clause and then use WHERE col = n with n being one of the values of the (…) list

So let’s create a small sample

create table idx
(
ID int NOT NULL IDENTITY(1,1) primary key clustered,
Data int
)
insert into idx (Data) values (1)
go 100
insert into idx (Data) values (2)
go 2
insert into idx (Data) values (3)
go 2

create index idx1 on idx(ID)
where Data in (2,3)

select ID from idx where Data in (2, 3) -- Index is used

select ID from idx where Data = 2 -- Table scan

select ID from idx where Data in (2, 3) AND Data = 2 -- Table scan

-- But that’s not because SQL is not smart we created the index wrong

drop index idx1 on idx

create index idx1 on idx(ID)
INCLUDE (Data) -- if the data we filter on is not in the index SQL has to do a lookup on the table
where Data in (2,3) -- and therefore chooses to make an full table scan

select ID from idx where Data in (2, 3) -- Index is used

select ID from idx where Data = 2 -- Index is used !!!! THATS what we wanted

select ID from idx where Data = 1 -- Table scan as expected

Just deployed my first SQL Azure database.

Quiet an adventure as with all IT related things: SQL Azure is almost like SQL Server. Emphasis on the almost. But after a little fighting I got it running. Time will tell us how well it is going to run.

My tip to all is to have a deployment script that only contains what you realy need and not a generated ones that specifies each and every default option. Why: SQL Azure does not know many of the default options and therefore the script fails in that case.

 

How to assign a text containing a single quote to a char, varchar, nchar or nvarchar valiable or colum

Sometimes the need to assign a text like "She's the boss" to a text column or variable arises.

DECLARE @MyText AS NVARCHAR(100) = 'She's the boss';

You can already see on the syntax coloring that something is wrong. To put a single quote inside a string literal you must precede it with another single quote

DECLARE @MyText AS NVARCHAR(100) = 'She''s the boss';

 If you run the whole batch below

DECLARE @MyText AS NVARCHAR(100) = 'She''s the boss';
SELECT @MyText;

You will see it produces the desired result:  She's the boss

To give you another example the text ‘I am between single quotes’ would look like this

DECLARE @MyText AS NVARCHAR(100) = '''I am between single quotes''';
SELECT @MyText;

increasing the performance of count(*)

 

I got a very strange request: We have to constantly count the rows in a table can we increase the performance of count(*)

So lets analyze the problem step by step.

The table is a typical table the actual columns have no effect so I replaced them with just one column in the example to make things easier; Let's start by creating a heap

 

CREATE TABLE  MyTable

(

ID intNOTNULLIDENTITY(1,1),

Payload varchar(300)NOTNULL

)

 

and fill it with random data

 

DECLARE @i ASint= 1;

WHILE (@i < 1000)

BEGIN

INSERTINTO MyTable(Payload) VALUES (REPLICATE('ABC', 20 + @i % 80));

SET @i += 1;

END

 

Now lets see where we are on a heap

 

STATISTICSIOON

 

SELECT COUNT(*)FROM  MyTable

 

Table 'MyTable'. Scan count 1, logical reads 25, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

To count the rows SQL Server must do a full table scan and reads each page exactly once.

Let's see if an index will help us…

 

CREATE  CLUSTEREDINDEX CI_MyTable ON MyTable(ID); SELECT COUNT(*)FROM MyTable;

 

Table 'MyTable'. Scan count 1, logical reads 27, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

SQL Server now performs a clustered index scan which is in fact a table scan plus the overhead to find the first page. Actually a little bit more work than the table scan.

The trick to get it faster is to create an index that uses less space that the table (SQL Server will always select the smalest index to execute a COUNT(*))

 

The Smallest INDEX you can create on a table that has a clustered index is a secondary index that is composed just out of  the clustered key:

 

CREATE INDEX IX_ID ON MyTable(ID);

 

SELECT COUNT(*)FROM MyTable;

 

 

Table 'MyTable'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

Now it just has to Scan the leaf nodes of the secondary index

 

Careful this kind of index has very limited use besides the example above.

Other uses would be for example: SELECT xyz WHERE uvw  IN (SELECT ID FROM MyTable)

Useless for most people

Hi fellow SQL Server users, I created a new category that I will use to mark posts that are useless for the greater part of the population.

This are either tips and tricks that define a very uncommon special case or are just for very very high end solutions and the practical application is limited. So watch out for the category and don't be surprised when you read a post marked like this when the content does not apply to you.

But on the other hand there will be a bunch of  SQL Server users that are just looking for that information

 

A little bit more information on multi location backups

On of the advantages of backing up to multiple locations is that the data is read on the SQL Server just once, this is an advantage because the IO on the database files is kept to a minimum (in comparison on doing two backups one after the other). Then the data is sent to all destenations in parallel this results in the fact that the backup will take as long as is takes to write to the slowest target location.

You can backup full, differential and log backups to multtiple locations the MIRROR TO option is build in into all of them

I do not backup full backups to two locations (you can alsways take the backup before the last to start your restore if you lose the last one) but I backup my logs to two locations to reduce the risk of a broken log chain by a missing or corrupt file.

If you are really paranoid you should write the data trough two different network cards from different manufactuers with different drivers to avoid data corruption by a faulty driver and then have a constant verify by restore on all destinations.

Backup to multiple locations at the same time (multiple copies of the backup files)

Sometimes it can be usefull to have two ore more copies of the same backup (e.g. you have two locations/disks and want a copy in/on both)

 

So lets explain how to backup to two locations at once. First the bad news this is an Enterprise (SQL Server 2005, 2008 and 2008R2) and Datacenter Edition (SQL Server 2008R2) only feature.

 

If the backup to one location looks like this

BACKUP DATABASE MyDatabase
     
TO DISK = 'D:\BACKUP\MyDatabase.bak'

 

Backing up to two locations  looks like this

 

BACKUP DATABASE tg
      TO DISK = 'D:\BACKUP\MyDatabase.bak'
      MIRROR
      TO DISK = 'E:\BACKUP\MyDBBackup2.bak'
      WITH FORMAT

The WITH FORMAT is required even if the backup files do not exist

 

You can go wild and backup to up to 8 locations at the same time.

BACKUP DATABASE tg
      TO DISK = 'D:\BACKUP\MyDatabase.bak
      MIRROR
      TO DISK = 'E:\BACKUP\MyDBBackup2.bak'
      MIRROR
      TO DISK = 'E:\BACKUP\MyDBBackup3.bak'
      MIRROR
      TO DISK = 'E:\BACKUP\MyDBBackup4.bak'
      MIRROR
      TO DISK = 'E:\BACKUP\MyDBBackup5.bak'
      MIRROR
      TO DISK = 'E:\BACKUP\MyDBBackup6.bak'
      MIRROR
      TO DISK = 'E:\BACKUP\MyDBBackup7.bak'
      MIRROR
     
TO DISK = 'E:\BACKUP\MyDBBackup8.bak'
      WITH FORMAT

 

I would not recommend to backup to 8 locations thats because the feature does not work as you might expect. It works like any other transaction in the database all or nothing. This means if one of your locations is unavailable the backup does not start and if one location fails during the backup the whole backup operation fails.

Hello World

Hello fellow SQL Server users.

I have spent half my life working with SQL Server and accumulated a lot of knowlege in this past 20 years. I started presenting on several conferences over the last few years and more and more people asked me to start blogging. This is what I am going to do here.

My focus is on building reliable, avalable and scaleable infrastructures for SQL Server and this is also the main focus of this blog but you will see the ocasional general tip entry as well.

If someone has specific questions feel free to ask them I will do my best to answer them

=tg=