When I'm using profiler to isolate performance issues, I try to configure server side traces, whenever possible. Fortunately, SQL Server Profiler will help you creating a script for a server side trace (File/Export/Script Trace Definition), so you don't have to figure out all the event- and column-codes. Very good!
As I was doing the same configuration again and again, I decided to separate the TSQL code for the configuration inside a stored procedure.
And here comes dbo.configureServerSideTrace:
if object_id('dbo.configureServerSideTrace', 'P') is not null |
Some annotations:
declare @traceID int |
Cheers.
]]>Whether compression is worth or not isn't an easy question to answer. One aspect that has to be taken into account is certainly the amount of storage that may be saved by storing a distinct table or index in any of the two compressed formats. SSMS offers a Data Compression Wizard than can provide storage-saving estimates for row or page level compression. From the context menu for a table or index just open Storage/Manage Compression. In the Combo box at the top select the compression type (Row or Page) and press the Calculate Button at the bottom. Here's a sample of a calculated saving for an index:
Unfortunately, SSMS does not offer an option for calculating estimated savings for more than one table or index at once. If you, let's say, would like to know the estimated storage savings of page level compression for you largest 10 tables, there's no GUI support in SSMS that will assist you in finding an answer. This is, where the stored procedure sp_estimate_data_compression_savings comes in handy. This procedure - as you may have guessed from its name - provides estimated savings for row or page level compression for any table or index. You have to provide the table or index as a parameter to the procedure. In other words: The procedure will only calculate the estimations for one table or index at a time. If you want to retrieve the calculations of more than one table or index as a result set, there's some more work to do, since the procedure has to be invoked multiple times. Here's a script that calculates the estimated savings of page level compression for the database in context.
-- Determine the estimated impact of compression |
The script calculates the impact of Page level compression but may easily be adapted to consider Row level compression instead. Please read the comments inside the script. Also, please notice that the script will only run on SQL Server Enterprise and Developer edition. All other editions don't provide the opportunity for data compression.
Here's a partial result retrieved from running the script against the AdventureWorksDW2008R2 database.
If you execute the script, please be aware that it may produce some extensive I/O. Running the script against your production database at business hours wouldn't be a very good idea therefore.
]]>
select create_date |
Another answer to the question, I've seen sometimes on the internet queries the login time for any of the system processes:
select login_time |
This was my preferred way - until yesterday, when I discovered the following simple method:
select sqlserver_start_time |
Easy, isn't it? Interestingly though, every of the above three queries yields a different result. Here's a query with a sample output:
select (select sqlserver_start_time |
Result:
It seems the SQL Server service must be started first. Only after the service is running, tempdb is created followed by a subsequent start of all sysprocesses. I can't imagine that the diverse three times make any difference in practice, e.g. if you try finding out for how many hours your SQL Server instance is running. But out there may be existing applications that have to be aware of the difference.
]]>You may download the presentation as well as the corresponding scripts here (German only).
]]>You probably agree with this well known word of wisdom, don't you?
This week I had to learn another aspect - the hard way: There are many cases that may require a restore. Only one of those cases is recovering a database from a state of failure. Another situation may require access to legacy data of a meanwhile deactivated database, a database that doesn't exist anymore. In this case only a successful restore from an older backup, although necessary, may not be sufficient. Here's the story, why.
I was called by a customer who requested me to pull out two documents from a legacy (SQL Server 2000) database. We found an 8 year old backup we could rely on and restore worked well. Great! But then we discovered the documents had been stored in an IMAGE column and nobody had an idea how it had been encoded at the time it was stored. We soon recognized that we need the original application to get access to the documents, only to recognize that nobody had an idea where to find the installation package. Eventually I could find an 8 year old backup which included the old VB6 code. I could have used this code to rebuild the application and also an installation package only with considerable difficulties, because:
I recognized that all those steps require a big effort and will take some days to accomplish. If only we had built a virtual machine of at least one legacy client system before replacing all client PCs by newer machines.
So, I'd modify the introductory statement like this:
A backup is worth nothing, if you can't utilize the data that is contained inside this backup.
]]>
-- create test table with 3 columns |
Before invoking the following SELECT command, allow the actual execution plan being displayed (Strg-M or selecting Query/Include Actual Execution Plan from the menu).
-- get some data |
If you look at the execution plan, you see that the optimizer complains about a missing index on column c2. The prognosticated improvement is about 99%, so adding the index it's certainly a good idea. Let's do it:
-- add missing index |
Perfect! The query runs much faster now and needs a lot fewer resources. But have a look at the table's statistics:
You see three statistics, one for the primary key, a second one for our created index ix_1, and a third one that was automatically created during execution plan generation for the first SELECT statement. This is the statistics, named _WA_Sys.. If the AUTO CREATE STATISTICS option is set to ON, the optimizer will add missing statistics automatically. In our little experiment, the optimizer had to generate this column statistics on column c2 in order to make some assumptions about the number of rows that had to be processed.
And here's the problem: When creating the index on column c2, a statistics on this column is also created, since every index has a corresponding linked statistics. That's just the way it works. At the time the index was added, the column statistics on c2 (that _WA_Sys. statistics) already existed. If you don't remove it manually, this statistics will remain there forever, although it is needless now. All it's good for is to increase maintenance efforts during statistics updates. You can safely remove this statistics by executing:
drop statistics t1._WA_Sys_... |
If you didn't think about this before, there's a chance that you'll find some of those superfluous statistics duplicates inside your database(s). Here's a query that finds index-related and column-statistics that match on the first column. Looking for matches on the first column is sufficient here, since the optimizer only automatically adds missing single-column statistics.
with all_stats as |
With that query at hand, you may easily find redundant statistics. Here's a sample output:
If you'd like to find out more about SQL Server statistics, you may want to check out my series of two articles, published recently on the Simple-Talk platform.
Part 1: Queries, Damned Queries and Statistics
Part 2: SQL Server Statistics: Problems and Solutions
It's free, so you might want to give it a try. Feel free to vote, if you like it!
]]>Delving into it, I could isolate the problem. It was a single SELECT statement that, when invoked, returned different results. The numbers differed by up to 20% in value without any data changes being performed!
Have a look at the following sample. We create a test table to demonstrate what I'm talking about:
use tempdb |
The table has three columns, where the third column only serves the purpose of filling up the row, so the table contains more data pages.
Let's now insert 600000 rows into our table:
declare @x float |
The first INSERT statement adds 300000 rows with positive values for the two columns floatVal and decimalVal. After that, we insert another 300000 rows, this time with inverse signs. So in total, values for each of the two columns should add up to zero. Let's check this by invoking the summation over all rows a few times:
select sum(floatVal) as SumFloatVal |
And here's the result:
As for the DECIMAL column, the outcome is as expected. But look at the totals for the FLOAT column. It's perfectly understandable, the sum will reveal some rounding errors. What really puzzled me is the difference between the numbers. Why isn't the rounding error the same for all executions?
I was pretty sure that I discovered a bug in SQL Server and posted a regarding item on MSFT's connect platform (see here).
Unfortunately nobody cared about my problem, and so I took the opportunity of talking to some fellows of the SQL Server CAT team on the occasion of the 2009 PASS Summit. After a while, I received an explanation which I'd like to repeat here.
The query is executed in parallel, as the plan reveals:
When summing up values, usually the summation sequence doesn't matter. (If you remember some mathematics from school that's what the commutative law of addition is about). Therefore, reading values in multiple threads and adding up the values in any arbitrary order is perfect, as the order doesn't have any influence on the result. Well, at least theoretically. When adding float values, there's floating point arithmetic rounding errors with every addition. These added-up rounding errors are the reason for the non-zero values of the float totals in our example. So that's ok, but why different results with almost every execution? The reason for this is parallel execution. Added-up rounding errors depend on the sequence, so the commutative law does not really apply to these errors. There's a chance that the sequence of rows changes with every execution, if the query is executed in parallel. And that's why the results change, dependent only on some butterfly wing movements at the other side of the world.
If we add the MAXDOP 1 query hint, only one thread is utilized and the results are the same for every execution, although rounding errors still remain present. So this query:
select sum(floatVal) as SumFloatVal |
will be executed by using the following (single thread) execution plan:
This time the result (and also the rounding error) is always the same.
Pretty soon after delivering the explanation, the bug was closed. Reason: the observed behavior is "by design".
I can understand that the problem originates from computer resp. processor architecture and MSFT has no chance of control therefore.
Although.
When using SSAS' write back functionality, SSAS will always create numeric columns of FLOAT data types. There's no chance of manipulating the data type; it's always float!
Additionally, SSAS more often than not inserts rows into write back tables with vastly large resp. small values. When looking an these rows, it appeared that they are created solely with the intention of summing up to zero. We discovered plenty of these rows containing inverse values that usually should nullify in total, but apparently don't. By the way that's why closing the bug with the "By Design" explanation makes me somewhat sad.
So, probably avoiding FLOATs is a good idea! Unfortunately, this is simply not possible in all cases and sometimes out of our control.
]]>I've seen the advice of taking a snapshot from a mirrored database for reporting purposes many times. Even books online contains a chapter Database Mirroring and Database Snapshots, where the scenario is explained.
The intention is to decouple your resource intensive reporting queries from the OLTP system which is generally a good idea. But keep in mind that:
So there's more than only a slightly a chance that query performance will suffer from missing statistics in your reporting snapshot, since the underlying OLTP database simply does not contain all statistics (or indexes) that your reporting applications could take advantage of. These missing statistics can't be added in the snapshot, because it's a read-only database. Additionally, you won't experience any automatic updates of stale statistics in the snapshot. And moreover, any added or updated statistics in the source database are not transferred into the snapshot, of course. This can affect your query performance significantly!
Have a look at the following example:
use master |
The script above creates a test database with one table and adds some rows to the table before creating a snapshot of this database. As we have never used column c2 in any query (besides the INSERT), there won't be any statistics for column t1.c2.
Now let's query the snapshot and have a look at the actual execution plan. Here's the query:
-- Be sure to show the actual execution plan |
Here's the actual execution plan:
Clearly evident that the optimizer detects a missing statistics, although the options AUTO CREATE STATISTICS and AUTO UPDATE STATISTICS have both been set to ON. The plan reveals a noticeable difference between the actual and estimated number of rows and also a warning regarding the missing statistics.
So, keep that in mind when using snapshots for reporting applications.
If you'd like to find out more about SQL Server statistics, you may want to check out my series of two articles, published recently on the Simple-Talk platform.
Part 1: Queries, Damned Queries and Statistics
Part 2: SQL Server Statistics: Problems and Solutions
It's free, so you might want to give it a try. Feel free to vote, if you like it!
]]>