How to Think like SQL Server Engine

Iniziamo. È gratuito!
o registrati con il tuo indirizzo email
How to Think like SQL Server Engine da Mind Map: How to Think like SQL Server Engine

1. Storage

1.1. Generally, SQL Server stores all tables as 8K pages

1.1.1. think of a table as a piece of A4 paper with data printed on it representing the 8K page

1.1.2. think of a table as a stack of A4 pages one on top of the other in sequence

1.1.3. note that clustered column store is a bit different

1.1.4. SET STATISTICS IO ON number of logical reads is the number of 8K pages read to complete query

1.1.5. performance tuning generally involves reducing number of page reads to bare minimum

1.2. Clustered index

1.2.1. based on column, or sequence of columns determines the sequence of the 8K pages

1.2.2. index doesn't just include the indexed columns, it includes all the other columns of the table exception is large type columns, such as nvarchar(max), which can be too big to store on same 8K page in this case, the clustered index will hold a pointer to another 8K page for retrieving the large type column(s) this is known as off-road data

1.2.3. Query hint WITH (INDEX = 1) on FROM (or JOIN) clause after the table name, forces SQL Server to use the clustered index Index 1 is always the clustered index (assuming table has one) useful when comparing performance of clustered index against another nonclustered index designed to be more optimal for a given query

1.3. Nonclustered index

1.3.1. stores only the columns included in the index far more rows can be stored on a single 8K page greatly reduces logical reads

1.3.2. every insert, update and delete on table becomes more expensive as relevant data must be written to every affected index

1.3.3. is covering index for queries that include columns wholly included in index

1.4. Indexation general rule of thumb

1.4.1. Brent Ozar recommends 5 and 5 general rule each table should have 5 or less indexes each index should have 5 or less columns

2. Execution plans

2.1. read these from right to left

2.1.1. every icon in plan is known as an operator each operator is self contained application that takes inputs and produces outputs when you hove mouse over it, you see stats about the operation

2.1.2. think of arrows as data pipelines when you select and hover mouse over, it displays info about data that moved through pipeline

2.1.3. to see how expensive a query was, hover mouse over top left icon (e.g. SELECT operator) Estimated subtree cost value is given in "query bucks" just an estimate made by engine before the query started when you hover over an operator, you will see estimated I/O cost and estimated CPU cost, where combination of both is represented as estimated operator cost

2.1.4. when estimated operator cost exceeds a configured threshold, SQL Server will enlist "helpers" to complete the operation work is divided up between multiple threads to run in parallel right click operator and choose Properties Actual I/O Statistics | Actual Logical Reads when engine has no indexes that help reduce logical reads and cost threshold for parallelism is met, this can result in some extra reads Parallelism (Gather Streams) icon represents combining of multiple outputs from multiple threads into a single operator with a single output Server Properties | Advanced Set Cost Threshold for Parallelism

2.2. Lots of factors affect the shape of execution plans, including available hardware config, SQL Server build no, trace flag settings, etc.

2.2.1. Expect to see different plans for same query on same copy of database across different environments

2.3. when query includes ORDER BY, plan includes Sort operation

2.3.1. Sort operations have high CPU cost, low I/O cost think of this as writing down a bunch of values in memory and then performing a shuffle sort operation on these SELECT * with ORDER BY can have a spectacular increase in query cost due to need to "write down" (in memory) all the row data, including fetching off-page data for large type columns

2.4. Index Seek operation takes advantage of index sort order to jump directly to specific pages containing the queried rows

2.5. when there are no covering indexes available for a query, engine must decide which index(es) to use for optimal query resolution

2.5.1. Index Seek (NonClustered) + Key Lookup (Clustered) feeding into Nested Loops operation only chosen by engine when the number of rows estimated for return via the Index Seek is a sufficiently low proportion of total table rows tipping point is low, engine will often choose Clustered Index Scan as a more optimal approach

2.5.2. Engine must make these decisions without reading the table(s) DBCC FREEPROCCACHE tells engine to forget what it knows about previous query execution plans be cautious about doing this in Production for every index, SQL Server creates a statistics object with same name engine queries statistics to gain estimates based on histograms (stats grouped in distribution buckets) DBCC SHOW_STATISTICS('schema.table', 'index_name') the estimates of an execution plan are fundamentally based on the statistics

2.6. using scalar functions in WHERE (or JOIN) clause can have a big impact

2.6.1. example: two approaches to get rows for a date WHERE LastAccessDate BETWEEN '2018-08-31' AND '2018-09-01' spoon feeds engine what it needs to know and row estimates are accurate (assuming statistics are up to date) WHERE CAST(LastAccessDate AS DATE) = '2018-08-31' engine typically under estimates rows that satisfy predicate, which can dramatically alter execution plan

3. Repeated queries

3.1. when running same query repeatedly in short period of time, SQL Server caches pages for logical reads but does not cache the products of CPU heavy operations

3.1.1. example being Sort operator expensive writing of data into memory points, combined with shuffle sort operations is repeated every time

3.1.2. only solution to this at the moment is 3rd party caching solutions