SQL Server 2016 Performance And Scalability Improvements

Transcription

SQL Server 2016 Performance and ScalabilityImprovementsMarko HottiSr. Technical Product Manager / SQL ServerMicrosoft Corporationmarko.hotti@microsoft.com

Record-breaking performanceLeader in OLTP price/performanceLeader in DW performance & price/performance#1 in TPC-E price/perf (44 cores)#1 in TPC-H 30TB (144 cores) Windows Server 2016Windows Server 2012 R238% faster TPC-H 3TB (72 cores) Windows Server 2012 R2#2 in TPC-E price/perf (44 cores)#1 and 40% faster in TPC-H 1TB (44 cores) Windows Server 2012 R2Windows Server 2012 R2Performance gains just by upgradingFaster queriesFaster operationsFaster throughput3.6x faster19x faster10x faster7-10x faster7x faster15x faster3x faster20x faster2.7x fasterqueries Tablevalued parametersSpatial index/queries up to20% faster271x faster queries2.6x faster10x faster100x fasterDW queries onAlwaysOn replicasSpatial line string34x faster queriesSpatial nativeFunctionsDW querieswith new CEIn-memory ColumnstoreTPC results as of April 2016 http://www.tpc.org/default.aspIn-Memorytemp objectsAuto NUMApartitioningXEvent readerDBCC ghputAlwaysOnBatch requestper secGame-changingapp performancePROSGlobal ERP1000%700%faster scoringfaster queriesTableauKPMG190%250%faster queriesfaster execution

SQL Server 2016 & Windows Server 2016 Better TogetherUnparalleled scalabilitywith Windows Server 2016Massive scale for inmemory performanceSimple, flexible HAand DRNo domain join neededUnparalleled security12 TB of memoryScalabilityWS 2016 maxcoresFine-grained security controlsBuilt-in anti-malware

Microsoft Storage Spaces DirectHyper-V cluster with local storageWhat is Storage Spaces Direct?Evolution of Storage SpacesServers with local storageHighly available and scalableStorage for Hyper-V virtualization and privatecloudScalabilityWhy Storage Spaces Direct?New device typesLower-cost flash storage with SATA SSDsBetter flash performance with NVMe SSDsSimplicityEthernet/RDMA network as storage fabricNo need for complex multi-initiator fabricSeamless capacity and performance expansion

Support for Windows Server CoreWindows Server edition with smallest footprintReduced memory and disk requirementsFewer running processes and services: greater stabilitySimplified managementRequires less maintenance and fewer OS patches,greatly reduced downtime50‒60 percent less patching and fewer OS rebootsScalability

In-Memory OLTPenhancements

In-Memory OLTPSQL Server engineIn-Memory OLTPcompilerNative compiled SPs& schemaNew high-performance, memory-optimized onlinetransaction processing (OLTP) engine integrated intoSQL Server and architected for modern hardwaretrendsIn-Memory OLTPengine: Memoryoptimized tables &indexesMemory-optimizedtable file groupTransaction logData file group7

Performance and Scaling ImprovementsSupports up to 2 TB of user data in durable memory optimizedtables in a single database.Multiple threads to persist memory-optimized tablesParallel Support Parallel scan for memory-optimized tables and HASH indexes Parallel plan support for accessing memory-optimized tables

Improved scalingOther enhancements include:7xPerformanceIn-Memory OLTPengine has beenenhanced to scalelinearly on servers up to4 sockets

New Transaction Performance Analysis Overview reportNew report replaces theneed to use theManagement DataWarehouse to analyzewhich tables and storedprocedures are candidatesfor in-memoryoptimizationPerformance

Operational analytics:disk-based and inmemory tables

Traditional operational/analytics architectureBI analystsKey issuesComplex implementationIIS ServerRequires two servers (capitalexpenditures and operationalexpenditures)Data latency in analyticsHigh demand;requires real-time analyticsPerformance

Minimizing data latency for analyticsBI analystsChallengesAnalytics queries are resource intensive and cancause blockingMinimizing impact on operational workloadsIIS ServerSub-optimal execution of analytics on relationalschemaBenefitsNo data latencyNo ETLNo separate data warehousePerformance

Operational analytics with columnstore indexKey pointsB-tree indexCreate an updateable NCCI for analyticsqueriesDrop all other indexes that were createdfor analyticsDelta row groupsNo application changesColumnstore index is maintained just likeany other indexNonclustered columnstore index (NCCI)PerformanceQuery optimizer will choose columnstoreindex where needed

Operational analytics: columnstore on in-memory tablesHash indexNo explicit delta row groupRows (tail) not in columnstore stay in In-Memory OLTP tableNo columnstore index overhead when operating on tailnonclustered indexDeleted Rows TableIn-Memory OLTP tabletailBackground task migrates rows from tail to columnstore inchunks of 1 million rowsDeleted Rows Table (DRT) – Tracks deleted rowsColumnstore data fully resident in memoryPersisted together with operational dataNo application changes requiredColumnstore IndexPerformance

Enhanced AlwaysOn Availability GroupsGreater scalabilityUnified HA solutionLoad-balancing readable secondariesIncreased number of automatic failover targetsLog transport performanceAG ListenerSupports Clustered ColumnStoreAGImproved manageabilityHong KongAsynchronous data (Secondary)movementDTC supportAGAGNew York(Primary)AvailabilitySynchronous datamovementNew Jersey(Secondary)Database-level health monitoringGroup Managed Service AccountDomain-independent Availability Groups

Solution Pattern: Leveraging Columnstore AG for performance and scaleETLReportPrimarySecondaryAlwaysOn Availability GroupsSQL Server 2016ColumnstoreAlwaysOn Availability Groups:ETL loading data into primaryQueries / Cube processing against secondaryHA Read Scale OutCube ProcessingROLAPCubeReportReport New Hardware: 16 cores/768GB RAM SSD Database Size: Uncompressed: 12.5 TB Compressed using Columnstore: 4.5 TB Dramatic performance improvement

StretchDatabase

Ever-growing data, ever-shrinking ITMassive tables (hundreds ofmillions/billions of rows, TBs size)What to do?Users want/need to retain dataindefinitelyExpand server and storageCold data infrequently accessedbut must be onlineDatacenter consolidationMaintenance challengesBusiness SLAs at riskHybrid solutionsMove data elsewhereDelete

Stretch SQL Server into AzureSecurely stretch cold tables to Azure with remote query processingCapabilityAzureSQLSERVER2016Hybrid solutionsStretch large operational tablesfrom on-premises to Azure withthe ability to queryBenefits

Stretch Database architectureLocaldataEligibledataInternet boundaryLocaldatabaseLinked serversHow it worksCreates a secure linked server definitionin the on-premises SQL ServerRemoteendpointRemotedataAzureHybrid solutionsTargets remote endpoint with linkedserver definitionProvisions remote resources and beginsto migrate eligible data, if migrationis enabledQueries against tables run against bothlocal database and remote endpoint

Queries continue workingBusiness applications continueworking without disruptionDBA scripts and tools work asbefore (all controls still held in localSQL Server)Developers continue building orenhancing applications with existingtools and methodsHybrid solutions

Advanced security features supportedData in motion always via securechannels (TLS 1.1/1.2)Always Encrypted supported ifenabled by user (encryption keyremains on-premises)Row-Level Security already workingSQL Server and SQL Azure auditingalready workingHybrid solutions

Improvements in SQL Server 2016 that make it run fasterwithout enabling new features“It Just Works Faster”

Query Optimizer ImprovementsThere are two key changes1. SQL Server now leverages parallelism when sample statistics arecreated either explicitly or as part of autostats2. SQL Server now uses a sub linear threshold to triggerAUTO UPDATE STATISTICS computation to address the autostatistics updates especially for large tables. Until now, SQL Serverused a fixed % of number of changes (i.e. delete, insert or updates)to a column irrespective of the size of the table. With this change,stats computation will get triggered at much lower % for largertables.

Encryption enhancementsHardware accelerated encryption/decryption for TDEImplements next generation of Microsoft cryptographyTakes advantage of specialized microprocessor instructionsImproves performance as much as 3x to 10xParallelizable decryptionDecryption now supported as parallelizable (used to be sequential only)Dramatically improved response times for queries with encrypted data columnsScalability

Automatic TEMPDB Configuration Number of files will default to the lower of 8or number of logical cores as detected bysetupInitial size & AutogrowthTF 1117 is enabled by default for TEMPDBEnable Instant File Initializationif you specifya very large initial size or autogrowth valueSpecify multiple folders/drives to spread thedatafiles across several volumes. Each filewill be placed in a round-robin mannerFor Log File Autogrowth default value of64MB is provided to so that the number ofVirtual Log Files (VLFs) during initial creationis a small and manageable number and withappropriate size so that the unused logspace can be reclaimed easily

-T1117 and –T1118 changes for TEMPDB and userdatabasesTEMPDBOne of these changes is TEMPDB always assumes -T1117 and -T1118 behavior.-T1117 - When growing a data file grow all files at the same time so they remain the same size.Reducing allocation contention points.-T1118 - When doing allocations for user tables always allocate full extents. Reducing contention ofmixed extent allocationsIn summary, SQL Server 2016 no longer requires one to turn on TF 1117 or 1118.

Trace Flags 1117 and 1118: User Databases andTempDBUser DatabaseFor User Databases, trace flags 1117 and 1118 have been replaced with new extensions inALTER DATABASE commands. Use the ALTER DATABASE syntax to enable or disable thedesired trace flag behavior at a database level.-- Trace Flag 1118Trace flag 1118 for user databases is replaced by a new ALTER DATABASE setting MIXED PAGE ALLOCATION.Default value of the MIXED PAGE ALLOCATION is OFF meaning allocations in the databasewill use uniform extents.The setting is opposite in behavior of the trace flag (i.e. TF 1118 OFF andMIXED PAGE ALLOCATION ON provide the same behavior and vice-versa).Syntax:ALTER DATABASE dbname SET MIXED PAGE ALLOCATION { ON OFF }For more information see aspx

-T1117 and –T1118 changes for TEMPDB and userdatabasesExample:--Default value is OFF so all allocations in AdventureWorks will use uniform extents. To disable anduse mixed extents turn the setting to on.ALTER DATABASE AdventureWorks SET MIXED PAGE ALLOCATION ON;Catalog changes:A new column is mixed page allocation on is added to DMV sys.databases that shows whichallocation type (uniform or mixed) is being used. For more information 534.aspx

Instant File InitializationDatabase Instant File Initialization was added several SQLServer releases ago. The instant file initialization featurescales the creation and expansion (growth) of database,DATA files. The 'Manage Volume Privilege' option is offby default preventing many SQL Server installations fromtaking advantage of the feature.SQL Server 2016 Setup provides the option to enable'Perform Volume Maintenance Task' privilege to the SQLServer Service SID. This privilege enables instant fileinitialization by avoiding zeroing of data pages. Forsecurity and performance considerations see DatabaseInstant File Initialization topic.For Failover Cluster instance, each node will be configuredindividually for this option since the privilege belongs tolocal security policy. The option will show and can beenabled when adding each node.If you are installing SQL Server usingcommand line or a configuration file, setthe SQLSVCINSTANTFILEINIT parameterto True to enable instant file initializationfor SQL Server service account.

Core engine scalabilityDynamic partitioning of thread-safe memory objects bynon-uniform memory access (NUMA) node or by CPUEnables greater scalability of high-concurrency workloads running on NUMA hardwareDynamically promotes CMemThread to be partitioned by NUMA node or by CPU based onworkload characteristics and contention factorsEliminates need for trace flag, but also dynamically determines partition based on contentionScalability

SQL Server 2016 Runs Faster On Same Hardware A bold statement that any SQL Server professional can stand behind withconfidence. No application changes needed, just worksColumnstore 34X fasterAlwaysOn 7X fasterQuery execution time Batch vs RowThroughputMB/sAvg CPU%(secondary)SQL 20148217SQL 201654036900800Execution Time tch24.1x8Row1634.2x32

TPC-H queries with new cardinality estimation994,861ms/loopSQL Server 2016 (old CE)SQL Server 2016 (new CE)375,666ms/loop2.6xfasterTests running the 22 queries that make up the TPC-H benchmark test using a 3,000 warehouse workload(900 million rows in order line table). Used traceflag 9481 to force the old cardinality estimation algorithmand compared the results using the new cardinality estimation.Tests performed on Intel Xeon CPU E7-8890v3 @ 2.50GHz, 1.5 TB RAM, and Tegile Flash Storage Array.

DBCC CheckDBSQL Server 2014SQL Server 201612,880 ms1,676 ms7xfasterMachine32GB RAM, 4 Core Hyper-threadedenabled 2.8Ghz, SSD StorageSQL ServerOut of the box, default es-7x-better/

DBCC CheckDBInternally DBCC CHECK* uses a page scanning coordinator design (MultiObjectScanner).SQL Server 2016 changes the internal design (CheckScanner), applying no lock semantics and design similar tothose used with In-Memory Optimized (Hekaton) objects, allowing DBCC operations to scale far better thanprevious releases.The following chart shows the same 1TB database testing. MultiObjectScanner Older design CheckScanner New design

DBCC CHECKDB extended logical checksStarting with SQL Server 2016, additional checks on filtered indexes, persisted computed columns, and UDTcolumns will not be run by default to avoid the expensive expression evaluation. This will greatly reduce thetime to run CHECKDB on databases that have these objects.Physical consistency checks of these objects are still done . This means only whenEXTENDED LOGICAL CHECKS option is specified, the expression evaluation is performed. This is inaddition to other logical checks that are only performed (indexed view, XML indexes, and spatial indexes)when EXTENDED LOGICAL CHECKS option is specified.For filtered indexes, CHECKDB has also been improved to skip any data record that is not qualified forbeing indexed by target NC checks/

Spatial Line String query improvementsSQL Server 2014SQL Server 201626,600 ms87 ms271xfaster

Spatial Native Function query improvementsSQL Server 2014SQL Server 20162,830 ms144 ms19xfasterMachine32GB RAM, 4 Core Hyper-threadedenabled 2.8Ghz, SSD StorageSQL ServerOut of the box, default installation

Table valued parameters using spatial columnsSQL Server 2016SQL Server 2014120,000 rows/sec8000 s-with-spatialcolumns/

Table valued parameters using spatial columnsTable Valued Parameters (TVPs) can be used as input parameter(s) to storedprocedures. A problem with TVP parameters, containing spatial columns, limitsscalability. When a TVP parameter arrives at the SQL Server the rows are stored inTEMPDB. The problem caused the Spatial assembly to be reloaded as each spatial rowand column was processed.SQL Server 2016 corrects the scalability problem, using native spatial validation(s),increasing performance by 15 times or more.TVP Before theFix:8000rows/secTVP After r-tvps-with-spatial-columns/

Spatial Index Builds FasterIndex creation and tessellation can be intensive, spatial activities.Along with the native and TVP enhancements additional work tooptimize index creation and tessellation was completed.Testing reveals that building a spatial index on SQL Server 2016, withthe improved design, can be more than 2 times faster than SQL Server2012 or 2014 on the same data and hardware. It is common place forspatial tables to be 300 million or more rows. Reducing the indexbuild time by a factor of 2x or more greatly reduces the needmaintenance spatial-columns/

Automatic Soft NUMASoft NUMA can be used to divide a physical node into multiple logical nodes presenting a differentlayout to the entire SQL Server and adjusting the partitioning to optimize scalability andperformance. Microsoft recommends use of Soft NUMA on the newer, large CPU NUMA systemdeployments to increase performance.During startup, SQL Server 2016 interrogates the hardware layout and automatically configures SoftNUMA on systems reporting 8 or more CPUs per NUMA node. The partitioning triggers variousadjustments throughout the database engine improving scalability and performance. TheAutomatic Soft NUMA logic considers logical CPU ratios, total CPU counts and other factors,attempting to create soft, logical nodes containing 8 or fewer CPUs each.Your mileage may vary but, here is a testing results from the SQL Server 2016 test harness:"With HT aware auto soft-NUMA, we get up-to 30% gain in query performance when DOP is set tothe number of physical cores on a socket (12 in this case) using Automatic Soft NUMA."

Updated Scheduling AlgorithmsSQL Server 2016 gets a scalability boost from scheduling updates. Testing uncovered issues with the percentile scheduling based algorithms in SQL Server 2012 and 2014. A large, CPU quantumworker and a short, CPU quantum worker can receive unbalanced access to the scheduling resources.Take the following example. Worker 1 is a large, read query using read ahead and in-memory database pages and Worker 2 is doing shorter activities. Worker 1 finds information already inbuffer pool and does not have to yield for I/O operations. Worker 1 can consume its full CPU quantum.On the other hand, Worker 2 is performing operations that require it to yield. For discussion let's say Worker 2 yields at 1/20th of its CPU, quantum target. Taking resource governance and otheractivities out of the picture the scheduling pattern looks like the following.Worker 1 is getting 5 times more CPU cycles than Worker 2. In testing we found issues with various workloads and system tasks. If Worker 2 is the log writer it takes longer to harden logrecords, which holds locks, which can lead to blocking and throughput issues.SQL Server 2016 and Windows Azure SQL Database (WASD) monitors the quantum usage patterns allowing all workers to get fair treatment. The same pattern described above looks like thefollowing on SQL Server 2016. In this simplistic example Worker 2 is allowed to consume repeated quantum's preventing Worker 1 from monopolizing the scheduler in an unfriendly pattern.Note: The scheduler changes were deployed to Windows Azure SQL Server Database in March of 2014.

And many more 3/sql-2016-itjust-runs-faster-announcement/

Query StoreYour flight data recorderfor your database

Problems with query performanceWebsiteIs downTemporaryperf issuesDBupgradedPerformanceDatabaseis notworkingImpossibleto predict /root causeRegressioncaused bynew bitsFixing query plan choice regressions is difficult Query plan cache is not well-suited for performance troubleshootingLong time to detect the issue (TTD) Which query is slow? Why is it slow?What was the previous plan?Long time to mitigate (TTM) Can I modify the query?How to use plan guide?

The solution: Query StoreDedicated store for query workload performance dataCaptures the history of plans for each queryCaptures the performance of each plan over timePersists the data to disk (works across restarts, upgrades, and recompiles)Significantly reduces TTD/TTMFind regressions and other issues in secondsAllows you to force previous plans from historyDBA is now in controlPerformance

Query data storeCollects query texts (plus all relevant properties)Stores all plan choices and performance metricsCompileExecutePlan storeRuntimestatsWorks across restarts / upgrades / recompilesQueryStoreschemaDramatically lowers the bar for performancetroubleshootingNew ViewsIntuitive and easy plan forcingDurability latency controlled by DB optionDATA FLUSH INTERNAL SECONDSPerformance

Query Store write architectureCompileQuery text and planQuery andPlan StoreasyncExecuteQuery ExecutionPerformanceQuery exec. statsRuntime statsstoreQuery StoreInternaltables

Query Store read architectureViews merge in-memory and on-disk contentUsers always see ‘latest’ dataCompileQuery text and planQuery Store viewsQuery andPlan StoreasyncExecuteQuery ExecutionPerformanceQuery exec. statsRuntime statsstoreQuery StoreInternaltables

Query Store schema explainedInternal tables1-nQuery textQueryContextsettingsOne row per query text perplan affecting option(example: ANSI NULLS on/off)PerformanceExposed views1-nPlanRuntimestatsCompile stats:query store query textquery context settingsquery store queryquery store planRuntimestatsintervalOne row per plan(for each query)sys.One row per planper time interval(example: 5 min)Runtime stats:query store runtime stats intervalquery store runtime stats

Keeping stability while upgrading to SQL Sever 2016SQL Server 2016QO enhancements tied to database compatibility levelInstall bitsKeepexistingcompat.levelPerformanceRun QueryStore(create abaseline)Move tovNextCompatLevelFixregressionswith planforcing

Monitoring performance by using the Query StoreThe Query Storefeature provides DBAswith insight on queryplan choice andperformancePerformance

/* (1)(6) Performanceanalysisusing Query Store views*/Turn ON QueryStore */SELECT q.query id, qt.query text id, qt.query sql text,SUM(rs.count executions)AS total execution countALTER DATABASE MyDB SET QUERY STORE ON;FROMsys.query store query textJOINparameters *//* (2) Review current QueryqtStoresys.query store queryq ON qt.query text id SELECT * FROM sys.database query store optionsq.query text id JOINsys.query store planp ONvaluesq.query id p.query id JOIN/* (3) Set new parameter*/sys.query store runtime statsrs ON p.plan id rs.plan idALTER DATABASE MyDBGROUPBY q.query id,qt.query text id, qt.query sql textSET QUERY STORE(ORDERBY total execution countOPERATION MODE READ WRITE, DESCCLEANUP POLICY (/* (7)Force plan for a given querySTALE QUERY THRESHOLD DAYS 30 */exec), sp query store force plan12DATA FLUSH INTERVAL SECONDS/*@query id*/, 14 /*@plan id*/ 3000,MAX SIZE MB 500,INTERVAL LENGTH MINUTES 15);/* (4) Clear all Query Store data */ALTER DATABASE MyDB SET QUERY STORE CLEAR;/* (5) Turn OFF Query Store */ALTER DATABASE MyDB SET QUERY STORE OFF;PerformanceDB-level feature exposedthrough T-SQL extensionsALTER DATABASECatalog views (settings, compile, and runtime stats)Stored Procs (plan forcing, query/plan/stats cleanup)

Live query statisticsView CPU/memory usage, execution time, queryprogress, and moreEnables rapid identification of potentialbottlenecks for troubleshooting queryperformance issuesAllows drill down to live operator level statistics:Number of generated rowsElapsed timeOperator progressLive warningsPerformance

Summary: Query StoreCapabilityQuery Store helps customers quickly find and fix query performance issuesQuery Store is a ‘flight data recorder’ for database workloadsBenefitsGreatly simplifies query performance troubleshootingProvides performance stability across SQL Server upgradesAllows deeper insight into workload performancePerformance

app performance Performance gains just by upgrading Record-breaking performance PROS 1000% faster scoring KPMG 250% faster execution Global ERP 700% faster queries Tableau 190% faster queries Leader in OLTP price/performance Leader in DW performance & price/performance #1 in TPC-E price/perf (44 cores) Windows Server 2012 R2 #2 in TPC-E price .