SQL Friday #17: Jeff Moden on “The ‘Secret’ to High Performance Code – High Performance Test Data”

Online Microsoft Data Platform Lunch Sessions

SQL Friday #17: Jeff Moden on “The ‘Secret’ to High Performance Code – High Performance Test Data”

2020-08-17 Past SQL Friday Sessions 0
Date:2020-09-25
Time:12:00 CEST
How to attend:Already finished
Accompanying material:Download zip-file from here.
If you can’t attend this one but want to attend future sessions, join the Meetup group and be the first to know when registration comes online: https://www.meetup.com/SQL-Friday

Jeff Moden on “The ‘Secret’ to High Performance Code – High Performance Test Data”

“A Developer must not guess… a Developer must KNOW!” — Sergiy Floka – circa June 2007

The only way to know for sure if the code that we’ve written or the index that we’ve built will perform and scale well is to test it against large amounts of data. On new projects in particular, such quantities of data aren’t always available. One way to build such data is with commercial and freeware data generators.

But what if we don’t have such a tool, can’t afford it, or simply aren’t allowed to install such third party tools even on our Dev servers?

This presentation explains and demonstrates how to quickly and easily build millions of rows of either assigned or constrained random test data of various datatypes. And it’s all “nasty fast” because we won’t use even a lick of RBAR.

We’ll learn what a “Pseudo Cursor” is and how to use it (you actually use them every day and might not know it). We’ll learn some basic formulas that can be expanded upon in so many different ways. We’ll learn how to generate and control the domains of large quantities of random integers and floating point numbers (the foundations of many types of test data) in a very high speed fashion and then how to convert those to other types of test data such as dates, currency amounts, alphanumeric department numbers, and even random-length simulated names.

Interestingly enough, the techniques learned can be used for other things. For example, we’ll also solve the classic problem of having several tens of thousands of rows of start and end dates and needing to generate 1 row for every date defined by the start and end dates for every row.

Last but not least, we’ll learn a bit about some awesome tricks with minimal logging to make our repeat testing/retesting go even faster including doing what everyone says is impossible… having data follow the Fill Factor as we’re inserting it into a table with a clustered index already in place so that you don’t have to do it separately after the initial data is inserted.

We’ll also see the huge amount of trouble so many have gotten themselves into when they make “Grooved” test data.

About Jeff Moden

Jeff Moden is the fellow that coined the term “RBAR”, which is pronounced as “ree-bar” (like the steel rods stuck in cement and not going anywhere fast) and stands for “Row By Agonizing Row”.

Jeff is a retired SQL Server MVP veteran of 9 years, winner of the 2011 RedGate Exceptional DBA award, has written some very high impact articles on SQLServerCentral.com, and has presented at the PASS Summit and many PASS Chapters and SQLSaturdays.

His passions are the “Black Arts” of high performance SQL code and teaching about it.