DEV Community

Andy Mallon for AM2.co

Posted on • Originally published at am2.co on

How to create date buckets in T-SQL



I was fiddling around with some data, looking at making a quick aging report. I probably should have reached for Power BI Desktop, but sometimes I can be stubborn, and I wanted to get my T-SQL to work–if nothing else to figure out how to make it work.

My Data

My data was essentially a really long list of widgets, with dates–really simple & straightforward:

Widget Name Widget Date
Widget A 2019-06-19
Widget B 2019-07-14
Widget C 2018-11-04
Widget D 2015-11-30

My reporting requirements

I wanted to count widgets based on how old they were. The data spanned several years, so I wanted to be able to put them into 3 or 6 month buckets. Essentially, I wanted to use N-month sized buckets to make my report to look something like this:

Widget Age Widget Count
0 – 6 Months Old 8752
6 – 12 Months Old 9836
12 – 18 Months old 7852
60 – 66 Months old 5872

Let’s write the report

First, we need some test data we can fool around with. I’m going to create a temp table with one row per day, using the Numbers table from my DBA Database. For this test data, that will make the numbers per bucket pretty similar, but that makes for easy validation!

SELECT  WidgetName = 'Widget ' + LEFT(NEWID(),3),
        WidgetDate = CONVERT(DATE,DATEADD(DAY,-1*[Number],GETDATE()))
INTO #Widgets
FROM DBA.dbo.Numbers
WHERE [Number] < 75000;

Now, it’s pretty easy to calculate the age of these rows–we just use DATEDIFF to compute the number of days/weeks/months/years between the WidgetDate and today. But that doesn’t help me with my report. I could do a GROUP BY MonthsAgo to count per month… but I really wanted to put these into N-month buckets.

SELECT  WidgetDate,
        DaysOld     = DATEDIFF(DAY,WidgetDate,GETDATE()),
        MonthsOld   = DATEDIFF(MONTH,WidgetDate,GETDATE())
FROM #Widgets;

Next, I want to put those “Months Ago” numbers into buckets–and I want to be able to easily adjust the size of those buckets as I evaluate the data & work on how I want to present it. Each bucket has a “bottom end” and “top end” value–referred to as a floor & ceiling.

Here’s where my math nerd-iness comes out… There’s a little trick for “bucketizing” numbers (in this case, turning “Months” into “Month Buckets”):

  • Take a number
  • Divide it by your bucket size
  • Round that number down to a whole number–We’ll call this the “divided number”
  • Multiply the “divided number” by the bucket size–This is your bucket floor
  • Add one to the “divided number” and then multiply that by your bucket size–This is your bucket ceiling

Converting that into T-SQL, we can even save a step: Our MonthsOld column is always a whole number (ie, an int). If our bucket size is also an int, then dividing the two will always result in an int, which has been rounded down–this saves me from having to manually do any rounding! (Though, I could totally throw in a call to the FLOOR() function if I wanted to be explicit.)

Now I can convert that into T-SQL and continue to add on to our last query, and spot-check the results:

DECLARE @BucketSize int = 6;

SELECT  WidgetDate,
        DaysOld            = DATEDIFF(DAY,WidgetDate,GETDATE()),
        MonthsOld          = DATEDIFF(MONTH,WidgetDate,GETDATE()),
        MonthBucketFloor   = DATEDIFF(MONTH,WidgetDate,GETDATE()) / @BucketSize * @BucketSize,
        MonthBucketCeiling = (DATEDIFF(MONTH,WidgetDate,GETDATE()) / @BucketSize + 1) * @BucketSize
FROM #Widgets;

I know there’s someone else out there shouting “No, Andy! That formula is too complicated! There’s another way to do that with fewer steps!” Yes, there is another way. This option might be easier to understand why it works, but I find the code harder to read.

  • Take a number
  • Perform a modulo with the bucket size as the divisor to get the remainder
  • Subtract the remainder from the original number–This is your bucket floor
  • Take your bucket floor and add your bucket size–This is your bucket ceiling

OK, so technically, that’s only 4 steps. But I don’t think that makes it any simpler. Here’s what this alternate version looks like when you convert it into T-SQL:

--Alternate to my prior query
DECLARE @BucketSize int = 6;

SELECT  WidgetDate,
        DaysOld            = DATEDIFF(DAY,WidgetDate,GETDATE()),
        MonthsOld          = DATEDIFF(MONTH,WidgetDate,GETDATE()),
        MonthBucketFloor   = DATEDIFF(MONTH,WidgetDate,GETDATE()) - (DATEDIFF(MONTH,WidgetDate,GETDATE()) % @BucketSize),
        MonthBucketCeiling = DATEDIFF(MONTH,WidgetDate,GETDATE()) - (DATEDIFF(MONTH,WidgetDate,GETDATE()) % @BucketSize) +  @BucketSize
FROM #Widgets;

I’m not going to use this version, but know that it’s 100% logically equivalent to the prior version. Maybe some of my fellow math nerds will slide into the comments with the mathematical proof that these two queries are logically equivalent.

My last step is simply to do some counts grouped by the buckets–and to convert the floor/ceiling numbers into a pretty label:

DECLARE @BucketSize int = 6;

WITH WidgetsWithAge AS (
    SELECT  WidgetDate,
            DaysOld            = DATEDIFF(DAY,WidgetDate,GETDATE()),
            MonthsOld          = DATEDIFF(MONTH,WidgetDate,GETDATE()),
            MonthBucketFloor   = FLOOR(DATEDIFF(MONTH,WidgetDate,GETDATE())/@BucketSize) * @BucketSize,
            MonthBucketCeiling = (FLOOR(DATEDIFF(MONTH,WidgetDate,GETDATE())/@BucketSize) + 1) * @BucketSize
    FROM #Widgets
)
SELECT  WidgetAge   = CONVERT(varchar(100), MonthBucketFloor) + ' - ' + CONVERT(varchar(100), MonthBucketCeiling) + ' Months Old',
        WidgetCount = COUNT(*)
FROM WidgetsWithAge
GROUP BY MonthBucketFloor, MonthBucketCeiling;

Boom! That’s it. That’s my report. And since I sneaked in that @BucketSize as a parameter, it’s pretty easy for me to adjust my bucket size for different data. I can easily switch that to a 36-month bucket with the flick of a keystroke:

DECLARE @BucketSize int = 36;

WITH WidgetsWithAge AS (
    SELECT  WidgetDate,
            DaysOld            = DATEDIFF(DAY,WidgetDate,GETDATE()),
            MonthsOld          = DATEDIFF(MONTH,WidgetDate,GETDATE()),
            MonthBucketFloor   = FLOOR(DATEDIFF(MONTH,WidgetDate,GETDATE())/@BucketSize) * @BucketSize,
            MonthBucketCeiling = (FLOOR(DATEDIFF(MONTH,WidgetDate,GETDATE())/@BucketSize) + 1) * @BucketSize
    FROM #Widgets
)
SELECT  WidgetAge   = CONVERT(varchar(100), MonthBucketFloor) + ' - ' + CONVERT(varchar(100), MonthBucketCeiling) + ' Months Old',
        WidgetCount = COUNT(*)
FROM WidgetsWithAge
GROUP BY MonthBucketFloor, MonthBucketCeiling;

The post How to create date buckets in T-SQL appeared first on Andy M Mallon - AM².

Top comments (0)