I’m enjoying the book and I’m happy to see that the discussion of GROUP BY in chapter 15 also includes a callout box titled “GROUP BY and DISTINCT”.
I’d like to suggest that the callout box nudge readers in the direction of DISTINCT over GROUP BY for the situation it describes.
The callout box compares two different queries:
SELECT DISTINCT date_reported, reported_by FROM Bugs;
SELECT date_reported, reported_by FROM Bugs GROUP BY date_reported, reported_by;
The callout box concludes with:
Both queries produce the same result and should be optimized and executed similarly, so the difference in this example is only a matter of preference.
Comparing the two queries is indeed valuable, but I suggest adding a bit more language, like this:
Both queries produce the same result and should be optimized and executed similarly, so the difference in this example is only a matter of preference. With all else being equal, using DISTINCT has the advantage of communicating the intention more clearly.
My experience might be skewed, but I immediately know what I’m looking at when I see
On the other hand, encountering a
GROUP BY without any aggregation would make me wonder if I just found a bug.
This small way of increasing code clarity seems like another opportunity to avoid pitfalls.