5 SQL GROUP BY Mistakes That Make You Look Like a Beginner (And How to Fix Them)

type

status

date

slug

summary

Mistake #1: Selecting Columns That Aren't Grouped

This is the #1 mistake that trips up SQL beginners. You write what seems like a logical query, and you're met with an error message that makes no sense.

The error looks like this:

Why it happens: When you're thinking about your data, this seems intuitive: "Show me departments, employee names, and how many people are in each department." Your brain processes this as three separate pieces of information you want to see.

But here's what SQL sees: You're asking it to collapse multiple rows into one per department. The Sales department might have 10 employees—John, Sarah, Mike, etc. Which first_name should SQL show in that single row? It can't pick one randomly.

The core rule you need to memorize:

Every column in your SELECT statement must be either:

In the GROUP BY clause, OR

Inside an aggregate function (COUNT, SUM, AVG, MIN, MAX)

The correct version:

If you truly need individual names alongside counts, you probably don't want GROUP BY at all—you want a window function (a topic for another post) or a JOIN with a grouped subquery.

When you'll see this: Most commonly when you're trying to show detail rows alongside summary statistics. Remember: GROUP BY collapses rows. If you want details, don't collapse.

Mistake #2: Using WHERE Instead of HAVING

This mistake is so common because WHERE is usually the first filtering keyword you learn. It feels natural to reach for it every time you need to filter something.

The error:

Why it happens: WHERE is your go-to filtering tool, so when you want "departments with more than 5 employees," your instinct is to use WHERE.

But here's the critical difference that explains everything:

WHERE filters individual rows before grouping happens

HAVING filters groups after aggregation is complete

Think about it this way: You can't filter by COUNT(*) before you've actually counted anything. SQL needs to group all the employees by department first, count them, and then filter out departments with 5 or fewer people.

The correct version:

You can use both in the same query:

In this example, WHERE first throws out anyone hired before 2020. Then GROUP BY groups the remaining employees. Then HAVING throws out departments where the average salary isn't above $60,000.

Memory trick: WHERE = "Which rows?" | HAVING = "How many/much in each group?"

Mistake #3: Forgetting That COUNT(*) and COUNT(column) Are Different

I'll be honest—I used COUNT(*) and COUNT(column_name) interchangeably for months before I realized they return different results.

The confusing moment:

Why it happens: COUNT() feels like it should always count everything. The subtle difference between COUNT(*) and COUNT(column) seems like unnecessary complexity—until it bites you.

The critical difference:

COUNT(*) counts all rows (including rows where some columns are NULL)

COUNT(column) counts only non-NULL values in that specific column

COUNT(DISTINCT column) counts unique non-NULL values

Real example:

Customer #2 didn't provide an email. COUNT(*) includes them because the row exists. COUNT(email) skips them because the email column is NULL.

When this matters:

Data quality analysis: "How many customers provided phone numbers?" Use COUNT(phone_number).

Avoiding overcounting: "How many unique products were sold?" Use COUNT(DISTINCT product_id).

Accuracy: "Total customers" (use COUNT(*)) vs "Customers with email addresses" (use COUNT(email)).

The fix: Be intentional about which COUNT you need. Ask yourself: "Do NULL values matter for what I'm counting?"

Mistake #4: Not Handling NULLs in Aggregate Functions

This mistake is sneaky because your query runs successfully—it just gives you the wrong answer.

The silent error:

Why it happens: You assumed NULL values would be treated as 0 in the calculation. They're not. They're ignored completely.

The surprise behavior:

NULL values are excluded from both the sum and the count. So if NULL means "no discount applied" (essentially 0), your average is wrong. You're calculating "average discount when a discount was given" not "average discount per order."

The fix—Option 1 (treat NULL as 0):

The fix—Option 2 (show both perspectives):

All aggregate functions ignore NULLs:

SUM(column) - NULLs excluded from total

MIN(column), MAX(column) - NULLs ignored

COUNT(column) - NULLs not counted

The only exception is COUNT(*), which counts all rows regardless of NULL values.

When you'll see this: Financial calculations, optional form fields, data with missing values, conversion rate calculations.

Mistake #5: Repeating Complex Expressions in GROUP BY

This mistake doesn't cause errors—it just makes your code ugly, hard to read, and prone to typos.

The awkward query:

Why it happens: You expect column aliases to work in GROUP BY like they do in ORDER BY. Unfortunately, in standard SQL, they don't.

The reality: You must repeat the entire expression because GROUP BY is evaluated before SELECT (where aliases are created). SQL doesn't know what age_group means yet.

Better approach—Use a subquery:

Now the expression is written once in the subquery, and the outer query uses the clean alias.

Note: PostgreSQL and some other databases allow GROUP BY 1 (positional) or GROUP BY age_group (alias), but this isn't standard SQL. If you're writing queries that might run on different databases, use the subquery approach.

Bonus: Why These Mistakes Happen (The Execution Order Secret)

Most GROUP BY mistakes come from one misunderstanding: SQL doesn't execute your query in the order you write it.

You write:

SQL executes:

This execution order explains everything:

Why WHERE can't use aggregates: Aggregates don't exist yet at step 2.

Why you can't use SELECT aliases in GROUP BY: SELECT happens after GROUP BY.

Why HAVING can filter on aggregates: Aggregates are calculated during GROUP BY at step 3.

Once you internalize this order, GROUP BY queries become intuitive.

Memory trick: "Fred Where Group Having Select Order Limit"

Conclusion

The five mistakes that make you look like a SQL beginner:

Selecting non-grouped columns without aggregating them

Using WHERE to filter aggregates instead of HAVING

Confusing COUNT(*) and COUNT(column)

Ignoring NULL behavior in aggregate functions

Repeating complex expressions instead of using subqueries

Here's the insight that ties everything together: Most GROUP BY errors come from not understanding SQL's execution order. Once you grasp that WHERE filters before grouping and HAVING filters after—and that SELECT comes last—everything clicks into place.

Your action step: Next time you write a GROUP BY query, pause before running it and ask yourself: "Is every column in my SELECT either in my GROUP BY or inside an aggregate function?" That one question will prevent 80% of GROUP BY errors.

And remember: these mistakes are completely normal. Every SQL developer has made them. The difference between a beginner and a professional isn't avoiding mistakes—it's recognizing and fixing them in 30 seconds instead of 30 minutes.

Practice Challenge

Before you go, try fixing this query that contains multiple mistakes:

Answer: Three mistakes!

first_name isn't in GROUP BY or inside an aggregate function

WHERE uses an aggregate (should be HAVING)

AVG(bonus) ignores NULLs—if some employees have NULL bonuses, decide whether that's intended

The corrected query:

Now you're writing GROUP BY queries like a pro. Go forth and aggregate with confidence!