Aggregating data is the process of grouping data based on certain criteria and performing calculations on those groups. SQL provides a number of built-in aggregate functions that can be used to perform calculations on groups of data. Some of the most common aggregate functions are:
- COUNT(): counts the number of rows in a group.
- SUM(): calculates the sum of a column in a group.
- AVG(): calculates the average of a column in a group.
- MAX(): returns the maximum value in a column in a group.
- MIN(): returns the minimum value in a column in a group.
Here’s an example that shows how to use the COUNT() function to count the number of employees in each department:
SELECT department, COUNT(*) as count
FROM employees
GROUP BY department;
This will produce a result set that shows the department name and the number of employees in that department:
+------------+-------+
| department | count |
+------------+-------+
| HR | 3 |
| IT | 5 |
| Sales | 4 |
+------------+-------+
Note that we used the GROUP BY clause to group the data by department, and we used the COUNT() function to count the number of rows in each group. We also used the alias count to rename the result column.
You can also use aggregate functions in combination with other clauses, such as WHERE and HAVING, to filter the data that’s being aggregated. For example:
SELECT department, AVG(salary) as avg_salary
FROM employees
WHERE gender = 'F'
GROUP BY department
HAVING AVG(salary) > 50000;
This will produce a result set that shows the department name and the average salary for female employees in that department, but only for departments where the average salary is greater than 50000:
+------------+------------+
| department | avg_salary |
+------------+------------+
| IT | 60000.00 |
| Sales | 55000.00 |
+------------+------------+
Note that we used the WHERE clause to filter the data so that we only get female employees, and we used the HAVING clause to filter the results based on the average salary of each department.
Group By
The GROUP BY clause is used to group the result set based on one or more columns. It is often used with an aggregate function to group the result set by one or more columns and apply the aggregate function to each group. The syntax for GROUP BY is as follows:
SELECT column1, column2, ..., aggregate_function(column)
FROM table_name
WHERE condition
GROUP BY column1, column2, ...;
In this syntax, column1, column2, ... are the columns that you want to group the result set by, and aggregate_function(column) is the aggregate function that you want to apply to each group.
For example, suppose we have a table named orders with the following data:
+---------+------------+-------+
| order_id| order_date | total |
+---------+------------+-------+
| 1 | 2022-02-01 | 100 |
| 2 | 2022-02-02 | 200 |
| 3 | 2022-02-02 | 150 |
| 4 | 2022-02-03 | 300 |
| 5 | 2022-02-03 | 250 |
+---------+------------+-------+
To find the total sales for each day, we can use the GROUP BY clause as follows:
SELECT order_date, SUM(total) AS total_sales
FROM orders
GROUP BY order_date;
The result set will be:
+------------+-------------+
| order_date | total_sales |
+------------+-------------+
| 2022-02-01 | 100 |
| 2022-02-02 | 350 |
| 2022-02-03 | 550 |
+------------+-------------+
In this example, we have grouped the result set by order_date column and applied the SUM function to the total column for each group.
Note that when using GROUP BY, the columns in the SELECT statement must either be in the GROUP BY clause or be used with an aggregate function. Otherwise, the query will result in an error.
HAVING
The HAVING clause is used in conjunction with the GROUP BY clause to filter the result set based on a condition that applies to the groups.
The syntax for the HAVING clause is as follows:
SELECT column1, column2, ...
FROM table
GROUP BY column1, column2, ...
HAVING condition;
The HAVING clause is similar to the WHERE clause, but it operates on groups rather than individual rows. The condition specified in the HAVING clause is applied to each group, and only the groups that meet the condition are included in the result set.
Here’s an example:
Suppose you have a table orders that contains information about customer orders, and you want to find the total number of orders for each customer who has placed more than 10 orders. You could use the following query:
SELECT customer_id, COUNT(*) AS num_orders
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 10;
This query groups the orders by customer_id, counts the number of orders for each customer, and then filters the result set to only include customers who have placed more than 10 orders. The result set will contain two columns: customer_id and num_orders.
Note that the HAVING clause must come after the GROUP BY clause in the query. Also, any column that is not an aggregate function (e.g. COUNT, SUM) must be included in the GROUP BY clause.
Exercise:
Write SELECT statements with aggregate functions and GROUP BY clause to perform calculations on data
Use the HAVING clause to filter groups based on aggregate values