Window functions have many uses and it's easy to get confused when you start working with them. That's why I'm going to explain the difference between Row_Number(), Rank(), Dense_Rank() and NTILE() to you now.
Window Functions
Window functions work on a series of rows and return a unique aggregate value in every row. The term Window refers to a set of rows which will be used for the function in the database.
There are these types of window functions:
- Ranking Window Functions:
DENSE_RANK()
,NTILE(),
RANK()
,ROW_NUMBER()
- Value Window Functions:
LEAD()
,LAG()
,LAST_VALUE()
,FIRST_VALUE()
- Aggregate Window Functions:
AVG()
,SUM()
,COUNT()
,MAX()
,MIN()
And those Window Functions are Supported in the Following SQL Dialects:
- SQream DB
- SQL Anywhere
- PostgreSQL
- Oracle
- MonetDB
- Microsoft SQL Server
- SQLite
- MariaDB
- Linter SQL RDBMS
- Informix Dynamic Server
- H2
- Firebird
- EXASolution
- DB2
- CUBRID
- ClustrixDB
- MySQL
ROW_NUMBER()
The name is self-explanatory. This function simply returns the unique row number of each record starting with 1.
Let's see how ROW_NUMBER()
works:
SELECT Value, ROW_NUMBER() OVER() FROM Table
Remember that some of the SQL dialects require an explicit ORDER BY
in the OVER()
clause:
SELECT Value, ROW_NUMBER() OVER(ORDER BY Value) FROM Table
Value | ROW_NUMBER |
a | 1 |
a | 2 |
b | 3 |
b | 4 |
c | 5 |
c | 6 |
c | 7 |
d | 8 |
e | 9 |
e | 10 |
RANK()
RANK()
almost behaves like ROW_NUMBER()
, except that it is used to give each record a unique rank based on the specified value.
When two records are of the same value, the RANK()
function will assign the same rank to both records by ignoring the very next rank.
So let's look at an example where we replace ROW_NUMBER()
with RANK()
in our original query:
SELECT Value, RANK() OVER(ORDER BY Value)FROM Table
Value | RANK |
a | 1 |
a | 1 |
b | 3 |
b | 3 |
c | 5 |
c | 5 |
c | 5 |
d | 8 |
e | 9 |
e | 9 |
Yet as you can see, there are gaps between the different ranks. By using DENSE_RANK()
, we can avoid those gaps.
DENSE_RANK()
The DENSE RANK()
function is similar to the RANK()
function except that it is "dense" and does not skip any rank. This means that if there are two equal records, DENSE_RANK()
will assign the same rank to both records but will not skip the next rank.
SELECT Value, DENSE_RANK() OVER(ORDER BY Value) FROM Table
Value | DENSE_RANK |
a | 1 |
a | 1 |
b | 2 |
b | 2 |
c | 3 |
c | 3 |
c | 3 |
d | 4 |
e | 5 |
e | 5 |
As you can obviously see, the very same rank is given to two equal records and the next rank number is given to its next record without skipping the rank value.
NTILE()
The NTILE()
Function disperses the rows into a specified number of groups. But if the amount of rows is not divisible by the amount of groups, you can get groups with different sizes. Also, don't forget that larger groups will be shown before smaller groups.
So let's look at an example where we want to divide a Table into two.
SELECT Value, NTILE(2) OVER(ORDER BY Value) FROM Table
Value | NTILE |
a | 1 |
a | 1 |
b | 1 |
b | 1 |
c | 1 |
c | 2 |
c | 2 |
d | 2 |
e | 2 |
e | 2 |
But what happens when we want to divide the Table into three?
SELECT Value, NTILE(3) OVER(ORDER BY Value) FROM Table
Value | NTILE |
a | 1 |
a | 1 |
b | 1 |
b | 1 |
c | 2 |
c | 2 |
c | 2 |
d | 3 |
e | 3 |
e | 3 |
As you can see we got one group which has four values and another two groups that got three Values because 10 Rows is not divisible by 3.