How to rank rows in SQL?

Mari Galdina
4 min readJan 31, 2021

How to choose the right answer for the SQL interview question?

Every time you come into a data science interview, you can hear questions related to databases. Usually, most workflows involve quick slicing and dicing of data. Companies ask to write basic queries which have relevant in real-life settings.

What are the main ideas for questions about databases on data science interviews?

  1. First of all, the interviewer wants to know how you understand the main concepts of working with a given database. Could you have any idea how to take the necessary data?
  2. Then they check how you understand the data. Do you know how different types of data works?
  3. Last, language features. Do you know special functions for better and fast results?

I want to go through an interview question where we can see all three items above.

Question

You have the revenue transactions table, write a query that finds the third purchase of every user.

How do you should start to answer?

  1. Ask about the structure of the revenue transactions table. How many columns it has and the name of columns you should show in the answer. Don’t work with the table you imagine. You should know what defines a purchase.
  2. Listen carefully about the types of data for each column and the values they can take. It is useful to know because sometimes datatype makes decisions about what functions we can use.
  3. Add some values into this table for a better understanding of how this table works.
After few questions, we can draw a scheme for our answer.

In few words, we should answer three questions: what, how, where.

  • What? After all clarifying questions, we know how many columns represent purchases for the customer.
  • How? We know about values and datatypes for each column. Are they enough for creating a solution? Or we need to transform the table with sub-queries.
  • Where? Do we have any conditions and restrictions for selecting the right data?

Now we ready to create a solution. Here we have one tricky question. How to found each third purchase? How can we count rows for each customer?

Ranking functions came to save us! We can use a set of window functions where rows from the result table have ranked according to a certain scheme. There are three ranking functions:

  • ROW_NUMBER()
  • RANK()
  • DENSE_RANK()

But we should remember that ranking functions are non-deterministic. What does that mean? A non-deterministic function may return different results every time it is called, even when the same input values are provided.

Let’s see the difference between these three function for our table:

SELECT id, user_id, created_at, product_id, quantity, 
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at ASC),
RANK() OVER (PARTITION BY user_id ORDER BY created_at ASC),
DENSE_RANK() OVER (PARTITION BY user_id ORDER BY created_at ASC)
FROM transaction
ORDER BY user_id

How we can see:

  • row_number — assigns unique numbers to each row within the PARTITION given the ORDER BY clause.
  • rank — behaves like ROW_NUMBER(), except that equal rows are ranked the same.
  • dense_rank — Trivially, is a rank with no gaps.

And RANK() function return the similar values for difference id purchase with the same date. Therefore it is important to know how accurately time is recorded in this example, adding hours and minutes to the time format can help to identify range correctly.

For the first step to creating a query, we should select all rows user_id, created_at, product_id, quantity and add the range for each. It can be a subquery that returns a table with the additional rank_value column.

SELECT user_id, created_at, product_id, quantity
FROM (SELECT user_id,
created_at,
product_id,
quantity,
ROW_NUMBER() OVER (PARTITION BY user_id
ORDER BY created_at ASC) AS rank_value
FROM transactions) AS t
WHERE rank_value = 3;

In real-life situations, we need detailed information about types of data and how this data record in the database.

Summary

It’s long to talk about SQL queries, but very fast to do it! In this blog, I wanted to show you how one little nuance can work for real-life solutions.

--

--