Data Engineering Interview Series : SQL – Commands and Order of Execution

Data Engineering Interview Series : SQL – Commands and Order of Execution

Written by Anju Mercian

Python

Read here for Part 1, where I go over SQL Fundamentals and Concepts. These articles are meant to accompany my recent Data Engineer Interview series on Women Who Code YouTube channel- Python playlist.

The industry standard programming language used for querying is Structured Query Language (SQL). SQL has now become a very powerful tool in the day to day life of a data engineer.

Thus, for a data engineer knowing SQL thoroughly becomes imperative. Hence, any interview for a data engineer or data science or a data analyst position will include probing questions on your knowledge of SQL. 

In this series, I will try to provide a cheatsheet not only for data interviews on SQL but also for your everyday work. The flow of the series will go over the SQL commands and interject theory with few questions and solutions. For practice, I will also link the question so if you would like to try out the problem and warm up your SQL knowledge before looking at the solution.

Note: The SQL commands I share are the ones that have helped me in my interview process and my day to day life as a data engineer. 

SQL Commands

To create the database

  • CREATE DATABASE <data base name>;

To drop database

  • DROP DATABASE <data base name>;

Basic DDL and DML commands

  • CREATE TABLE: Used to create a new table in a database
  • CREATE TABLE table_name(

 col1 datatype,

 col2 datatype,

 col3 datatype,

);

The column parameter specifies the column name and the datatype parameter specifies the datatype like varchar, integer etc.

  • ALTER TABLE: To add, delete or modify columns in an existing table.
  • ALTER TABLE table_name
  • ADD col_name data type
  • INSERT INTO TABLE: To insert new records.
  • INSERT INTO table_name (column1, column2, column3, …)
  • VALUES (value1, value2, value3, …);

SQL Commands

Order of Execution

Also known as Order of Operations – the steps you take in order to accomplish your goal. 

  • Picking the right SQL order of operations is important if you want to run efficient, effective queries. 
  • The SQL order of execution defines the order in which the clauses of a query are evaluated. 

Some of the most common query challenges people run into could be easily avoided with a clear understanding of the SQL order of execution, sometimes called the SQL order of operations. Understanding SQL query order can help diagnose why a query won’t run, and even more frequently will help you optimize your queries to run faster.

The SQL order of operations refers to the sequence in which different clauses of a SQL query are executed. Each clause has a specific purpose and performs a specific task, and the order in which these clauses are executed can affect the final result of the query.

Order of operations

The order of operations is as such:

  • The table in the From is first scanned

  • Next if there is a join then ON is scanned 

  • Then JOIN of the tables on the column specified happens

  • Next step the filters are scanned and executed with the Where 

  • Then Group BY is executed 

  • After which the Having clause will be executed.

  • Only after that, the columns are selected using Select 

  • Then ends execution with Distinct, order by, limit clauses.

Note: if you are going to use columns in a way that prevents pre-filtering, the database will have to sort and join both full tables. 

SQL Question Time

Link

Solution:

with rank_company as(

select company, profits,

rank() OVER (order by profits desc) as ranking

from forbes_global_2010_2014

)

select company, profits from rank_company

where ranking < 4

order by profits desc