How to use Python and SQL together.

5 min readJan 24, 2021

Python is an interpreted and high-level programming language with advantages. How we know, open-source libraries with variants amount of tools make it a useful tool for Data Scientist, Data Engineers, and Data Analysts. Python supports various databases like SQLite, MySQL, Oracle, MongoDB, PostgreSQL, etc. A lot of enterprise businesses stored their data in a relational database. Relation databases are intuitively understandable for users and have an efficient and powerful way to create, read, update and delete data of all kinds.

So, if we marry Python, and SQL it will give us some advantages working with data for projects. Usually, each relational database management system (RDBMS) uses a slightly different SQL. But the concepts, structures, and operations are almost identical. Your code written for one database can work with some modifications for another.

I want to show you how to works with SQL in Python on the example of the SQLite library. I use this library when I have several files in different formats about one theme, and I need to connect them. I think it’s a good practice to use Pandas Dataframe to explore your data, but when each table has a different number of columns and all columns completely differents, our code can be slow and confusing.

I use Jupyter notebook to represent my code.

Beginning of work

First, to use SQLite, we must install and import library sqlite3.

# import libraries
import sqlite3

Here I want to add one note for installing additional libraries into the Jupyter notebook environment. We should set up a pip package in the current Jupyter kernel.

# install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install sqlite3

What is SQLite? SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.

Prepare files for work

Second, collect all your files for the project in one directory. Сertainly you can save your files whatever you want. But then you will need to do more work with the files’ path and os library. Maybe next time?

All my projects have a folder called data for storing all pieces of information I download, scrub, or create.

# for work with path/dir
import os 
from glob import glob# use glob to get all of the csv files
files = glob("./data/*.*")
# check what we have 
print("Files for project: ")
for file in files:
    print(file)

This is the result of the command. I collect all files in the folder and check what I have for work. Each file has a unique collection of columns. For example:

tmdb_movies has [index|genre_ids|id|original_language|original_title|popularity|release_date|title|vote_average|vote_count] columns
imdb_title_crew has [tconst|directors|writers] columns

Create Database

Third, create a database for files. It can look a little weird, but we start work with the database from the connection line. Why? Some API and pythons’ libraries magic, and if the database is existing, it returns a connection object. Otherwise, Python will create a file with the given name. It is not good practice because one little typo in the code can give your hundred databases, but it still simplifies your code and work.

# this will create a sqlite file in your directory
# add name for sqlite file
db = 'movies_database.sqlite'
conn = sqlite3.connect(db)

Here, I want to make only one note: the database file, with the name from the connection line, will be created in the same folder as pythons’ files. If we wish to change it, we need to change the path name while opening the file.

How we execute command?

Now we have files, database and ready for the next step: put our files into the base. How can we do that? For database programming with Python, we should use a control structure — Cursor. The cursor is needed to execute SQL commands, traverse and fetch the records of the database.

# code for creating cursor after we have connected database
cur = conn.cursor()

Every cursor instance has a method called execute() to run a single SQL statement.

To create a table in the database, create an object and write the SQL command in it with being commented. Usually, the command to create a table looks like this.

# SQL command to create a table in the databaseCREATE TABLE employees (
staff_number INTEGER PRIMARY KEY, 
first_name VARCHAR(20), 
last_name VARCHAR(30), 
gender CHAR(10), 
joining DATE)

When we need to use this code in Python, we add more command:

sql_command = """CREATE TABLE employees (
staff_number INTEGER PRIMARY KEY, 
first_name VARCHAR(20), 
last_name VARCHAR(30), 
gender CHAR(10), 
joining DATE)"""# and execute the statementcur.execute(sql_command)# This code creates a table called employees with columns staff_number, first_name, last_name, gender, joining in the database we connected.

But I’m a little cheating and sent all files through data frames into the database. I put all my files into the dictionary, where filenames are keys and values are information from files.

files_dict = {}
# working with csv files
for filename in files:
    # cleaning the file names
    filename_cleaned = os.path.basename(filename).replace(".", "_") 
    # read file to DataFrame
    filename_df = pd.read_csv(filename, index_col=0)
    # use cleanfilename as a key and the df as its value
    files_dict[filename_cleaned] = filename_df

And in the end, we have all files into one.

# Write a function that converts the dataframe to a sqlite table
def create_sql_table_from_df(df, name, conn):
    # Use try except
    # it will try to make a table
    # if a table exists the function will execute whatever you put in the except part
    try:
        df.to_sql(name, conn)
        print(f"Created table {name}")
    
    # if the table exists it will tell you, and won't cause an error
    except Exception as e:
        print(f"could not make table {name}")
        print(e)
        
for name, table in files_dict.items():
    create_sql_table_from_df(table, name, conn)

You can always check how many tables in the database sqlite_master.

# check tables which we have for work
for row in cur.execute(“SELECT name FROM sqlite_master WHERE type=’table’;”):
    print(row)

Result:

Now I can connect all these tables by the key columns and analyze information by parts.

For example:

Summary

In my blog, I try to give simple instructions for creating the database using Python and SQLite. SQLite is a useful tool when we work with several files, which have the same key, but different amount of information inside.