Stored procedures are not very commonly used. I worked for a while before i had to write my first stored procedure. They weren’t exactly hard but they weren’t easy too.
So what are stored procedures? The technical definition is
A procedure (often called a stored procedure) is a subroutine like a subprogram in a regular computing language, stored in database. A procedure has a name, a parameter list, and SQL statement(s)
Think of them as stored MySQL queries but not to be confused with MySQL views. Views are mostly used to generate commonly called queries and they mostly only display data. Procedures are a bit complex as they can be used to do simple to complex processing of data and later do an insert or update into another table.
So what exactly are the advantages or benefits of using stored procedures? Why not just run normal SQL queries?
- Stored procedures are fast. MySQL server takes some advantage of caching, just as prepared statements do. The main speed gain comes from reduction of network traffic. If you have a repetitive task that requires checking, looping, multiple statements, and no user interaction, do it with a single call to a procedure that’s stored on the server.
- Stored procedures are portable. When you write your stored procedure in SQL, you know that it will run on every platform that MySQL runs on, without obliging you to install an additional runtime-environment package, or set permissions for program execution in the operating system, or deploy different packages if you have different computer types. That’s the advantage of writing in SQL rather than in an external language like Java or C or PHP.
- Stored procedures are always available as ‘source code’ in the database itself. And it makes sense to link the data with the processes that operate on the data.
- Stored procedures are migratory! MySQL adheres fairly closely to the SQL:2003 standard. Others (DB2, Mimer) also adhere.
Alright. Enough of talking, lets code
Imagine a scenario where you are storing a person’s data i.e their DOB(date of birth), Height and weight. With this data you are required to generate the age and bmi of the person for reporting purposes.
So what do you do? You could write an SQL query that calculates that data and shows the required data. Something like
id as person_id,
ROUND(((weight)/POWER(2,height/100)),0) as bmi,
TIMESTAMPDIFF(YEAR,DOB,CURDATE()) as age
But think if you had to do this for thousands of people and even need to order the list based on something like age, bmi and even make assessments whether the person is overweight, underweight or just normal. Now imagine if you were still using the same tables for recording data as well as reading data for thousands of patients. Chances are your database will begin to get overwhelmed with all the connection and queries
That’s where things like ETL come in. ET what?
What is ETL? ETL stands for Extract, Transform and Load, which is a process used to collect data from various sources, transform the data depending on business rules/needs and load the data into a destination database
Essentially what we do is get data from the source table, process it and store it into a destination table from which we can be able to read easily. Lets do an example
We’ll use MySQL database. You can install it or if you are familiar with Docker you can spin a MySQL container from an image. One installed, we’ll create two tables. Lets call one
person_data and the other
flat_data . person_data will hold the raw person data while flat_data will hold the processed data
We’ll create 2 databases. One for holding the raw data. We’ll call it
db and the other for warehousing our processed data. We’ll call it
id INT NOT NULL AUTO_INCREMENT,
DOB DATETIME NOT NULL,
height INT NOT NULL,
weight INT NOT NULL,
PRIMARY KEY (
person_id INT NULL,
bmi INT NULL,
age INT NULL);
Add any number of records to the person_data table. In our case we’ll add 2 rows
INSERT INTO `db`.`person_data` (`DOB`, `height`, `weight`) VALUES ('1990-01-01', '164', '73'), (‘1990-01-01’, ‘184’, ’74’);
Nice! Now we should have 2 rows in our table. You can confirm this by running the query
SELECT * FROM db.person_data;
Now we write the stored procedure
insert into etl.flat_data(
You might be wondering what that “DELIMITER $$” stands for. Usually a delimiter shows the end of the code which is usually a “;” for SQL. By stating “DELIMITER $$” we are telling MySQL to instead look at “$$” as indication where the code ends. This is because a procedure tends to have SQL statements inside which will have a “;”. That will cause confusion as MySQL will assume that’s the end of the query. By changing the delimiter we can now freely use semicolon in the queries and dollar sign to indicate the end of the stored procedure
We are creating a stored procedure called
generate_flat_data which doesn’t take any parameters otherwise it would have been something like
generate_flat_data(parameter). . This procedure can now be found under the stored procedures folder as shown below
So what is happening inside the stored procedure?
We are doing the following
- Fetching data from person_data table
- Processing it i.e converting the data to age and bmi which are calculated values
- Storing the processed values into another table called flat_data
So now that we have created the stored procedure, how do we call it?
Simple just run
If it was successful and no errors were returned then you can confirm if the data was actually successfully transformed and saved
Wala! The data should be stored after having been transformed. Congratulations, you have successfully done your first ETL using MySQL stored procedure . Hooray!
There are plenty of tools used to do ETL and that is up to you. So don’t be limited to stored procedures