Coding and Implementing a Relational Database using MySQL – Towards Data Science
Sign up
Sign in
Sign up
Sign in
Craig Dickson
Follow
Towards Data Science
--
1
Listen
Share
This is part 2 of a 3-part series taking you through the process of designing, coding, implementing and querying a relational database, starting from zero. See part 1 (Designing a Relational Database and Creating an Entity Relationship Diagram) here, and part 3 (Data Analysis in MySQL — Operators, Joins and More in Relational Databases) here.
All the code and information for this tutorial can be found on the associated GitHub repository. I used Lucidchart to make the diagrams shown in the article.
In part 1 of this series, we covered going from conception of the database to making a plan, and finally creating an Entity Relationship Diagram spelling out the relationships that we will need to model using our SQL code. We covered the basics of Relational Database theory, and talked about primary keys, foreign keys, cardinality and a lot of other really useful material.
It’s a good idea to read that before starting this article, but you can also read this one on its own if you prefer to dive right into the code.
Alright! Now we’re about to get to the real fun stuff, actually creating and populating our database using SQL.
Before we dive into this, we will need to get MySQL Community Server set up on our system, if it isn’t already. The installation is a little more complex than installing a typical application (at least on Windows!), but we can handle it.
First go to the download page and get the installer for your system. If you are using Windows, this guide will help you get set up. Here are guides for Mac and Linux users too (although it may vary by Linux distribution). A full walk-through is beyond the scope of this article, but you can do it. I believe in you!
Once you get this set up, we can do everything that follows in the MySQL command line client.
The command line client is great and powerful, and essential to learn. Sometimes it’s nice to make our lives easier by using a GUI client, however.
PopSQL is a good GUI application for SQL, which makes things look a little nicer, as well as providing very useful features like the ability to look over your database schema in the left-hand navigation window, alongside some basic data visualisation and convenient export features.
For professional use there is a fee, but there is also a free tier that will definitely be enough for learning and playing around with SQL. There are lots of other GUI options out there as well, so give them a try if an alternative sounds appealing. We’ll be using PopSQL for the rest of this article however, so that’s where the screenshots will come from.
To answer any further questions about the SQL statements we use here, the best step is to look at the MySQL documentation. This is the official resource produced by the developers and contains detailed and comprehensive information. While it might seem intimidating to begin with, reading the documentation is a great habit to get into. Just search for the statement or topic of interest and see what comes out. As with all things code-related, google (or a search engine of your choice) is your friend!
Once we have our environment set up, we need to create a database on our MySQL server. This is not too complicated, we just have to use the CREATE DATABASE statement. Note that all SQL statements are closed with a semicolon ‘;’. This lets the client know that this is the end of our statement, which is especially useful when writing longer and more complex statements, like nested queries, but is essential for all statements (except the USE command, which we will see next).
Nice! We just created a database with the name
school
. Incredible work.Next time we log into MySQL command line client, we will need to enter our password and then choose which database we want to use. To do this we use the use command:
As simple as that. If we are using GUI software we don’t normally have to take this step every time.
Note that by convention we put the keywords (like SELECT, DELETE and CREATE DATABASE) in upper case, and the rest of the code in lower case. SQL is actually case insensitive so the code works perfectly well all in lower case or all in upper case (or in mOcKiNg SpOnGeBoB meme style if you really have to). This convention is for the humans who will read your code later, and it’s strongly recommended to follow this practice when writing any flavour of SQL.
Building our database is going to be hugely simplified by the work we have already done creating our ERD and defining the entities (which will be created in our database as tables) and their relationships. The final step we need to take before creating the tables in SQL is plan which data type each attribute will need to have. This has to be declared when creating the table, so we need to think about it before we take that step.
MySQL supports a wide range of data types, from simple integers and strings to BLOBs and JSONs. We will just be using a small subset of these in our database.
We’re going to use the following;
With this in mind, we can start defining appropriate data types for our tables.
Here we have defined our teacher_id as an INT, first_name and last_name as VARCHAR(40), language_1 and language_2 as VARCHAR(3), dob as a DATE, tax_id as INT and phone_no as VARCHAR(20). It’s perfectly possible to choose different values or even different data types (perhaps the tax_id in your country includes text characters? Then an INT won’t work for you), this is all part of the art of database design.
To create this table in our database, we will use the following code:
The SQL statement we use here is simply CREATE TABLE. This is followed by the name we want to give the table, then parentheses and the attribute names and associated data types.
We have also added some constraints to some of the attributes in our table. Contraints specify rules for data in a table, and will constrain what the RDBMS will allow us to do to that particular attribute.
We have added NOT NULL to first_name, last_name and language_1 — this means that the table will not accept a record where any of those attributes are set to NULL. Those attributes require a non-NULL value for every record. This makes sense, as our teachers will require a first name and last name, and to teach in a language school they need to be able to offer at least one language.
We have also set tax_id to be UNIQUE. This means that every record must have a different value for this attribute, which makes sense for a tax ID number, but would not make sense for our language_1 field, for example — we will likely have multiple teachers who offer the same language!
We have also set the teacher_id field as our PRIMARY KEY. In practice this is really just a combination of NOT NULL and UNIQUE, but it is important to define one primary key (which again, can be a single attribute or a combination of different attributes) for each table.
If we now try to use a
SELECT * FROM table
statement to see what data is in our table (we will go over this statement in more detail in part 3), we should receive ‘Empty set’ as our response (in MySQL command line client) or ‘No results found’ (in PopSQL). This shows us our table exists (otherwise we would receive an error), but is still empty. Exactly as we would expect!Now that we have created our first table, let’s delete it!
The passion for destruction is a creative passion, too. A good way to understand how easy it is to delete something in MySQL (and why I was so concerned about doing that accidentally in the past) is to go ahead and do it. This is also the best way to overcome those concerns, as it is in fact relatively difficult to do by accident.
The relevant SQL statement is DROP TABLE.
Go ahead, do it! It’s liberating!
Notice that there is no helpful ‘are you sure you want to do this?’ dialog box which pops up. There is also no undo button, just a simple ‘Query OK’ or ‘Success’ message, and the table is gone. This is also the case when the table contains data. So SQL is a very powerful language, but it can be unforgiving. Make sure you’ve thought through what you’re doing when you’re deleting or updating tables!
Don’t forget to reinstate the Teacher table before moving on to the next sections.
Here we’re using the same data types again, the only new one is BOOLEAN for in_school. This is going to be TRUE if the class takes place on the premises of the International Language School, and FALSE if it take place elsewhere, i.e. at the offices of the client.
The code to create these tables is as follows:
Now we have a table for each of our entities, great!
The next step is to establish the relationships between these by setting up our foreign keys. Happily, we thought about this and planned these way back when creating our ERD, so now we just need to put them into the correct syntax and update our tables using the ALTER TABLE statement.
Note that these can be added during the CREATE TABLE step, but that calls for greater planning in the order of our statements, as we can’t create a relation between tables in MySQL until both tables are created. Using ALTER TABLE after the initial table creation helps us to keep these steps separated and can be a little more forgiving, but both methods are useful to know.
Let’s take a closer look at the first of these commands to see what we’re doing here:
Here, we are updating the participant table, and creating a relationship (
ADD FOREIGN KEY
) where the attribute/column client on the participant table references the client_id attribute/column on the client table, exactly as we planned. This establishes the relationship between these tables and makes our database into a relational database. Hooray!We also have a new constraint here — ON DELETE SET NULL. This tells MySQL what we want to do when a record in the client table is deleted — in this case the value of the client attribute for that participant will be set to NULL, but the record will remain in the participants table. The other option is ON DELETE CASCADE. If we used this here then when a record is deleted from the client table all participants linked to that client via this foreign key relationship would also be deleted.
ON DELETE SET NULL is the more conservative option here, but there are good reasons to go with ON DELETE CASCADE in many circumstances. Again, this is part of database design, considering the implications of this type of decision on your database. When we’re dealing with bigger databases then these decisions will also have effects on the performance of the database, which we also need to factor in to our decision-making.
If you remember, way back in part 1 when we were designing our database and creating our ERD, the last point we looked at was the many-to-many relationship between participants and courses. This relationship is different from the 1-to-N relationships we have encountered so far, and as such needs to be handled differently in our database.
In a 1-to-N relationship such as that between our Teacher and Course tables, it is enough to add a foreign key to the entity on the N side of the relationship. We did this above where we referenced the teacher column of our course table to the teacher_id column on our teacher table.
This means that any course can only be taught by one teacher, but that a teacher may teach many courses. The foreign key perfectly enables us to capture this relationship in our database.
In the case of participants and courses, however, this won’t work. A course may be taken by multiple participants, and a participant may take multiple courses. Adding a foreign key to one, or even both, of these tables wouldn’t capture every way in which these are related without creating multiple copies of our records.
To capture an N-to-M relationship in a relational database we must create another table which connects the primary keys of the entities involved in the relationship, along with any other useful information that it makes sense to store here.
For our case, we will create a new table called takes_course:
This is our first composite key! We have set the two columns participant_id and course_id together to be the primary key for this table, as only this combination will be unique. Note that each of these is the primary key in their respective table. It is necessary that the attributes we use in this table uniquely identify the records in the respective tables, so the primary key column (or columns) is usually the best choice for this.
We are only using these two columns, as we don’t have any additional data to store here. That may not always be the case — for example we may have a table for customers and a table for salespeople, and we might like to store the total sales by each salesperson to each customer on the table that captures their relationship. That would be entirely valid and possible, but in our example all we are interested in is the fact that the participant takes the course, all the other details regarding the participant and the course are more logically stored in their respective tables.
Let’s create the table! This time we will add the constraints at the same time, as all our other tables already exist.
Here we have set the two columns
participant_id
and course_id
as the primary key by including them in parentheses, separated by a comma. This is how we create a composite key in MySQL.Also notice that this time we have used ON DELETE CASCADE. This makes sense here as if a participant is deleted from the database, we don’t need to continue to store the courses they are taking, and likewise if a course is deleted we no longer need to store the participants taking that course.
Now we have all of our tables created in a fully functioning relational database. Really great work.
Now we have our database ready, the relationships are defined and it’s waiting to get into action. We just need to add the teachers, courses, clients and participants and we have ourselves a fully-functioning relational database, which we have built from the ground up!
To populate our tables with data, we need to use the INSERT INTO statement, followed by the data we wish to insert in parentheses. For example, to populate the first row of our Teachers table, we would use the following statement:
Which gives the following output:
The syntax here is INSERT INTO table VALUES (data we want to insert, separated by commas). It is essential that we have a value (or a NULL value) for every attribute of the record (every column of the table), or MySQL will give us an error. They also need to be in the same order as the columns in the table, and follow any constraints we have placed on the columns (i.e. no NULL values in columns set as NOT NULL, no duplicates in columns set as UNIQUE).
We can insert multiple records by including the values in a list, separated by commas, as so:
It is also possible to perform a partial insert by specifying the columns we wish to populate, as below:
Running this statement gives us:
Notice we’re using the SELECT statement to retrieve the contents of our table. This is one of the most important statements in SQL, and we will use it all the time, any time we want to retrieve data from our database.
The format here is:
In this case
SELECT *
means ‘select all columns’. We could also enterSELECT teacher_id, last_name, tax_id
, for example, if we were only interested in retrieving those columns.Before we populate the rest of our data, let’s see how we can delete records from our table. We do this using the DELETE FROM statement. We need to be very careful here, as if we don’t include a condition clause (using WHERE) with our DELETE FROM statement we can delete all the data in our table.
This will delete all records where the condition ‘teacher_id = 2’ is satisfied, so just the record we created for Stefanie Martin a moment ago. Sure enough:
This is our first time seeing the WHERE clause. As we use SQL, we will find more and more uses for this, in SELECT statements, as well as DELETE and UPDATE statements. We will explore this more thoroughly in part 3 of this guide.
We can also make changes to records using the UPDATE clause. This uses syntax UPDATE table SET column_1 = value_1, column_2 = value_2, WHERE condition.
In our table, perhaps James Smith gets married and wants to take his wife’s last name. We can use the following statement in MySQL to make that change:
Remember that WHERE clause, or we’ll change every last name in the table to Jones! Running this statement will give us:
Now we have covered each of the CRUD functions — Creating, Reading, Updating and Deleting. We are well on our way to being able to do whatever we need to with databases!
Now that we’ve seen how to INSERT, DELETE and UPDATE records, let’s go ahead and put the rest of our data into our database. You can see the data set we will be importing in Excel format in this GitHub repository, along with all the SQL code we have used (plus a little that we haven’t) in a .SQL file there.
Here is the necessary MySQL code to populate the database with the values (remember to delete the entry for James Jones in the teacher table first!):
Nice! Now all of our tables have been populated with the data, all the relationships between the data have been defined and implemented, all the attributes have appropriate data types and constraints and the database is ready for use. Fantastic work!
In this article, we have gone from having a plan for a database, with ERD ready to go, to having a fully implemented and populated database in MySQL. We have chosen our data types, defined all our fields, created our tables and defined the relationships between them using MySQL code.
The next step is to start analysing that data. Stick around for part 3 of this guide — Data Analysis in MySQL — Operators, Joins and More in Relational Databases — to see a good selection of the powerful tools available in MySQL for extracting, manipulating and updating the data in our newly-created database.
Thanks very much for taking the time to come with me on this journey. I always welcome your feedback — please get in touch with me via my website and let me know how I could have done this more effectively! I am always open to constructive criticism, or any of your comments.
Until next time!
--
--
1
Towards Data Science
Data Scientist, based in Berlin, Germany. Python, SQL, R, Tableau. He / him. I’m the one on the left in the photo. http://www.craigdoesdata.de
Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams
source