Here is a toy database which models an online music store. In this database, we have falling into three categories: information about the company, information about customers and their orders, and information about the products being sold.
You'll be editing the queries.sql file, which consists of a series of "plain English" sentences. Your job is to translate those sentences into a single SQL SELECT query. Don't worry if you can't do it — take your best shot!
We've set up our own PostgreSQL server that you'll be connecting to.
- Contents
- Exercises
- Installing PostgreSQL
- Using PostgreSQL
- The Database Tables
- One Key Idea: Normalization
- Useful Resources
The exercises are contained in individual .sql files. They are meant to be completed in the following order:
queries.sql— Basic SQL queries on a single tablegroup_by.sql— Single-table queries usingGROUP BYjoin.sql— Two-table queries usingJOINleft_join.sql— Two-table queries usingLEFT JOINmulti_join.sql— Queries against more than two tables simultaneously
You can install PostgreSQL on a Mac with brew:
brew install postgresqlOnce installed, start the database server with
brew services start postgresqlYou can install PostgreSQL on Windows using [Chocolatey's][url-chocolatey] choco command:
choco install postgresqlSee the [Chocolatey PostgreSQL package page][url-chocolatey-postgresql] for more details.
Once installed, you have to add PostgreSQL's bin directory to your PATH environment variable. Search for Edit system environment variable to open System Properties. From there, click the Environment Variables button.
Under User variables, click the row labeled Path and then click the Edit... button. Click New and add the following directory to the PATH environment variable:
C:\Program Files\PostgreSQL\12\bin
Connect to database by running this in your terminal:
psql -h sql-exercises.20bits.com -U adjacent_student adjacent_sql_exercisesYou will be prompted for a password. Ask your instructor for this information.
Once inside, type the following at the PostgreSQL prompt
adjacent_sql_exercises> \dtto list all the tables in the database. To see the schema for a specific table, e.g., the invoices table, you can use PostgreSQL's \d command:
adjacent_sql_exercises> \d invoices
If you see a colon : at the bottom of the screen, that means there's more to see. You can scroll up and down using the arrow keys. You can exit the scrolling interface by pressing the q key.
Don't be afraid to explore the data in the tables to get a better feeling for how it's formatted. It's impossible to damage your database as long as you're only running SELECT queries. For example, what does the data in the invoices table look like? Well, let's look at a 5 rows
SELECT * FROM invoices LIMIT 5;That's what we mean by "explore."
When you're ready to exit the PostgreSQL shell you can press Ctrl+D or run the following command:
\qRemember: relational databases are organized into tables. Each table has a "schema", which dictates the fields (aka columns) that the data in the table can or must contain. The data itself is stored as records (aka rows).
Here are the tables, organized by high-level purpose.
albumsare the albums our store is sellingartistsare all the artiststracksare all the tracks, containing a column that tells us which album the track belongs to.media_typestell us what format a track is in, e.g., MP3, Apple Audio, etc.genresare a list of genres.playlistsare, well, playlists. A play list has many tracks and a single track can belong to many playlists.playlist_trackstells us which tracks belong to which playlists
customersare our customer records and they may or may not have an assigned "customer support representative"invoicesare a list of specific customer ordersinvoice_linesare the line-items on specific invoices
employeescontains information about our company's employees. This is mostly used to assign support representatives to customers.
One of the key ideas in how relational databases like PostgreSQL and MySQL organize data is that we try to minimize redundancy by using references to other data rather than duplicating that data between different tables.
For example, every track belongs to one (and only one) album. An album has its own associated information, like album title, publication date, and so on. Each track also has its own associated information, like track title, track position, and duration. We want to be able to ask questions like "What is the title of the album on which track X appears?"
If you imagine a spreadsheet with a bunch of track listings, one way to achieve this would be to have an "Album Title" column and to answer this question we would just look at the value in the "Album Title" column for track X. Every track on the same album would have the same value in the "Album Title" column, although heaven help us if there are two separate albums with the same title!
This is not how we store information in a relational database. Rather than storing album-related information in the same table as track-related information, we store album-related information in an "albums" table and track-related information in a "tracks" table. We assign each track and album a unique ID. In the "tracks" table we would when have an album_id column containing the unique album ID as a reference or pointer to the relevant row in the "albums" table.
Excel can do this, but it is too cumbersome for the most common tasks. In a relational database like PostgreSQL or MySQL, however, it is much easier to deal with. In fact, SQL (the language) is counting on you storing your data this way.
