Postgres - exercises

Kamila Chyla

2021-10-04 1538 words 8 minutes

Contents

SQL Exercises

As part of my SQL training and in order to know postres better I am (slowly) doing exercises from pgexercises.com page.

I like that page very much:

it does not want my email address
it does not require me being logged in
it remembers my progress
ine can solve many SQL-related mini-tasks in different categories (from simple selects to joins, data modifications, to aggregate functions)

Although database schema for those exercises is very simple, I’m struggling a bit when things get complicated…

🧗‍♀️ 😅 👏

Therefore I think that page is perfect as an SQL refresher for someone like me who was using SQL on almost daily basis and now needs to bring all those information back from the depths of my brain, recall specific syntactic constructs and re-learn how to apply them in practice.

schema for database used in exercises from pgexercises.com

My solutions with comments will be available as GitHub gist soon.

Progress with postgres!

Right now I’m going through the aggregate functions section. I can see my progress on the page as a list with all completed exercises marked green:

My discoveries

if a column is of timestamp type and I need to select rows within specific date, I can:
- use comparison operators with a literal date in WHERE clause, for example: if I want to search for events on 10th of May 2017, I can write event < '2017-05-11' and event >= '2017-05-10'
- or I can use a function date_trunc which, given timestamp value and position name, would clean all positions with “lower resolution”, e.g. cleans time part of a timestamp: where date_trunc('day', event) = '2017-05-10'

when I want to insert calculated data, I cannot use this syntax:

1
2


insert into cd.facilities(facid, name, membercost, guestcost, initialoutlay, monthlymaintenance)
values (select max(f.facid) + 1 as facid from cd.facilities f, 'Spa', 20, 30, 100000,800);

And have to use this instead:

1
2


insert into cd.facilities (facid, name, membercost, guestcost, initialoutlay, monthlymaintenance)
select (select max(facid) from cd.facilities)+1, 'Spa', 20, 30, 100000, 800;

and rigth now I have no idea why my solution wasn’t accepted.

Date arythmetics and date/time functions

Postgres has plenty of interesting functions which one can use in SQL queries, so that it is no longer necessary to calculate them in progrmamming language which calls dbms queries. For example:

date operations may use intuitive date/time arythmetics
there are additional functions which allow to, for example, create caluies of interval, date or timestamp types

Plans

After I finish SQL exercises, I’m going to dive into intersting articles about postgres types: Po zakończeniu ćwiczeń z SQL-a zamierzam przyjrzeć się bliżej interesującym mnie obszarom:

json, jsonb and jsonpath which allows to search inside json types values
geometric types which allow to represent points, lines, circles, paths or polygons
array types
range types

Beside those, postgres handles many other types: (network types, object identifiers, domain types or complex types), of which my knowledge is very limited right now 🤔 .

Functions documentation is impressive; I’d like to look at those (even if my understanding would still be superficial); in particular, I’d like to use them practically, for example during real datasets analysis.

We’ll see - the naps of my little daughter are shorter, so time is even scarcier resource than it’s ever been before, and there are so many interesting things out there waiting to be discovered or learned about 😄.

Update 2021-10-05

You cannot use column alias in HAVING clause

Postgres disallows using column alias (e.g. for a column value calculated using complex query with aggregate function) in having clause; one has to either:

repeat column definition in having clause or
wrap the query in outer select statement and use a condition in where clause.

This is what I learned trying to solve revenue task. I stumbled upon following error:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


exercises=# select f.name, 
        sum(b.slots * (case when b.memid=0 then f.guestcost else f.membercost end)) as revenue 
from cd.bookings b join cd.facilities f on b.facid = f.facid 
group by f.facid 
having revenue > 1000 
order by revenue;
;
ERROR:  column "revenue" does not exist
LINE 5: having revenue > 1000 
               ^
exercises=#

and had to use second approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


select name, revenue from (
select f.name as name, 
        sum(b.slots * (
            case when b.memid = 0 
            then f.guestcost 
            else f.membercost 
            end)) as revenue 
from cd.bookings b join cd.facilities f 
on b.facid = f.facid 
group by f.facid  
order by revenue
) as x  where x.revenue < 1000;

extract function

Syntax: extract (field from source)

As a source you can provide any value of type timestamp, date (will be cast to timestamp), time and interval.

A field can be one of: century, day, decade, dow, doy, epoch, hour, isodow, isoyear etc. (full list).

Instead of date_part('month', eventstamp) one can use extract(month from eventstamp) function: it’s easier to input since you don’t have to put apostrophes around the literal :)

For example:

1
2
3
4
5
6
7


select 
	extract(year from a.date) as year, 
	extract(day from a.date) as day, 
	extract(doy from a.date) day_of_year,  
	extract(isodow from a.date) day_of_week_monday_is_1,  
 	extract(timezone_hour from a.date) tz 
from (select now() date) a;

ROLLUP

Oh, this one is great: one can easily calculate partial (prefix) sums and totals in case of hierarhical data as in (example task):

1
2
3
4
5


select facid,  extract(month from starttime) as month, sum(slots) as slots 
from cd.bookings 
where extract(year from starttime) = 2012 
group by rollup(facid, month) 
order by facid, month;

ROUND

Two argument function that does rounding of first argument to the number of decimal digits (after dot) given by second argument. I used this function automatically without even checking with the documentation that such funciton actually exists. It should, right? I probably used some variant of it before.

Since it is not as clear as it sounds what rounding realy means, let’s check with documentation:

Rounds to nearest integer. For numeric, ties are broken by rounding away from zero. For double precision, the tie-breaking behavior is platform dependent, but “round to nearest even” is the most common rule.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10



select 
	round(123.456, 3) a,
	round(123.456, 2) b,
	round(123.456, 1) c, 
	round(123.456, 0) d,
	round(123.45, -1) e, 
	round(159.50, 0) f,
	round(159.49, 0) g
	;

Windowing functions

I used them for the first time in count members task - this time I’ve read the explenation and then the following:

tutorial that makes quite nice an introduction
syntax of windowing functions
windowing functions chapter which requires a lot of thinking - and my intuition is rather poor about when to use which syntax; probably that kind of ease comes with experience

Correlated subqueries

Correlated subquery appeared at rolling average task.

I find it interesting: a query is using generate_series function (it is one of so called set generating functions) to generate a list of dates from given month (August 2012). Then for each generated date (a set of generated dates is in FROM clause) the SELECT statement chooses two columns: a date and an average calculated “in place” using correlated query.

This is the recommended solution for rolling average which I put here to remember this pattern better:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


select 	dategen.date,
	(
		-- correlated subquery that, for each day fed into it,
		-- finds the average revenue for the last 15 days
		select sum(case
			when memid = 0 then slots * facs.guestcost
			else slots * membercost
		end) as rev

		from cd.bookings bks
		inner join cd.facilities facs
			on bks.facid = facs.facid
		where bks.starttime > dategen.date - interval '14 days'
			and bks.starttime < dategen.date + interval '1 day'
	)/15 as revenue
	from
	(
		-- generates a list of days in august
		select 	cast(generate_series(timestamp '2012-08-01',
			'2012-08-31','1 day') as date) as date
	)  as dategen
order by dategen.date;

My own trials were all wrong. Below is one of my convoluted ways of calculating the average:

I incorrectly assumed that I have data for each day in August 1012 and ALSO I have data for 16 days preceding 1st of August
I tied to first calcucale the per-day revenues (sums using per-day grouping) in all_revs subquery
and then tried to calculate the average using window function for 16 preceding rows excluding current row (so that 15 days before current date is taken)
… and as you can see I had some issues with date arithmentics:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


with all_revs as (
	select date(starttime) d, 
					sum(case when b.memid = 0 
						then slots * f.guestcost 
						else slots * f.membercost 
						end) rev
	from cd.facilities f join cd.bookings b ON b.facid = f.facid
	where date(starttime) >= ('2012-08-01'::DATE - '16 days'::INTERVAL) and date(starttime) < '2012-09-01'
	group by d
	order by d),
 all_avgs as (
select d, rev, avg(rev) over (rows 16 preceding exclude current row) aa
from all_revs)
select d, aa from all_avgs 
where extract(year from d) = 2012 and extract(month from d) = 8; 

Conclusion: it is time to dive into next chapter of pgexercises, namely dates and have a closer look at related functions.

Gist

My solutions so far are documented here: gist

This text is part of the series sql...

All entries in this series:

2021-04-11 - Postgres - dates
2021-04-10 - Postgres - JSON
2021-04-10 - Postgres - exercises
2021-30-09 - Postgres - installation