A single query against a 1974 pop charts database — demonstrating 15 rarely used but powerful SQL features.
-- ───────────────────────────────────────────────────────────── -- 1974 Pop Charts: The SQL Features You Never Write -- PostgreSQL · BFH Database Course -- ───────────────────────────────────────────────────────────── WITH RECURSIVE -- ① WITH RECURSIVE — generate every chart week in 1974 -- without a calendar table. The CTE calls itself, -- adding 7 days each iteration until year end. weeks_1974 AS ( SELECT DATE '1974-01-01' AS week_start UNION ALL SELECT week_start + INTERVAL '7 days' FROM weeks_1974 WHERE week_start < DATE '1974-12-24' ), -- ② Main CTE: enrich raw chart data with window functions chart_analysis AS ( SELECT c.week_date, c.artist, c.song_title, c.chart_position, c.market, c.genre, -- ③ LAG — what was the position the previous week? -- NULL on the first appearance of a song. LAG(c.chart_position, 1) OVER ( PARTITION BY c.song_title ORDER BY c.week_date ) AS previous_position, -- ④ LEAD — what will the position be next week? -- Lets you see if a song is climbing or falling. LEAD(c.chart_position, 1) OVER ( PARTITION BY c.song_title ORDER BY c.week_date ) AS next_week_position, -- ⑤ FIRST_VALUE with ROWS BETWEEN UNBOUNDED PRECEDING -- AND UNBOUNDED FOLLOWING — scan the entire -- partition to find the all-time peak position. FIRST_VALUE(c.chart_position) OVER ( PARTITION BY c.song_title ORDER BY c.chart_position ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS peak_position, -- ⑥ DENSE_RANK — rank artists by how many weeks -- they held #1. DENSE_RANK means no gaps in -- numbering when there are ties (unlike RANK). DENSE_RANK() OVER ( ORDER BY SUM(CASE WHEN c.chart_position = 1 THEN 1 ELSE 0 END) OVER (PARTITION BY c.artist) DESC ) AS artist_dominance_rank, -- ⑦ NTILE(4) — divide all artists into 4 equal -- buckets by average chart position. -- Bucket 1 = top performers. NTILE(4) OVER ( ORDER BY AVG(c.chart_position) OVER (PARTITION BY c.artist) ) AS performance_quartile, -- ⑧ Running count of top-10 weeks so far — -- ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- is the classic "running total" frame. SUM(CASE WHEN c.chart_position <= 10 THEN 1 ELSE 0 END) OVER ( PARTITION BY c.song_title ORDER BY c.week_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_top10_weeks FROM charts c WHERE EXTRACT(YEAR FROM c.week_date) = 1974 ), -- ⑨ INTERSECT — artists who charted in BOTH markets. -- Only rows present in both result sets survive. crossover_artists AS ( SELECT artist FROM charts WHERE market = 'UK' AND EXTRACT(YEAR FROM week_date) = 1974 INTERSECT SELECT artist FROM charts WHERE market = 'US' AND EXTRACT(YEAR FROM week_date) = 1974 ), -- ⑩ EXCEPT — artists who charted in UK but NOT in US. -- The mirror image of INTERSECT. uk_only_artists AS ( SELECT artist FROM charts WHERE market = 'UK' AND EXTRACT(YEAR FROM week_date) = 1974 EXCEPT SELECT artist FROM charts WHERE market = 'US' AND EXTRACT(YEAR FROM week_date) = 1974 ) -- ── FINAL SELECT ──────────────────────────────────────────── SELECT ca.artist, ca.song_title, -- ⑪ FILTER (WHERE …) — conditional aggregation without -- a nested CASE WHEN inside every COUNT. -- Much cleaner, and equally fast. COUNT(*) FILTER (WHERE ca.chart_position = 1) AS weeks_at_number_one, COUNT(*) FILTER (WHERE ca.chart_position <= 10) AS weeks_in_top_ten, COUNT(*) FILTER (WHERE ca.chart_position <= 40) AS weeks_in_top_forty, -- ⑫ NULLIF — returns NULL if both arguments are equal. -- Here it prevents division by zero gracefully: -- if COUNT(*) = 0, NULLIF returns NULL and the -- whole division becomes NULL instead of crashing. ROUND( COUNT(*) FILTER (WHERE ca.chart_position <= 10)::NUMERIC / NULLIF(COUNT(*), 0) * 100, 1 ) AS pct_time_in_top_ten, -- ⑬ PERCENTILE_CONT … WITHIN GROUP (ORDER BY) — -- an ordered-set aggregate. It interpolates to find -- the true median (0.5 = 50th percentile) across -- all chart positions for this song. PERCENTILE_CONT(0.5) WITHIN GROUP ( ORDER BY ca.chart_position ) AS median_position, ca.peak_position, ca.artist_dominance_rank, CASE ca.performance_quartile WHEN 1 THEN 'Dominant' WHEN 2 THEN 'Strong' WHEN 3 THEN 'Moderate' WHEN 4 THEN 'Struggling' END AS market_presence, CASE WHEN ca.artist IN (SELECT artist FROM crossover_artists) THEN 'UK + US crossover' WHEN ca.artist IN (SELECT artist FROM uk_only_artists) THEN 'UK only' ELSE 'US only' END AS market_reach, -- ⑭ STRING_AGG — concatenate all song titles for an -- artist into a single comma-separated string. -- DISTINCT removes duplicate titles; ORDER BY -- inside the aggregate sorts them alphabetically. STRING_AGG(DISTINCT ca.song_title, ', ' ORDER BY ca.song_title) AS all_charting_songs, -- ⑮ COALESCE — returns the first non-NULL argument. -- If genre was never filled in, show 'Unclassified' -- rather than an ugly NULL in the result set. COALESCE(ca.genre, 'Unclassified') AS genre_display, peak_week.best_week FROM chart_analysis ca -- ⑯ CROSS JOIN LATERAL — for each row in chart_analysis, -- run this correlated subquery. LATERAL means the -- subquery can reference columns from the outer query -- (ca.song_title). Without LATERAL that would be -- a syntax error. Think of it as a per-row function call. CROSS JOIN LATERAL ( SELECT week_date AS best_week FROM charts WHERE song_title = ca.song_title ORDER BY chart_position ASC LIMIT 1 ) AS peak_week GROUP BY ca.artist, ca.song_title, ca.genre, ca.peak_position, ca.artist_dominance_rank, ca.performance_quartile, peak_week.best_week -- HAVING filters on aggregated results — here: only songs -- that actually made the top 40 at some point. HAVING MAX(ca.chart_position) <= 40 ORDER BY weeks_at_number_one DESC, median_position ASC;
A CTE that references itself. Each iteration builds on the previous result. Used here to generate 52 weekly dates without a calendar table. Stops when the WHERE clause fails.
Look backwards or forwards within a window partition. LAG(position, 1) gives you last week's position for the same song. LEAD gives you next week's. NULL at the boundaries.
FIRST_VALUE picks the first value in the window frame. UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING means the frame is the entire partition, so you always get the global best.
RANK leaves gaps after ties: 1, 2, 2, 4. DENSE_RANK does not: 1, 2, 2, 3. Use DENSE_RANK when you want to know "how many distinct positions above me" not "how many rows above me."
Divides the result set into n roughly equal buckets. NTILE(4) gives quartiles. Bucket 1 = top 25%. Useful for labelling performance tiers without hard-coding thresholds.
Conditional aggregation without CASE WHEN inside the aggregate. COUNT(*) FILTER (WHERE position = 1) counts only the rows where position is 1. Cleaner and more readable.
An ordered-set aggregate. Unlike normal aggregates it needs an ORDER BY inside it. PERCENTILE_CONT(0.5) interpolates the true median. PERCENTILE_DISC would return an actual row value.
Concatenates values across rows into a single string with a separator. Equivalent to GROUP_CONCAT in MySQL. Accepts DISTINCT and its own ORDER BY clause inside the aggregate.
Returns only rows that appear in both result sets. Like a set intersection in mathematics. Automatically removes duplicates. INTERSECT ALL keeps them.
Returns rows from the first query that do not appear in the second. The set difference. "Give me everything in A that is not in B." Called MINUS in Oracle.
Returns NULL if a equals b, otherwise returns a. The canonical use is NULLIF(denominator, 0) to prevent division-by-zero errors gracefully rather than with a CASE WHEN wrapper.
Returns the first non-NULL argument from a list. COALESCE(genre, 'Unclassified') is equivalent to CASE WHEN genre IS NOT NULL THEN genre ELSE 'Unclassified' END. Just much shorter.
Allows a subquery in the FROM clause to reference columns from tables listed earlier in the same FROM clause. Without LATERAL, subqueries are isolated. Think of it as a per-row correlated subquery that can appear in a JOIN.