BFH · Database Course · SQL Deep Cuts

The SQL You Never Write
But Always Need

A single query against a 1974 pop charts database — demonstrating 15 rarely used but powerful SQL features.

charts

  • chart_id INTEGER PK
  • week_date DATE
  • artist TEXT
  • song_title TEXT
  • chart_positionINTEGER
  • weeks_on_chartINTEGER
  • market TEXT UK | US
  • genre TEXT

artists

  • artist TEXT PK
  • origin_countryTEXT
  • label TEXT
  • formed_year INTEGER

sample data

  • '1974-04-06' ABBA
  • 'Waterloo' pos 1
  • 'UK' Glam Pop
  • '1974-07-20' Stevie Wonder
  • 'You Haven't Done...' pos 1
  • 'US' Soul
WITH RECURSIVE CTE chaining LAG / LEAD FIRST_VALUE DENSE_RANK NTILE PARTITION BY ROWS BETWEEN FILTER (WHERE) PERCENTILE_CONT WITHIN GROUP STRING_AGG INTERSECT EXCEPT NULLIF COALESCE LATERAL
-- ─────────────────────────────────────────────────────────────
-- 1974 Pop Charts: The SQL Features You Never Write
-- PostgreSQL · BFH Database Course
-- ─────────────────────────────────────────────────────────────

WITH RECURSIVE

-- ① WITH RECURSIVE — generate every chart week in 1974
--    without a calendar table. The CTE calls itself,
--    adding 7 days each iteration until year end.
weeks_1974 AS (
    SELECT DATE '1974-01-01' AS week_start
    UNION ALL
    SELECT week_start + INTERVAL '7 days'
    FROM   weeks_1974
    WHERE  week_start < DATE '1974-12-24'
),

-- ② Main CTE: enrich raw chart data with window functions
chart_analysis AS (
    SELECT
        c.week_date,
        c.artist,
        c.song_title,
        c.chart_position,
        c.market,
        c.genre,

        -- ③ LAG — what was the position the previous week?
        --    NULL on the first appearance of a song.
        LAG(c.chart_position, 1) OVER (
            PARTITION BY c.song_title
            ORDER BY     c.week_date
        ) AS previous_position,

        -- ④ LEAD — what will the position be next week?
        --    Lets you see if a song is climbing or falling.
        LEAD(c.chart_position, 1) OVER (
            PARTITION BY c.song_title
            ORDER BY     c.week_date
        ) AS next_week_position,

        -- ⑤ FIRST_VALUE with ROWS BETWEEN UNBOUNDED PRECEDING
        --    AND UNBOUNDED FOLLOWING — scan the entire
        --    partition to find the all-time peak position.
        FIRST_VALUE(c.chart_position) OVER (
            PARTITION BY c.song_title
            ORDER BY     c.chart_position ASC
            ROWS BETWEEN UNBOUNDED PRECEDING
                     AND UNBOUNDED FOLLOWING
        ) AS peak_position,

        -- ⑥ DENSE_RANK — rank artists by how many weeks
        --    they held #1. DENSE_RANK means no gaps in
        --    numbering when there are ties (unlike RANK).
        DENSE_RANK() OVER (
            ORDER BY
                SUM(CASE WHEN c.chart_position = 1
                         THEN 1 ELSE 0 END)
                OVER (PARTITION BY c.artist) DESC
        ) AS artist_dominance_rank,

        -- ⑦ NTILE(4) — divide all artists into 4 equal
        --    buckets by average chart position.
        --    Bucket 1 = top performers.
        NTILE(4) OVER (
            ORDER BY
                AVG(c.chart_position)
                OVER (PARTITION BY c.artist)
        ) AS performance_quartile,

        -- ⑧ Running count of top-10 weeks so far —
        --    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        --    is the classic "running total" frame.
        SUM(CASE WHEN c.chart_position <= 10
                 THEN 1 ELSE 0 END) OVER (
            PARTITION BY c.song_title
            ORDER BY     c.week_date
            ROWS BETWEEN UNBOUNDED PRECEDING
                     AND CURRENT ROW
        ) AS running_top10_weeks

    FROM   charts c
    WHERE  EXTRACT(YEAR FROM c.week_date) = 1974
),

-- ⑨ INTERSECT — artists who charted in BOTH markets.
--    Only rows present in both result sets survive.
crossover_artists AS (
    SELECT artist FROM charts
    WHERE  market = 'UK' AND EXTRACT(YEAR FROM week_date) = 1974
    INTERSECT
    SELECT artist FROM charts
    WHERE  market = 'US' AND EXTRACT(YEAR FROM week_date) = 1974
),

-- ⑩ EXCEPT — artists who charted in UK but NOT in US.
--    The mirror image of INTERSECT.
uk_only_artists AS (
    SELECT artist FROM charts
    WHERE  market = 'UK' AND EXTRACT(YEAR FROM week_date) = 1974
    EXCEPT
    SELECT artist FROM charts
    WHERE  market = 'US' AND EXTRACT(YEAR FROM week_date) = 1974
)

-- ── FINAL SELECT ────────────────────────────────────────────
SELECT
    ca.artist,
    ca.song_title,

    -- ⑪ FILTER (WHERE …) — conditional aggregation without
    --    a nested CASE WHEN inside every COUNT.
    --    Much cleaner, and equally fast.
    COUNT(*) FILTER (WHERE ca.chart_position =  1)  AS weeks_at_number_one,
    COUNT(*) FILTER (WHERE ca.chart_position <= 10) AS weeks_in_top_ten,
    COUNT(*) FILTER (WHERE ca.chart_position <= 40) AS weeks_in_top_forty,

    -- ⑫ NULLIF — returns NULL if both arguments are equal.
    --    Here it prevents division by zero gracefully:
    --    if COUNT(*) = 0, NULLIF returns NULL and the
    --    whole division becomes NULL instead of crashing.
    ROUND(
        COUNT(*) FILTER (WHERE ca.chart_position <= 10)::NUMERIC
        / NULLIF(COUNT(*), 0) * 100, 1
    ) AS pct_time_in_top_ten,

    -- ⑬ PERCENTILE_CONT … WITHIN GROUP (ORDER BY) —
    --    an ordered-set aggregate. It interpolates to find
    --    the true median (0.5 = 50th percentile) across
    --    all chart positions for this song.
    PERCENTILE_CONT(0.5) WITHIN GROUP (
        ORDER BY ca.chart_position
    ) AS median_position,

    ca.peak_position,
    ca.artist_dominance_rank,

    CASE ca.performance_quartile
        WHEN 1 THEN 'Dominant'
        WHEN 2 THEN 'Strong'
        WHEN 3 THEN 'Moderate'
        WHEN 4 THEN 'Struggling'
    END AS market_presence,

    CASE
        WHEN ca.artist IN (SELECT artist FROM crossover_artists)
             THEN 'UK + US crossover'
        WHEN ca.artist IN (SELECT artist FROM uk_only_artists)
             THEN 'UK only'
        ELSE      'US only'
    END AS market_reach,

    -- ⑭ STRING_AGG — concatenate all song titles for an
    --    artist into a single comma-separated string.
    --    DISTINCT removes duplicate titles; ORDER BY
    --    inside the aggregate sorts them alphabetically.
    STRING_AGG(DISTINCT ca.song_title, ', '
        ORDER BY ca.song_title) AS all_charting_songs,

    -- ⑮ COALESCE — returns the first non-NULL argument.
    --    If genre was never filled in, show 'Unclassified'
    --    rather than an ugly NULL in the result set.
    COALESCE(ca.genre, 'Unclassified') AS genre_display,

    peak_week.best_week

FROM chart_analysis ca

-- ⑯ CROSS JOIN LATERAL — for each row in chart_analysis,
--    run this correlated subquery. LATERAL means the
--    subquery can reference columns from the outer query
--    (ca.song_title). Without LATERAL that would be
--    a syntax error. Think of it as a per-row function call.
CROSS JOIN LATERAL (
    SELECT week_date AS best_week
    FROM   charts
    WHERE  song_title = ca.song_title
    ORDER BY chart_position ASC
    LIMIT  1
) AS peak_week

GROUP BY
    ca.artist, ca.song_title, ca.genre,
    ca.peak_position, ca.artist_dominance_rank,
    ca.performance_quartile, peak_week.best_week

-- HAVING filters on aggregated results — here: only songs
-- that actually made the top 40 at some point.
HAVING MAX(ca.chart_position) <= 40

ORDER BY
    weeks_at_number_one DESC,
    median_position     ASC;

WITH RECURSIVE

A CTE that references itself. Each iteration builds on the previous result. Used here to generate 52 weekly dates without a calendar table. Stops when the WHERE clause fails.

LAG / LEAD

Look backwards or forwards within a window partition. LAG(position, 1) gives you last week's position for the same song. LEAD gives you next week's. NULL at the boundaries.

FIRST_VALUE + ROWS BETWEEN UNBOUNDED

FIRST_VALUE picks the first value in the window frame. UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING means the frame is the entire partition, so you always get the global best.

DENSE_RANK vs RANK

RANK leaves gaps after ties: 1, 2, 2, 4. DENSE_RANK does not: 1, 2, 2, 3. Use DENSE_RANK when you want to know "how many distinct positions above me" not "how many rows above me."

NTILE(n)

Divides the result set into n roughly equal buckets. NTILE(4) gives quartiles. Bucket 1 = top 25%. Useful for labelling performance tiers without hard-coding thresholds.

FILTER (WHERE …)

Conditional aggregation without CASE WHEN inside the aggregate. COUNT(*) FILTER (WHERE position = 1) counts only the rows where position is 1. Cleaner and more readable.

PERCENTILE_CONT WITHIN GROUP

An ordered-set aggregate. Unlike normal aggregates it needs an ORDER BY inside it. PERCENTILE_CONT(0.5) interpolates the true median. PERCENTILE_DISC would return an actual row value.

STRING_AGG

Concatenates values across rows into a single string with a separator. Equivalent to GROUP_CONCAT in MySQL. Accepts DISTINCT and its own ORDER BY clause inside the aggregate.

INTERSECT

Returns only rows that appear in both result sets. Like a set intersection in mathematics. Automatically removes duplicates. INTERSECT ALL keeps them.

EXCEPT

Returns rows from the first query that do not appear in the second. The set difference. "Give me everything in A that is not in B." Called MINUS in Oracle.

NULLIF(a, b)

Returns NULL if a equals b, otherwise returns a. The canonical use is NULLIF(denominator, 0) to prevent division-by-zero errors gracefully rather than with a CASE WHEN wrapper.

COALESCE

Returns the first non-NULL argument from a list. COALESCE(genre, 'Unclassified') is equivalent to CASE WHEN genre IS NOT NULL THEN genre ELSE 'Unclassified' END. Just much shorter.

LATERAL

Allows a subquery in the FROM clause to reference columns from tables listed earlier in the same FROM clause. Without LATERAL, subqueries are isolated. Think of it as a per-row correlated subquery that can appear in a JOIN.