Optimizing Relative Read Frequency Queries in SQL

My goal was just to create, test and use a relatively simple SQL query. Some complexity comes from aggregate functions and table self joins only.
I had 2 tables, article and view, where I store view count. I wanted to calculate a relative measure of read frequency, instead of absolute view counts. So recently published content is more comparable to older material.
To normalize this, we can calculate a views-per-day ratio and then compare all articles against the highest ratio one.

In this article, we’ll walk through:

  1. Showcasing the schema and the goal to achieve.
  2. Building the query that calculates relative read frequency.
  3. Investigating indexing strategies based on execution plans.

Schema Setup

We start with two simple tables:

-- Article table
CREATE TABLE `article` (
  `article_id` int unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(105) DEFAULT NULL,
  `content` text,
  `publish_date` date DEFAULT NULL,
  PRIMARY KEY (`article_id`)
) ENGINE=InnoDB;

-- Views table
CREATE TABLE `view` (
  `article_id` int unsigned NOT NULL,
  `view_count` int unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`article_id`)
) ENGINE=InnoDB;
  • article stores article data, including key field: publish_date.
  • view holds the total view counts for each article.

Relative Read Frequency

To compute comparable relative popularity, we need:

  • Views per day:

Using GREATEST() ensures we avoid division by zero for same-day publications.

  • Relative percentage:
    Each article’s frequency is divided by the maximum daily frequency across all articles, giving us a normalized read percentage.

Final Query

WITH daily_frequencies AS (
  SELECT a.article_id, a.title, a.publish_date,
    COALESCE(v.view_count, 0) as view_count,    
    GREATEST(DATEDIFF(CURDATE(), a.publish_date), 1) as days_since_publish,
    COALESCE(v.view_count, 0) / GREATEST(DATEDIFF(CURDATE(), a.publish_date), 1) as daily_read_frequency
  FROM article a
  LEFT JOIN view v ON a.article_id = v.article_id
  WHERE a.publish_date IS NOT NULL AND a.publish_date <= CURDATE()
),
max_frequency AS (
  SELECT MAX(daily_read_frequency) as max_daily_frequency
  FROM daily_frequencies
)
SELECT df.article_id, df.title, df.view_count, 
  df.publish_date, df.days_since_publish, df.daily_read_frequency,
  CASE 
    WHEN mf.max_daily_frequency > 0 
    THEN ROUND((df.daily_read_frequency / mf.max_daily_frequency) * 100, 2)
    ELSE 0
  END as read_percentage
FROM daily_frequencies df
CROSS JOIN max_frequency mf
ORDER BY read_percentage DESC;

Result:

  • The most frequently read article gets 100% and others are scaled accordingly.
  • Exclude not published articles
  • Articles with no views appear with 0%.

Example Output

article Table

article_idtitlepublish_date
1Intro to SQL2025-08-01
2Index Optimization2025-08-10
3Daily Metrics2025-08-25

view Table

article_idview_count
1900
2200
3120

Query Output

article_idtitleview_countpublish_datedays_since_publishdaily_read_frequencyread_percentage
3Daily Metrics1202025-08-25260.00100.00
2Index Optimization2002025-08-101711.7619.60
1Intro to SQL9002025-08-012634.6257.70

Interpretation:

  • Daily Metrics” dominates, because it has the highest views/day, even with fewer total views.
  • Intro to SQL” has many views overall, but a lower daily frequency because of day since published.
  • “Index Optimization” is the lowest with only ~20% relative read frequency.

Indexing Considerations

At first glance, you might expect adding an index on publish_date to speed up the query, since it used multiple times in WHERE conditions. However, this can sometimes increase execution cost:

  1. High match rate – If nearly all articles satisfy the publish_date filter, a full table scan may be faster than index lookups.
  2. Covering vs. partial index – Indexing only publish_date forces additional lookups for title and article_id.
  3. JOIN strategy bias – MySQL often defaults to the primary key (PK) for LEFT JOINs, even when a covering index exists.

Improving the Index

So seemed like a better approach to align the index with the JOIN pattern. While

CREATE INDEX idx_article_publish_covering 
ON article (publish_date, article_id, title);

looked promising, however investigating the exec. plan showed: MySQL still favors the PK.
So if the optimizer still ignores my index, I’ll use hints to help:

SELECT ...
FROM article a FORCE INDEX (idx_article_publish_covering)
LEFT JOIN view v ON a.article_id = v.article_id
...

However after disappointing result again,
eventually in practice, reordering the columns gave the better result:

CREATE INDEX idx_article_publish_covering
ON article (article_id, publish_date, title);

Why does this worked better?

  • JOIN-first strategy: MySQL prefers starting with article_id, since it’s the JOIN key.
  • Cardinality advantage: article_id is unique, making index navigation more efficient.
  • Execution alignment: MySQL can JOIN on article_id, then filter by publish_date, matching the optimizer’s nested-loop approach.

Key Takeaways

  • Indexes don’t always guarantee better performance—especially with JOIN-heavy queries.
  • Even with good intentions the engine can ignore your indexes (sometimes righteously 😊)
  • Index order matters: matching the execution pattern (JOIN keys first, filters second) often outperforms intuitive column orders.

This demonstrates an important SQL optimization lesson: understanding how the optimizer thinks is just as important as writing the correct indexes and queries.