My goal was just to create, test and use a relatively simple SQL query. Some complexity comes from aggregate functions and table self joins only.
I had 2 tables, article and view, where I store view count. I wanted to calculate a relative measure of read frequency, instead of absolute view counts. So recently published content is more comparable to older material.
To normalize this, we can calculate a views-per-day ratio and then compare all articles against the highest ratio one.
In this article, we’ll walk through:
- Showcasing the schema and the goal to achieve.
- Building the query that calculates relative read frequency.
- Investigating indexing strategies based on execution plans.
Schema Setup
We start with two simple tables:
-- Article table CREATE TABLE `article` ( `article_id` int unsigned NOT NULL AUTO_INCREMENT, `title` varchar(105) DEFAULT NULL, `content` text, `publish_date` date DEFAULT NULL, PRIMARY KEY (`article_id`) ) ENGINE=InnoDB; -- Views table CREATE TABLE `view` ( `article_id` int unsigned NOT NULL, `view_count` int unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`article_id`) ) ENGINE=InnoDB;
articlestores article data, including key field:publish_date.viewholds the total view counts for each article.
Relative Read Frequency
To compute comparable relative popularity, we need:
- Views per day:

Using GREATEST() ensures we avoid division by zero for same-day publications.
- Relative percentage:
Each article’s frequency is divided by the maximum daily frequency across all articles, giving us a normalized read percentage.
Final Query
WITH daily_frequencies AS (
SELECT a.article_id, a.title, a.publish_date,
COALESCE(v.view_count, 0) as view_count,
GREATEST(DATEDIFF(CURDATE(), a.publish_date), 1) as days_since_publish,
COALESCE(v.view_count, 0) / GREATEST(DATEDIFF(CURDATE(), a.publish_date), 1) as daily_read_frequency
FROM article a
LEFT JOIN view v ON a.article_id = v.article_id
WHERE a.publish_date IS NOT NULL AND a.publish_date <= CURDATE()
),
max_frequency AS (
SELECT MAX(daily_read_frequency) as max_daily_frequency
FROM daily_frequencies
)
SELECT df.article_id, df.title, df.view_count,
df.publish_date, df.days_since_publish, df.daily_read_frequency,
CASE
WHEN mf.max_daily_frequency > 0
THEN ROUND((df.daily_read_frequency / mf.max_daily_frequency) * 100, 2)
ELSE 0
END as read_percentage
FROM daily_frequencies df
CROSS JOIN max_frequency mf
ORDER BY read_percentage DESC;
Result:
- The most frequently read article gets 100% and others are scaled accordingly.
- Exclude not published articles
- Articles with no views appear with
0%.
Example Output
article Table
| article_id | title | publish_date |
|---|---|---|
| 1 | Intro to SQL | 2025-08-01 |
| 2 | Index Optimization | 2025-08-10 |
| 3 | Daily Metrics | 2025-08-25 |
view Table
| article_id | view_count |
|---|---|
| 1 | 900 |
| 2 | 200 |
| 3 | 120 |
Query Output
| article_id | title | view_count | publish_date | days_since_publish | daily_read_frequency | read_percentage |
|---|---|---|---|---|---|---|
| 3 | Daily Metrics | 120 | 2025-08-25 | 2 | 60.00 | 100.00 |
| 2 | Index Optimization | 200 | 2025-08-10 | 17 | 11.76 | 19.60 |
| 1 | Intro to SQL | 900 | 2025-08-01 | 26 | 34.62 | 57.70 |
Interpretation:
- “Daily Metrics” dominates, because it has the highest views/day, even with fewer total views.
- “Intro to SQL” has many views overall, but a lower daily frequency because of day since published.
- “Index Optimization” is the lowest with only ~20% relative read frequency.
Indexing Considerations
At first glance, you might expect adding an index on publish_date to speed up the query, since it used multiple times in WHERE conditions. However, this can sometimes increase execution cost:
- High match rate – If nearly all articles satisfy the
publish_datefilter, a full table scan may be faster than index lookups. - Covering vs. partial index – Indexing only
publish_dateforces additional lookups fortitleandarticle_id. - JOIN strategy bias – MySQL often defaults to the primary key (PK) for LEFT JOINs, even when a covering index exists.
Improving the Index
So seemed like a better approach to align the index with the JOIN pattern. While
CREATE INDEX idx_article_publish_covering
ON article (publish_date, article_id, title);
looked promising, however investigating the exec. plan showed: MySQL still favors the PK.
So if the optimizer still ignores my index, I’ll use hints to help:
SELECT ...
FROM article a FORCE INDEX (idx_article_publish_covering)
LEFT JOIN view v ON a.article_id = v.article_id
...
However after disappointing result again,
eventually in practice, reordering the columns gave the better result:
CREATE INDEX idx_article_publish_covering
ON article (article_id, publish_date, title);
Why does this worked better?
- JOIN-first strategy: MySQL prefers starting with
article_id, since it’s the JOIN key. - Cardinality advantage:
article_idis unique, making index navigation more efficient. - Execution alignment: MySQL can JOIN on
article_id, then filter bypublish_date, matching the optimizer’s nested-loop approach.
Key Takeaways
- Indexes don’t always guarantee better performance—especially with JOIN-heavy queries.
- Even with good intentions the engine can ignore your indexes (sometimes righteously 😊)
- Index order matters: matching the execution pattern (
JOINkeys first, filters second) often outperforms intuitive column orders.
This demonstrates an important SQL optimization lesson: understanding how the optimizer thinks is just as important as writing the correct indexes and queries.