Optimizing Relative Read Frequency Queries in SQL

My goal was just to create, test and use a relatively simple SQL query. Some complexity comes from aggregate functions and table self joins only.
I had 2 tables, article and view, where I store view count. I wanted to calculate a relative measure of read frequency, instead of absolute view counts. So recently published content is more comparable to older material.
To normalize this, we can calculate a views-per-day ratio and then compare all articles against the highest ratio one.

In this article, we’ll walk through:

  1. Showcasing the schema and the goal to achieve.
  2. Building the query that calculates relative read frequency.
  3. Investigating indexing strategies based on execution plans.

Schema Setup

We start with two simple tables:

-- Article table
CREATE TABLE `article` (
  `article_id` int unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(105) DEFAULT NULL,
  `content` text,
  `publish_date` date DEFAULT NULL,
  PRIMARY KEY (`article_id`)
) ENGINE=InnoDB;

-- Views table
CREATE TABLE `view` (
  `article_id` int unsigned NOT NULL,
  `view_count` int unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`article_id`)
) ENGINE=InnoDB;
  • article stores article data, including key field: publish_date.
  • view holds the total view counts for each article.

Relative Read Frequency

To compute comparable relative popularity, we need:

  • Views per day:

Using GREATEST() ensures we avoid division by zero for same-day publications.

  • Relative percentage:
    Each article’s frequency is divided by the maximum daily frequency across all articles, giving us a normalized read percentage.

Final Query

WITH daily_frequencies AS (
  SELECT a.article_id, a.title, a.publish_date,
    COALESCE(v.view_count, 0) as view_count,    
    GREATEST(DATEDIFF(CURDATE(), a.publish_date), 1) as days_since_publish,
    COALESCE(v.view_count, 0) / GREATEST(DATEDIFF(CURDATE(), a.publish_date), 1) as daily_read_frequency
  FROM article a
  LEFT JOIN view v ON a.article_id = v.article_id
  WHERE a.publish_date IS NOT NULL AND a.publish_date <= CURDATE()
),
max_frequency AS (
  SELECT MAX(daily_read_frequency) as max_daily_frequency
  FROM daily_frequencies
)
SELECT df.article_id, df.title, df.view_count, 
  df.publish_date, df.days_since_publish, df.daily_read_frequency,
  CASE 
    WHEN mf.max_daily_frequency > 0 
    THEN ROUND((df.daily_read_frequency / mf.max_daily_frequency) * 100, 2)
    ELSE 0
  END as read_percentage
FROM daily_frequencies df
CROSS JOIN max_frequency mf
ORDER BY read_percentage DESC;

Result:

  • The most frequently read article gets 100% and others are scaled accordingly.
  • Exclude not published articles
  • Articles with no views appear with 0%.

Example Output

article Table

article_idtitlepublish_date
1Intro to SQL2025-08-01
2Index Optimization2025-08-10
3Daily Metrics2025-08-25

view Table

article_idview_count
1900
2200
3120

Query Output

article_idtitleview_countpublish_datedays_since_publishdaily_read_frequencyread_percentage
3Daily Metrics1202025-08-25260.00100.00
2Index Optimization2002025-08-101711.7619.60
1Intro to SQL9002025-08-012634.6257.70

Interpretation:

  • Daily Metrics” dominates, because it has the highest views/day, even with fewer total views.
  • Intro to SQL” has many views overall, but a lower daily frequency because of day since published.
  • “Index Optimization” is the lowest with only ~20% relative read frequency.

Indexing Considerations

At first glance, you might expect adding an index on publish_date to speed up the query, since it used multiple times in WHERE conditions. However, this can sometimes increase execution cost:

  1. High match rate – If nearly all articles satisfy the publish_date filter, a full table scan may be faster than index lookups.
  2. Covering vs. partial index – Indexing only publish_date forces additional lookups for title and article_id.
  3. JOIN strategy bias – MySQL often defaults to the primary key (PK) for LEFT JOINs, even when a covering index exists.

Improving the Index

So seemed like a better approach to align the index with the JOIN pattern. While

CREATE INDEX idx_article_publish_covering 
ON article (publish_date, article_id, title);

looked promising, however investigating the exec. plan showed: MySQL still favors the PK.
So if the optimizer still ignores my index, I’ll use hints to help:

SELECT ...
FROM article a FORCE INDEX (idx_article_publish_covering)
LEFT JOIN view v ON a.article_id = v.article_id
...

However after disappointing result again,
eventually in practice, reordering the columns gave the better result:

CREATE INDEX idx_article_publish_covering
ON article (article_id, publish_date, title);

Why does this worked better?

  • JOIN-first strategy: MySQL prefers starting with article_id, since it’s the JOIN key.
  • Cardinality advantage: article_id is unique, making index navigation more efficient.
  • Execution alignment: MySQL can JOIN on article_id, then filter by publish_date, matching the optimizer’s nested-loop approach.

Key Takeaways

  • Indexes don’t always guarantee better performance—especially with JOIN-heavy queries.
  • Even with good intentions the engine can ignore your indexes (sometimes righteously 😊)
  • Index order matters: matching the execution pattern (JOIN keys first, filters second) often outperforms intuitive column orders.

This demonstrates an important SQL optimization lesson: understanding how the optimizer thinks is just as important as writing the correct indexes and queries.

Expanding on Property Injection with Spring Boot Auto-Configuration

In a previous post, we discovered how auto-configuration in Spring Boot enables bean and configuration creation.

In Spring, managing settings, configuring beans, or handling application constants, properly injecting properties into the environment is crucial. Here we’ll explore different methods of injecting properties in a Spring Boot auto-configuration setup.

Continue reading Expanding on Property Injection with Spring Boot Auto-Configuration

Discover the Different Auto-Configuration Options in Spring Boot

The Spring Boot Auto-Configuration feature simplifies: automatic configuration of beans and components. When Spring Boot detects specific libraries or components in your project (like database dependencies or web modules), it automatically configures those components for you without needing explicit manual configuration.

Key Concepts:

  1. Auto-Configuration Conditions: Spring Boot applies auto-configuration on conditional and profile driven way to be classes available on the classpath or specific properties set in application.properties or application.yml.
  2. EnableAutoConfiguration: This annotation enables auto-configuration, it can be fine-tuned by excluding specific auto-configurations using exclude or excludeName attributes.
  3. @Conditional Annotations: Auto-configuration relies heavily on Spring’s @Conditional annotations, which control whether certain configurations should be applied based on the environment or classpath.
  4. Diagnostics: spring-boot-actuator provides insight into which configurations have been applied, helping with debugging and understanding the auto-configured components.

Documentation here.

Continue reading Discover the Different Auto-Configuration Options in Spring Boot

Mastering Lambda Expressions with Generic Types in Java

In Java, SAM (Single Abstract Method) interfaces are a key feature that enable the use of lambda expressions and functional programming concepts. A SAM is an interface that contains exactly one abstract method, common examples of SAM interfaces in Java include Runnable, Callable, and Comparator.
Any custom interface with a single abstract method can be treated as a functional interface, to be able to costumize code.

Continue reading Mastering Lambda Expressions with Generic Types in Java

Agile Challenges in Scale

Agile

Agile is about co-located smaller highly effective teams, which can focus and provide values in fast pace, driven by ownership and team spirit.
In Agile practices, team ceremonies (such as standups, sprint reviews, and retrospectives) are esential to improve communication, alignment, and efficiency. However, when Agile teams grow too large, these ceremonies can lose their effectiveness.
I like to highlight how large team sizes and poorly structured meetings can hinder productivity, communication, and decision-making, advocating for smaller team structures to preserve Agile’s core values.

Agile methodologies are built around small, cross-functional teams designed to work closely together, rapidly iterate, and adjust to changes. However, as teams grow in size, they often encounter communication bottlenecks and coordination challenges.

Continue reading Agile Challenges in Scale

Correlation Identifier: Ensuring Traceability in Distributed Systems

Correlation identifier of requests-responses is an essential feature of microservice platforms for monitoring, reporting, debugging and diagnostics.
Allows tracing a single request inside the application flow, when it can often be dealt with by multiple downstream services.

Problem statement

The most trending direction in software development is distributing processing and systems. In software architectual system designs there are multiple service layouts: single monolithic systems, SOA distribution, and nowadays microservice level grouping of services, applications.
In case of multiple services, we should consider, even multiple running entities from one specific service.

In case of looking into the logs, we could have a harder time to track down the chain of calls and locate the specific instances, which was hit by the request, through serving a user request.

Continue reading Correlation Identifier: Ensuring Traceability in Distributed Systems

Leveraging Server-Sent Events (SSE) with Spring WebFlux for Real-Time Data Streaming

Introduction

What options we have to gather long processing data from the backend to send out to the UI?
Usually the followings:

  • open frequent connections – polling
  • keep an open connection – websocket

Server Sent Events

With Server Sent Events you’re able to send continuous updates from the server.
Imagine as a client subscription to a server for a stream.
Sometimes we don’t need a full bi-directional protocol (sending data back from client side), just a non-blocking data flow with the ability to process data row by row, while still receiving from the connection.
These are updates, time splitted data chunks from the server side to provide a fluent UI experience.

Continue reading Leveraging Server-Sent Events (SSE) with Spring WebFlux for Real-Time Data Streaming

Enhancing Unit Testing with various Test Doubles

The topic tries to cover high level of Test Doubles, Mocks. Here we have many named concepts around, Fake, Mock, Spy, Stub, etc…

Testing with mocks, stubs is primarily happens with unit, integration testing.
In object oriented world, bigger or smaller units of codes are tested, and while we having SRP objects, we still have to deal with dependencies and other object structures while we processing code.
To clean cut and provide a stable state we’re should fix those parameters/dependencies, which are not in the question from the testing perspective.

Continue reading Enhancing Unit Testing with various Test Doubles

Multilevel-KeySearch – Solution for Complex Data Lookup Strategies

Why? What’s this all about?

This is a library to handle (store-search) multi level key attributes.
The implementation was driven by the need of

  • handle multiple level keys
  • flexible search
  • ranking/ordering

on the matches .

Lets consider data with multiple levels: category / subcategory / group / item levels for example.
If you want to find specific let’s say: subcategory, that’s an easy task. You can filter/map and you get it. But what if you have multiple attribute to match, multiple key lengths? How will you sort them? This task can be achieved, but requires more and more code and you lose fluency.

Continue reading Multilevel-KeySearch – Solution for Complex Data Lookup Strategies