Token Efficiency for LLM assisted Development

How seven load-bearing principles across chat sessions and agentic pipelines, keep LLM dev costs manageable without degrading what the tools produce.

Occasinally I had a problem most people working with LLMs eventually run into: Long sessions forgot their own constraints. Multi-file investigations dumped thousands of tokens into the main context and never gave them back. Pipelines paid full price for content that should have been cached. None of it was the model’s fault, all of it needs only changing how you worked with the tools. The patterns below come experience and research about LLM assisted development to scale. Design choices about where to spend tokens and where not to.

One-line thesis: token efficiency is a design discipline, effectivness not being cheap.


Continue reading Token Efficiency for LLM assisted Development

Building an Agentic Dev Pipeline — From Ad-Hoc Prompting to a Repeatable Protocol

How eleven design decisions, a structured interview technique, and two effectiveness axes turned a slash command into a self-managing dev loop.

I had a problem that most people using LLMs for development eventually hit: inconsistency. Sometimes I’d get well-structured code with tests. Sometimes I’d get a half-finished implementation with no tests and no explanation of what changed. Sometimes I’d ask the same question twice and get architecturally different answers. The issue wasn’t the model — it was me. Every session started from scratch. No shared protocol. No handoffs. Just vibes.

The pipeline I built this week replaces that. It’s not a framework or a library — it’s nine files that define a protocol: who does what, in what order, with what information, and when to ask me before proceeding. The output is repeatable. The quality is auditable. And the design decisions that shaped it contain, I think, some generally useful lessons about building agentic systems.

One-line thesis: a well-designed agentic pipeline is a protocol, not a prompt.

Continue reading Building an Agentic Dev Pipeline — From Ad-Hoc Prompting to a Repeatable Protocol

Tuning RAG Retrieval Quality with the Autoresearch Pattern

Applying Karpathy’s autoresearch loop to measure and systematically improve RAG retrieval — from gut-feel tuning to +68% MRR across 8 eval runs.

A retrieval pipeline has a lot of knobs. Dense vs. sparse. Hybrid on or off. Time-decay reranking on or off. Per-collection fusion weights for every source feeding the index. I had built one with all of them, and I had gut feelings about every setting. Hybrid probably helps. Time decay probably helps for recency-sensitive queries. The fusion weights were whatever felt reasonable when I first wrote them.

What I didn’t have was a way to tell whether a change made things better or just different. Tweak a weight, run a few questions, eyeball the results — that’s not measurement, that’s superstition.

This article is the story of how I replaced the superstition with a small, fast evaluation harness, and how an automated loop borrowed from a recent Karpathy project ended up finding a +68% MRR improvement over the dense-only baseline I had been quietly running for months.

Continue reading Tuning RAG Retrieval Quality with the Autoresearch Pattern

My Always-On Dev Environment pt. 2 – More Than One Stack

In part 1 I set up a Raspberry Pi 5 as an always-on dev box: code-server running as a systemd service, and a single PHP/MySQL/Vite project running inside Docker Compose. The whole point of keeping the host clean — no PHP, no Node, no MySQL installed directly on the Pi — was that the Pi was never going to stay a single-project machine. Sooner or later I’d want to run something else next to it without the two stacks fighting each other. This post is what happens when “sooner or later” arrives.

Running More Than One Stack on the Same Pi

The PHP/MySQL/Vite project is the one I’ve been using as an example, but the whole point of keeping the host clean is that it’s never just one project. The Pi doesn’t care what language a project is written in — it just needs a Compose file and a free range of ports.

The pattern I follow is simple and boring, which is exactly what you want:

  • One folder per project, each with its own docker-compose.yml.
  • One Docker network per project (Compose creates this automatically from the folder name), so services in different projects can’t accidentally see each other.
  • A port allocation note in my own head — or in a text file — so projects don’t collide on the host ports.
Continue reading My Always-On Dev Environment pt. 2 – More Than One Stack

My Always-On Dev Environment on a Raspberry Pi 5

There’s a particular kind of friction every developer knows: you sit down at a different machine, your laptop instead of your desktop, or your phone while waiting for a train. Then suddenly your project feels far away. The repo isn’t cloned. The Node version is wrong. The database is empty. The ten minutes you had to jot down an idea evaporate into setup.

I wanted to get rid of that friction entirely. The result is a small, always-on development server running on a Raspberry Pi 5 in the corner of my room, and a setup where every device I own: desktop, laptop, Android phone. They’re just a different window into the same live project.

This article is the story of how that environment is built, why it’s shaped the way it is, and where its limits are.

Continue reading My Always-On Dev Environment on a Raspberry Pi 5

Optimizing Relative Read Frequency Queries in SQL

My goal was just to create, test and use a relatively simple SQL query. Some complexity comes from aggregate functions and table self joins only.
I had 2 tables, article and view, where I store view count. I wanted to calculate a relative measure of read frequency, instead of absolute view counts. So recently published content is more comparable to older material.
To normalize this, we can calculate a views-per-day ratio and then compare all articles against the highest ratio one.

In this article, we’ll walk through:

  1. Showcasing the schema and the goal to achieve.
  2. Building the query that calculates relative read frequency.
  3. Investigating indexing strategies based on execution plans.

Continue reading Optimizing Relative Read Frequency Queries in SQL

Expanding on Property Injection with Spring Boot Auto-Configuration

In a previous post, we discovered how auto-configuration in Spring Boot enables bean and configuration creation.

In Spring, managing settings, configuring beans, or handling application constants, properly injecting properties into the environment is crucial. Here we’ll explore different methods of injecting properties in a Spring Boot auto-configuration setup.

Continue reading Expanding on Property Injection with Spring Boot Auto-Configuration

Discover the Different Auto-Configuration Options in Spring Boot

The Spring Boot Auto-Configuration feature simplifies: automatic configuration of beans and components. When Spring Boot detects specific libraries or components in your project (like database dependencies or web modules), it automatically configures those components for you without needing explicit manual configuration.

Key Concepts:

  1. Auto-Configuration Conditions: Spring Boot applies auto-configuration on conditional and profile driven way to be classes available on the classpath or specific properties set in application.properties or application.yml.
  2. EnableAutoConfiguration: This annotation enables auto-configuration, it can be fine-tuned by excluding specific auto-configurations using exclude or excludeName attributes.
  3. @Conditional Annotations: Auto-configuration relies heavily on Spring’s @Conditional annotations, which control whether certain configurations should be applied based on the environment or classpath.
  4. Diagnostics: spring-boot-actuator provides insight into which configurations have been applied, helping with debugging and understanding the auto-configured components.

Documentation here.

Continue reading Discover the Different Auto-Configuration Options in Spring Boot

Mastering Lambda Expressions with Generic Types in Java

In Java, SAM (Single Abstract Method) interfaces are a key feature that enable the use of lambda expressions and functional programming concepts. A SAM is an interface that contains exactly one abstract method, common examples of SAM interfaces in Java include Runnable, Callable, and Comparator.
Any custom interface with a single abstract method can be treated as a functional interface, to be able to costumize code.

Continue reading Mastering Lambda Expressions with Generic Types in Java

Agile Challenges in Scale

Agile

Agile is about co-located smaller highly effective teams, which can focus and provide values in fast pace, driven by ownership and team spirit.
In Agile practices, team ceremonies (such as standups, sprint reviews, and retrospectives) are esential to improve communication, alignment, and efficiency. However, when Agile teams grow too large, these ceremonies can lose their effectiveness.
I like to highlight how large team sizes and poorly structured meetings can hinder productivity, communication, and decision-making, advocating for smaller team structures to preserve Agile’s core values.

Agile methodologies are built around small, cross-functional teams designed to work closely together, rapidly iterate, and adjust to changes. However, as teams grow in size, they often encounter communication bottlenecks and coordination challenges.

Continue reading Agile Challenges in Scale