Correlation Identifier

Correlation identifier of requests-responses is an essential feature of microservice platforms for monitoring, reporting, debugging and diagnostics.
Allows tracing a single request inside the application flow, when it can often be dealt with by multiple downstream services.

Problem statement

The most trending direction in software development is distributing processing and systems. I nsoftware architectual system designs there are multiple service layouts: single monolithic systems, SOA distribution, and nowadays microservice level grouping of services, applications.
In case of multiple services, we should consider, even multiple running entities from one specific service.

In case of looking into the logs, we could have a harder time to track down the chain of calls and locate the specific instances, which was hit by the request, through serving a user request.

A request hits through GraphQL service, the User, Order, Payment, Shipping services on different instances

You will need to centralize the log messages from multiple services and multiple instances.
Order these log entries in timely order and then somehow group the relevant ones.
Here comes the correlation identifier in picture. It uniquely identifies the request, and generated-passed through and persisted along the log entries of the systems.

With propagated identifiers instead of various, you’ll have consistent log ids through a specific request life-cycle across every single service

Solutions

You have multiple options how to generate, transport and store this identifier.

How?
Generally you want to generate it at the first entry point and pass it down.
In the above the GraphQL proxy would serve as an entry point, which makes it easy to handle the generate and addition in one place.
In another layout I would suggest the following:
if you don’t have id, generate and add to the request
if you received an id, use and pass it along the outgoing requests

You have to define your HTTP interface and pass this data on requests. Similar options applicable non-HTTP request.


You can either extend the request/response body, (POST request, Responses with payload). However this way your model interfaces will be distorted with a non-product specific attribute.
A better option is to use a custom header parameter or cookies (Header: x-correlation-id).

What value?
The correlation Id value should be an unique identifier, so UUID is a good candidate. (while there is theoretical chance of, collision, the probability is very-very low)

Also the correlation Id could contain business level identifiers as well. This way the id will be more informative, it even adds extra query grouping ability. On simple systems it could make sense to use a basic product id. it’s simpler and smaller.
Also you can combine the above two. Using 1-2 unique identifier with a timestamp or simply generated value. UniqueId-productType-UUIDsmallHash

Result

If we made all the decisions, we should put the identifiers into the application logs.
From that on, you only have to use aggregated log solutions to gain on the the correlation id.
Track the request along the services with simple querying.
Either with simple grep if the logs are accessible from a central place or with the use of log aggregate applications, loKi, loggy, splunk, etc.

In the second article Correlation Identifier - In Practice, I'll present examples in more detail how you can effectively generate-process the identifier, and how to add the information to the logs.

Mentions:

Amazon uses X-Amzn-Trace-Id https://aws.amazon.com/premiumsupport/knowledge-center/trace-elb-x-amzn-trace-id/

Spring Cloud Sleuth – uses trace-id and provide span-id s as work-units

correlation-id is used by jms systems, i.e.: ActiveMQ, RabbitMQ

Spring WebFlux + Server Sent Events (SSE)

Introduction

What options we have to gather long processing data from the backend to send out to the UI?
Usually the followings:

  • open frequent connections – polling
  • keep an open connection – websocket

Server Sent Events

With Server Sent Events you’re able to send continuous updates from the server.
Imagine as a client subscription to a server for a stream.
Sometimes we don’t need a full bi-directional protocol (sending data back from client side), just a non-blocking data flow with the ability to process data row by row, while still receiving from the connection.
These are updates, time splitted data chunks from the server side to provide a fluent UI experience.

advantages:

disadvantages:

  • unidirectional protocol
  • simultaneous request limit (browser defaults: 6)

WebFlux

With reactive programming, the purpose of this type of data processing is the fluency and non-blocking nature.

There are more conceptual advantages, like: backpressure, split and delayed and limited processing of data.

It supports asynchronous and event-driven concept in programing with a fluent stream-like API.

The reactive API is present in Spring since version 5

It can be used with the specific content types in between:

  • backend-backend server-client (SOA/microservice calls)
  • backend-ui server client (server-browser calls)

Example Project

I picked some sample data about a movie database.
Our sample application is very simple: We stream the file line-by-line. Convert the data into a data model and producing the result as a stream to the endpoint.
The client side subscribed to the backend endpoint and processing the data line-by-line.
This way we have a stream from a datasource till the client visualization streamlined.

Backend stream logs

UI streaming processing and visualization

Besides the standard Flux stream we only needed to use the EVENT_STREAM Mediatype. While on the UI client side EventSource interface is required to capture the stream.

For more details see the Webflux + SSE example project: https://github.com/PazsitZ/webflux_sse

Multilevel-KeySearch

Why? What’s this all about?

This is a library to handle (store-search) multi level key attributes.
The implementation was driven by the need of

  • handle multiple level keys
  • flexible search
  • ranking/ordering

on the matches .

Lets consider data with multiple levels: category / subcategory / group / item levels for example.
If you want to find specific let’s say: subcategory, that’s an easy task. You can filter/map and you get it. But what if you have multiple attribute to match, multiple key lengths? How will you sort them? This task can be achieved, but requires more and more code and you lose fluency.

The details

The idea was, that you can easily instantiate keys, then define a collection. This way where you can pre-define the filter and sort algorithm on the collection itself, and you can use it almost the regular way.
The key has the capability to define wildcard, which is customizable.

As an example, we define a flexible matching keylength, an comparison which uses null as wildcard:

new CustomMultiKey(    
    KeyLengthRule.matchSmaller(),
    KeyComparisonRelation.equalAndAny(null),
    null
);

I choose map to store the keys, this provides even a value or it can be the key itself.

Let’s get some examples:

MultiKeyMap<String, String> map = getMap();
String aHuman = map.getIfAbsentFirst(map.newKey().addAny().add(HUMAN));

This way you will get all the human subcategory, and by your map it will be a sorted order, you will get the first one.

On the key and map you can define:

  • keylength rule
  • Comparison rule
  • Sorting rule

All the above have already implementations provided, but by the interfaces you can easily provide further custom rules.

What features are ahead?

My future ideas revolving around more query like syntax, parallel search and multi-rule definition.

The source is available: https://github.com/PazsitZ/multilevel-keysearch
The jars available: http://pazsitz.hu/repo/hu/pazsitz/keysearch/

Class-Method Caching

When you have the need of caching that’s a quite common topic, you can find many out-of-the-box solutions.

But what if if you want to cache whole class, with all the methods, maybe across your whole application?
My solution based on a proxy class and this way applies cache on all method invocations. The cache configuration of course can be defined, but the basis can see in the example:

ClassInterface class = new Class();
ClassInterface cachedClass = CacheClass.localCache(ClassInterface .class, class, CacheClass.CacheType.ACCESS, 30);

So in the above example we need an interface, a class instance and some configuration options for caching.
This way we’ll get a Proxy class, which will cache our methods, where we use the cachedClass instance.

This is all good, but what if we want to have caching across the application.

@Bean("classCached")
public ClassInterface getCachedClass() {
    ClassInterface class = new Class();
    ClassInterface cachedGloballyClass = CacheClass.globalCache(ClassInterface.class, class );
    return cachedGloballyClass;
}

This way whereever we inject the cached instance all of the invocation, will point the same instance, this way it will share the cachepool.

My implementation is using the guava caching library, hardwired for now.
But any caching library can serve inside the proxy solution.

The source is available: https://github.com/PazsitZ/methodcaching
The jars available: http://pazsitz.hu/repo/hu/pazsitz/methodcaching/