Correlation Identifier: Ensuring Traceability in Distributed Systems

Correlation identifier of requests-responses is an essential feature of microservice platforms for monitoring, reporting, debugging and diagnostics.
Allows tracing a single request inside the application flow, when it can often be dealt with by multiple downstream services.

Problem statement

The most trending direction in software development is distributing processing and systems. In software architectual system designs there are multiple service layouts: single monolithic systems, SOA distribution, and nowadays microservice level grouping of services, applications.
In case of multiple services, we should consider, even multiple running entities from one specific service.

In case of looking into the logs, we could have a harder time to track down the chain of calls and locate the specific instances, which was hit by the request, through serving a user request.

A request hits through GraphQL service, the User, Order, Payment, Shipping services on different instances

You will need to centralize the log messages from multiple services and multiple instances.
Order these log entries in timely order and then somehow group the relevant ones.
Here comes the correlation identifier in picture. It uniquely identifies the request, and generated-passed through and persisted along the log entries of the systems.

With propagated identifiers instead of various, you’ll have consistent log ids through a specific request life-cycle across every single service

Solutions

You have multiple options how to generate, transport and store this identifier.

How?
Generally you want to generate it at the first entry point and pass it down.
In the above the GraphQL proxy would serve as an entry point, which makes it easy to handle the generate and addition in one place.
In another layout I would suggest the following:
if you don’t have id, generate and add to the request
if you received an id, use and pass it along the outgoing requests

You have to define your HTTP interface and pass this data on requests. Similar options applicable non-HTTP request.

You can either extend the request/response body, (POST request, Responses with payload). However this way your model interfaces will be distorted with a non-product specific attribute.
A better option is to use a custom header parameter or cookies (Header: x-correlation-id).

What value?
The correlation Id value should be an unique identifier, so UUID is a good candidate. (while there is theoretical chance of, collision, the probability is very-very low)

Also the correlation Id could contain business level identifiers as well. This way the id will be more informative, it even adds extra query grouping ability. On simple systems it could make sense to use a basic product id. it’s simpler and smaller.
Also you can combine the above two. Using 1-2 unique identifier with a timestamp or simply generated value. UniqueId-productType-UUIDsmallHash

Result

If we made all the decisions, we should put the identifiers into the application logs.
From that on, you only have to use aggregated log solutions to gain on the the correlation id.
Track the request along the services with simple querying.
Either with simple grep if the logs are accessible from a central place or with the use of log aggregate applications, loKi, loggy, splunk, etc.

In the second article Correlation Identifier - In Practice, I'll present examples in more detail how you can effectively generate-process the identifier, and how to add the information to the logs.

Mentions:

Amazon uses X-Amzn-Trace-Id https://aws.amazon.com/premiumsupport/knowledge-center/trace-elb-x-amzn-trace-id/

Spring Cloud Sleuth – uses trace-id and provide span-id s as work-units

correlation-id is used by jms systems, i.e.: ActiveMQ, RabbitMQ

Problem statement

Solutions

Result

Leave a Reply Cancel reply