F*** Microservices

This is a rant. If you have some experience with microservice architecture then you will probably find nothing new here. I focus here on an architectural choice where a single team works on multiple deployment units especially where there are more deployment units than developers on the team. For me a team consists of up to 6 developers. If you have more, then your organization has other, more pressing, problems.

First of all, if you do not know what your business process is going to look like, especially if your product is not in production yet, do not make it a distributed process. I am not talking about orchestration vs choreography here. I am all for orchestration, and a clearly visible process that is defined in one place, however that discussion is for another post. I take issue with a business process that has to communicate with multiple deployment units in order to perform its task, regardless of the communication protocol (REST, event driven, message driven, etc). Apparently some believe that developing a process that changes weekly (or even daily), is easier to do it when it spans multiple services/repositories. 

– We are not in production, we can break our platform for a small period of time to save time. No worries. 

– Sure, but then why are you applying microservices architecture right now in the first place? 

– So that the team can learn how to work with microservices before we are in production.

– That is great, but if you are deploying services with breaking changes then how are they supposed to learn?

– Ok, let’s do it properly then. With backward compatibility and versioning.

And then you fail to meet deadlines arbitrarily set by business. 

Here is a simple calculation. We have two aggregates that are in two different services and a business process that interacts with them. New requirements require us to change a single field in each of the aggregates. If the process and aggregates were in a single deployment unit this would probably result in a single 20 – 30 line PR (this includes business logic, storage changes, unit and integration tests, etc). Consider a case where the aggregates and the process were in separate deployment units. If we were to introduce breaking changes then we would need 3 PRs of 20 – 30 lines of code. At least 3 times as much work plus higher chance to get it wrong on the service boundaries. If we were to adhere to backward compatibility then this would result in at least 5 PRs. As to why I will leave it to the reader.

Alternatively, although some consider it a heresy, you could keep all your code in a single repository. This would limit the amount of work needed, while providing an additional benefit of keeping the code in a consistent state. The trade off is the swelling of the repository (think indexing in IDE).

I once heard that no breaking changes will be introduced since only additive changes will be done to the integration model. This was for a system that was not even live yet. I have no comment. Even for a system that would be live this seems highly irresponsible. Will it result in a huge mess? Probably.

Next, continuous delivery. By continuous delivery I understand that, after a PR is merged to master, it gets built (e.g. as a docker image), then goes through automated testing and finishes in a state that it could be deployed to production (e.g. canary deployment) at any time, without the need for further testing. If your organisation does not have a reliable system for continuous delivery you should not do microservices. This guideline is not something I came up with but I wholeheartedly support. To be honest I would go as far as to say that if you have more than one deployment unit per team then you should have continuous deployment. Continuous integration is not enough. If for some reason you can’t, be it due to some weird regulations, or whatever, then do not do microservices.

Continuing, the size of the microservices. This has been debated to death. If you follow DDD approach then this should be straightforward and you should not go wrong with bounded context as a boundary. However. If you think that DDD is not for you or your product does not have a domain (do not ask me how), you might end up with one service per database table, or smaller. I draw the line where half of the microservice code is infrastructure. My preference is one service per team of 6 developers (this includes QAs, since I treat them as developers).  If service is too big for them to handle, split it. If there is not enough work to go around, reduce the team.

Microservices are not an excuse for poor design. Neither strategic (e.i. spanning the whole application, multiple services), or tactical (within the bounds of a single service). I have heard Greg Young once talk about treating services as small classes that could be easily written in a week. If you can write it in a week then it shouldn’t be a problem to rewrite it from scratch if needed. This is an awesome concept but this does not mean that the services are supposed to be unreadable. Small classes still are a subject to clean code principles. When time comes to rewrite it in 6 months time you have to be able to understand what this service does. After all the code is the documentation, isn’t it?

Monitoring. This is another thing that has been talked about multiple times. When applying microservice architecture your monitoring has to be top class. You will not be able to test all the integration variations that happen between services. Neither in an automated or manual manner. If your organisation cannot supply metrics in an easy to access way (and searching though individual AWS ECS instances does not count), then get this straightened out first before diving with multiple deployment units. 

IMPORTANT NOTE: I believe determining what should be monitored is the first thing that should be done when starting working on a new product. Both from a technical and business view. Reason for this is that if you know what you need to measure then you know what your product should deliver. Treat it as a specification of the fitness function for your product.

Whenever you start working on a new service, always run it with two nodes from the start. Regardless of the size of the service. If you cannot develop multi-node service, then your service is not horizontally scalable, you cannot provide at least some sensible measure of availability, doing zero down or canary deployments becomes very hard, etc. If I see such a service in development, that is a huge red flag.

Regarding performance. I am not that sold on the benefits of microservices regarding performance. Sure you can scale horizontally the services that are under heavy use (although I am yet to see a spring application that has a small startup time), or make sure that a service that needs low response time doesn’t get killed by that report generation. When you reach this point and these become true problems that can’t be fixed with just adding another machine, it will most likely mean that you succeeded in making a valid product and you have traffic that is probably in top 1%. Rejoice.

As a closing remark, people that often warn against microservices mention that you are not like Netflix. I say the opposite. You are like Netflix and you should behave like them. They had a monolithic application until well over 100 developers were working on it. That is when they decided to split the application. So be like Netflix.

F*** Microservices