Finding service boundaries: more than just the bounded context
Domain Driven Design (DDD) can provide a good starting point for discovering service boundaries. It assumes that you can't define a large and complex business domain in a single model, so you should break it down into a series of smaller, self-contained models, or "bounded contexts".
Each of these bounded contexts is a cohesive collection of data and behaviour that could represent a service boundary. The emphasis on capabilities is important, as services should be more than collections of data entities and CRUD methods. There is also a recognition that different services will have different views of the same concept. A classic example is a customer, where a billing service associates them with payment details, while a shipping service will only be interested in their delivery address.
Although DDD provides a useful theoretical framework to identify ideal service boundaries, many service implementations are driven by more practical concerns. Pragmatism should be a big part of service design and it makes sense to factor in a range of organisational and technical concerns into your boundaries.
Style of service
The desired characteristics of a service can help to determine its boundaries. Bounded contexts tend to describe the largest grouping of capabilities that still maintain internal cohesion, so they may not be useful if you want your services to be relatively small. On the other hand, if you want services to be fully autonomous and look after their own data then this may imply a service that is large enough to encapsulate a self-contained business process. If want services to be owned by a single team and use independent deployment pipelines, then this may also place practical limits on their optimal size.
Data processing
Services can also be defined according to how they process data. You may want to distinguish reporting from operational data, which tend to require very different styles of solution. Similarly, patterns such as Command Query Responsibility Segregation imply separate service implementations to accommodate different processing contexts.
The volume of data is also significant. If you need to provide an interface that supports batch processes, then you may want to isolate this larger-scale processing in a different implementation. This can often be preferable to hardening the entire service for high volumes.
Resilience
Genuine high availability requires a greater investment than those services that can tolerate more failure. You may want to consider isolating those interfaces that really must not fail into separate implementations to reduce the overall burden of complexity.
Organisation
Developing services across teams tends to be much slower than keeping it within a single organisational unit. This is largely a matter of communication, which tends to be more immediate within teams rather than between teams. It is also one of ownership, as it is easier to maintain clarity over the design of a service when it is controlled by a single team.
This is where Conway's law comes into play, which suggested that the interfaces in a system will tend to reflect the organisational boundaries within it. There is nothing necessarily wrong in this, as drawing boundaries along team lines can often be the most efficient means of delivering a service.
Scope of change
Dependencies between services can be very hard to manage, quickly giving rise to "death by co-ordination meeting". The conceptual neatness of a bounded context should also be considered alongside the likely scope of future changes. If these are going to regularly sweep across multiple services, then it may make sense to aggregate services to make change easier to deliver.
Overheads
Some design decisions can be driven by a desire to reduce the incremental overhead associated with each new service.
Services have operational cost. Each one involves some effort in terms of development, deployment, versioning, monitoring, tracing and resilience. This can suddenly overwhelm your infrastructure and processes if you have not prepared for scale. There is also the cognitive overhead to consider as a large, sprawling estate can be difficult to comprehend and navigate.
Security
Each new service increases the attack surface area for the system. As the number of moving parts increases it can become more difficult to automate security updates, assert best practice and ensure consistent security scanning. You need to ensure that your security strategies, practices and policies can scale in step with your service infrastructure or you will fall foul of unexpected vulnerabilities and insecure communication.
Concurrency and consistency
Although services should be autonomous, they inevitably duplicate data to some degree. This is where CAP theorem could apply: distributed processing forces you to choose between consistency and availability. If you want to avoid direct, real-time coupling between services then you will have to become accustomed to eventual consistency.
This can have implications for service design, i.e. if data needs to be synchronised between processes then it makes sense to keep it within the service.
Third party products
Third party products don't always support a clear or consistent boundary, particularly if they have been inherited from the distant past. Their scope tends to change over time and can be defined as much by company politics as any coherent understanding of the domain. Third-party platforms such as ERPs can creep ominously across the architecture, gobbling up responsibilities as they go.
One way of protecting the wider architecture against this amorphous boundary is to use an anti-corruption layer. This can force a clear definition of a third-party platform's responsibilities as well as acting as a barrier to scope creep.
Time and understanding
Your understanding of a domain tends to improve over time, so it can be difficult to get service boundaries right first time. The more services you have the more difficult it gets to re-draw these boundaries.
It makes sense to limit the number of separate services to begin with. It tends to be easier to decompose a larger service than aggregate several smaller services. This implies much larger services but with fewer mutual dependencies. You will still reap the benefits of scalability, resilience and flexibility, without being hamstrung by premature decomposition.
Available resources
In the real world of deadlines and limited resources, the way you organise data and behaviour can be defined by who is available to do the work. This is particularly the case if you assert clear code ownership by teams. If a team that looks after a specific service does not have the bandwidth to develop a certain feature, then the pragmatic choice may be to have it implemented by another team – and in a different service.
Decomposition strategy
If you are moving from a monolith to something a little more distributed there is always a clear and present danger of over-decomposition. Where bounded contexts can give rise to quite large service implementations, there can be a temptation to apply the single responsibility principle to service design and produce lots of small, specialised services.
This can be a mistake that is difficult to correct. If you over-decompose your domain then the cost of change can start to increase. You start to be weighed down by increased overheads and growing dependencies.