Service Mesh

A Service Mesh is a dedicated infrastructure layer for managing and controlling service-to-service communication within a microservices architecture. In a microservices environment, applications are broken down into smaller, independent services that can be developed, deployed, and scaled individually. As the number of these services grows, managing communication, security, and monitoring between them becomes more complex. That’s where a Service Mesh comes in.

SERVICE MESH BENEFITS

Load balancing

Distributing requests evenly across multiple instances of a service to ensure optimal resource usage and minimize response times.

Service discovery

Automatically identifying and locating available instances of a service, allowing for seamless communication between services.

Traffic management

Controlling and shaping the flow of traffic between services, enabling features like canary releases, blue-green deployments, and circuit breaking.

Security

Implementing secure communication between services through mutual TLS (mTLS), as well as providing access control and authentication mechanisms.

Fault tolerance

Adding resiliency to the system by enabling features such as retries, timeouts, and circuit breaking to handle failures gracefully.

Observability

Providing telemetry data, metrics, and logs for monitoring and debugging purposes, giving insight into the health and performance of the system.

How can Syntio help?

Here at Syntio, we combined several different systems that, together, offer companies a clear view of entire platforms and help them to enforce best practices and standards, which are necessary to have a truly scalable solution. Each component of the Service Mesh package is carefully selected so that everything works together seamlessly. These components include Istio as the control plane, along with Envoy, Istio’s default data plane, Kiali as Istio’s management console, and a highly customized Backstage as the developer portal.
However, with a collection of tools comes the challenge of installing and configuring them, this would often be done by different internal teams or by different vendors. Our engineers will not only install and configure the tools to your needs but also make sure that you fully understand each one: how it works, how to manage it, and how to operate it. Syntio doesn’t want to be a bottleneck in your organization; our goal is to deliver a working solution that you will operate after a handover period, putting you back in charge of your destiny.

More on Service Mesh

Like many modern solutions Service Mesh is really a collection of tools and systems all working in harmony to bring many benefits.

To better understand the benefits of adopting the concept of Service Mesh, let’s imagine an organization called 3RC.

The core product at 3RC is a monolithic application called Lgc, which is developed and maintained by three separate technology teams. 3RC soon realized that running a monolith in production is a cumbersome task and that the development and rollout of new features take an increasingly long amount of time as synchronization and communication between the technology teams becomes exceedingly difficult, making it the bottleneck in delivering new features.

Various other issues arise, like the fact that when development was first starting out, all three teams needed to agree on specific languages and frameworks, with technology team A insisting on the usage of Java and Spring Boot, since they’re in charge of parts of the system that handle a lot of the business logic which greatly benefits from the batteries-included approach of Spring Boot. Because of this, teams B and C needed to implement their parts of the system in Java as well. This became an issue when they realized that Java has very poor library support for the new features that are now on their roadmap, and that they would need to waste months hacking together a Java library, even though there’s already a popular, widely-used python library, that they can’t use because of their commitment to having everything in Java.

For these reasons (and many others, like the difficulty of performing updates, replication, scalability, and high availability), 3RC decided to split up Lgc into multiple, independent services, taking full advantage of a modern microservices-oriented architecture and technologies like Docker and Kubernetes. This decision was a great accelerator in the development of new features, allowing the three technology teams to work mostly independently.

The modules of Lgc were split up into three standalone services A, B and C, which were packaged into containers and deployed on a Kubernetes cluster. Later on, new microservices were added whenever there was a need for a new feature to also be a standalone service, since now the teams could use any language and framework that best fit the job.

However, as time went on and new features were added, the three technology teams started having divergent approaches to numerous fundamental engineering issues, which became highlighted as strange issues started to arise.

3RC Inc. noticed inconsistencies and strange fluctuations in the latencies of user requests; during peak customer usage, service A, started to experience slowdowns and other intermittent issues and was unable to serve the traffic at the desired rate. This led to the whole system being unable to serve any traffic. 3RC also noticed that if service B was having issues processing user requests, service A was also behaving erratically, but only for certain requests. Unfortunately, the team responsible for service B has not yet found the time to expose request handling metrics, while team A only prints out the metrics through logs in their own custom format, making monitoring and debugging of these issues borderline impossible. The tracing of individual customer requests also hasn’t been implemented yet, so there was little hope of getting to the bottom of the strange behavior.

The second thing the teams at 3RC noticed, was that their way of doing automated production deployments is very risky and prone to leaking bugs into production. They practiced a deployment approach called ’blue-green deployment’, which means they brought up the new deployment (the “blue” deployment) and then, at some point, cut over the traffic from the old cluster (the “green” deployment). They realized that this approach doesn’t solve the core problem which they had before (directly pushing new code to production), since blue/green deployment still led to a “big bang” release, which is exactly what they wanted to avoid.

The third thing 3RC realized is that the teams implementing services A, B, and C were handling security completely differently. Team A favored secure connections with certificates and private keys, relying heavily on Spring Beans to do the heavy-lifting, while team B, who developed their microservices in Go, created their own custom framework (since Go has much poorer framework support than Java) built on passing tokens and verifying signatures. The team operating service C decided they didn’t need any additional security since these were “internal” services behind the company firewall.

The administrators at 3RC were also becoming frustrated that they had very little insight into how inter-service (west-east) communication was configured and that they had to read countless pages of documentation to get a grasp of how the system worked, with no way to visualize the system and the interactions between components.

The administrators were also frustrated since the teams that actually used certificates for secure communications were managing these certificates manually, redeploying the system every time one of the certificates expired.

Product owners and management were also having a hard time keeping track of the number of components and what they were doing, what the available endpoints were, where the up-to-date documentation was located, and which programming languages and frameworks were being used. They needed to jump between numerous different systems to be able to solve this puzzle, as no centralized, easy to understand service (developer) catalog was being used. New developers were experiencing the same issues, albeit from a different perspective, during the onboarding process, since they needed to scrape the entire Confluence space at 3RC to fully understand how the system was working.

These challenges are not unique to 3RC, nor is the extent of the challenges limited to what they encountered. The following things must be addressed when moving to a services-oriented architecture:

Understanding what’s happening to the overall system as it constantly changes and evolves

Building applications/services capable of responding to changes in their environment

Building systems capable of running in partially failed conditions

Keeping faults from jumping isolation boundaries

Inability to control the runtime behaviors of the system

Implementing strong security as the attack surface grows

Lowering the risk of making changes to the system

Setting up rate limits

Setting up user quotas

Enforcing policies about who or what can use system components, and when

Having a centralized service catalog so that stakeholders have easier oversight of the platform

Having an easy-to-understand UI which showcases both the components, their interaction, and their (layer 7) network configuration

Service Mesh solves these issues, offering the guarantee that all these challenges are not only addressed but that they are addressed in a battle-tested, industry-standard fashion with minimal overhead. This allows product owners to have peace of mind since the adoption of Service Mesh implies all these issues are resolved, as these core issues are no longer the responsibility of developers. This also means that developers no longer have to waste time dealing with these issues and becoming experts in the field (since having experts is the only viable alternative to adopting Service Mesh), freeing up their valuable time that can be better spent on implementing systems.