Distributed Cache
Using Akka cluster-sharding and Akka HTTP on Kubernetes
Using Akka cluster-sharding and Akka HTTP on Kubernetes
This article captures the implementation of an application serving data over HTTP which is stored in cluster-sharded actors and deployed on Kubernetes.
Use case: An application, serving data over HTTP and with a high request rate, and the latency of order of 10ms with limited database IOPS available.
My initial idea was to cache it in memory, which worked pretty well for some time. But this meant larger instances due to duplication of cached data in the instances behind the load balancer. As an alternative I wanted to use Kubernetes for this problem and do a proof of concept (PoC) of a distributed cache with Akka cluster-sharding and Akka-HTTP on Kubernetes.
This article is by no means a complete tutorial to Akka cluster sharding or Kubernetes. It outlines knowledge I gained while doing this PoC. The code for this PoC can be found here.
Let’s dig into the details of this implementation.
To form an Akka Cluster, there needs to a pre-defined ordered set of contact points often called seed nodes. Each Akka node will try to register itself with the first node from the list of seed nodes. Once, all the seed nodes have joined the cluster, any new node can join the cluster programmatically.
The ordered part is important here, because if the first seed node changes frequently then the chances of split-brain increases. More info about Akka Clustering can be found here.
So, the challenge here with Kubernetes was the ordered set of predefined nodes, and here comes StatefulSet(s) and Headless Services to the rescue.
StatefulSet guarantees stable and ordered pod creation, which satisfies the requirement of our seed nodes, and Headless Service is responsible for their deterministic discovery in the network. So, the first node will be “-0” and the second will be “-1” and so on.
- is replaced by the actual name of the application
The DNS for the seed nodes will be of the form:
-...svc.cluster.local
Steps:
- Start with creating the Kubernetes resources. First, the Headless Service, which is responsible for deterministic discovery of seed nodes(Pods), can be created using the following manifest:
kind: Service
apiVersion: v1
metadata:
name: distributed-cache
labels:
app: distributed-cache
spec:
clusterIP: None
selector:
app: distributed-cache
ports:
- port: 2551
targetPort: 2551
protocol: TCP
Note, that the “clusterIP” is set to “None.” Which indicates it’s a Headless Service.
Create a StatefulSet, which is a manifest for ordered pod creation:
apiVersion: "apps/v1beta2" kind: StatefulSet metadata: name: distributed-cache spec: selector: matchLabels: app: distributed-cache serviceName: distributed-cache replicas: 3 template: metadata: labels: app: distributed-cache spec: containers: - name: distributed-cache image: "localhost:5000/distributed-cache-on-k8s-poc:1.0" env: - name: AKKA_ACTOR_SYSTEM_NAME value: "distributed-cache-system" - name: AKKA_REMOTING_BIND_PORT value: "2551" - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: AKKA_REMOTING_BIND_DOMAIN value: "distributed-cache.default.svc.cluster.local" - name: AKKA_SEED_NODES value: "distributed-cache-0.distributed-cache.default.svc.cluster.local:2551,distributed-cache-1.distributed-cache.default.svc.cluster.local:2551,distributed-cache-2.distributed-cache.default.svc.cluster.local:2551" ports: - containerPort: 2551 readinessProbe: httpGet: port: 9000 path: /health
Create a service, which will be responsible for redirecting outside internet traffic to pods:
apiVersion: v1 kind: Service metadata: labels: app: distributed-cache name: distributed-cache-service spec: selector: app: distributed-cache type: ClusterIP ports: - port: 80 protocol: TCP # this needs to match your container port targetPort: 9000
Create an Ingress, which is responsible for defining a set of rules to route traffic from outside internet to services.
apiVersion: extensions/v1beta1 kind: Ingress metadata: name: distributed-cache-ingress spec: rules: # DNS name your application should be exposed on - host: "distributed-cache.com" http: paths: - backend: serviceName: distributed-cache-service servicePort: 80
And the distributed cache is ready to use:
Summary This article covers Akka Cluster-sharding on Kubernetes with the pre-requirements of an ordered set of Seed Nodes and their deterministic discovery in the network, and how it can be solved with StatefulSet(s) and Headless Service(s).
This approach of caching data in a distributed fashion offered the following advantages:
- Less database lookup, saving database IOPS
- Efficient usage of resources; fewer instances as a result of no duplication of data
- Lower latencies to serve data
This PoC opens up new doors to think about how we cache data in-memory. Give it a try (all steps to run it locally are mentioned in the Readme).
We're hiring! Do you like working in an ever evolving organization such as Zalando? Consider joining our teams as a Software Engineer!