A Journey On End To End Testing A Microservices Architecture
In microservices architecture there are different components working together to enable a business capability, therefore testing all of them can get tricky.
End to end testing is a testing technique used to test the flow of an application through a business transaction. In microservices architecture there are different components working together to enable a business capability, therefore testing all of them can get tricky. In this article you can read about our team’s journey:
- What our system looks like
- What do you get from e2e testing?
- How to define e2e tests
- How to deal with authentication
- What testing framework we choose
- How to test canvas
- How to test async flows
- Automation
1 ) Our system
In our team we maintain a system that offers business capabilities such as the ability to explore and filter orders. The high level components that are used to enable that feature are: The front-end application, the backend for the front end, various databases (PostgreSQL, Solr and DynamoDB), message brokers (we use nakadi), and a bunch of microservices. You can read more details about our architecture later in this article.
2 ) What to expect from e2e testing
As you can see the architecture is quite complex and things might break on different levels. You might have great unit testing coverage for each component but if they can’t talk to each other, users expectations of the product are not met.
You can introduce some integration testing but things might get out of sync if more than one team is responsible for the same product or even if not all team members share equal ownership of each component (which they should).
You can achieve integration testing from a user perspective by mocking your dependencies (by intercepting requests). This approach adds complexity in writing tests but on the other hand end to end testing creates complexity to run all systems in a desired state where you can make your assertions confidentially.
Because we wanted to be able to ensure that whenever we are releasing a new feature we are not breaking anything else, and because changes in the backend could introduce bugs if we are not on the same page, we decided to introduce end to end testing in our systems. This way we could spot bugs on staging environments.
“Having end to end tests is also a very nice way to document all the user journeys of your application.”
3) Coverage and tests definition
The first step that we took was defining the scope of the systems that we we’re going to put under testing. It is strongly advised that when you perform end to end testing you should put all the components under testing, but on the scale of our company this is not always possible.
In our case we decided to do “domain scoped e2e testing”, because systems out of the domain might already have some e2e testing and our systems are decoupled from each other. Also it is pretty hard to put systems that are out of the domain in the desired state you need to perform your tests.
The architecture of the systems that we wanted to test is something like this:
After the scope was known we defined the list of features that we wanted to test. Basically that was all the features, but either way having a list around helps a lot. Once you have it, it is easy to group related features (one way of grouping features is by business capability) and split them into smaller tasks so the whole team can work on them. This will also help you prioritise groups and implement the important ones first. You can prioritise by urgency as well so you have the critical ones covered first. This also helped us identify which of them require some support from other teams and this way you remove some potential blockers early.
4) Authentication on test environment
One of the problems we had was authentication. We are using an SSO server to authenticate users. The only way to login through the SSO is by having an actual email address and a password. In order to achieve this we needed some real accounts and to have those, there where a few implications. Because of this we decided to authenticate users using an auto-generated token when running end to end tests and we only implemented this feature for staging environment, and this way we by passed SSO.
5) Choosing a testing framework that solves main problems
So far so good. We had an idea what we wanted to achieve and pretty quickly we ended up thinking about how to start and write end to end tests.
It is suggested that when you write and run e2e tests you should be able to have a deterministic state of all the systems so that you can easily assert whether the action was performed as it should.
Our main problems were:
Have a desired state of the system. This was hard because we didn’t own all the systems.
Have a desired state of the application. This was hard because we had a component that uses the html canvas.
The first one was solved by having an API that allows us to insert some data into the system which normally was not an application use case. The second one was solved by being able to talk to the state management component from e2e tests.
Now comes the best part, choosing a testing framework. We did some research and we decided to focus on 2 options, Zalenium and Cypress. Options like Nightwatch and puppeteer where considered as well. Both Zalenium and Cypress offered a really nice set of features like video recording, pretty nice integration with CI and Docker, a clean API and a nice dashboard, but the final winner for us was Cypress. We choose that because first of all our users mainly use Chrome. Also, Cypress seemed to be much faster than Zalenium and it managed to solve the problem of flaky tests. Another cool Cypress feature is its dashboard which you could use to interact with your tests. But the killer feature is that Cypress executes tests on the same environment as your application.
6) What if you want to test canvas?
Some parts of our application are written in canvas, and interacting with canvas is almost impossible. We decided to avoid canvas completely and interact with the application runtime. Our application is written in React and because Cypress runs on the same environment as our application we could dispatch actions and read from state in our tests.
7) Testing asynchronous flows
An interesting problem while testing was how to test application parts which are highly asynchronous in terms of communication with the backend. We have parts of the application that do short polling. To test this Cypress offers a dynamic way of configuring timeouts. For instance you could do something like:
`cy.get(‘some-selector’, {timeout: 50000})`
This way Cypress checks periodically whether this element is present but it retries until the timeout is done. As a timeout value we simply used SLO targets which were agreed between teams.
8 ) Automating tests
Automating tests was quite straight forward. In our CI/CD server we spin up 2 containers, one that runs the application and another one that runs the tests. After the process is done, those containers are destructed.
All this process was quite fun to work on and I learned a lot. Having end to end tests helped us understand how users could use our system and we automated Quality Assurance, something that was previously a manual process and sometimes also error prone.
We're hiring! Do you like working in an ever evolving organization such as Zalando? Consider joining our teams as a Software Engineer!