The testing balance for large integrations

Testing

Continuous Integration

The first step to delivering consistent and high-quality software is Continuous Integration (CI).
CI is all about ensuring your software is in a deployable state at all times.
That is, the code compiles and the quality of the code can be assumed to be of reasonably good quality.

Source control

CI starts with some shared repository, typically a source control system, such as Subversion (SVN) or Git. Source control systems make sure all code is kept in a single place.
It’s easy for developers to check out the source, make changes, and check in those changes. Other developers can then check out those changes.
In modern source control systems, such as Git, you can have multiple branches of the same software.
This allows you to work on different stages of the software without troubling, or even halting, other stages of the software.
For example, it is possible to have a development branch, a test branch, and a production branch. All new code gets committed on development; when it is tested and approved, it can move on to the test branch and, when your customer has given you approval, you can move it into development.

Another possibility is to have a single main branch and create a new (frozen) branch for every release. You could still apply bug fixes to release branches, but preferably not new features.

Don’t underestimate the value of source control.

It makes it possible for developers to work on the same project and even the same files without having to worry too much about overwriting others’ code or being overwritten by others.

Next to code, you should keep everything that’s necessary for your project in your repository. That includes requirements, test scripts, build scripts, configurations, database scripts, and so on.

Each check into this repository should be validated by your automated build server. As such, it’s important to keep check-ins small. If you write a new feature and change too many files at once, it becomes harder to find any bugs that arise.

CI server

Your builds are automated using some sort of CI server. Popular CI server software includes Jenkins (formerly Hudson), Team Foundation Server (TFS), CruiseControl, and Bamboo. Each CI server has its own pros and cons. TFS, for example, is the Microsoft CI server and works well with .NET (C#, VB.NET, and F#) and integrates with Visual Studio. The free version only has limited features for only small teams. Bamboo is the Atlassian CI server and, thus, works well with JIRA and BitBucket. Like TFS, Bamboo is not free. Jenkins is open source and free to use. It works well for Java, in which Jenkins itself was built, and works with plugins. There are a lot of other CI servers, all with their own pros and cons, but the thing they all have in common is that they automate software builds.

Your CI server monitors your repository and starts a build on every check in. A single build can compile your code, run unit tests, calculate code coverage, check style guidelines, lint your code, minify your code, and much more. Whenever a build fails, for example, because a programmer forgot a semi-colon and checked in invalid code or because a unit test fails, the team should be notified. The CI server may send an email to the programmer who committed the offending code, to the entire team, or you could do nothing (which is not best practice) and just check the status of your build every once in a while. The conditions for failure are completely up to the developer (or the team). Obviously, when your code does not compile correctly because it’s missing a semicolon, that’s a fail. Likewise, a failing unit test is an obvious fail. Less obvious is that a build can fail when a certain project does not have at least a 90% test code coverage or your technical debt, that is, the time it takes to rewrite quick and dirty solutions to more elegant solutions grows to more than 40 hours.

The CI server should build your software, notify about failures and successes, and ultimately create an artifact. This artifact, an executable of the software, should be easily available to everyone on the team. Since the build passed all of the teams, criteria for passing a build, this artifact is ready for delivery to the customer.

 

Software quality

That brings us to the point of software quality.

If a build on your CI server succeeds, it should guarantee a certain level of software quality.

I’m not talking perfect software that is bug-free all of the time, but software that’s well tested and checked for best practices. Numerous types of tests exists, but we will only look at a few of them in this article.

 

Unit tests

One of the most important things you can do to guarantee that certain parts of your software produce correct results is by writing unit tests. A unit test is simply a piece of code that calls a method (the method to be tested) with a predefined input and checks whether the result is what you expect it to be. If the result is correct, it reports success, otherwise it reports failure. The unit test, as the name implies, tests small and isolated units of code.

Let’s say you write a function int Add(int a, int b) in C# (I’m pretty sure every programmer can follow though):

public static class MyMath

{

public static int Add(int a, int b)

{

return a + b;

}

}

The first thing you want to test is whether Add indeed returns a + b and not a + a, or b + b, or even something random. That may sound easier than it is. If you test whether Add(1, 1) returns 2 and the test succeeds, someone might still have implemented it as a + a or b + b. So at the very least, you should test it using two unequal integers, such as Add(1, 2). Now what happens when you call Add(2147483647, 1)? Does it overflow or throw an exception and is that indeed the outcome you suspected? Likewise, you should test for an underflow (while adding!?). -2147483647 + -1 will not return what you’d expect. That’s three unit tests for such a simple function! Arguably, you could test for +/-, -/+, and -/- (-3 + -3 equals -6 and not 0), but you’d have to try really hard to break that kind of functionality, so those tests would probably not give you an extra useful test. Your final unit tests may look something like the following:

[TestClass]

public class MathTests

{

[TestMethod]

public void TestAPlusB()

{

int expected = 3;

int actual = MyMath.Add(1, 2);

Assert.AreEqual(expected, actual, “Somehow, 1 + 2 did not equal 3.”);

}

 

[TestMethod]

[ExpectedException(typeof(OverflowException))]

public void TestOverflowException()

{

// MyMath.Add currently overflows, so this test will fail.

MyMath.Add(int.MaxValue, 1);

}

 

[TestMethod]

[ExpectedException(typeof(OverflowException))]

public void TestOverflowException()

{

// MyMath.Add currently underflows, so this test will fail.

MyMath.Add(int.MinValue, -1);

}

}

Of course, if you write a single unit test and it succeeds, it is no guarantee that your software actually works. In fact, a single function usually has more than one unit test alone. Likewise, if you have written a thousand unit tests, but all they do is check that true indeed equals true, it’s also not any indication of the quality of your software. It suffices to say your tests should cover a large portion of your code and, at least, the most likely scenarios. I would say quality over quantity, but in the case of unit testing, quantity is also pretty important. You should actually keep track of your code coverage. There are tools that do this for you, although they cannot check whether your tests actually make any sense.

It is important to note that unit tests should not depend upon other systems, such as a database, the filesystem, or (third-party) services. The input and output of our tests need to be predefined and predictable. Also, we should always be able to run our unit tests, even when the network is down and we can’t reach the database or third-party service. It also helps in keeping tests fast, which is a must, as you’re going to have hundreds or even thousands of tests that you want to run as fast as possible. Instant feedback is important. Luckily, we can mock (or fake) such external components,.

Just writing some unit tests is not going to cut it. Whenever a build passes, you should have reasonable confidence that your software is correct. Also, you do not want unit tests to fail every time you make even the slightest change. Furthermore, specifications change and so do unit tests. As such, unit tests should be understandable and maintainable, just like the rest of your code. And writing unit tests should be a part of your day to day job. Write some code, then write some unit tests (or turn that around if you want to do Test-Driven Development). This means testing is not something only testers do, but the developers as well.

In order to write unit tests, your code should be testable as well. Each if statement makes your code harder to test. Each function that does more than one thing makes your code harder to test. A thousand-line function with multiple nested if and while loops (and I’ve seen plenty) is pretty much untestable. So when writing unit tests for your code, you are probably already refactoring and making your code prettier and easier to read. Another added benefit of writing unit tests is that you have to think carefully about possible inputs and desirable outputs early, which helps in finding edge cases in your software and preventing bugs that may come from them.

Integration tests

Checking whether an Add function really adds a and b is nice, but does not really give you an indication that the system as a whole works as well. As said, unit tests only test small and isolated units of code and should not interact with external components (external components are mocked). That is why you will want integration tests as well. Integration tests test whether the system as a whole operates as expected. We need to know whether a record can indeed be saved in and retrieved from a database, that we can request some data from an external service, and that we can log to some file on the filesystem. Or, more practically we can check whether the frontend that was created by the frontend team actually fits the backend that was created by the backend team. If these two teams have had any problems or confusion in communication, the integration tests will, hopefully, sort that out.

If for example, you create a service for a third party who want to interface with a system you wrote. The service does not do a lot basically it take the received message and forwarded it to another service that you use internally (and isn’t available outside of the network). The internal service have all of the business rules and could read from, and write to, a database. Furthermore, it would, in some cases, create additional jobs that would be put on a (asynchronous) queue, which is yet another service. Last, a fourth service would pick up any messages from the queue and process them. In order to process a single request, you potentially need five components (external service, internal service, database, queue, and queue processor). The internal service is thoroughly unit tested, so the business rules were covered. However, that still leaves a lot of room for errors and exceptions when one of the components is not available or has an incompatible interface.

 

Big bang testing

There are two approaches to integration testing: big bang testing and incremental testing. With big bang testing, you simply wait until all the components of a system are ready and then start testing. In the case of the mentioned example service, that mean developing and installing everything, then posting some requests and checking whether the external service could call the internal service, and whether the internal service could access the database and the queue and, not unimportant, give feedback to the external service. Furthermore, of course, you have to test whether the queue triggered the processing service and whether the processing service processed the message correctly too.

In reality, the processing also used the database; it put new messages on the queue and sent emails in case of errors. Additionally, all the components have to access the hard drive for logging to a file (and do not assume the filesystem is always available; the first time on production I actually ran into an Unauthorized Exception and nothing was logged). So that means even more integration testing.

 

Incremental testing

With incremental testing, you test components as soon as they are available and you create stubs or drivers (some sort of placeholder) for components that are not yet available. There are two approaches here:

  • Top-down testing: Using top-down testing would mean you would’ve checked whether the external service could make a call to the internal service and, if the internal service was not available yet, create a stub that pretends to be the internal service.
  • Bottom-up testing: Bottom-up is testing the other way around, so you’d start testing the internal service and create a driver that mimics the external service.
  • Incremental testing has the advantage that you can start defining tests early before all the components are complete. After that, it becomes a matter of filling in the gaps.

 

Acceptance tests

After having unit tested our code and checked whether the system as a whole works, we can now assume our software works and is of decent quality (at least, the quality we expect). However, that does not mean that our software actually does what was requested. It often happens that the customer requests feature A, the project manager communicates B, and the programmer builds C. There is a really funny comic about it with a swing (do a Google image search for how projects really work). Luckily, we have acceptance tests.

The important thing here is that the tests do more or less exactly what our users will do as well.

There is some confusion on the difference between integration tests and acceptance tests. Both test the entire system, but the difference is that integration tests are written from a technical perspective while acceptance tests are written from the perspective of the product owner or business users.

Smoke tests

Of course, even when all of your tests succeed, a product can still break in production. The database may be down or maybe you have a website and the web server is down. It is always important to also test whether your software is actually working in a production environment, so be sure to always do an automated smoke test after deployment that gives you fast and detailed feedback when something goes wrong. A smoke test should test whether the most important parts of your system work. A manual smoke test is fine (and I’d always manually check whether your software, at least, runs after a release), but remember it’s another human action that may be forgotten or done poorly.

Some people run smoke tests before doing integration and acceptance tests. Integration and acceptance tests test an entire system and, as such, may take a bit of time. A smoke test, however, tests only basic functionality, such as does the page load? When a smoke test fails, you can skip the rest of your tests, saving you some time and giving you faster feedback.

There are many types of tests available out there. Unit tests, smoke tests, integration tests, system tests, acceptance tests, database tests, functional tests, regression tests, security tests, load tests, UI tests… it never ends! I’m pretty sure you could spend an entire year doing nothing but writing tests. Try selling that to your customer; you can’t deliver any working software, but the software you can’t deliver is really very well tested. Personally, I’m more pragmatic. A test should support your development process, but a test should never be a goal on its own. When you think you need a test, write a test. When you don’t, don’t. Unfortunately, I have seen tests that did absolutely nothing (except give you a false sense of security), but I’m guessing someone just wanted to see tests, any tests, really bad.

Teamwork

Imagine doing all this locally on your own computer. For simplicity, let’s say you’ve got some code that has to compile and some unit tests that have to run. Easy enough, everybody should be able to do that. Except your manager, who doesn’t have the developer software installed at all. Or the intern, who forgot to kick off the unit tests. Or the developer, who works on a different OS making some tests, that aren’t important to him, fail (for example, we have an application developed on and for Windows, but a complimentary app for iOS developed on a Mac). Suddenly, getting a working and tested executable becomes a hassle for everyone who isn’t working on this project on a daily basis.

 

E2E Testing

Whatever level of test you have in place in your testing organization, you need to assess and mitigate risks coming from the simple fact that something is changing in your software ecosystem.
Each level of testing described above give confidence of the single

End-to-end testing is a methodology used to test whether the flow of an application is performing as designed from start to finish. The purpose of carrying out end-to-end tests is to identify system dependencies and to ensure that the right information is passed between various system components and systems.
End-to-end testing involves ensuring that the integrated components of an application function as expected. The entire application is tested in a real-world scenario such as communicating with the database, network, hardware and other applications.

For example, a simplified end-to-end testing of an email application might involve:

  • Logging in to the application
  • Accessing the inbox
  • Opening and closing the mailbox
  • Composing, forwarding or replying to email
  • Checking the sent items
  • Logging out of the application

 

Integration or E2E Testing? Where is the balance?

Allow me to introduce you to the “testing pyramid”.

Let me give you a short break down of what it all means: As of today, the majority of testing effort still goes into End-To-End (E2E) testing, which means by default that most of the testing budget goes into E2E testing as well.


Manual testing is by far dominant in a lot of aspects, whatever level of automation you have in place, there will always be always– accounting for over 70% of testing efforts. Bottom line: the “testing pyramid”, standing on its head, makes for a pretty unstable geometric body.

Even so, there are many reasons to love E2E testing. Developers like it, since it means testing is shifted to some “higher level” and someone else needs to do it. Executives like it, since it promises to be very close to the real user experience and “real” scenarios are covered. Testers like it, since they prefer testing through the UI over testing APIs – UIs are simply more concrete and tangible with business knowledge.

On the other hand however, E2E testing has some severe drawbacks, which become more and more critical:

E2E tests require a fully functioning system landscape, which defers E2E tests to the very end of the development/test cycle. The possibility of accelerating speed-to-market is fundamentally contradicted by this constraint.

E2E tests are very costly and time-consuming, exacerbating the pressure placed on the tests from a cost and time perspective.

Due to increasingly interconnected system landscapes with SOA architectures, E2E processes impact a broader variety of systems, pushing the beginning of testing even further along the project’s timeline.

Bottom line: having E2E tests as the major test stage drives towards a dead end.

The solution to this problem was introduced several years ago in the course of Agile development theory and is called the “inversion of the testing pyramid”:

Allow me to introduce you to the “inverted testing pyramid”.

Let me give you a short break down of what it all means: As of today, the majority of testing effort still goes into End-To-End (E2E) testing, which means by default that most of the testing budget goes into E2E testing as well.

In the “inverted testing pyramid””, E2E tests make up only a small portion of the test effort. Test automation is driven to its extreme, achieving automation rates of 90+%. The overall testing effort is reduced significantly.

In regards to test-duration, which is critical to time-to-market requirements, the benefits of the inversion become even more apparent: test-cycles have been reduced from 8 weeks to 3 days, achieving even higher coverage of business risks; an efficiency gain of 90+ %.

 

References used:

Continuous Integration, Delivery, and Deployment
by Sander Rossel