I’ve worked on 5+ apps using The Composable Architecture and I really like it but I’ve observed few issues that we can run into in bigger apps.
As you probably know I’m also a big proponent of TDD / BDD, so let’s start by talking about what I don’t like about the way TCA does testing.
What does exhaustivity mean?
Exhaustivity in the context of TCA tests means every test you perform has to replicate all the actions, state changes, and effects steaming from the first trigger you want to verify.
This means that you usually will write a single test for each action you have in your application.
This sounds good on the surface right?
It leads to a code that has a lot of TDD anti-patterns. Let’s get into the details.
Benefits of TDD
If you read my Best practices for testing apps you might remember that I do TDD for a few different reasons.
Let’s highlight the 2 most important ones in the context of TCA.
Confidence to refactor internal implementation details without breaking expected behavior
When we write new tests we want to through the process of:
- Red - Write failing test for non-existent (empty) API
- Green - Implement minimal code to make the test pass
- Refactor - Change implementation detail without breaking the test
Now the key factor here is that once the test is Green, it should usually not change as we change the implementation.
—
Why does that matter?
It gives us the confidence to constantly refactor and improve implementation details, and if our tests are passing without any changes we know we aren’t breaking consumer/user expectations (assuming decent test coverage).
—
That confidence is even more important in the long-term maintenance of our codebase, if we need to change tests when our implementation changes it means that they are fragile.
This fragility is one of the most common anti-patterns in TDD and why a lot of people that are new to testing end up saying testing isn’t worth it.
A single test should focus and verify a single side-effect
If you need to analyze your test each time it fails, it probably means your test is doing way more than it should.
Test failure should be clear and concise, you should immediately know what failed. They also serve for good documentation of expected behavior when they are focused.
As such even if a single user action causes N side-effects, it’s better to create N tests rather than 1 test with N assertions.
It’s also part of The giant anti-pattern
TCA Exhaustive tests
Exhaustive tests by design fail the 2 main benefits we just talked about.
Your tests will need to verify every single assertion and almost any changes in your implementation details will cause you to have to update your tests, making them extremely fragile.
As an example, one of the tests to verify what happens when the user closes a tab in ARC Browser was 100+ lines of code, without even counting the code to set it up and that’s not even the biggest test we had.
Do you know what I’ve seen happen in apps that have tests like this, time and time again?
- People stop writing new tests because it’s too complicated
- When the old test fails, folks will just copy the expectation to make the test pass.
- Without really analyzing the logic, because it’s too complicated
- When a test fails it’s hard to understand why it failed because you have to read everything in it, it’s not fun to read 100+ lines of test code to try to understand what’s going on
- Even re-ordering events will cause tests to fail
Coming up with alternative
When I started working at The Browser Company I kicked off a discussion about those issues that I’ve observed.
These are the proposal requirements I’ve come up with:
- Non-exhaustive
- Devs decide which state changes to assert and which to ignore
- Need to be able to verify effect actions were received
- Effects are processed as normal (not waiting for receive)
- Tests can finish successfully even with long-running effects still active
- We still want to be able to ensure they did in-fact finish (opt-in)
- Try to keep API as close as possible to the original to be able to choose when we want
.exhaustive
tests and when we want to have.nonExhaustive
- Since there are still places when exhaustive can make sense (think TDD vs BDD testing differences)
- e.g.
let store = TestStore(.nonExhaustive, ...)
- Additionally offer the ability to access state outside of
send
blocks for standardXCTAssert
testing
After discussing and clearing this with the team I proposed a new API.
New API
Reference sample state / actions for the samples
struct State: Equatable {
var name: String = "Krzysztof"
var surname: String = "Zabłocki"
var age: Int = 33
var mood: Int = 0
}
enum Action: Equatable {
case changeIdentity(name: String, surname: String)
case changeAge(Int)
case changeMood(Int)
case advanceAgeAndMoodAfterDelay
}
let reducer = Reducer<State, Action, AnySchedulerOf<DispatchQueue>> { state, action, scheduler in
switch action {
case let .changeIdentity(name, surname):
state.name = name
state.surname = surname
return .none
case .advanceAgeAndMoodAfterDelay:
return .merge(
.init(value: .changeAge(state.age + 1)),
.init(value: .changeMood(state.mood + 1))
)
.delay(for: 1, scheduler: scheduler)
.eraseToEffect()
case let .changeAge(age):
state.age = age
return .none
case let .changeMood(mood):
state.mood = mood
return .none
}
}
- Verifying state happens inside send or receive blocks, same as in original TestStore, but in a non-exhaustive manner, meaning that if your reducer modifies many state properties, you can simply ignore the ones you don’t care about, e.g.
store.send(.changeIdentity(name: "Marek", surname: "Ignored")) {
$0.name = "Marek"
// we don't verify surname since we don't care
}
- You can send multiple actions without asserting any state changes, and verify state change in last send block
store.send(.changeIdentity(name: "Adam", surname: "Stern"))
store.send(.changeIdentity(name: "Piotr", surname: "Galiszewski"))
// Verify final state matches
store.send(.changeIdentity(name: "Merowing", surname: "Info")) {
$0.name = "Merowing"
$0.surname = "Info"
}
- You can verify actions received from effects and state changes by using final block, but unlike a standard store, do it in a non-exhaustive manner, e.g. here we only check we get changeAge and ignore changeMood effect callback
store.send(.advanceAgeAndMoodAfterDelay)
// When moving scheduler forward in time
testScheduler.advance(by: 1)
// Verify that it received delayed action and updated state
// note that we choose ignore checking for whether `changeMood` was received
store.receive(.changeAge(34)) {
$0.age = 34
}
You could also use an alternative approach for the final check:
XCTAssertEqual(store.state.age, 34)
Converting from Exhaustive to Non-Exhaustive tests
- Given N behaviors you were verifying in a single test
- Split that 1 test across N tests with common setup
- Copy the implementation and then remove assertions unrelated to the behavior under test
- Your exhaustive test probably has many crufty checks that were irrelevant to the test but needed to pass the exhaustivity test. Now that this test is non-exhaustive, delete as many of these as you can to keep your tests focused and easy to understand.
- You should be able to drop most, if not all store.receive calls — since this was testing for effects that are usually implementation details, in most cases, you don’t want to be testing them in a non-exhaustive world.
This focuses your tests on a small surface area as possible, it’s better to split one big test into N small tests around different side-effects of action because if the code introduces a bug only some of them will fail, making it much clearer what broke.
If you are using state mutations as functions of your State objects e.g.
state.doSomething()
Try to avoid doing that in tests and instead rely on setting the state changes explicitly:
- Those functions will depend on previous state values
- They are an implementation detail and if you introduce a bug in them your test won’t fail anymore, defeating the purpose of the test existing in the first place.
What does this give us?
This pattern will mean we test less logic in a single test, so we need to have a higher number of tests to match up coverage but it’s a good thing because:
- You are duplicating less of an implementation detail (especially side-effects relationships)
- Tests are easier to read because they are focused
- If we introduce regression it should only break tests that exercised that part of application behavior not everything
Conclusion
These problems might not be visible in smaller-scale projects, but they still exist. They are very visible in larger projects, FYI our Browser is one of the biggest clients of the TCA framework.
If you don’t like exhaustivity and agree with my thoughts about it, you can start using the proposed test store implementation today, it’s available as TBCTestStore
under our develop fork of TCA which we keep up-to-date with the original repo.