Since there are many people relying on this library working properly, and we don't want to accidentally introduce some changes which cause it to break, we're using unit-tests ensuring that the individual functions in our code-base work properly. This guide will help you get started with writing new unit-tests, or editing existing ones, which is often needed when changing things around.
NOTE: This is a practical guide to quickly get you started with writing unit-tests, not a full introduction to what unit-tests are, and it will only cover some very basics which can help you understand our unit-tests. If you're looking for a full introduction, you can take a look at the Additional Resources section at the bottom.
We are using the following modules and packages for our unit tests:
- pytest
- pytest-cov
- coverage.py (as a part of pytest-cov)
- unittest.mock (standard library)
We decided on using pytest
instead of the unittest
module from standard library since it's much more beginner
friendly and it's generally easier to use.
When running the tests, you should always be in an activated virtual environment (or use poetry run
to run commands
for the tests from within the environment).
To make things simpler, we made a few shortcuts/aliases using taskipy:
poetry run task test-nocov
will run all unit-tests usingpytest
.poetry run task test
will runpytest
withpytest-cov
, collecting code coverage informationpoetry run task test /path/to/test.py
will run specific testpoetry run task retest
will rerun only previously failed tests
When actively developing, you'll most likely only be working on some portion of the code-base, and as the result, you won't need to run the entire test suite, instead you can only run tests for a specific file with
poetry run task test-nocov /path/to/test.py
When you are done and are preparing to commit and push your code, it's a good idea to run the entire test suite as a sanity check that you haven't accidentally introduced some unexpected bugs:
poetry run task test
Since consistency is an important consideration for collaborative projects, we have written some guidelines on writing tests for the project. In addition to these guidelines, it's a good idea to look at the existing code base for examples (e.g., test_connection.py).
To organize our test suite, we have chosen to mirror the directory structure of mcproto
in the
tests
subdirectory. This makes it easy to find the relevant tests by providing a natural grouping of
files. More general testing files, such as helpers.py
are located directly in the tests
subdirectory.
All files containing tests should have a filename starting with test_
to make sure pytest
will discover them.
This prefix is typically followed by the name of the file the tests are written for. If needed, a test file can contain
multiple test classes, both to provide structure and to be able to provide different fixtures/set-up methods for
different groups of tests.
When writing unit tests, it's really important to make sure that each test that you write runs independently from all of the other tests. This both means that the code you write for one test shouldn't influence the result of another test and that if one tests fails, the other tests should still run.
The basis for this is that when you write a test method, it should really only test a single aspect of the thing you're testing. This often means that you do not write one large test that tests "everything" that can be tested for a function, but rather that you write multiple smaller tests that each test a specific branch/path/condition of the function under scrutiny.
To make sure you're not repeating the same set-up steps in all these smaller tests, pytest
provides
fixtures that can be executed before and after each test is run. In
addition to test fixtures, it also provides support for
parametrization, which is a way of re-running the same
tests with different values. If there's a failure, pytest will then show us the values that were being used when this
failure occurred, making it a much better solution than just manually using them in the test function.
As we are trying to test our "units" of code independently, we want to make sure that we don't rely on objects and data generated by "external" code. If we did, the we might end up observing a failure that something external, and not a failure in the code we're actually testing.
However, the objects that we're trying to test often depend on these external pieces of code. Fortunately, there is a solution to for that: we use fake objects, that act like the true objects. We call these fake objects "mocks".
To create these mock objects, we use the unittest.mock
module (part of python's standard library). In addition, we have also defined some helper mixin classes, to make our
mocks behave how we want (see examples below).
As an example of mocking, let's create a fake socket, which the connection class can use to make the send
calls, when
sending over some data, That way, we don't have to actually establish a connection to some external server, and can
instead test out that the connection class works properly and calls our mocked methods with correct data.
import socket
from unittest.mock import Mock
from mcproto.connection import TCPSyncConnection
def test_connection_sends_correct_data():
mock_socket = Mock(spec_set=socket.socket)
conn = TCPSyncConnection(mock_socket)
data = bytearray("hello", "utf-8")
conn.write(data)
mock_socket.send.assert_called_once_with(data)
In the example above, we've just made sure that when we try to write some data into a connection class, it properly
call the send
method of socket, with our data, sending them out.
The spec_set
attribute limits what attributes will be accessible through our mock socket, for example
mock_socket.close
will work, because the socket.socket
class has it defined. However mock_socket.abc
will not be
accessible an will produce an error, because the socket class doesn't define it.
By default, a mock will allow access to any attribute, which is however not what we usually want, as a test should fail
if an attribute that shouldn't exist is accessed. That's why we often end up setting spec_set
with our mocks.
Alright, now let's consider a bit more interesting example. What if we wanted to ensure that our connection can properly read data that were sent to us through a socket?
def test_connection_reads_correct_data():
mock_socket = Mock(spec_set=socket.socket)
mock_socket.recv.return_value = bytearray("data", "utf-8")
conn = TCPSyncConnection(mock_socket)
received = conn.read(4) # 4 bytes, for 4 characters in the word "data"
assert received == bytearray("data", "utf-8")
mock_socket.recv.assert_called_once_with(4)
Cool! But in real tests, we'll need something a bit more complicated, as right now, our recv
method will just naively
return 4 byte long data, no matter what the passed length attribute was. We can afford to do this here, as we know
we'll be reading 4 bytes and we'll only make one recv call to do so. But what if our connection actually read the data
procedurally, only reading a few bytes at a time, and then just joining them together?
Well, this is a bit more complex, but it's still doable, let's see it:
from unittest.mock import Mock
from mcproto.connection import TCPSyncConnection
from tests.helpers import CustomMockMixin # Explained later, in it's own section
class ReadFunctionMock(Mock):
def __init__(self, *a, combined_data: bytearray, **kw):
super().__init__(*a, **kw)
self.combined_data = combined_data
def __call__(self, length: int) -> bytearray:
"""Override mock's __call__ to make it return part of our combined_data bytearray.
This allows us to define the combined data we want the mocked read function to be
returning, and have each call only take requested part (length) of that data.
"""
self.return_value = self.combined_data[:length]
del self.combined_data[:length]
return super().__call__(length)
class MockSocket(CustomMockMixin, Mock):
spec_set = socket.socket
def __init__(self, *args, read_data: bytearray, **kwargs) -> None:
super().__init__(*args, **kwargs)
self._recv = ReadFunctionMock(combined_data=read_data)
def recv(self, length: int) -> bytearray:
return self._recv(length)
def test_connection_partial_read():
mock_socket = MockSocket(read_data=bytearray("data", "utf-8"))
conn = TCPSyncConnection(mock_socket)
data1 = conn.read(2)
assert data1 == bytearray("da", "utf-8")
data2 = conn.read(2)
assert data2 == bytearray("ta", "utf-8")
def test_connection_empty_read_fails():
mock_socket = MockSocket(read_data=bytearray())
conn = TCPSyncConnection(mock_socket)
with pytest.raises(IOError, match="Server did not respond with any information."):
conn.read(1)
Well, that was a lot! But it finally gave us an idea of how mocks can look like in tests, and how they help us represent the objects that they're "acting" to be.
By default, unittest.mock.Mock
and unittest.mock.MagicMock
classes cannot mock coroutines, since __call__
method
they provide is synchronous. The
AsyncMock
that has been introduced
in python 3.8 is an asynchronous version of MagicMock
, that
can be used anywhere a coroutine is expected.
While Mock classes are pretty well written, there are some features which we often want to change. For this reason, we
have a special mixin class: tests.helpers.CustomMockMixin
, which performs these custom overrides for us.
Namely, we stop the propagation of spec_set restricted mocks in child mocks. Let's see an example to better understand what this means:
from tests.helpers import CustomMockMixin
class CustomDictMock(CustomMockMixin, Mock):
spec_set = dict
normal_mock = Mock(spec_set=dict)
custom_mock = CustomDictMock()
# Let's run the `pop` method, which is accessible from both mocks, as it's a
# part of the `dict`'s specification.
x = normal_mock.pop("abc")
y = custom_mock.pop("abc")
x.foobar() # TypeError: No such attribute!
y.foobar() # Works!
x.pop("x") # Works
y.pop("x") # Works
As you can see from the example above, by default, mocks return new child mocks whenever any attribute is accessed.
However with mocks limited to some spec_set
, these child mocks will also be limited to the same spec_set
. However
in most cases, attributes/functions of the mocked classes wouldn't actually hold/return instances of that same class.
They can really hold anything, and so this kind of limitation doesn't really make sense, and so we instead return
regular unrestricted mock classes as the child mocks.
Additionally, the CustomMockMixin
also provides support for using spec_set
as a class attribute, which regular
mocks don't have. This has proven to be quite useful when making custom mock classes, as the alternative would be to
override __init__
and pass the spec_set
attribute manually, each time.
Over time, more helpful features might be added to this class, and so it's advised to always inherit from it whenever making a mock object, unless you have a good reason not to.
Even though mocking is a great way to let us use fake objects acting as real ones, without patching, we can only use mocks as arguments. However that greatly limits us in what we can test, as some functions may be calling/referencing the external resources that we'd like to mock directly inside of them, without being overridable through arguments.
Cases like these are when patching comes into the picture. Basically, patching is just about (usually temporarily) replacing some built-in / from external code object, by a mock, or some other object that we can control from the tests.
A good example would be for example the open
function for reading/writing files. We likely don't want any
actual files to be written in the tests, however we might need to test a function that writes these files, and perhaps
check that the content written matches some pattern, ensuring that it works properly.
While there is some built-in support for caching in the unittest.mock
library, we generally use pytest
's
monkeypatching as it can act as a fixture and integrates
well with the rest of our test codebase, which is written with pytest in mind.
Finally, there are some considerations to make when writing tests, both for writing tests in general and for writing tests for our project in particular.
Having test coverage is a good starting point for unit testing: If a part of your code was not covered by a test, we know that we have not tested it properly. The reverse is unfortunately not true: Even if the code we are testing has 100% branch coverage, it does not mean it's fully tested or guaranteed to work.
One problem is that 100% branch coverage may be misleading if we haven't tested our code against all the realistic
input it may get in production. For instance, take a look at the following format_join_time
function and the test
we've written for it:
# Source file:
from typing import Optional
from datetime import datetime
def format_join_time(time: Optional[datetime] = None, name: str) -> str:
str_time = time.strfptime("%d-%m-%Y") if time else "unknown"
return f"User {name!r} has joined at: {str_time}"
# Test file:
from source_file import format_join_time
def test_format_join_time():
res = format_join_time("ItsDrike", None)
assert res == "User 'ItsDrike' has joined, time: unknown"
If you were to run this test, the function pass the test, and the branch coverage would show 100% coverage for this function. Can you spot the bug the test suite did not catch?
The problem here is that we have only tested our function with a time that was None
. That means that
time.strptime("%d-%m-%Y)
was never executed during our test, leading to us missing the spelling mistake in
strfptime
(it should be strftime
).
Adding another test would not increase the test coverage we have, but it does ensure that we'll notice that this function can fail with realistic data
def test_format_join_time_with_non_none_time():
res = format_join_time("ItsDrike", datetime(2022, 12, 31)
assert res == "User 'ItsDrike' has joined, time: 2022-12-31"
Leading to the test catching our bug:
collected 2 items
run-last-failure: rerun previous 1 failure first
tests/test_foo.py::test_format_join_time_with_non_none_time FAILED [ 50%]
tests/test_foo.py::test_format_join_time PASSED [100%]
=============================================== FAILURES ===============================================
_______________________________ test_format_join_time_with_non_none_time _______________________________
def test_format_join_time_with_non_none_time():
> res = format_join_time("ItsDrike", datetime(2022, 12, 31))
tests/test_foo.py:11:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'ItsDrike', time = datetime.datetime(2022, 12, 31, 0, 0)
def format_join_time(name: str, time: Optional[datetime] = None) -> str:
> str_time = time.strfptime("%d-%m-%Y") if time else "unknown"
E AttributeError: 'datetime.datetime' object has no attribute 'strfptime'. Did you mean: 'strftime'?
mcproto/foo.py:5: AttributeError
======================================= short test summary info ========================================
FAILED tests/test_foo.py::test_format_join_time_with_non_none_time - AttributeError: 'datetime.datetime'
object has no attribute 'strfptime'. Did you mean: 'strftime'?
=================-============= 1 failed, 1 passed, 2 warnings in 0.02s ================================
What's more, even if the spelling mistake would not have been there, the first test did not test if the format_join_time function formatted the join time according to the output we actually want to see.
All in all, it's not only important to consider if all statements or branches were touched at least once with a test, but also if they are extensively tested in all situations that may happen in production.
Another restriction of unit testing is that it tests, well, in units. Even if we can guarantee that the units work as
they should independently, we have no guarantee that they will actually work well together. Even more, while the
mocking described above gives us a lot of flexibility in factoring out external code, we are work under the implicit
assumption that we fully understand those external parts and utilize it correctly. What if our mocked socket
object
works with a send
method, but it got changed to a send_message
method in a recent update? It could mean our tests
are passing, but the code it's testing still doesn't work in production.
The answer to this is that we also need to make sure that the individual parts come together into a working application. Since we currently have no automated integration tests or functional tests, that means it's still very important to test out the code you've written manually in addition to the unit tests you've written.
- Quick guide on using mocks in official python docs
- Ned Batchelder's PyCon talk: Getting Started Testing
- Corey Schafer video about unittest
- RealPython tutorial on unittest testing
- RealPython tutorial on mocking
This document was heavily inspired by python-discord's tests README