Srikanth Sastry A Techie in Boston

Mocks, Stubs, and how to use them

Photo by Polina Kovaleva from Pexels Photo by Polina Kovaleva from Pexels

Test doubles are the standard mechanism to isolate your System-Under-Test (SUT) from external dependencies in unit tests. Unsurprisingly, it is important to use the right test double for each use case for a maintainable and robust test suite. However, I have seen a lot of misuse of test doubles, and suffered through the consequences of it enough number of times to want to write down some (admittedly subjective) guidelines on when an how to use test doubles.

Briefly, test doubles are replacements for a production object used for testing. Depending on who you ask, there are multiple different categorizations of test doubles; but two categories that appears in all of these categorizations are mocks and stubs. So I will focus on on these two. I have seen mocks and stubs often conflated together. The problem is made worse by all the test-double frameworks’ terminology: they are often referred to as ‘mocking’ frameworks, and the test doubles they generate are all called ‘mocks’.

Mocks

woman wearing an emoji mask

Image by Andii Samperio from Pixabay

Mocks are objects that are used to verify ‘outbound’ interactions of the SUT with external dependencies. This is different from the notion of ‘mocks’ that ‘mocking frameworks’ generate. Those ‘mocks’ are more correctly the superclass of test doubles. Examples where mocks are useful include the SUT logging to a log server, or sending an email, or filing a task/ticket in response to a given input/user journey. This becomes clearer with an illustration.

from unittest.mock import MagicMock

class TestSUT(unittest.TestCase):
    def test_log_success(self) -> None:
        mock_log_server = MagicMock(spec=LogServerClass, autospec=True)
        mock_log_server.log = MagicMock(return_value=True)
        sut = SUT(log_server=mock_log_server)
        
        sut.test_method(input="foo")
        
        # This is ok!
        mock_log_server.log.assert_called_once_with(message="foo")

Note that in the above illustration, we verify that the message is sent to the the log server exactly once. This is an important part of the SUT’s specification. It the SUT were to start logging multiple messages/records for the request, then it could pollute the logs or even overwhelm the log server. Here, even though logging appears to be a side effect of test_method, this side effect is almost certainly part of SUT’s specification, and needs to be verified correctly. Mocks play a central role in such verifications.

Stubs

Robot imitating family

Unlike mocks, stubs verify ‘inbound’ interactions from external dependencies to the SUT. Stubs are useful when replacing external dependencies that ‘send’ data to the SUT in order for the SUT to satisfy its specification. Examples include key value stores, databases, event listeners, etc. The important note here is that the outbound interaction to the stub should not be asserted in the tests; that’s an anti pattern (it results in over-specification)! Here is an illustration.

from unittest.mock import MagicMock

class TestSUT(unittest.TestCase):
    def test_email_retrieval(self) -> None:
        stub_key_value_store = MagicMock(spec=KeyValueStoreClass, autospec=True)
        stub_key_value_store.get = MagicMock(return_value="user@special_domain.com")
        sut = SUT(key_value_store=stub_key_value_store)
        
        email_domain = sut.get_user_email_domin(username="foo")
        
        # This is ok!
        self.assertEquals("special_domain.com", email_domain)
        
        # THIS IS NOT OK!
        stub_key_value_store.get.assert_called_once_with(username="foo")

In the above illustration, we create a stub for the key value store (note that this is a stub even thought the object is a ‘mock’ class) that returns "user@special_domain.com" as a canned response to a get call. The test verifies that the SUT’s get_user_email_domain is called, it returns the correct email domain. What is important here is that we should not assert that there was a get call to the stub. Why? Because the call to the key value store is an implementation detail. Imagine a refactor that causes a previous value to be cached locally. If the unit tests were to assert on calls to the stubs, then such refactors would result in unit test failures, which undermines the utility, maintainability, and robustness of unit tests.

Fakes, instead of stubs

A small detour here. When using a stub, always consider if you can use a fake instead. There are multiple definitions of a fake, and the one I am referring to is the following. A fake is a special kind of stub that implements the same API as the production dependency, but the implementation is much more lightweight. This implementation may be correct only within the context of the unit tests where it is used. Let’s reuse the previous illustration of using a stub, and replace the stub with a fake. Recall that we stubbed out the get method of KeyValueStoreClass to return the canned value "user@special_domain.com". Instead, we can implement a fake KeyValueStoreClass that uses a Dict as follows.

from unittest.mock import MagicMock
from typing import Dict

# We assume a simplistic API for KeyValueStoreClass with just
# update and get methods.
class KeyValueStoreClass:
    def update(self, k: str, v: str) -> None:
        ...
    def get(self, k: str) -> str:
        ...

class FakeKeyValueStoreClassImpl:
    def __init__(self):
        self.kvs: Dict[str, str] = {}
    
    def update(self, k:str, v:str) -> None:
        self.kvs[k] = v

    def get(self, k: str) -> str:
        return self.kvs[k]


class TestSUT(unittest.TestCase):
    def test_email_retrieval(self) -> None:
        FakeKeyValueStoreClass = MagicMock(return_value=FakeKeyValueStoreClassImpl())
        fake_key_value_store = FakeKeyValueStoreClass()
        fake_key_value_store.update(k="foo", v="user@special_domain.com")
        sut = SUT(key_value_store=fake_key_value_store)
        
        email_domain = sut.get_user_email_domin(username="foo")
        
        self.assertEquals("special_domain.com", email_domain)

The advantage of using a fake is that the test becomes much more robust and is more resistant to refactoring. It also becomes more extensible. When using a stub, if we wanted to test a different user journey, we would need to inject a new return value for KeyValueStoreClass.get method. We would in one of two ways: (1) resetting the mock, which is a bad anti-pattern, or (2) initialize the stub to return a preconfigured list of canned values, in order, which makes the test more brittle (consider what happens if the SUT chooses to call get for the same key twice vs. calls get for different keys once each). Using a fake sidesteps these issues.

But my dependency has both inbound and outbound interactions!

Photograph of man double exposure

Despite all your efforts to separate out the test cases that need stubs and the ones that need mocks, you will inevitably find yourself needing to test a scenario in which you need to verify both inbound and outbound interactions with an external dependency. How do we address that?

First, if you need to assert on the outbound interaction of the same call that is stubbed, then you really don’t need that test. Just use a stub/fake and do not assert on the outbound interaction. Next, the only legitimate case of needing to verify both inbound and outbound interactions is if they are on distinct APIs of the same dependency. For example, the SUT could be reading from a file, and you need to test that (1) the contents of the file were read correctly, and (2) the file object was closed after the file was read. In this case, it is perfectly ok to stub the file read method while mocking the close method. Here is an illustration.

from unittest.mock import MagicMock, patch

class TestSUT(unittest.TestCase):
    def test_file_read(self) -> None:
        file_mock_stub_combo = MagicMock()
        # Using this as a stub by injecting canned contents of the file
        file_mock_stub_combo.__iter__.return_value = ["1234"]
        
        # Next, we treat the file open call as a mock.
        with patch("builtins.open",
                   return_value=file_mock_stub_combo, 
                   create=True
                  ) as mock_file:
            sut = SUT(filename="foo")
            file_contents = sut.get_contents()
            
            # Assertions on call to file open.
            # Treating the 'open' call as a mock.
            mock_file.assert_called_once_with("foo")
        
            # Assertion on the contents returned.
            # Treating the `read` as a stub.
            self.assertEquals("1234", file_contents)
        
            # Assertion on the outbound interaction of file close.
            # Treating the 'close' call as a mock.
            file_mock_stub_combo.close.assert_called_once()

DRY unit tests are bad... mkay

DRY

“Don’t Repeat Yourself” (DRY) is arguably one of the most important principles in software engineering. It is considered a truism among many. A consequence of such dogmatic allegiance to DRYness is that we see a lot of DRY unit tests; this is where the utility of the DRY principle breaks downs and starts causing more problems that it solves.

So, what’s wrong with DRY Unit Tests?

Presumably, we are all convinced of the benefits of DRYing your code (interested readers can go the Wikipedia page). It does have some downsides, and so you have the notion of the DAMP/MOIST/AHA principle. Interestingly, the reasons why DRYness doesn’t always work out in production code are different from why it is a bad idea to write DRY unit tests. I see five ways in which (a) test code is different from production code and (b) it contributes to why test code should not be DRY.

  1. Tests (conceptually) do not yield well to common abstractions.
  2. Test code’s readability always takes precedence over performance, but not so for production code.
  3. Production code enjoys the safety net of test code, but test code has no such backstop.
  4. DRY production code can speed up developer velocity, but DRY test code hinders developer velocity.
  5. Complex changes to production code can be reviewed faster with pure green/pure red test code changes, but complex changes to test code cannot be reviewed easily.

Let’s explore each one in more detail.

DRYness and Abstraction

Abstract In practice, DRYing out code results in building abstractions that represents a collection of semantically identical operations into common procedure. If done prematurely, then DRYing can result in poorer software. In fact, premature DRYing is the motivation for advocating the AHA principle. While that argument against DRYness works well in production code, it does not apply for test code.

Test code is often a collection of procedures, and each procedure steps the System-Under-Test (SUT) through a distinct user journey and compares the SUT’s behavior against pre-defined expectations. Thus, almost by design, test code does not yield itself semantically similar abstractions. The mistake that I have seen software engineers make is to mistake syntactic similarly for semantic similarity. Just because the tests’ ‘Arrange’ sections look similar does not mean that they are doing semantically the same thing in both places; in fact, they are almost certainly doing semantically different things because otherwise, the tests are duplicates of each other!

By DRYing out such test code, you are effectively forcing abstractions where none exist, and that leads to the same issues that DRYness leads to in production code (See [1], [2], [3], [4] for examples).

Readability

Abstract Most code is read more often than is written/edited. Unsurprisingly, it is important to favor code readability, even in production code. However, in production code, if this comes at a steep cost in performance and/or efficiency, then it is common (and prudent) to favor performance over readability. Test code, on the other hand, is less subject to the (potential) tension between readability and performance. Yes, unit tests need to be ‘fast’, but given the minuscule amount of data/inputs that unit tests process, speed is not an issue with hermetic unit tests. The upshot here is that there is no practical drawback to keeping test code readable.

DRYing out test code directly affects its readability. Why? Remember that we read unit tests to understand the expected behavior of the system-under-test (SUT), and we do so in the context of a user journey. So, a readable unit test needs to explain the user journey it is executing, the role played by the SUT in realizing that user journey, and what a successful user journey looks like. This is reflected in the Arrange-Act-Assert structure of the unit test. When you DRY out your unit tests, you are also obfuscating at least one of those sections in your unit test. This is better illustrated with an example.

A common DRYing in unit tests I have seen looks as follows:

class TestInput(typing.NamedTuple):
    param1: str
    param2: typing.Optional[int]
    ...

class TestOutput(typing.NamedTuple):
    status: SomeEnum
    return_value: typing.Optional[int]
    exception: typing.Optional[Exception]
    ...

class TestCase(typing.NamedTuple):
    input: TestInput
    expected_output: TestOutput
        
class TestSequence(unittest.TestCase):
    
    @parameterized.expand([
        [test_input1, expected_output1],
        [test_input2, expected_output2],
        ...
    ])
    def test_somethings(self, test_input: TestInput, expected_output: TestOutput) -> None:
        self._run_test(test_input, expected_output)
        
    def _run_test(self, test_input: TestInput, expected_output: TestOutput) -> None:
        sut = SUT(...)
        prepare_sut_for_tests(sut, test_input)
        output = sut.do_something(test_input.param2)
        test_output = make_test_output(output, sut)
        self.assertEquals(expected_output, test_output)

On the face of it, it looks like DRY organized code. But for someone reading this test to understand what SUT does, it is very challenging. They have no idea why the set of test_inputs were chosen, what is the material difference among the inputs, what user journeys do each of those test cases represent, what are the preconditions that need to be satisfied for running sut.do_something(), why is the expected output the specified output, and so on.

Instead, consider a non-DRY alternative.

class TestSequence(unittest.TestCase):
    
    def test_foo_input_under_bar_condition(self):
        """
        This test verifies that when condition bar is true, then calling `do_something()`
        with input foo results in sigma behavior
        """
        sut = SUT()
        ensure_precondition_bar(sut, param1=bar1, param2=bar2)
        output = sut.do_something(foo)
        self.assertEquals(output, sigma)

This code tests one user journey and is human readable at a glance by something who does not have in-depth understanding of SUT. We can similarly define all the other test cases with code duplication and greater readability, with negligible negative impact.

Who watches the watchmen?

colink., CC BY-SA 2.0 <https://creativecommons.org/licenses/by-sa/2.0&gt [Original Image posted to Flickr by colink. License: Creative Commons ShareAlike]

Production code has the luxury of being fine tuned, optimized, DRY’d out, and subject to all sorts of gymnastics mostly because production code is defended by tests and test code. For instance, to improve performance, if you replaced a copy with a reference, and accidentally mutated that reference inside a function, you have a unit test that can catch such unintended mutations. However, test code has no such backstop. If you introduce a bug in your test code, then only a careful inspection by a human will catch it. The upshot is the following: the less simple/obvious the test code is, the more likely it is that a bug in that test code will go undetected, at least for a while. If a buggy test is passing, then it means your production code has a bug that is undetected. Conversely, if a test fails then, it might just denote a bug in the test code. If this happens, you lose confidence in your test suite, and nothing good can come from that.

DRY code inevitably asks the reader to jump from one function to another and requires the reader to keep the previous context when navigating these functions. In other words, it increases the cognitive burden on the reader compared to straight line duplicated code. That makes it difficult to verify the correctness of the test code quickly and easily. So, when you DRY out your test code, you are increasing the odds that bugs creep into your test suite, and developers lose confidence in the tests, which in turn significantly reduces the utility if your tests.

Developer Velocity

Woman developer

Recall from the previous section that while tests might have duplicate code, they do not actually represent semantic abstractions replicated in multiple places. If you do mistake them for common semantic abstractions and DRY them out, then eventually there will an addition to the production code whose test breaks this semantic abstraction. At this point, the developer who is adding this feature will run into issues when trying to modify the existing test code to add the new test case. For instance, consider a class that is hermetic, stateless, and does not throw exceptions. It would not be surprising to organize DRY tests for this class that assumes that exceptions are never thrown. Now there is a new feature added to this class that requires an external dependency, and now can throw exceptions. Added a new test case into the DRY’d out unit test suite will not be easy or straightforward. The sunk cost fallacy associated with the existing test framework makes it more likely that the developer will try to force-fit the new test case(s) into existing framework. As a result:

  1. It slows the developer down because they now have to grok the existing test framework, think of ways in which to extend it for a use case that it was not designed for, and make those changes without breaking existing tests.
  2. Thanks to poor abstractions, you have now incurred more technical debt in your test code.

Code Reviews

Developers doing code reviews

DRY’d out tests not only impede developer velocity, they also make it less easy to review code/diffs/pull requests. This is a second order effect of DRYing out your test code. Let’s revisit the example where we are adding a new feature to an existing piece of code, and this is a pure addition in behavior (not modification to existing behavior). If the tests were not DRY’d out, then adding tests for this new feature would involve just adding new test cases, and thus, just green lines in the generated diff. In contrast, recall from the previous subsection that adding tests with DRY test code is likely going to involve modifying existing code and then adding new test cases. In the former case, reviewing the tests is much easier, and as a result, reviewing that the new feature is behaving correctly is also that much easier. Reviewing the diff in the latter case is cognitively more taxing because not only does the reviewer need to verify that the new feature is implemented correctly, they also have to verify that the changes to the test code is also correct, and is not introducing new holes for bugs to escape testing. This can significantly slow down code reviews in two ways (1) it requires more time to review the code, and (2) because it requires longer to review the code, the reviewers are more likely to delay even starting the code review.

Do not index on test coverage metrics

Coverage Chart

We live in a data driven world, and as the saying goes “[…] What is not measured, cannot be improved. […]”

What is not defined cannot be measured. What is not measured, cannot be improved. What is not improved, is always degraded.

– William Thomson Kelvin

The temptation, therefore, is to measure everything. Even the quality of your unit tests, and there where the trouble usually begins. For an detailed explanation of why indexing on the test coverage metrics is a bad idea, I highly recommend Jason Rudolph’s collection of posts on this topic here. To drive home the point more explicitly (and motivate you to actually go read Jason’s posts), here are some illustrative explanations.

There are many coverage metrics including function coverage, statement coverage, line coverage, branch coverage, condition coverage, etc. Here, we will only look at line coverage and branch coverage, because those are the most popular.

Line coverage

Let’s start with line coverage, which is the number of lines of code executed by tests vs. the total number of lines of code. The most common target for the line coverage metric is 80%. That is, 80% of your code should be executed by your tests. While that might seem like a good idea, indexing on this metric can actually take you away from good quality test coverage! How? Consider the following (contrived example).

def has_three_digits(value: int) -> bool:
    strlen = len(str(value))
    if strlen == 3:
        return True
    return False

class TestHasThreeDigits(unittest.TestCase):
    def test_has_three_digits_234(self) -> None:
        output_value = has_three_digits(234)
        self.assertTrue(output_value)

Clearly TestHasThreeDigits inadequate as a test suite for testing has_three_digits. Tests only the True case, and misses the False cases completely! The line coverage of the test suite is 3/4 = 75%. You could say that the test coverage is less than 80%, and therefore not adequate. Here, it appears that the line coverage metric does indeed point of inadequate testing. However, this confidence in the metric is severely misplaced! Consider the following refactoring of has_three_digits

def has_three_digits(value: int) -> bool:
    value_as_str = str(value)
    strlen = len(value_as_str)
    if strlen == 3:
        return True
    return False

Now, TestHasThreeDigits line coverage magically improves to 4/5 = 80%, and as per the 80% target, the metrics seems to suggest adequate coverage! In fact, you can play this game some more and refactor has_three_digits to

def has_three_digits(value: int) -> bool:
    value_as_str = str(value)
    strlen = len(value_as_str)
    return (strlen == 3)

Now, with the same test suite TestHasThreeDigits now has 100% coverage! Recall that semantically the test still do the same thing; they still test only the True case, and ignore the False case completely.

Branch coverage

An easy retort to the above example is that line coverage is not a sufficiently nuanced metric, and what you really need is branch coverage, which is the number of branches executed by the tests vs. the number of branches in the code.

Looking at the branch coverage of TestHasThreeDigits, we can see that it has a 50% branch coverage, which is inadequate. Well, here’s an easy way to improve that.

class TestHasThreeDigits(unittest.TestCase):
    def test_has_three_digits_true(self) -> None:
        true_output_value = has_three_digits(234)

    def test_has_three_digits_false(self) -> None:
        false_output_value = has_three_digits(23)

See, now the test suite has 100% branch coverage! However, not that it has no assertions at all. So, despite having 100% line and branch coverage, this test suite is completely useless! (This is a form of incidental coverage anti-pattern.)

Here is a more nuanced example:

class HasThreeDigits:
    def __init__(self) -> None:
        self.counter = 0
        
    def test(x: int) -> bool:
        self.counter += 1
        return (len(str(x)) == 3)
    
class TestHasThreeDigits(unittest.TestCase):
    def test_has_three_digits_234(self) -> None:
        output_value = has_three_digits(234)
        self.assertTrue(output_value)

    def test_has_three_digits_23(self) -> None:
        output_value = has_three_digits(23)
        self.assertFalse(output_value)

The code coverage is 100%, branch coverage is 100%. But self.counter is never verified!

Wait, there’s more!

Coverage metrics only consider the code the are under your project, and ignore all external libraries. However, your code is correct only if you are satisfying the preconditions of your external library calls, and test coverage metrics do not capture any of that. Here is an illustration with an contrived example.

from external.lib import convert_to_num

def has_three_digits(value: str) -> bool:
    v = convert_to_num(value)
    return v is not None and v > -1000 and v < 1000

The above code is expected return True if value is and integer with 3 digits. Here is test suite.

class TestHasThreeDigits(unittest.TestCase):
    def test_has_three_digits_234(self) -> None:
        output_value = has_three_digits('234')
        self.assertTrue(output_value)

    def test_has_three_digits_23(self) -> None:
        output_value = has_three_digits('23')
        self.assertFalse(output_value)

    def test_has_three_digits_minus_23(self) -> None:
        output_value = has_three_digits('-23')
        self.assertFalse(output_value)

    def test_has_three_digits_minus_234(self) -> None:
        output_value = has_three_digits('-234')
        self.assertTrue(output_value)

    def test_has_three_digits_ten_times_ten(self) -> None:
        output_value = has_three_digits('10*10')
        self.assertTrue(output_value)

The test suite looks reasonable. You line and branch coverage is a 100%, and so nothing in the metrics suggestg anything is amiss. Except that we have said nothing about how convert_to_num is implemented. It is easy to imagine some preconditions for the input to convert_to_num; for instance, it throws a ValueError exception if you pass in an input of the form 3/0. Now, you can see how the test suite is not adequate! (has_three_digits('10/0') will throw an exception). But your test coverage metrics will never be able to help here.

Beware of using patch.object to test your Python code

Software Testing

Liskov substitution principle states that a class and its subclass must be interchangeable without breaking the program. Unfortunately, Python’s patch.object breaks this principle in a big way. In fact, this can make your tests untrustworthy and become a maintenance headache with failures every time you extended your base class. Here is a contrived, but concrete example.

Say, you decide to build a special class called ImmutableList with a factory that looks as follows:

from typing import List, Sequence
class ImmutableList:
  @staticmethod
  def create_list(input: List[int]) -> "ImmutableList":
    return ImmutableList(input)

  def __init__(self, input: List[int]) -> None:
    self._inner_list = tuple(input)

  def get_inner_list(self) -> Sequence[int]:
    self._inner_list

Next, your system under test is a class SUT that uses an instance of ImmutableList as an injected dependency.

class SUT:
  def __init__(self, wrapper: ImmutableList) -> None:
    self.wrapper = wrapper

  def get_wrapper_length(self) -> int:
    return len(self.wrapper.get_inner_list())

Now, when testing SUT, say, we patch the get_inner_list() method with patch.object:

from unittest import mock

with mock.patch.object(ImmutableList, 'get_inner_list', return_value=[1, 2, 3]) as mock_method:
  sut = SUT(ImmutableList.create_list([]))
  assert sut.get_wrapper_length() == 3, "FAILURE"
  print("SUCCESS")

When you run this test, it does print SUCCESS, and therefore, works as intended.

Now, let’s say that we found a special case of ImmutableList that corresponds to a zero length list, and we implement it as follows:

class ZeroLengthImmutableList(ImmutableList):
  def __init__(self):
    super().__init__([])
  
  def get_inner_list(self) -> Sequence[int]:
    return tuple()

Next, we modify the factory method to return this ZeroLengthImmutableList, when the input is an empty list, as follows:

  @staticmethod
  def create_list(input: List[int]) -> "ImmutableList":
    if len(input) == 0:
      return ZeroLengthImmutableList()
    return ImmutableList(input)

Thus, the two classes look as follows:

class ImmutableList:
  @staticmethod
  def create_list(input: List[int]) -> "ImmutableList":
    if len(input) == 0:
      return ZeroLengthImmutableList()
    return ImmutableList(input)

  def __init__(self, input: List[int]) -> None:
    self._inner_list = input

  def get_inner_list(self) -> Sequence[int]:
    self._inner_list

class ZeroLengthImmutableList(ImmutableList):
  def __init__(self):
    super().__init__([])
  
  def get_inner_list(self) -> Sequence[int]:
    return tuple()

Now, let’s go back to our test, which is still

from unittest import mock

with mock.patch.object(ImmutableList, 'get_inner_list', return_value=[1, 2, 3]) as mock_method:
  sut = SUT(ImmutableList.create_list([]))
  assert sut.get_wrapper_length() == 3, "FAILURE"
  print("SUCCESS")

Since sut.wrapper is still an ImmutableList, by the Liskov Substitution Principle, mock.patch.object(ImmutableList, 'get_inner_list', return_value=[1, 2, 3]) should still return [1, 2, 3] when sut.get_wrapper_length(). However, this does not happen! The above test fails with

AssertionError                            Traceback (most recent call last)

<ipython-input-21-1c1b12b89ff3> in <module>()
     23 with mock.patch.object(ImmutableList, 'get_inner_list', return_value=[1, 2, 3]) as mock_method:
     24   sut = SUT(ImmutableList.create_list([]))
---> 25   assert sut.get_wrapper_length() == 3, "FAILURE"
     26   print("SUCCESS")

AssertionError: FAILURE

This forces you to change the tests every time you refactor ImmutableList.create_list to return a ‘better’ implementation of ImmutableList!

When I first realized my privilege

Oppressed

Ironically, a sure marker of privilege is not realizing your own privilege. I grew up being told that I am a Brahmin, which is the highest caste, and that it makes us superior and better than others. Unsurprisingly, I was taught that we were, in fact, an oppressed minority. The government reservations for the so-called Scheduled Castes and Scheduled Tribes were often cited as evidence of such oppression. So, naturally, I grew up knowing nothing about the privilege that I enjoyed.

For the first 23 years of my life, I was convinced that everything that came my way was hard earned, and it was despite the oppression against our community. At 23, I was working as a young software engineer in Bangalore. I needed the house deep cleaned, and a contractor said that he would send a couple of people over who would take care of it for us. I was about to get my first glimpse into the privilege that I had enjoyed all my life.

The two people that the contractor send over were completely clueless. They had no idea how to go about the job. They showed up empty handed and asked us what they should be using to clean the house. They required constant supervision and direction, and it consumed most of our time, and defeated the purpose of hiring them in the first place. By the end of the day, less than half the work was done, I was completely frustrated.

It is important to mention that these two folks’ work ethic was never in doubt. They worked as hard and diligently as you could be expected. They were from a village some hours away, quite illiterate, and desperate for work. They hadn’t seen houses in cities before, and so no idea what is involved in the upkeep of a proper house. They probably lived in shanty small houses, and this was all alien to them.

Coming back to the main story, it was close to evening, and very little of the work was done. At this point, the they said that they had to leave because if they didn’t leave right away, then they’d miss the last bus to their village and they’d have to walk home. So if we could pay them, then they will be on their way (and settle the account with the contractor later).

I was pretty upset at this point, and I told them that they hadn’t completed even half the work, and so I’ll pay them only half. They just looked at each other and simply nodded at me. They had this look of someone who has always been helpless and has resigned themselves to this fate for so long that they couldn’t imagine life any other way. I saw all of that, and but my frustration overrode that, and I gave them just half the agreed upon amount and sent them their way.

As soon as they closed the gate behind them, I felt incredibly sorry for them. It wasn’t their fault that they were not skilled. It wasn’t their fault that the contractor sent them our way. And us paying them just half the amount simply means that the contractor will take a larger cut of the money. Despite all of that, all they did was meekly nod. Next, I felt shame and guilt. Me paying them the entire amount would not make a slightest difference to my finances. I spend more going out with friends on a weekend evening.

All of this took about a minute to register, and I immediately called them back and gave them the full amount that was promised to them.

That look of helplessness and resignation stayed with me a long time. But I simply couldn’t understand it in it’s larger context. For a while, I saw this as a fault in their character that will get them swindled, and almost felt good about me not being one of the many who cheated them out of what they earned. Nonetheless, this event stayed with me, as I learned more, I kept recasting that experience with new sociological lenses. It took me years to recognize it as a natural consequence of multi-generational societal oppression, and with that recognize my own privilege.