Stephen Darlington
Stephen Darlington

In Part I of this series, we looked at how to evaluate a new algorithm using BenchmarkDotNet. Benchmarking allowed us to test the execution time and memory usage of a new code path in isolation. Sometimes that’s not enough. Say you’re moving your database from an RDBMS to a NoSQL database or decomposing a monolith into a suite of microservices. An important thing to experiment with before going live is how your new system is going to handle production loads. There is nothing worse than flipping the switch and having your whole new system crash down around your ears. You also want to ensure that the results that your new code (or system) is returning match the results your old system was returning. But how do you do that? Your friendly neighborhood Github has a library that can help you achieve both of these goals: Scientist.

The goal of Scientist is to be able to run an experiment while not interfering with your production system. It allows you to execute both your current code AND a new code path for every request your system handles. With Scientist in play, your customers continue along their merry way while you collect data about your change. Be careful, though. Scientist runs things synchronously by default, so unless explicitly run things asynchronously, your customers will end up waiting until both your original code and your refactor complete before they get their result.

Github initially wrote Scientist in Ruby, but there are ports to many popular languages. For our experiment today, we’re using the .NET port. Let’s jump into some code.

Running an Experiment

For demonstration purposes, we’re going to continue with our string manipulation example from Part I of this series. It’s a trivial example, but gives us some continuity. The first thing we’re going to do is refactor our code a bit. Rather than having two public methods, we’re going to have two private methods to perform our string operation. The public GiveMeAnA will call our classic implementation. When complete, our code will look like this:

Now that our code is set up, it’s time to add the Nuget package: Install-Package Scientist. Next, we set up our experiment:

The first line of our method sets up a new Science that returns a string. We’re giving it a name of “gimme-a” so we can find these results in our log. After that, we set up Use and Try blocks. The Use statement goes around your original code block. You can put your code directly in the lambda, but we extracted the functionality into another method in our example for cleanliness. The Try statement is for the code you want to evaluate. The name is optional, but very helpful if you want to run multiple experiments in this single block of code. Note: I’m running this synchronously. As mentioned above, you’ll want to do async in production code.

That’s all you need to do to set up an experiment.

Science performs the following actions when GiveMeAnA() is invoked: It decides whether or not to run the Try block, Randomizes the order in which Use and Try blocks are run, Measures the durations of all behaviors, Compares the result of Try to the result of Use, Swallows (but records) any exceptions raised in the Try block, Publishes all this information, and Returns the result of the Use block to the caller.

That’s all well and good, but what good is it running experiments if you don’t examine the results? That’s where the results publisher comes in.

Checking Our Results

As all good scientists know, you need to evaluate the data that comes out of your experiments. Science publishes its results to memory be default, so you need to implement the IResultPublisher interface to write to a log file or database. The interface has a Publish method that gets invoked whenever a Science has completed its execution. This method takes in Result<T, TClean> result as a parameter, which contains the result of our experiment. Here’s a simple example that writes our results out to our default logger:

public Task Publish<T, TClean>(Result<T, TClean> result)
    _logger.LogInformation($"Publishing results for experiment '{result.ExperimentName}'");
    _logger.LogInformation($"Result: {(result.Matched ? "MATCH" : "MISMATCH")}");
    _logger.LogInformation($"Control value: {result.Control.Value}");
    _logger.LogInformation($"Control duration: {result.Control.Duration}");

    foreach (var observation in result.Candidates)
        _logger.LogInformation($"Candidate name: {observation.Name}");
        _logger.LogInformation($"Candidate value: {observation.Value}");
        _logger.LogInformation($"Candidate duration: {observation.Duration}");

    if (result.Mismatched)
        // if the answers of our methods don't match, do something like log to db

    return Task.FromResult(0);

As the name implies, result.Matched tells you whether or not the results of your experiment matched the results of your control. When they don’t match, you’ll more than likely want to log the result plus any additional data about the requests so you can examine the behavior before making the new code live. Scientist allows you to add as much context as you want to the experiment, so you can log inputs and user details or other information needed to reproduce the request.

Finally, wherever your setup code runs (like for dependency injection or environment setup), add your ResultPublisher to Science: Science.ResultPublisher = new ExperimentPublisher();. With this data in hand, you can run both your current and new systems in parallel in production for as long as you want while you tweak the new system to go live. When you’re ready to flip the switch, all you need to do is remove the Science code and have your application execute the refactored code.


In Part I of this series, we looked at how to compare two pieces of code in isolation. In Part II of this series, we looked at how to run current and refactored code in production at the same time. Using Scientist allows you to validate your new solution both for correctness of results and for its ability to handle true production usage. By using both of these techniques, you can deploy large refactors into production with higher confidence that you haven’t broken anything.

Now go out there and refactor something!

The full code for this article’s example is available here.