unitizeR - Test Details

Understanding Tests

Test Outcomes

When unitize is run with a test file against an existing unitizer store, each test in the file is matched and compared to the corresponding test in the store. Here is a comprehensive list of possible outcomes:

New: a test present in the file is not in the store and needs to be reviewed to confirm it is correct.
Passed: the test matched the reference test in the store and need not be reviewed.
Failed: the evaluation of the test from the file differs from the one produced by same expression in the store.
Deleted/Removed: a test present in the unitizer store no longer exists in the test file so you will be prompted to remove it from the store.
Corrupted/Error: an error occurred while attempting to compare the file and store tests; this should occur very rarely and is likely the result of using a custom comparison function to compare the tests (see unitizer_sect for more details on custom comparison functions). Because the comparison function itself failed, unitizer has no way of knowing whether the test passed or failed; you can think of it as an NA outcome.

When reviewing tests, unitizer will group tests by test type, so you will review all new tests in one go, then the failed tests, and so on. As a result, the order that you review tests may not be the same as the order they appear in in the test file.

What Constitutes a Test?

As noted previously simple assignments are not considered tests. They are stored in the unitizer store, but you are not asked to review them, and their values are not compared to existing reference values prior to storage. The implicit assumption is that if there is an assignment the intent is to use the resulting object in some later test at which point any issues will crop up. Skipping assignment review saves some unnecessary user interaction.

You can force assignments to become tests by wrapping them in parentheses:

a <- my_fun(25)     # this is not a test
(a <- my_fun(42))   # this is a test

The actual rule unitizer uses to decide whether an expression is a test or not is whether it returns invisibly without signalling conditions. Wrapping parentheses around an expression that returns invisibly makes it visible, which is why assignments in parentheses become tests. Conversely, you can wrap an expression in invisible(...) to prevent it from being treated as a test so long as it does not signal conditions.

Recall that newly evaluated tests are matched to reference tests by deparsing the test expression. Some expressions such as strings with non-ASCII bytes (even in their escaped form) or numbers with long decimal tails will deparse differently on different systems, and thus may cause tests to fail to match. You can still use these by storing them in a variable, as the assignment step is not a test:

chr <- "hello\u044F" # this is not a test
fun_to_test(chr)     # this is a test

`unitizer` Test Components

The following aspects of a unitizer tests are recorded for future comparison:

Value.
Conditions.
Screen (stdout) output.
Message (stderr) output.
Whether the expression issued an “abort” invokeRestart (e.g. was stop called in the expression).

Currently only the first two elements are actually compared when determining whether a test passes or fails. These two should capture almost all you would care about from a unit test perspective.

Screen output is omitted from comparison because it can be caused to vary substantially by factors unrelated to source code changes (e.g. console display width). Screen output will also seem identical to the value as most of the time screen output is just the result of printing the return value of an expression. This will not be the case if the expression itself prints to stdout explicitly, or if the function returns invisibly.

Message output is omitted because all typical mechanisms for producing stderr output also produce conditions with messages embedded, so it is usually superfluous to compare them. One exception would be if an expression cated to stderr directly.

The “abort” invokeRestart is omitted because it generally is implied by the presence of an error condition and actively monitoring it clutters the diagnostic messaging produced by unitizer. It exists because it is possible to signal a “stop” condition without actually triggering the “abort” restart so in some cases it could come in handy.

While we omit the last three components from comparison, this is just default behavior. You can change this by using the compare argument for unitizer_sect.

Sections

`untizer_sect`

Often it is useful to group tests in sections for the sake of documentation and clarity. Here is a slghtly modified version of the original demo file with sections:

unitizer_sect("Basic Tests", {
  library(unitizer.fastlm)
  x <- 1:10
  y <- x ^ 3
  res <- fastlm(x, y)

  get_slope(res)
})

unitizer_sect("Advanced Tests", {
  2 * get_slope(res) + get_intercept(res)
  get_rsq(res)
})

Now re-running unitizer segments everything by section (note, first few lines are set-up):

(.unitizer.fastlm <- copy_fastlm_to_tmpdir())
update_fastlm(.unitizer.fastlm, version="0.1.2")
install.packages(.unitizer.fastlm, repos=NULL, type='src', quiet=TRUE)
unitize(file.path(.unitizer.fastlm, "tests", "unitizer", "unitizer.fastlm.R"))

+------------------------------------------------------------------------------+
| unitizer for: tests/unitizer/unitizer.fastlm.R                               |
+------------------------------------------------------------------------------+

                    Pass Fail  New
 1.    Basic Tests     -    -    1
 2. Advanced Tests     -    -    2
..................................
                       -    -    3

If there are tests that require reviewing, each section will be reviewed in turn.

Note that unitizer_sect does not create separate evaluation environments for each section. Any created object will be available to all lexically subsequent tests, regardless of whether they are in the same section or not. Additionally on.exit expressions in unitizer_sect are evaluated immediately, not on exit.

It is possible to have nested sections, though at this point in time unitizer only explicitly reports information at the outermost section level.

Controlling Test Comparison

By default tested components (values and conditions) are compared with all.eq, a wrapper around all.equal that returns FALSE on inequality instead of a character description of the inequality. If you want to override the function used for value comparisons it is as simple as creating a new section for the tests you want to compare differently and use the compare argument:

unitizer_sect("Accessor Functions", compare=identical,
  {
    get_slope(res)
    get_rsq(res)
    get_intercept(res)
} )

The values produced by these three tests will be compared using identical instead of all.eq. If you want to modify how other components of the test are compared, then you can pass a unitizerItemTestsFuns object as the value to the compare argument instead of a function:

unitizer_sect("Accessor Functions",
  compare=unitizerItemTestsFuns(
    value=identical,
    output=all.equal,
    message=identical
  ),
  {
    get_slope(res)
    get_rsq(res)
    get_intercept(res)
} )

This will cause the value of tests to be compared with identical, the screen output with all.equal, and messages (stderr) with identical.

If you want to change the comparison function for conditions, keep in mind that what you are comparing are conditionList objects so this is not straightforward (see getMethod("all.equal", "conditionList")). In the future we might expose a better interface for custom comparison functions for conditions (see issue #32).

If you need to have different comparison functions within a section, use nested sections. While unitizer will only report the outermost section metrics in top-level summaries, the specified comparison functions will be used for each nested section.

Special Semantics

Almost Like `source`

When unitizer runs the test expressions in a test file it does more than just evaluating each in sequence. As a result there are some slight differences in semantics relative to using source. We discuss the most obvious ones here.

`on.exit`

Each top-level statement statement, or top-level statement within a unitizer_sect (e.g. anything considered a test), is evaluated directly with eval in its own environment. This means any on.exit expressions will be executed when the top-level expression that defines them is done executing. For example, it is not possible to set an on.exit(...) for an entire unitizer_sect() block, although it is possible to set it for a single sub-expression:

unitizer_sect('on.exit example', {
  d <- c <- b <- 1
  on.exit(b <- 2)
  b                  # == 2!
  {
    on.exit(d <- c <- 3)
    c                # Still 1
  }
  d                  # == 3
}

Evaluation Environments

Each test is evaluated in its own environment, which has for enclosure the environment of the prior test. This means that a test has access to all the objects created/used by earlier tests, but not objects created/used by subsequent tests. See the Reproducible Tests Vignette for more details.

Options and Streams

In order to properly capture output, unitizer will modify streams and options. In particular, it will do the following:

Temporarily set options(warn=1L) during expression evaluation.
Temporarily set options(error=NULL) during expression evaluation.
Use sink() to capture any output to stdout.
Use sink(type="message") to capture output to stderr.

This should all be transparent to the user, unless the user is also attempting to modify these settings in the test expressions. The problematic interaction are around the options function. If the user sets options(warn=1) with the hopes that setting will persist beyond the execution of the test scripts, that will not happen. If the user sets options(error=recover) or some such in a test expression, and that expression throws an error, you will be thrown into recovery mode with no visibility of stderr or stdout, which will make for pretty challenging debugging. Similarly, unitizeing debugged functions, or interactive functions, is unlikely to work well.

You should be able to use options(warn=2) and options(error=recover) from the interactive unitizer prompt.

If unitize is run with sdtderr or stdout sunk, then it will subvert the sink during test evaluation and reset it to the same sinks on exit. If a test expression sinks either stream, unitizer will stop capturing output from that point on until the end of the test file. At that point, it will attempt to reset the sinks to what they were when unitizer started. Sometimes this is not actually possible. If such a situation occurs, unitizer will release all sinks to try to avoid a situation where control is returned to the user with output streams still captured.

To reduce the odds of storing massive and mostly useless stdout, unitize limits how much output is stored by default. If you exceed the limit you will be warned. You may modify this setting with options("unitizer.max.capture.chars").

Other Details

Matching Tests

Whenever you re-run unitize on a file that has already been unitized, unitizer matches the expressions in that file to those stored in the corresponding unitizer store. unitizer matches only on the deparsed expression, and does not care at all where in the file the expression occurs. If multiple identical expressions exist in a file they will be matched in the order they show up.

The unitizer_sect in which a test was when it was first unitized has no bearing whatsoever on matching a new test to a reference test. For example, if a particular test was in “Section A” when it was first unitized, but in the current version of the test file it is in “Section X”, that test will be matched to the current one in “Section X”.

Some expressions may deparse differently on different systems or with different settings (e.g. numbers with decimal places, non-ASCII characters) so tests containing them may not match correctly across them. See the Introductory Vignette for how to avoid problems with this.

Commenting Tests

unitizer parses the comments in the test files and attaches them to the test that they document. Comments are attached to tests if they are on the same line as the test, or in the lines between a test and the previous test. Comments are displayed with the test expression during the interactive review mode. Comment parsing is done on a “best-efforts” basis; it may miss some comments, or even fail to work entirely.