Enhance testing experience #136

x4lldux · 2019-11-29T18:46:18Z

Sorry for a big dump, but I didn't want to create a bunch of smaller but codependent PRs. Each commit message has some more description in it so please look there also.

The overall motive behind it is to make testing of PropCheck simpler and easier. As mentioned in #130, running tests is not straight forward:

some tests need to be disabled by default,
tests are very noisy with a lot of debug information being printed out,
some tests are flaky or even unstable,
they are quite slow.

Dispatching tests

I've disabled unstable and external tests by default and added two new aliases:

test_ext for running external tests,
tests for running external and normal tests. CI uses this with --cover and --trace arguments like before.

Noisy tests

Printing of debug information is now controller by DEBUG and VERBOSE environment variables. To do that, I've introduced a new case template PropCheck.TestCase which imports few debug printing functions and sets default options for each module. From now on all tests need to use PropCheck.TestCase instead of ExUnit.Case (not counting external tests).
The only thing that is still creating noise are doctest, but changing that would either make them less comprehensible or need copying doctests to own separate tests.

Flaky tests

I've isolated 2 flaky tests and increased the number of tests or search steps to increase the probability that the generated examples had the "correct" path in them (both of them were checking for at least one correct example). I've been increasing the number of numtests/search_steps then running each test 3000 times to check if they fail, until I was not able make them fail.

Slow tests

I've reduce the numtests to 33 and search steps to 333, and introduced NUMTESTS and SEARCH_STEPS env variables, that control those options (via PropCheck.TestCase). Some tests with hard coded number of tests, are now scaled proportionally. Additionally, most tests are now run asynchronously. This reduced time of mix test from ~28 seconds (on my computer), to ~3.7 seconds. Running external and regular tests takes now around 12 seconds.
To offset the reduced numbers, CI builds have them increased to 200 and 1000 (for numtestsand search_steps). CI also disables asynchronous test by using --trace.

alfert · 2019-12-01T21:36:00Z

Wow, this is quite a mouthful. I took a quick glance at it and I really like most of it. Thank you! 💚

Beside the conflict with .travis.yml, the handling of the environment variables should be changed and follow along the implementation started in #126. So please add a PROPCHECK_ prefix and check, if you can reuse (or refactor) the existing solution for handling configurations. Verbosity is similar to exception handling, so I suggest to apply the same approach.

Please do a rebase to the current state of master!

alfert · 2019-12-02T06:42:52Z

Another thought: Are those debug output functions intended to be used by regular PropCheck users? If yes, then we need to document them and explain how/why to use them. For the latter, I would guess this approach:

use the new functions for any kind of output
if the test fails, do a rerun with DEBUG enabled, then the recorded counter-examples are run together with their output shown

However, this approach fails for measure, collect and the like, since they expect a working property. Here, the DEBUG means checking if the property is defined well, it is not about finding / identifying bugs in the system under text.

alfert · 2019-12-02T15:51:16Z

And something else: I like the various test targets. Please document them in the README.md.

x4lldux · 2019-12-02T20:42:53Z

So please add a PROPCHECK_ prefix

Will do!

handling of the environment variables should be changed and follow along the implementation started in #126

I deliberately chose to implement that based on #114, because it was just easier to do and also I think we need to tackle configuration in a separate issue. I'd leave that as is, and change it once we unify configuration. Right now there are 5 ways to pass options to PropEr from PropCheck - three of which are somewhat "global".

Are those debug output functions intended to be used by regular PropCheck users?

No. It's just for testing PropCheck itself. PropCheck.TestCase is also just for tests meant to test PropCheck itself.

Here, the DEBUG means checking if the property is defined well, it is not about finding / identifying bugs in the system under text.

Right, DEBUG-output is used to inspect the property, so in most cases it's not useful to have it on.

And something else: I like the various test targets. Please document them in the README.md.

Good point! I should have thought about that!

alfert · 2019-12-02T20:59:21Z

PropCheck.TestCase is also just for tests meant to test PropCheck itself.

Ahh, interesting. I did not see this. We should state that explicitly in the docs. This goes also for the
the debug functions. The reason behind is that the test cases are also examples how to use PropCheck. Using then functions not supposed to be used by ordinary users requires at least an explanation. Your writing above about the noisy tests is a good rationale that should find its place in the documentation of PropCheck.TestCase.

we need to tackle configuration in a separate issue

Valid point. I open a feature request for that.

evnu · 2019-12-02T21:25:34Z

We should state that explicitly in the docs.

Maybe it would make sense to mark the module with @moduledoc false and state its function in a regular comment. It should then not be part of the documentation.

x4lldux · 2019-12-02T21:51:14Z

It should then not be part of the documentation.

ExDocs won't generate docs from test/support path unless MIX_ENV is set to test. It's also marked as only: :dev in deps, so that wouldn't even work.
That said, I'd leave it as a docs, but improve the docs and add note to README.

alfert · 2019-12-03T08:48:40Z

I assume the conflict with .travis.yml disappears after rebasing.

x4lldux · 2019-12-03T12:43:36Z

Ahh, interesting. I did not see this. We should state that explicitly in the docs. This goes also for the
the debug functions. The reason behind is that the test cases are also examples how to use PropCheck. Using then functions not supposed to be used by ordinary users requires at least an explanation.

That's a very good point! We should consider whether to choose between new user's experience and our convenience. Maybe it would be better pick the former and to undo PropCheck.TestCase and explicitly replace that in each test module with something like:

use PropCheck, default_opts: &PropCheck.TestHelpers.config/0
import PropCheck.TestHelpers, except: [config: 0]

Where PropCheck.TestHelpers would contain those debug_* functions and default config which currently is in PropCheck.DefaultOpts. Maybe the need for default_opts: &PropCheck.TestHelpers.config/0 will disappear altogether once #138 is closed.

Or we could also extract those examples to a separate examples/ directory and just not run them during testing and leave our tests to just check the library, not to explain the use cases.

alfert · 2019-12-04T05:47:05Z

Or we could also extract those examples to a separate examples/ directory and just not run them during testing and leave our tests to just check the library, not to explain the use cases.

At least we must run it regularly to ensure that it is still valid code. From that perspective it is still test code, but not optimised for our internal development processes.

Removing PropCheck.TestCase does not help, since we would still use the debug functions, which are not supposed to be used by (new) users. What about setting the verbosity more aggressively to false?

x4lldux · 2019-12-04T08:31:11Z

The simplest solution would be to just comment out any collect/3, measure/3 and aggregate/3 functions (and just leave debug/1 and scale_*/2,3 functions from TestHelpers). I don't think they are needed on day-to-day test running.

Running tests in PropCheck isn't exactly easy. There are few tags tests that are intended to fail and some which are unstable, additionally there are few external tests that need be run form a bash shell. So simply doing `mix test` doesn't work and will just confuse users. This commit simplifies that by introducing new two Mix alias: * `test_ext` which runs external tests, * `tests` which runs both external and regular tests (limited only to reliable tests).

To accomplish that, this commit, replaces most printing with `debug/1` function and comment out unnecessary printouts (like `collect/3` or `aggregate/3`). Printing of debug information can now be controlled by setting `PROPCHECK_DEBUG` environment variable. Setting `PROPCHECK_DEBUG` to `1` or `TRUE` will enable printing of debug statements in the tests itself. In CI environment, both variables are set to preserver current output state.

x4lldux · 2019-12-10T15:41:16Z

Strange. All of my travis builds were OK while here all but one of the builds, which failed with erl_crash.dump. Probably memory consumption got too big when creating a tree in that last test. @alfert can you rerun the build?

If this will keep happening, search_steps will need to be adjusted.

alfert · 2019-12-11T10:51:42Z

Rerunning the branch on my notebook works smoothly (mix tests). But on travis, it fails while allocating 3.9 GB of heap, which is quite a lot. Travis provides "only" 7.5 GB RAM in their machines, so we could really run against a wall here.

I would suggest to adjust the search_steps.

Default number of tests is reduced to 33 and tests which had hard coded number of tests are scaled based on their relation to PropEr's default of 100. That is, if test had hardcoded `numtests: 10`, it's now `scale_numtests(0.1)` because 10 is 0.1 of 100. This allows to keep the proportions of individual tests while globally controlling number of tests via `PROPCHECK_NUMTESTS` environment variable. The same was done to `search_steps`, but is controlled by `PROPCHECK_SEARCH_STEPS` environment variable and scaled by `scale_search_steps` function. This is meant to reduce time wait for completing tests on day-to-day development. To compensate for the potentially missed opportunities to generate failing examples on day-to-day development, number of tests is increased on CI.

Tests involving PingPong and Movie servers cannot be run safely in parallel, so only one in each group was chosen to be run asynchronously.

Increasing the number of tests increases the chances of find the correct path.

x4lldux · 2019-12-11T11:19:58Z

I've modified .travis.yml and explicitly set PROPCHECK_SEARCH_STEPS to 1000, which is the default value, and PROPCHECK_NUMTESTS to 200, twice the default. If that doesn't help, then this particular tests will have to be altered.

But, the VM I'm using to develop it has "only" 5.9GB RAM and half of it is used by firefox and emacs, and I didn't had any issue like that. I think the probabilistic nature of this tests will generate enormous structures on "rare" occasions. Will see how rare.

alfert mentioned this pull request Dec 2, 2019

Handle configuration in a consistent manner #138

Open

x4lldux force-pushed the saner_tests branch 2 times, most recently from e17fb88 to 4512fcf Compare December 10, 2019 14:32

x4lldux added 2 commits December 10, 2019 16:01

x4lldux force-pushed the saner_tests branch from 4512fcf to f9cc750 Compare December 10, 2019 15:01

x4lldux added 4 commits December 11, 2019 12:07

Make most of test run asynchronously

c3e56ba

Tests involving PingPong and Movie servers cannot be run safely in parallel, so only one in each group was chosen to be run asynchronously.

Fix flaky tests

74ad500

Increasing the number of tests increases the chances of find the correct path.

Reduce noise in external tests

ecb4fac

x4lldux force-pushed the saner_tests branch from f9cc750 to ecb4fac Compare December 11, 2019 11:07

alfert merged commit 4725c0f into alfert:master Dec 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance testing experience #136

Enhance testing experience #136

x4lldux commented Nov 29, 2019 •

edited

Loading

alfert commented Dec 1, 2019

alfert commented Dec 2, 2019

alfert commented Dec 2, 2019

x4lldux commented Dec 2, 2019

alfert commented Dec 2, 2019

evnu commented Dec 2, 2019

x4lldux commented Dec 2, 2019

alfert commented Dec 3, 2019

x4lldux commented Dec 3, 2019 •

edited

Loading

alfert commented Dec 4, 2019 •

edited

Loading

x4lldux commented Dec 4, 2019 •

edited

Loading

x4lldux commented Dec 10, 2019

alfert commented Dec 11, 2019

x4lldux commented Dec 11, 2019

Enhance testing experience #136

Enhance testing experience #136

Conversation

x4lldux commented Nov 29, 2019 • edited Loading

Dispatching tests

Noisy tests

Flaky tests

Slow tests

alfert commented Dec 1, 2019

alfert commented Dec 2, 2019

alfert commented Dec 2, 2019

x4lldux commented Dec 2, 2019

alfert commented Dec 2, 2019

evnu commented Dec 2, 2019

x4lldux commented Dec 2, 2019

alfert commented Dec 3, 2019

x4lldux commented Dec 3, 2019 • edited Loading

alfert commented Dec 4, 2019 • edited Loading

x4lldux commented Dec 4, 2019 • edited Loading

x4lldux commented Dec 10, 2019

alfert commented Dec 11, 2019

x4lldux commented Dec 11, 2019

x4lldux commented Nov 29, 2019 •

edited

Loading

x4lldux commented Dec 3, 2019 •

edited

Loading

alfert commented Dec 4, 2019 •

edited

Loading

x4lldux commented Dec 4, 2019 •

edited

Loading