July 20, 2015
Related Prior ksummit-discuss Threads:
Additional Participants: Alexey Dobriyan, Andy Lutomirski, Dan Carpenter, Fengguang Wu, Geert Uytterhoeven, Guenter Roeck, Jiri Kosina, Josh Boyer, Julia Lawall, Kees Cook, Kevin Hilman, Mel Gorman, Michael Ellerman, Peter Hüwe, Shuah Khan, and Steven Rostedt.
People tagged: Geert Uytterhoeven, Grant Likely, Kevin Hilman, Stephen Rothwell, and Wolfram Sang.
Given the high-volume discussions in prior years, one might assume that there was nothing new to add. However, the topic of testing does appear to have progressed. This is a good thing, especially if it has managed to progress enough to keep up with the bugs.
Mark Brown noted that the topic of testing is usually covered, but suggested that additional discussion would be helpful in making people aware of what is available and also in working out what additional testing would be useful. Mark called out Shuah Khan's kselftest development, Fengguang Wu's 0day test robot, and kernelci.org. Mark suggests that further progress could be made by pooling resources and by upstreaming Kconfig fragments that are designed to support testing. Mark also suggested that defconfigs be slimmed down in favor of using Kconfig fragments. Finally, Mark called out the perennial topic of what additional testing would be useful. Mark suggested that this discussion take the form of a workshop session combined with a core-day readout. Shuah Khan volunteered to organize these sessions, though primarily focused on kselftest.
In response to Mark's Kconfig-fragment suggestion, Alexey Dobriyan argued that this would result in everyone testing with the same .configs, which could actually decrease test coverage. Mark replied that although that might happen, a big benefit of Kconfig fragments would be lowering the barriers to new people joining the testing efforts.
Steven Rostedt took this as his cue to argue that tests should have three results instead of two, with UNSUPPORTED joining the traditional PASS and FAIL. A test that (for example) attempts to use ftrace in a kernel that does not have ftrace enabled should result in UNSUPPORTED. As an alternative to Kconfig fragments, Steven suggested a central repository of working .config files, complete with documentation on what hardware or filesystem is required to support a given test. Mark agreed, but noted that current tests are supposed to call out the unsupported case, and agreed that easing the job of running tests was central to the discussion. Kees Cook pointed out that a given test might need a particular sysctl setting or privilege level as well, and that it would be good to record that information in machine-readable form to enable automated configuration checking. Andy Lutomirski noted that the topic of machine-readable output was discussed at the 2014 Linux Kernel Summit, but is not aware of it having been implemented. Mark replied that the 2014 discussion was focused on very basic PASS/FAIL critiria, and that it might be time to look at extending the automation.
Masami Hiramatsu
would like selftesting to be extended to the tools in the tools/
directory, instead of focusing only on the kernel itself.
Masami also points out that people can get working .config files on
many systems via /proc/kconfig.gz
.
Steven Rostedt
still likes the idea of having per-test .config files, possibly with an
“all tests” .config file that covers all tests.
Guenter Roeck
would like more shared configurations, especially working configurations
for qemu.
Guenter stated that he does builds for various configurations for
a number of architectures, but has no idea whether his chosen
configurations are relevant.
Mark Brown
listed Olof's build-and-boot test, Dan's running smatch and other
static-analysis tools, Coverity, and his own builder, in addition
to the kselftest, 0day test robot, and
kernelci.org
(which
Michael Ellerman
and
Guenter Roeck
call out as the closest thing to a proper kernel continuous-integration
setup)
efforts that
he called out in his initial posting.
Other tests called out include:
Dan Carpenter wondered if the 0day test robot had obsoleted Build Regression, and Geert replied that he believed that kisskb builds more and different configurations, but confirmed that people mostly ignore the resulting emails. Jiri Kosina speculated that the reason for them being ignored is that the emails are one huge report, so that it is difficult to tell if your code is responsible for a given failure, and that they are sent directly to LKML without CCing people of interest. Josh Boyer agreed that although LKML is a great archival mechanism, it is now useless as a general form of communication. Josh believes that this is the case for bug reports in general, not just Build Regression emails. Kevin Hilman asked how quickly build results are available after a branch is pushed and whether build artifacts are available. Kevin is interested in reducing overall computational load by having kernelci.org consume build artifacts produced by others. It turns out that that kernelci.org has an interface for consuming a JSON file requesting that a given build be tested. Guenter Roeck replied that the build-result latency depends on the load on the system and the branch in question, but that the typical time is anywhere between two hours and about three days. Guenter does not currently post build products, but could in theory do so. In practice, this will require about 100GB storage somewhere out on the web, as well as a lot of bandwidth. Guenter used to do JSON, but had to curtail this due to the high CPU load from the resulting JSON requests. Kevin feels Guenter's storage pain, given that kernelci.org uses about 400GB for 45 days worth of builds, boots, and tests. Michael Ellerman noted that some corporations' lawyers were less than happy with the thought of distributing binaries on an ongoing basis. Guenter had not considered the legal aspect, but believes that gaining approval should be possible given reasonable lawyers.
Mark Brown
noted that the 0day test robot did not suffer from Build Regression's
issues because the 0day test robot sends the errors directly to the
people mentioned in the offending commit.
That said, 0day test robot's notifications are one-offs: For many classes
of problems, if the problem persists, no additional notifications are
sent.
Mark
notes that the regular “all the issues” email from
Build Regression helps keep these issues on the front burner.
Fengguang Wu
stated that the 0day test robot tests randconfig and
all(yes|no|def|mod)config, and that it also builds anything that
it finds in arch/*/configs
.
However, to maintain the one-hour response-time goal, only about 10%
of them are run immediately, with the remaining 90% tested as time
permits.
The 0day test robot now tests
543 git trees,
and Fengguang is always interested in adding more.
Fengguang
agreed that the 0day test robot can sometimes fail to catch errors,
for example, due to:
Fengguang added that heavy load can increase latency, but should not result in anything being lost. He also noted that you can request a full build report by dropping him an email. [ Ed. note: I subscribe to the full build report for RCU, and it can be quite helpful. ] Mark Brown suspected that he was noting latency, as sometimes the problem was fixed in -next before 0day reported it. Mark would also prefer to get build results on demand rather than being emailed them. Fengguang Wu noted that 0day prioritizes -next testing at a lower priority, and said that he would increase that priority due to the interest in it. He also asked that people send him reports when errors are missed or subjected to excessive delays so that he can fix any problems. Fengguang explained that the report-once logic only suppresses re-notification for ten days, so that build errors will eventually be re-reported if not fixed. Finally, Fengguang suggested that Mark use procmail to direct the full build reports to a local mbox and then check it on demand. Mark objected that this procmail approach wouldn't make the errors go away when fixed, to which Fengguang suggested just checking the most recent full build report, which would contain all the latest information.
Mark Brown also noted that Grant Likely and Kevin Hillman have been working on testing qemu, including working configuations. Julia Lawall added that the 0day test robot runs Coccinelle, and that Coccinelle can run correctly and find problems even if the build fails. Shuah Khan asked whether there is interest in including kselftest in qemu boot tests (to which Guenter Roeck replied in the affirmative), and noted that a “quicktest” option was added in 4.2 in order to meet qemu time constraints.
Alexey Dobriyan
argues that the confusion about testing tools happens because the
testsuite directories are hidden under tools/
(in tools/testing/selftests/
),
and would like the directory containing self-tests to instead appear as a
top-level directory, with make test
running its tests, rather than
the current make kselftest
.
[ Ed. note: There appears to be some disagreement over what the various
.config files would contain and what variety of hardware the tests are
expected to run on automatically. ]
Shuah Khan
questioned the usefulness of relocating selftests/
,
and suggested that ktest could handle the required .config files.
Alexey Dobriyan
said that his goal was visibility rather than usability, calling
out git's t/
as an example of good visibility.
Steven Rostedt
is willing to make ktest run the kselftests, but believes that some other
way of handling .config files is needed, especially in the case where
ktest is not being run from within a git repository.
Shuah
thanked Steven for the clarification, and pointed out that a few of the
kselftests depend on specific kernel configurations, and that kselftest
simply exits gracefully if the proper configuration is not in place.
So kselftest test as much as it can given the kernel configuration, and
relies on the user to build the kernel appropriately.
Guenter Roeck suggested that listing all of the testing efforts would be helpful. Guenter also argued that simply providing test results on the web is insufficient, because people don't go looking. Guenter believes that even email notification is insufficient, arguing that the “Build regression” emails sent by Geert are widely ignored. Guenter instead argues that new problems should be automatically bisected, with the patch author and maintainer being notified. Mark Brown agreed with the need for bisection, calling it out as a necessary aspect of interpretation of test results (and later noting that his own builder's failure reports are “lovingly hand crafted using traditional artisan techniques”). Mark also pointed out that testers need to actively push and to test against ongoing development. Prompt analysis and notification is something that Mark believes the 0day test robot gets right. [ Ed. note: Agreed! ] Finally, Mark pointed out that the more we come to rely on automated test, the greater the impact of bugs that break the build or prevent successful boot. Guenter Roeck agreed, calling out 4.1 as being particularly bad in the build-and-boot department, and wonders if automatic revert of broken patches will be needed to minimize the impact. Mark Brown suggested that this impact needs discussing and that automated reversion is one solution, but that Stephen Rothwell's carrying of fixup patches in -next is another possible approach.