Li Zefan: Stable issues

May 4, 2014

Participants: Andy Lutomirski, Ben Hutchings, Dan Carpenter, Daniel Vetter, Davidlohr Bueso, Greg KH, Guenter Roeck, Jan Kara, Jason Cooper, Josh Boyer, Laurent Pinchart, Li Zefan, Masami Hiramatsu, Matt Fleming, Michal Simek, Olof Johansson, Steven Rostedt, Takashi Iwai, and Theodore Ts'o.

People tagged: Ben Hutchings, Dave Jones, Fengguang Wu, Greg KH, Jiri Slaby, and Willy Tarreau.

Li Zefan raised a number of questions and issues with -stable:

Are there too many long-term-stable kernels? There are currently five (2.6.32, 3.2, 3.4, 3.10, and 3.12), which means that maintainers might need to prepare patches for all five of them.
Does Greg need a sub-maintainer? There are a number of patches that were backported to 3.2 that have not been backported to Greg's 3.4 long-term-stable kernel. This difference appears to be that Ben is willing to invest significant time into manually backporting patches that do not apply cleanly, a task that Greg does not have time to undertake.
Are maintainers doing a good job of tagging patches for -stable?" Li feels that they might not be, based a manual scan of patches against kernel/sched/rt.c.
Should stable trees maintain a known_issues.txt file? This file could track things like patches that did not apply, allowing the consumer of the stable tree to take appropriate action.
Do stable kernels need more testing? Automated testing (for example, Dave Jones's Trinity and Fengguang Wu's 0day) seems to focus on mainline, with little automated testing of the stable trees.

Too many LTS kernels?

Josh Boyer pointed out that Dave Miller is under no obligation to prepare five sets of pull requests, but that he is glad that Dave chooses to do so. Li Zefan agreed, but pointed out that many other maintainers do not follow Dave Miller's example, hence the idea of having fewer long-term stable kernels. Theodore Ts'o suggests that additional coordination between those consuming stable trees might greatly reduce the work required, for example, announcing the kernel version ahead of time. However, he points out that the choice of kernel is their decision, and this decision should include the work required to interact with the corresponding stable kernel. In addition, some consumers will backport selected major features as well as bug fixes. Li Zefan asked if instead of consumers announcing their intent ahead of time, the choice of version to provide long-term stable support for might be announced earlier, noting that large infrastructural changes might then be scheduled to fit in well with the choice of long-term stable versions. Jan Kara worries that declaring a long-term stable version ahead of time might cause people to cram a bunch of immature code into it, with these people figuring that the mess could “simply” be cleaned up over time. Greg KH seconded Jan's concerns about cramming immature code, but said that almost all companies are willing to share their LTS desired privately. However, Greg notes that he cannot please all the people all the time because not all release schedules are going to align nicely. Steven Rostedt suggests a bait-and-switch approach, where 3.x is announced as the new LTS tree, but at the last minute it is changed to 3.x-1. That way, the immature code would miss the LTS tree. Guenter Roeck indicated that people would catch on to this sort of subterfuge very quickly. Steven Rostedt agreed that people would game whatever system was set up. Takashi Iwai agrees with Jan on the risk, but points out that making a big change just after a long-term stable release makes backporting fixes more difficult, which prompted Jason Cooper to ask for an example. Jason pointed out that git is pretty good about tracking renames, and that in the worst case, the code can always be manually backported. Takashi Iwai said that problems arose when a file was split, requiring manual handling of backports. Although manual backporting is always possible, Takashi would like to avoid this additional burden on maintainers.

Equip Greg with a sub-maintainer?

Josh Boyer noted that Greg seems quite happy to have help with backporting and to get reports of what distros are carrying on top of a stable release. Josh asked if there should be more formal requirements placed on distros that base off of a given stable tree. Li Zefan hopes that we can do better than the current situation, which results in hundreds of missing fixes, hence the suggestion of having someone help Greg with backporting.

Greg KH said that he was not only happy to have people help him, but also to have people come up with new ways of helping him. He also said that the 3.4 tree was a bit behind due to his recent travel.

Maintainers tagging adequately for -stable?"

Josh Boyer believes that this is a real problem, and would like to see people working through each subsystem identifying patches for -stable. Li Zefan prefers that subsystem maintainers tag fixes properly, arguing that digging through git logs for -stable candidates is neither fun nor productive.

Add a known_issues.txt?

Josh Boyer asked if theis file was to be per-stable-tree or per-subsystem. Li Zefan gave an example NFS oops in 3.4.x that required significant time to resolve. Having a known-issues file might motivate people to do the needed backporting, or at least alert them to possible issues.

Testing stable kernels

Guenter Roeck would love to do -stable testing, but does not have time to set it up. Josh Boyer believes that most regressions in the -stable trees are due to problems that trinity and 0day are not yet set up to find, limiting the usefulness of running trinity and 0day on the -stable trees. Guenter Roeck argued that there is a significant number of regressions due to missing patches and bad or incomplete backports, and that these could be caught by trinity and 0day. Josh Boyer agreed that running them would be of some help, but that the benefit would be quite limited compared to those gained by running them on the upstream kernel. Li Zefan agreed with Josh, noting that if trinity and 0day could give as much benefit to the stable trees as they do to upstream, the stable trees would not deserve the word “stable” Guenter Roeck agreed that the stable trees were more stable than mainline, but said that he finds quite a few build failures, and believes that he would find runtime errors if his tests extended beyond simple boot-up. Guenter's concern is that even minor -stable regressions, even performance regressions, will be used as arguments against updating production kernels, which could leave them with significant bugs or other vulnerabilities. For this reason, Guenter believes that automated testing of the -stable trees would be valuable, and Li Zefan agreed that trinity and 0day could be useful for this purpose.

Masami Hiramatsu suggested standard test suites, with bugfixes including a test case to verify no regressions. Li Zefan recalls Andrew Morton having made a similar test-with-feature suggestion, which Masami Hiramatsu agreed was a good idea. Laurent Pinchart argued that in addition to tests, documentation should also be provided.

Davidlohr Bueso suggested LTP for testing user API, and believes that Fengguang Wu's 0day tests include LTP. Dan Carpenter feels that LTP is more complicated, arguing that trinity is much easier to use. Jan Kara has no problems with LTP usability, and says that it is improving over time. Li Zefan wonders how useful LTP is in finding kernel bugs, given that he rarely such bug reports. Jan Kara sees 3-5 per year, and is also working to extend LTP's coverage. Andy Lutomirski wondered if anything had come of earlier discussions to provide a make test, and mentioned that he is working on a tool called virtme. Greg KH said that make test already exists, but is currently broken, and that the plan is to fix it. Masami asked what make test would do, wondering if it would run tools/testing/selftest. Greg said that the first goal is to run the tests in the kernel today, then go from there.

Masami Hiramatsu agreed that LTP is good for testing the user API, but is also concerned about the non-syscall portions of the user API. In addition, Masami is concerned about version mismatch between LTP and the kernel, suggesting that an in-tree test suite might have advantages in this area. Davidlohr Bueso agreed that he is focused mostly on syscalls and ioctls, and agreed that although LTP does some procfs testing, that it is mostly concerned with syscalls. Davidlohr believes that LTP should continue focusing mostly on syscalls, which should allow it to avoid worrying quite as much about the exact kernel version, given that the user API does not change much. Davidlohr also believes that LTP is an example of the usefulness of out-of-tree test suites.

Matt Fleming like in-tree test suites because they allow the fix and test to be added with a single commit. Masami Hiramatsu likes the single-commit approach because it can be used by git bisect. Daniel Vetter doesn't believe that in-tree tests for drm/i915 are useful because current tests rely on the “piglit” out-of-tree test framework. Furthermore, integrating the tests and documentation with the framework improves usability. Daniel instead links the kernel commit to its testcase using the “Testcase:” tag. Masami would like to clarify each subsystem's testing policy, be it use of dedicated out-of-tree tools or an in-kernel self-test. Masami likes the linking to testcases, suggesting that this might allow automating test configuration when doing git bisect, even if the tests are out-of-tree. Daniel would like to see minimal standards that apply everywhere, including how any external tests are documented and how much pressure should be applied to bring them in-tree. Masami agreed, also asking that a unit-test tag be added to the MAINTAINERS file, which would make it clear how to test the corresponding subsystem when preparing to submit patches.

Jason Cooper would like to see some sort of filtered automated reporting of differences between -stable releases, for example, from v3.12.3 to v3.12.7, in order to allow release engineers to easily identify any changes that they are uncomfortable with. Guenter disagrees, arguing that any such list would be used as an excuse to exclude all patches, resulting in running a less-stable mainline release.

Michal Simek would prefer that additional testing effort be invested in testing patches earlier in the development/maintainership process, before they hit mainline. Michal would also like to see more testing of the more obscure architectures via simulators.

Ben Hutchings believes that Fengguang's 0day tests excludes old patches based on author date, so that it will refrain from testing backports. Ben suggests that the test change to commit date so that backports would be tested. Li Zefan agreed that it would be good for Fengguang to extend his framework to cover stable releases. Guenter Roeck volunteered to publish -stable branches on kernel.org in order to gain the benefit of 0day testing, though his workflow would result in repeated rebases (and thus possibly repeated tests).

Olof Johansson runs Greg's -stable queue through his builder for ARM configurations, and also boots them on his and Kevin Hilman's ARM board farms. Olof noted that the test coverage was not very extensive, but that it at least covers the basics.