August 4, 2013
Participants: Jiri Kosina, Josh Boyer, and Greg KH, Ted Ts'o, Rafael J. Wysocki, Nicholas A. Bellinger, Steven Rostedt, John W. Linville, James Bottomley, H. Peter Anvin, Linus Torvalds, Guenter Roeck, Shuah Khan, Ingo Molnar, Li Zefan, Willy Tarreau, Rob Landley, David Lang, Tony Luck, Takashi Iwai, Mark Brown, Ben Hutchings, Paul Gortmaker, Jason Cooper, Dave Airlie, Kees Cook, Joe Perches, Kalle Valo, Jan Kara, Motohiro Kosaki, Trond Myklebust.
People tagged: Dave Miller
Threads merged into this one:
Jiri Kosina
proposed discussing the criteria for deciding which patches go into
the various stable trees.
As one might guess from his title, Jiri believes that some
less-than-stable patches are frequently added to the stable trees,
for example, the
random.c update
in 3.0.41.
This update included 902c098a
, which Jiri says was buggy and
not marked for -stable, and caused pain to distros, which
Rafael J. Wysocki seconded.
However,
Greg KH
pointed out that these commits were due to a security issue, which
Ted Ts'o
provided
a link to.
Jiri
wondered why this wasn't in the changelog and why, given the wide deployment,
such a rush was necessary.
Ted plead “guilty”
on the changelog's deficiencies, agreeing that even if there were security
reasons to limit information flow, additional high-level technical information
would have been good.
However, Ted noted that he was not the person who pushed them to -stable,
but speculated that getting these fixes into -stable might been of great
value for embedded devices.
In any case, Ted believes that this patchset was an exception to the
normal -stable processes.
Josh Boyer seconded Jiri's concerns, noting that the first few stable releases in a given series were significantly less stable than later releases, almost as if they were release candidates rather than releases. On the other hand, Josh also noted that there have recently been significant lags between fixes being posted to LKML and eventual appearance in mainline, and that there clearly needs to be a balance.
Greg KH asked that this discussion be kicked off immediately at “stable at vger.kernel.org” instead of waiting, but said that he would be up for an in-person discussion as well.
Theodore Ts'o noted that Linus has rather firmly stated that the only fixes that should be pushed to mainline after -rc2 (or -rc3 at the latest) are for regressions or for very serious data-integrity issues. At that time, the concern was that careless late-in-cycle fixes for unimportant bugs might generate far more serious bugs. Ted wonders if the pendulum has now swung too far in the other direction. John W. Linville agreed that there seems to be some oscillation in the rules and their interpretation, stating that “a good repetitive flogging and a restatement of the One True Way to handle these things might be worthwhile once again”. In contrast with the practices for late-rc bug fixing, Greg Kroah-Hartman said that he has been consistent in enforcing the rules documented at Documentation/stable_kernel_rules.txt, and that even the SCSI maintainers were finally following them. James Bottomley objected to Greg's “finally following them” stating that the SCSI tree has had patches marked for -stable for quite some time. Rafael J. Wysocki further wondered why people complained to Greg rather than to the maintainer who marked the patch for -stable. Greg suggested that this was due to being an easy big target. A key theme running through this discussion was differences of opinion as to what fixes should be included in -stable trees, including differences in risk assessment.
In this spirit, David Lang kicked off a debate as to what level of risk is acceptable by arguing that a regression rate of one per ten fixes is insufficient. Tony Luck pointed out that Linux testing is carried out by inflicting changes on a gradually increasing pool of users over a multi-year timeframe, which means that there is a tradeoff between timely fixes and avoidance of regressions. Linus Torvalds agreed, but added that testing is usually self-selecting, so that the initial tests are carried out by the people who suffered from the bug, and who are thus likely to report an improvement even if there is some negative side effect. Linus therefore suggested that only the most critical fixes should be immediately sent to -stable, and that others could wait so as to get more testing. Greg Kroah-Hartman said that people already mark stable patches as follows:
Cc: stableGreg's workflow respects this sort of notation, so it can be used whenever needed. Willy Tarreau suggested that unadorned Cc to -stable be deferred by default, so that a patch would need to be tagged specially to be immediately applied to -stable.# delay for 3.12-rc4
H. Peter Anvin
noted that it is not unusual for a patch to be flagged for -stable
after Linus has pulled it to mainline.
Peter would therefore like some out-of-band mechanism for flagging
-stable patches.
Greg
said that such a mechanism already exists, namely sending the git SHA-1
to “stable at vger.kernel.org” along with the destination
-stable trees.
Greg also noted that some maintainers also keep separate trees to maintain
commits destined for -stable.
However,
Theodore Ts'o
pointed out that the current Docuemntation/stable_kernel_rules.txt
currently says that you should send the patch, not just the SHA-1, and
that he had been doing just this without seeing any complaints.
Guenter Roeck
says that he does the same, but also adds the SHA-1 commit ID from mainline.
H. Peter Anvin
clarified his original request, stating that he wanted better automation
of this process, suggesting something based on git notes
.
Linus
said that while he was OK with maintainers using git notes
locally (adding that they can be very powerful for certain workflows),
he would neither pull them to nor push them from mainline.
Steven Rostedt
speculated that a process based on git notes
could be made
to work even given that Linus wasn't going to pull them into mainline,
for example, by polling mainline and upon seeing a commit appear there,
checking the local tree for git notes
.
Shuah Khan
added that such a process could include quick sanity tests to make sure
that the flagged patches applied cleanly to the relevant -stable trees.
Greg KH
echoed Linus in saying that he would not be using git notes
in his -stable trees, further asking if it was really all that hard to
just remember what has been marked for -stable, for example, by placing
the patch in a mailbox or a separate git tree.
H. Peter Anvin
argued that the value of something like git notes
was
that it preserved information on why and how the patch made it to -stable.
Ingo Molnar
countered by saying that one advantage of a the limited-time acknowledgment
of review and testing contributions is that it encourages this review
and testing to happen in a timely manner.
Takashi Iwai
would nevertheless like to see some sort of metadata linking a buggy
commit with its fix, perhaps via tags or notes.
Takashi also considered the option of linking from the fix to the
buggy commit, but argued that this makes reverse mapping (of interest
to bisection) harder.
[Editor's note: It appears that there is great scope for creativity in
workflows interacting with -stable.]
Nicholas A. Bellinger
agreed with the danger that late-in-cycle fixes might reduce rather than
increase stability, and gave an example from iSCSI where he delayed
mainlining a fix for exactly this reason.
The fix required too large of a change and too much manual testing to
justify addition to a late -rc release.
Steven
suggested git cherry-pick -x
to place such commits into a
separate branch of the maintainer's main git tree, but
Linus
expressed a strong preference either for identical SHA-1 IDs in a
separate git tree or identical commit summary lines.
Linus also said that he would much rather see a given fix committed twice
by two maintainers than to have cross-maintainer dependencies, at least
assuming that it is a reasonably small and contained fix.
Steven Rostedt suggested the following criteria for the -rc levels:
The discussion must have become too boring for James Bottomley, who suggested dispensing with the “Cc: stable” tags entirely in favor of having the maintainer be directly responsible for sending patches to -stable. Steven Rostedt suggested keeping these tags, but changing the workflow so that patches not be accepted into -stable without the approval of the relevant maintainer. Willy Tarreau disagreed with Steven, stating that the current process already involves maintainer review. Paul Gortmaker suggested that -stable trees for older kernel versions should be more strict than the N-1 stable trees, given the higher risks inherent in applying patches to older kernel versions. The ensuing discussion raised concerns about the scalability of the current process (along with some contention over what “scalability” even meant in this context), concerns about losing patches needed for -stable, risks of bad patches appearing to apply without errors, challenges of managing -stable trees for old kernel versions, and tutorials on how the various -stable tree maintainers manage the workflow.
Several -stable maintainer noted that they simply took whatever patches Greg KH took, which caused Steven Rostedt to raise concerns about Greg's mortality (Steven also noted that because there are many -stable maintainers but only one Greg KH, that scalability concerns should try to push work away from Greg and onto the -stable maintainers). Greg replied that his workflow was highly and publicly documented and scripted, and that his requests for specific help have rarely been answered, but that his joining Linux Foundation now lets him focus on -stable as a part of his day job.
Greg KH listed the following two issues that he had seen in the thread:
H. Peter Anvin suggested that the kernel-summit discussion be about different -stable workflows and what the maintainers' options are rather than about a specific proposal, to general acclamation. This included James who expects to look into different workflows based on this discussion. Peter also offered to present on the -tip tree workflow.
Finally, H. Peter Anvin called out the risk of getting too hung up on policy. Different fixes at different times in different subsystems may need to be handled in different ways.