Welcome to LWN.net
The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!
By Jake Edge
August 31, 2022
A fairly lengthy discussion of whether there should be a way to break out of (or continue) more than one level of nested loops in Python recently took place in the Ideas category of the language's discussion forum. The idea is attractive, at least in an abstract sense—some other languages support jumping out of multiple loops at once—but it seems unlikely to go anywhere for Python. The barrier to new features is fairly high, for sure, but there is also a need for proponents to provide real-world examples that demonstrate their advantages. That, too, is a difficult bar to clear, as was seen in the discussion.
Idea
A user called "Python Millionaire" posted an example of some loops that they had written to process data about some basketball players; "I want to continue or break out of the nested loops because I am no longer interested in the player
". They proposed adding an integer to break and continue statements to specify how many loops to operate on. For example:
for player in all_players: for player_tables in all_tables: for version in player_tables: # things have gone wrong, need to break break 2 this_is_not_reached = True this_line_is_called()
If Python ever gets this feature, though, the (un-Pythonic?) integer mechanism will surely not be part of it. It is terribly fragile when code gets shuffled around, for one thing. Also, as Bryan Van de Ven observed, it would be "a usability nightmare
" because it would be difficult "to quickly locate the target of your fancy goto by means of a simple code grep
".
This is not the first time the idea has come up; the feature was raised 15 years ago by Matt Chisholm in PEP 3136 ("Labeled break and continue"). The PEP was rejected by Guido van Rossum for a variety of reasons, including a worry that the feature would "be abused more than it will be used right, leading to a net decrease in code clarity
". Peter Suter pointed to the PEP in the discussion, noting that Millionaire "would presumably need at least some very convincing examples that outweigh the reasons given in the rejection notice
".
PEP 3136 offered several possibilities for the syntax of the feature, and did not choose one, which was another reason Van Rossum rejected it. But it seems clear that the labeled version is seen as the most viable path, even among those who are against adding the feature. A labeled break might look something like the following:
for a in a_list as a_loop: for b in b_list as b_loop: if ... break a_loop
That break would exit both loops immediately; a labeled continue would go to the next iteration of the named loop.
Millionaire thought that after 15 years it might be time to reconsider the idea. They lamented that the approaches suggested to work around the lack of multi-level break are "infinitely clumsier
" and "anti-pythonic
". Suter agreed with that to a certain extent, noting the first search result for "python multiple for loop break" is a Stack Overflow answer that is overly clever. Suter adapted it to the original example as follows:
for sport in all_sports: # "for sport" loop for player in all_players: for player_tables in all_tables: # "for player_tables" loop for version in player_tables: # things have gone wrong, go to next iteration of all_sports loop break else: continue break else: continue break
That uses the else clause for loops, which will execute if no break is used in the loop, thus the loop runs to completion. So if the innermost loop runs to completion, the continue in the else will result in another iteration of the "for player_tables" loop. If the inner loop uses break, however, it will break twice more, all the way back to the "for sport" loop. As can be seen from that convoluted description, the construct is far from readable—or maintainable.
Other ways
There are multiple ways to accomplish what Millionaire is trying to do, some of which were described in the discussion. Using flags is one obvious, perhaps clunky, mechanism, another is to use exceptions, but that may not be much less clunky. Overall, though, several participants thought that the code itself should be refactored in some fashion. Chris Angelico thought that moving the search operation into its own function, which can return once the outcome is known, would simplify things. Steven D'Aprano agreed:
The obvious fix for that ugly code is to refactor into a function:def handle_inner_loops(sport): for player in all_players: for player_tables in all_tables: for version in player_tables: if condition: # things have gone wrong, bail out early. return block() for sport in all_sports: handle_inner_loops(sport)The solution to "Python needs a way to jump out of a chunk of code" is usually to put the chunk of code into a function, then return out of it.
Millionaire thought that requiring refactoring into a function was less than ideal. It is also not possible to implement a multi-level continue that way. Beyond that, Millionaire pushed back on the notion that labeled break/continue was a better syntactic choice in all cases; offering the numeric option too would give the most flexibility. There was little or no support for keeping the numeric version, however.
But the arguments given in support of the feature were generally fairly weak; they often used arbitrary, "made up" examples that demonstrated a place where multi-level break could be used, but were not particularly compelling. For example, "Gouvernathor" posted the following:
for system in systems: for planet in system: for moon in planet.moons: if moon.has_no_titanium: break 2 # I don't want to be in a system with a moon with no titanium if moon.has_atmosphere: break # I don't want to be in the same planetary system
As D'Aprano pointed out, though, that is hardly realistic; "your example seems so artificial, and implausible, as to be useless as a use-case for multilevel break
". He reformulated the example in two different ways, neither of which exactly duplicated the constraints of Gouvernathor's example, however. He also had some thoughts on what it would take to continue pursuing the feature:
To make this proposal convincing, we need a realistic example of an algorithm that uses it, and that example needs to be significantly more readable and maintainable than the refactorings into functions, or the use of try…except (also a localised goto).If you intend to continue to push this idea, I strongly suggest you look at prior art: find languages which have added this capability, and see why they added it.
Angelico noted that he has used the Pike programming language, which does have a labeled break. He found that he had used the feature twice in all of the Pike code he has written. Neither of the uses was particularly compelling in his opinion; one was in a quick-and-dirty script and the other is in need of refactoring if he were still working on that project, he said. That was essentially all of the real-world code that appeared in the discussion.
Paul Moore suggested that needing a multi-level break may be evidence that the code needs to be reworked; "Think of it in terms of 'having to break out of multiple loops is a code smell, indicating that you should re-think your approach'.
" Though he questioned the value of doing so, he did offer up a recent example:
I don't think everyone piling in with their code samples is particularly helpful. The most recent example I had, though, was a "try to fetch a URL 10 times before giving up" loop, inside the body of a function. I wanted to break out if one of the tries returned a 304 Not Modified status. A double-break would have worked. But in reality, stopping and thinking for a moment and factoring out the inner loop into a fetch_url function was far better, named the operation in a way that was more readable, and made the outer loop shorter and hence more readable itself.
Workarounds?
Millionaire complained that all of the suggestions that had been made for ways to restructure the code were workarounds of various sorts; "They are all ways of getting around the problem, not actual solutions presented by the programming language, which should be the case.
" But Oscar Benjamin said that he could not "picture in my mind real maintainable code where labelled break is significantly better than a reorganisation
". All he can see in his mind is the feature "being used to extend the kind of spaghetti code that I already wish people didn't write
". There is, of course, an alternative: "if real life examples were provided then we could discuss the pros and cons in those cases without depending on my imagination
".
Meanwhile, others in the discussion pushed back against the workaround complaint and also reiterated calls for real-world code. Millionaire returned to his earlier basketball example, with a beefed-up version that uses labeled break and continue. While Millionaire seemed to think it was a perfectly readable chunk of code that way, others were less impressed. Angelico questioned some of the logic, while Van de Ven thought it did not demonstrate quite what Millionaire was claiming:
A 50-line loop body and eight levels of indentation (assuming this is inside a function) and this is the good version? Having a multi-breakwon'tdidn't fix that. All the complexity in that code stems from trying to do ad-hoc relational querying with imperative code, at the same time as pre- and post-processing.
Van de Ven and Millionaire went back and forth a few times, with Millionaire insisting that Van de Ven's refactorings and other suggestions were not mindful of various constraints (which were never mentioned up front, of course). Van de Ven thought that the episode was an example of an XY problem, where someone asks about their solution rather than their problem, but he still persisted in trying to show Millionaire alternative ways to structure their code. There are, seemingly, several avenues that Millionaire could pursue to improve their code overall, while also avoiding the need for multi-level break—if they wished to. But that is apparently not a viable path for Millionaire.
The discussion was locked by David Lord shortly thereafter; it was clear that it had run its course.
The convoluted examples presented in the thread were not particularly helpful to the cause, in truth. Users who want to add a feature to Python should have an eye on compelling use cases from the outset, rather than generalized feelings that "this would be a nice addition" to the language. If, for example, code from the standard library had been shown, where a multi-level break would have significantly improved it, the resurrected feature idea might have gained more traction. There are lots of other huge, open Python code bases out there, as well; any of those might provide reasonable examples. So far, at least, no one has brought anything like that to the fore.
This is something of a recurring theme in discussions about ideas for new Python features. To those who are proposing the feature, it seems like an extremely useful, rather straightforward addition to the language, but the reception to the idea is much different than expected. Python developers need to cast a critical eye on any change to the language and part of that is to determine whether the benefit outweighs the substantial costs of adopting it. That is not going to change, so it makes sense for those who are looking to add features to Python to marshal their arguments—examples—well.
(Log in to post comments)
from Hacker News https://ift.tt/gWLfrmG
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.