In the top division of sumo, wrestlers participate in 15-day tournaments in which each wrestler fights one bout a day and cannot face the same opponent more than once in a tournament. With 42 top-division wrestlers, there are a staggering number of possible tournaments, so I became curious about finding tournaments fitting certain optimal criteria. I could not figure out analytic solutions to some of these questions, so I wrote this tool to encode sumo tournament scheduling as an integer linear programming problem to generate some amusing tournaments, as well as to have an excuse to play around with an ILP tool.
This specifically started because I wondered how a sumo round-robin tournament can be scheduled
while still abiding by the "each wrestler fights once a day, every day" rule.
In more abstract terms you can phrase this problem as,
"If you have
In many cases, it is easy to verify that the values the solver came up with are optimal but not quite as easy to come up with schemes that produce schedules with the optimum values. In a couple of the cases, I have figured out how to produce the appopriate schedules, but if you have analytic solutions for the others (e.g., the 39 winners/losers case), I would be very curious to hear about it. Additionally, if you are an ILP expert and you can tell me how to improve these encodings, I would love to learn.
Developed using Python 3.9. I did not consciously make use of 3.9's features, but I would bet on needing 3.7+ at least.
Also requires Python-MIP, which I chose because it was very easy to install: pip install mip
.
I only used the default CBC solver; I do not know if I might get better results on my machine using Gurobi or CPlex.
Specs on the machine I used (a slightly dated desktop): Intel i7-4790, 16 GB of RAM.
You can run python sumo_query.py -h
to see a description of the commands and options.
In the examples below, I give the commands I used. The default settings correspond to the top division (Makuuchi).
You can generate tournaments for the second division (Juryo) by setting N
to 28.
I made the encoding general enough to handle the divisions below Juryo,
where wrestlers fight 7 times per tournament and so don't fight every day,
but the real sizes of the lower divisions seem to be very hard on the solver so I did not explore these extensively.
For example, I generated this schedule for division 3, Makushita,
with the following query: python sumo_query.py --N 120 --LB 20 --UB 30 --M 7 --time 3600 generate
,
taking 1414 seconds, compared to under a minute for a top-division schedule.
(I did not include wrestler names and ranks because I was too lazy to manually compose a list or scrape one; my apologies.)
For cosmetic purposes, I included files encoding wrestler names and ranks for Makuuchi and Juryo here per the November 2020 ranks. If you specify a names file, the schedules produced will include the wrestler names and indicate their correct side. I transcribed the list manually, so I apologize if there are any typos or mistakes.
In real sumo tournaments, each day's matches are decided by a committee of coaches and are typically announced during the previous day's bouts. The scheduling committee is interested in putting on a good show for the audience, in making the race for the championship exciting, and following certain customs like having the yokozuna and ozeki face low-ranked opponents in the first week and each other in the second week. This tool is intended to model possible tournaments rather than likely ones, so for now I've at least not made any attempt to formalize the unwritten (and often-changing) criteria used to make real scheduling decisions.
Additionally, tournaments are often complicated by wrestlers dropping out before the tournament starts or during the tournament due to injuries, resulting in fewer bouts on some days and opponents from the lower divisions being brought up to fight in the top division. For simplicity, this tool elides some of these complexities, though it might be feasible to eventually support more of these.
- We assume that all wrestlers participate on all days of the tournament, not requiring any "visitors" from the lower division.
- We enforce no constraints about which matchups happen when, other than a "koreyori sanyaku" option I included (the highest-ranked fighters will only meet in the last bouts of the last day) and an option to provide a file specifying which matchups are not allowed due to wrestlers being in the same stable. (Conflict files for Makuuchi and Juryo are given here. I manually put it together based on looking at wrestlers' stables so I apologize for any transcription mistakes.)
We also do not model the results of playoffs, partly because the rules about playoffs between more than two wrestlers get very complicated.
Suppose there are
We can model these fights using a set of integer variables
Each wrestler fights at most once a day (if
Each wrestler fights a total of
Any two opponents can face each other at most once in the same tournament, which we can encode with constraints of the form
For scheduling purposes, we also enforce that each day of the tournament have at least
Given the above, let us define binary variables
For
To model the scores over the course of the tournament, let us define integer variables
Several of the below queries are concerned about conditions related to the champion. In sumo, a wrestler is the champion if he has the most wins after the final day's bouts. If more than one wrestler is tied, then there is a tiebreaker playoff between them; we will not model the results of playoffs for now.
To specify that a given wrestler mip
permits only inclusive constraints).
While most championships are decided on the final day and some go to a playoff, sometimes a sumo championship is mathematically secure before the final day. The championship is mathematically secure up to a tie on day
The best possible score wrestler
A few queries examine how we can maximize or minimize the number of wrestlers with a specific score (a special case of which is modeling the largest ties, which simply requires fixing the champion's score too). We can give this as an optimization objective to the solver by defining variables that encode whether a value is strictly less than a constant.
Based on this Stack Exchange question,
we can use this encoding to define a binary variable
To maximize the number of wrestlers at least a specific score
To optimize for the largest tie, we specify that the champion have a particular score and maximize the number of wrestlers with a score greater than or equal to that (since they are already constrained to have less than or equal to the champion's score, this will maximize the number exactly equal to the champion's score).
Similarly, we can optimize for the number of wrestlers with exactly a specific score
As mentioned in the introduction, I was curious as to how one would schedule a round-robin tournament while still meeting the requirements that each wrestler fight once a day and never face the same opponent more than once. As there are 42 top-division wrestlers, each wrestler has 41 possible opponents, so such a round-robin tournament must be 41 days long (what a gauntlet).
I used the following query (omitting wins and losses because they're not necessary for the question and make the encoding larger):
python sumo_query.py -k 0 -D 41 -M 41 --time 1200 --names names_files/makuuchi_11_2020.json generate
. I omitted the conflicts file because this would prevent certain matchups and it would likely be impossible to schedule it while still meeting the "each wrestler fights exactly once each day" requirement in the top division (without visitors from the second division).
I include the resulting schedule here, which took 567 seconds to solve. You would be much better off using a normal round-robin scheduling program.
Fun fact: Setting -k
to 3 (constraining the final 3 fights on the final day to be between the top-rankers)
caused the solver to sputter. It did not find a solution within an hour and I was not willing to wait.
If I recall correctly, the lowest winning score in any real top-division tournament has been 11-4, with Harumafuji being the last to achieve it in his final tournament in September 2017. As we see below, the lowest possible winning score is 8-7, with a playoff, but I was also curious what the lowest winning score without a playoff would be.
While there are very many hard fighters in the current top division, I think Takayasu should have the honor of being the hypothetical champion in this situation, since he has not yet (as of Nov. 2020) won a championship despite having been an ozeki and having been second place many times.
Query: python sumo_query.py --names names_files/makuuchi_11_2020.json --conflicts conflicts_files/makuuchi_11_2020.json champ --idx 8 --score 9 --no-ties
I include the resulting schedule here, which took 105 seconds to solve. Amusingly it includes a perfect opening week from Hakuho.
We can conclude that at least one wrestler must have a winning score, since if every wrestler had 7 or fewer wins
at the end of a 15-day tournament with 21 bouts a day, the total number of wins would be at most
I used the following query to try to optimize for the biggest tie with an 8-7 championship score: python sumo_query.py --names names_files/makuuchi_11_2020.json --conflicts conflicts_files/makuuchi_11_2020.json --time 1200 champ --score 8 --max-tie
I include the resulting schedule here, which took 870 seconds to solve.
All but 3 wrestlers ended up tied for the lead. I imagine the rankings committee would have an easy time later: Keep almost everyone in their current ranks, except for the unlucky saps (one's an ozeki and so would only be kadoban). The PR officials might have a harder time.
Since 8-7 is the smallest possible winning score, I was similarly curious to see what would be the smallest possible playoff for it
rather than the largest possible. In retrospect, the solver result suggests how one can construct such a tournament: Divide the wrestlers into two groups, which we'll call
I used the following query to try to optimize for the smallest tie with an 8-7 championship: python sumo_query.py --names names_files/makuuchi_11_2020.json --conflicts conflicts_files/makuuchi_11_2020.json --time 1200 champ --score 8 --min-tie
I include the resulting schedule here, which took 246 seconds to solve. We can also verify that this is the smallest possible tie with an 8-7 championship: Suppose it were possible to produce a tie of
Championships with perfect scores are very uncommon and I do not know if there has ever been a playoff between 15-0 wrestlers (I didn't check), so I was curious to see how many wrestlers could simultaneously attain a perfect score over 15 days. I felt silly when the solver finished, because this one isn't hard to work out on paper: Pick half the division to win all their bouts, and other half to lose all their bouts, and bounce all the losers between the winners for the 15 days, with the winners never facing off until the monster playoff.
I used the following query to try to optimize for the biggest tie with 15-0 championship score: python sumo_query.py --names names_files/makuuchi_11_2020.json --conflicts conflicts_files/makuuchi_11_2020.json --time 1200 champ --score 15 --max-tie
I include the resulting schedule here, which took 921 seconds to solve. A larger tie is not possible, as the 21 wrestlers' 15 wins each represent all the bout victories in the entire tournament.
The bloggers at Tachiai.org like to use the term "Darwin bout" to refer to bouts between
wresters who both have a 7-7 record on the final day, since an 8-7 final score means a promotion while a 7-8 score
means a demotion. I was too lazy to specifically maximize final day faceoffs between 7-7 wrestlers,
but I did optimize for having the most wrestlers at 7-7 after day 14, whose fates would be determined by their final-day performance.
As it happens, every bout on this day 15 is a Darwin bout. (The scheme described above with an A
group and a B
group will produce such a schedule, I only realized later.)
Query: python sumo_query.py --names names_files/makuuchi_11_2020.json --conflicts conflicts_files/makuuchi_11_2020.json --time 600 opt-score --max --lower-score 7 --upper-score 7 --day 13
I include the schedule here, which took 366 seconds of solver time. (The result is largely the same as the "smallest tie with an 8-7 championship" tournament, which I did not inspect very closely, though the solver time required was substantially different.)
This query is slightly semantically different from seeking the maximal tie with an 8-7 championship, but the result was largely the same.
Query: python sumo_query.py --names names_files/makuuchi_11_2020.json --conflicts conflicts_files/makuuchi_11_2020.json --time 1200 opt-score --max --lower-score 8 --upper-score 15 --day 14
I include the schedule here, which took over 2000 seconds of solver time (yes, it didn't follow the time limit I set).
The solver warns it cannot guarantee that this is optimal. However, we can verify that it is not possible for there to be more than 39 winning scores. Suppose there were at least 40 winning scores. That represents at least 320 wins between the wrestlers with those winning scores. The total number of bouts in a 15-day, 21-match-per-day tournament is 315, so this is impossible.
I wonder what sort of reception this tournament would have. Incidentally, one can produce a schedule with 39 losing scores by taking the one with 39 winning scores above and reversing the outcome of every single match.
Query: python sumo_query.py --names names_files/makuuchi_11_2020.json --conflicts conflicts_files/makuuchi_11_2020.json --time 600 opt-score --max --lower-score 0 --upper-score 7 --day 14
I include the schedule here, which took 432 seconds of solver time. We can verify that 39 losing scores is optimal by similar reasoning to the above. Suppose there were 40 losing scores. Those 40 wrestlers collectively lost at least 320 bouts, but there are 315 bouts in a tournament.
I suppose a very early mathematically secure championship would be somewhat boring since the tension would be removed from much of the tournament, but a lot of people would have to be losing matches they would be expected to win to make that possible, which would surely also make for an exciting tournament.
I gave the hypothetical honor of this championship to Terunofuji, whose comeback to the top division has been utterly miraculous.
Query: python sumo_query.py --names names_files/makuuchi_11_2020.json --conflicts conflicts_files/makuuchi_11_2020.json --time 600 champ --score 15 --secure 9 --idx 7
I include the schedule here, which took 42 seconds of solver time. Note that the constraints only required that the championship be secure up to a tie on day 10: they did not require a tie to actually happen.
Who but Hakuho could have this one?
Query: python sumo_query.py --names names_files/makuuchi_11_2020.json --conflicts conflicts_files/makuuchi_11_2020.json --time 600 champ --score 15 --secure 10 --no-ties
I include the schedule here, which took 83 seconds of solver time.
I used the following query: python sumo_query.py -k 0 --time 600 champ --score 15 --secure 8
and, after about 180 seconds, the solver concluded that the linear reduction of the integer linear program
was infeasible. So day 10 seems to be the earliest that the championship can be mathematically secure up to a tie.
We can verify this with the following reasoning: Suppose some wrestler
I used the following query: python sumo_query.py -k 0 --time 600 champ --score 15 --secure 9 --no-ties
.
After only about 14 seconds, the solver concluded that the linear reduction was infeasible, so day 11
seems to be the earliest that the championship can be secured outright.
We can use similar reasoning to the above: Suppose some wrestler