Top Modern Deck Win Percentages

Back in 2015, a few intrepid MTGO data-miners were able to bring us the win percentages for top Modern decks. They achieved this through programs which interpreted visual and textual clues from MTGO replay screens to identify decks and their records in matches. Different authors and analysis then aggregated all those results. In the end, this allowed Modern content creators to calculate two of the gold standards of Magic data analysis: Match Win Percentages (MWPs). More specifically, they were able to find:

Overall Match Win Percentage against all decks in the field.
Matchup-specific Match Win Percentage in certain Deck A. vs. Deck B matches

From what I remember, the two most prominent authors who published these results were Frank Karsten of ChannelFireball and SaffronOlive of MTGGoldfish. Karsten’s “Magic Math – The New Modern by the Numbers” is still up to this day, and you can see what some of those old MTGO data-scraping techniques were able to calculate. These numbers busted some myths (e.g. the allegedly unfavorable Twin vs. Jund matchup was actually 50/50), and supported some widely-held beliefs (e.g. Tron really was a 44/56 dog to Twin). Unfortunately, Wizards later requested that content creators remove some of this data. For example, SaffronOlive’s “28K Games of Modern Analyzed” article itself is still live, but many of its associated data tables were long-ago deleted.

Wizards ultimately changed the MTGO client to stop these data-scraping analysis techniques, primarily by preventing replay viewing for games that a player did not participate in. Between this technical change and deliberate throttling of League results, it has been challenging to compile accurate data on any Magic metagames, let alone the MWP gold standard.

Thankfully, other Magic community members stepped up to carry the data torch. The most recent data heroes have been Reddit users mindspank and Erzel_. Both users solicited input from the broader Reddit, Twitter, and online communities to determine what players brought to various paper GP events. Both users published their work in well-received Reddit posts: mindspank for GP Vegas and Erzel_ for a triple GP dump plus GP Prague alone. Erzel’s recent infographics are particularly helpful at providing both a Modern metagame picture, as well as an MWP analysis for many top decks.

Both users did so much work on their respective datasets that I didn’t have much to add. I did notice, however, that Erzel’s recent analysis did not aggregate MWPs from the previous GP they collected: GP Vegas, GP Barcelona, and GP Sao Paulo. This gave me a unique opportunity to combine all the data into one concise picture of top Modern deck MWP across multiple events.

The Data

Raw data for this post comes from Erzel’s GP analysis Google sheets:

In summary, Erzel_ creates a survey that respondents can self-submit to identify their GP decks and the decks of their GP opponents. Erzel then aggregates those survey responses, tabulates them, and cross-references findings with the GP pairings and results pages on the Wizards website. This allows him to determine which decks are winning which matches, even for matches that aren’t reported, and then again to determine overall MWP and metagame picture. His response rate for the surveys hovers around 50%, which gives us a satisfying picture of who is playing what. We’ll talk about some limitations of this data collection method later.

I cannot emphasize how much hard work Erzel_ must have put into building, cleaning, and aggregating this data. Another huge shoutout to him for that labor. I encourage everyone to contribute to his surveys/projects as much as they can; I would never have the time to do these initial steps, and this post is only possible because of the gathering and cleaning work he did up front.

Reviewing Erzel’s data and analyses, I simply combined all the GP results (Vegas, Barcelona, Sao Paulo, Prague) to get some aggregated overall MWPs and matchup MWPs for top Modern decks across the tournaments. I didn’t do additional analysis because, again, his work was already so comprehensive.

Overall MWP for Top Modern Decks

The list below shows the MWP for all Modern decks with more than 200 recorded matches in the combined GP Prague/Triple GP samples. This represents their overall win rate across matches against all known opponents. It does not include matches against unknown opponents. So if UW Control beat Infect 2-1, that match would count towards the total. But every time that UW Control beats an unknown deck 2-1 or loses to another unknown deck 0-2, these would not count. See the “Data Limitations” section on how this can create some issues, but it’s overall an effective and transparent approach.

The numbers in parentheses represent total recorded known match-wins vs. total known matches (e.g. we saw KCI win 272 of its observed 469 matches).

KCI: 58% (272/469)
Counters Company: 54.8% (181/330)
UW Control: 54.5% (362/664)
Humans: 52.1% (551/1058)
Gx Tron: 51.9% (402/775)
Bogles: 50.7% (113/223)
Burn: 50.6% (383/757)
Hollow One: 50.5% (222/440)
Infect: 50.2% (155/309)
Grixis Death’s Shadow: 50% (182/364)
Storm: 50% (148/296)
Jeskai Control: 47.8% (314/657)
Mardu Pyromancer: 45.6% (259/568)
Titanshift: 43.8% (144/329)
Affinity: 43.8% (224/512)
Jund: 43.6% (158/362)

Notable omissions from the list include Bant Spirits, UW Spirits, Hardened Scales, and Bridgevine. These lists all had strong GP Prague performances, but were unknown or underplayed at GP Vegas, Barcelona, and Sao Paolo for a combination of reasons (card legality, under-appreciated power level, etc.). If you just look at GP Prague performance, here are those MWP for reference (just remember that those extra MWP figures are only from a single GP, not from the four-GP pool):

UW Spirits: 56.3% (40/71)
Hardened Scales: 54.9% (89/162)
Bant Spirits: 53.3% (49/92)
Bridgevine: 52.9% (72/136)

Given these numbers, here are some high-level takeaways. Note that all caveats that will be described in the “Data Limitations” section apply to all takeaways and analysis:

The majority of top-played Modern decks have MWPs at 50%+: Including the four GP Prague-only decks (UW Spirits, etc.), there are 15 top Modern decks with 50%+ MWP. This suggests that you have a lot of options for tournament-viable strategies.
KCI presents as a really good Modern deck: KCI has won multiple GP, it placed in the Top 4 of a Team PT, and it now posts the highest MWP of any widely-played Modern deck. This deck is exactly as good as people give it credit; maybe better!
Midrange may be struggling relative to other decks: Mardu and Jund have MWPs under 50%. GDS is struggling right at the 50% edge. This isn’t necessarily a problem as these decks are often touted as 50/50 decks in the format. But with most other top decks over that 50% mark, it could suggest a deeper weakness.
Counters Company might be a sleeper hit: Many of the top 51%+ decks are widely-appreciated format pillars that appear in tournament after tournament. Counters Company, however, appears to get far less press than the other decks. Its MWP is behind only KCI, which suggests the deck might be better than its prevalence suggests.

Feel free to post any other takeaways in the comments section or other related discussions. I’m curious to hear what you find.

Matchup-specific MWPs between Top Modern Decks

Next we turn to the matchup matrix. This only considers the 16 decks in with combined match samples greater than 200 observed cases. The table below (click for a larger image) shows the MWP of the left/Y-axis deck against the top/X-axis deck. So Affinity appears to have a 38.2% MWP against Bogles, Humans is 45.2% against Gx Tron, etc.

To adjust for smaller samples, the MWPs represent a weighted average of the observed MWP and the expected MWP between two decks. I calculated expected MWP using one of my favorite sports formulas, Bill James’ Log5 probability estimate. This formula calculates the expected win percentage over time between an entity with MWP of P(a) and an entity with MWP of P(b). Like every statistical method we can use, Log5 has some limitations, but it works really well to boost up a small N in samples like ours when we have an unknown true MWP. This helps us account for small N in a sample like Infect vs. Jund (only 7/10 in Jund’s favor in observed games) by giving an expected MWP on Infect’s and Jund’s individual MWPs. From what I can see, this is also a different calculation method than Erzel’s weighted average in his own charts/sheets.

(09/05/2018 edit to address a relevant criticism). The other reason I am using log5 is that many Modern matchups have historically gravitated to the MWP mean. That is, if two true 50-50 decks battle, the match is often closer to 50-50 than we initially believe, even if it is ultimately unfavorable. We saw this in the 2015 MTGO dataset in Karsten’s earlier article. In that large N dataset, numerous matchups clustered far closer to the 50-50 range than we initially believed they would, including allegedly bad/good ones. Examples included Twin vs. Jund (51/49), Twin vs. Tron (56/44), Delver vs. Twin (52/48), Burn vs. Tron (54/46), etc. There we’re very few matchups outside of the 45/55 bracket, and only 4 of the 42 recorded matches were sub-40/60. This further suggests a gravitation towards an MWP equilibrium point, which is the exact effect log5 induces. This historical precedent suggests a log5 adjustment will accurately normalize most small N matchups to that gravitated mean point.

Here’s the table (again, click for a larger image).

And here’s an embedded Google Sheet (also linked):

There’s so much information in this matrix that I struggle to come up with a succinct list of conclusions. That said, here are some standouts for me:

If you want to beat Humans, play Gx Tron (54.8% MWP) or KCI (56.7% MWP).
If you want to beat KCI, play Grixis DS (57.1%), Infect (57.4%), or Storm (54.3%).
If you want a deck that is basically 50/50+ against everything, play UW Control (only sub-50% matchups are Jeskai at 47.8% and Storm at 43%).
I’d avoid Titanshift. Only three of its matchups are over 50%, and it has the most matches in the 30% and lower ranges (6 total).
The most polarizing matchup is KCI vs. Titanshift, which is 78% in KCI’s favor.
Humans vs. UW Control is effectively a 50/50 matchup.

Even with our adjustment, sample size is still an issue for most of our matchups. If we were to develop N% (90%, 95%, etc.) confidence intervals around those MWP calculations (i.e. knowing with N% certainty that the true MWP is between a certain lower and upper bound around the calculated MWP), we’d find that the true MWP could be anywhere from +/-5% to +/-10% around the recorded MWP. So UW Control vs. Humans might not be 50.9%/49.1%. It could actually be anywhere from 45/55 to 55/45. For a lower N sample, such as Grixis DS and Infect, the gap is even larger: it could be anywhere from 27/73 to 47/53. This is why we will need to increase N to really sharpen our MWP picture.

Data Limitations

As with all analysis projects, both the data collection and analysis methods come with limitations. In my experience with statistical work, both professionally and personally, it’s very easy for people to find limitations and objections to any posted analysis. It takes much more time and energy to acknowledge the known limitations and adjust accordingly based on those limitations. With that in mind, here’s a list of possible limitations which readers will want to consider when reading and interpreting this post. I also suggest some possible adjustments for those interested in taking the analysis a step further.

Reporting bias in the surveys: If I bomb out of a GP at 1-4 on Day 1, I am less likely to report my embarrassing performance than someone who goes 11-4 all the way to Round 15. This means the sample is probably skewed positive towards the players who are reporting their own finishes. To adjust for this, we could find the overall MWP of all players at the GP and then the overall MWP of all players who filled out the survey (they provide their names). The difference in MWP could be used as a coefficient to adjust our numbers.
Small sample size: Some matchup samples are based on 50+ matches. Some are based on 10, even across all four GP. No single matchup had more than 100 observations in the entire pooled dataset. Even with our Log5 formula adjustment, this means that our numbers could be skewed too favorably or unfavorably. The best adjustment is adding more observations. Confidence intervals also help.
Limited event relevance: GP dynamics, players, and deck choices might shift our MWP away from what they would be on MTGO or even at a local/regional event. For example, a deck like KCI or Counters Company is much harder to play on MTGO than in paper due to loop rules. This means that MTGO players may find these GP-generated numbers less useful for their Magic medium.
Log5 adjustment sensitivity: The Log5 formula assumes very large input MWP (i.e. a “true” MWP). Our MWP aren’t actually “true” MWP; they are just MWP calculated from larger samples. This means that our weighted averages of the observed MWP and the expected Log5 MWP lean could place undue weight on the expected MWP when maybe we should just trust the observed rate. You can adjust this by changing the weighting factors depending on how much/little you trust the observed MWP.

I am confident there are other limitations, just as I am confident we can come up with possible adjustments for any limitation we encounter. Feel free to post other suggestions in the comments.

More Analysis to Come!

I’m excited to see more Reddit work done on GP/Modern data collection. Hopefully this further builds the MWP/matchup datasets. I’m also going to keep tinkering with ways to apply those adjustments mentioned in the limitation section to sharpen our Modern understanding even further.

My next step is likely to add MTGO Challenge/PTQ/MOCS, and even potentially SCG results, into the matchup matrix. This should increase N dramatically, but I foresee some issues. For one, I’m not certain it’s justifiable to cross mediums like this; MTGO and paper are two different beasts, and SCG vs. GP events also tend to see different fields. Second, I would need to confine my analysis to feature matches and T8s, which naturally skews the field towards better, or at least more prominent, players. All of this could be worth it for a larger N, but we need to be careful in widening our net too much.

Let me know if you have any questions, comments, or criticisms and I look forward to seeing you all in our next ModernMetrics data dive.

Top Modern Deck Win Percentages

The Data

Overall MWP for Top Modern Decks

Matchup-specific MWPs between Top Modern Decks

Data Limitations

More Analysis to Come!

Published by

ktkenshinx

2 thoughts on “Top Modern Deck Win Percentages”

Leave a comment Cancel reply

The Data

Overall MWP for Top Modern Decks

Matchup-specific MWPs between Top Modern Decks

Data Limitations

More Analysis to Come!

Share this:

Published by

ktkenshinx

2 thoughts on “Top Modern Deck Win Percentages”

Leave a comment Cancel reply