Tireless Tracker

Mapping the Magic Landscape

July 5th, 2022 — Jett Crowdis

Table of Contents

Communicating the diversity within a Magic format is a difficult task. Nowhere is this more true than in Commander and Cube, two formats with unrestricted card pools and unparalleled in their capacity for self-expression.

By partnering with EDHREC and Cube Cobra, the largest databases for their respective formats, I had the incredible opportunity to assess this diversity. I set out to visualize the over one million Commander decklists from EDHREC and the 30,000 cubes from Cube Cobra.

The results are the Commander Map and the Cube Map, interactive atlases that visualize the full variety of these formats. This article will explain the math behind these maps, the limitations of our approach, and our plans for them in the future!

The Framework
The Data

The fundamental unit of data from both EDHREC and Cube Cobra is simply a list of cards. In the case of a Commander deck, this list is 100 cards, but in the case of a cube, the list can be any size. Because the Commander and Cube Maps both analyze this type of data, the mathematical approach to building them is identical.

To evaluate diversity, we need a foundation to compare two different lists (i.e. two different Commander decks or cubes to each other). The most intuitive framework is to simply count the number of each card present within a list. As an example, consider four simple lists: one contains only Lightning Bolt, one contains only Ponder, one contains both, and one contains no cards at all. In this framework, Lightning Bolt and Ponder each represent a spatial dimension, and we can plot each list as a data point:

List 1

List 2

Ponder

List 3

Lightning Bolt

List 4

Ponder
Lightning Bolt

Lists with two or fewer cards are represented as corners on a square, like above. Lists with three or fewer cards can be represented as vertices of a geometrical cube. Lists with four or more…

See the problem?

There is no way for humans to directly visualize four or more dimensions. Even worse, there are over 20,000 unique cards printed in Magic’s history, each of which is a dimension in this framework. If we can’t visualize four dimensions, how can we hope to visualize more than 20,000?

Dimensionality reduction is the solution.

Dimensionality Reduction

The goal of dimensionality reduction, or “projection”, is straightforward: find the best way to represent multi-dimensional data with just two dimensions1. A person’s shadow, for example, is a 2D projection of their 3D body. Projection usually loses information, as two dimensions can rarely retain all the information stored in higher dimensions. A shadow might show how tall and wide someone is, but it cannot communicate how “thick” they are2. The goal of dimensionality reduction is to find dimensions that best preserve higher dimensional relationships. We want points that are similar (close together) in high dimensions to also be close together in our low-dimensional representation.

The key is that some dimensions are more important than others. Take the following data as an example:

These data are two-dimensional, but imagine that we want to reduce their dimensionality and “project” them into just one dimension. Can you spot the single dimension (a line) that best explains the data? Click the toggle below the graph to find out.

The best dimension is a mix of our two original dimensions! It explains 98% of the information present in the data3. Dimensionality reduction works by combining our original dimensions into one or two dimensions that best explain the data. It leverages relationships between the original dimensions to do this. In the above data, dimension 1 is linearly correlated with dimension 2, so a single line that “combines” them preserves the most information.

The logic for card lists is much the same—we might, for example, recognize that Splinter Twin is very often played with Pestermite and Deceiver Exarch. Rather than consider each card a single dimension, we could declare a single dimension that represents the presence (or absence) of all three. This reduces the number of dimensions but still preserves information.

For a more complete example, let’s again take some example card lists:

List 1

Swords to Plowshares
Mother of Runes
Oblivion Ring
Wrath of God
Birds of Paradise
Sylvan Library
Cultivate
Eternal Witness
Faithless Looting
Lightning Bolt
Abrade
Chaos Warp
Dark Ritual
Demonic Tutor
Toxic Deluge
Ravenous Chupacabra

List 2

Swords to Plowshares
Mother of Runes
Oblivion Ring
Wrath of God
Brainstorm
Ponder
Counterspell
Dig Through Time
Faithless Looting
Lightning Bolt
Abrade
Chaos Warp
Dark Ritual
Demonic Tutor
Toxic Deluge
Ravenous Chupacabra

List 3

Brainstorm
Ponder
Counterspell
Dig Through Time
Birds of Paradise
Sylvan Library
Cultivate
Eternal Witness
Faithless Looting
Lightning Bolt
Abrade
Chaos Warp
Dark Ritual
Demonic Tutor
Toxic Deluge
Ravenous Chupacabra

There are 20 unique cards among these lists, so these three data points (lists) each have 20 dimensions. Since each dimension represents an individual card, combinations of dimensions will represent the weighted presence or absence of multiple cards. Can you find the two dimensions that best explain the differences between these lists?

You’ll notice that each list has the same red and black cards but has only two of green, blue, or white cards. Thus our dataset, despite having 20 original dimensions (cards), actually has only two dimensions related to the presence of absence of green, blue, and white cards.

These dimensions explain 100% of the information in these data4. The goal of dimensionality reduction in the context of lists is to find cards that are present in some lists and absent in others. These are the cards that best explain the differences between lists and therefore preserve the most information.

Unfortunately, the real-world datasets of Cube or Commander databases are messier, containing as many as one million lists in the case of EDHREC’s data. Reducing the dimensionality of these data cannot be done by eye—the lists are too varied and the dataset too large.

Principal Component Analysis

Luckily, there are dimensionality reduction algorithms to automate the process, each with strengths and use cases. I will explain the first approach I tried: principal component analysis (PCA). One of the most popular and intuitive dimensionality reduction algorithms, it is in fact the exact approach outlined with the toy examples above.

PCA algorithmically finds the best linear dimensions (or “principal components”), which means that the reduced dimensions it finds are linear combinations of the original dimensions. In the example of the data on a line, we went from two dimensions to one, and the best “principal component” was ½×(dimension 1) + ½×(dimension 2). In the example with the lists missing colors, we went from 20 dimensions (cards) to two. Principal component 1 was the “blue” dimension which gave blue cards a positive weight, while principal component 2 was the “green and white” dimension, giving white cards a positive weight and green cards the same weight but negative.

PCA is popular because it is interpretable. By seeing how the original dimensions contribute to the new dimensions, we can understand the patterns that PCA is using to explain differences between lists.

To apply PCA to our Commander and Cube data, I constructed a matrix for each dataset, where the rows represent each list and the columns represent each card. Each entry of the matrix contained a 1 if the card was present in that list and 0 if it was absent5. Because the presence of each card in a list is denoted with a 1, this process treats cards within a list equally, regardless of whether they are a commander, partner, or basic land6. At the time of this article’s release, the Commander matrix was of size (~1.2M x ~22k), while the Cube matrix was (~30k x ~23k). The larger number of cards for Cube reflects the format’s lack of banlist or legality restrictions. After constructing these matrices, I applied PCA to each. The results illuminate some high level differences between Commander decks and between cubes:

Notably, PCA only explains 3% and 10% of the differences among Commander decks and cubes, respectively, which is far worse than in our toy examples. We can look closer at the cards that contribute most to each dimension:

Commander

Dimension 1 (Card: weight)
Dimension 2 (Card: weight)
Fetchlands: 14.5
Sol Ring: 32.6
Mana Crypt: 13.7
Commander Tower: 30.0
Rhystic Study: 13.0
Arcane Signet: 23.3
Mystical Tutor: 12.7
Basics: 17
Swan Song: 12.2
Exotic Orchard: 13.4
Cyclonic Rift: 12.2
Lightning Greaves: 11.3
Chrome Mox: 12.0
Reliquary Tower: 11.2
Beast Within: -10.9
Ridgeline Rager: ~0
Kodama's Reach: -11.9
Raging Spirit: ~0
Terramorphic Expanse: -12.0
Serra Inquisitors: ~0
Rampant Growth: -12.0
Hulking Ogre: ~0
Evolving Wilds: -14.6
Coastal Hornclaw: ~0
Cultivate: -16.2
Brutal Suppression: ~0
Forest: -19.9
Cinder Crawler: ~0
Cards relating to each PCA dimension. Lists with high dimension values will have many cards with positive weights, and lists with low dimension values will have many cards with negative or low weights.

Cube

Dimension 1 (Card: weight)
Dimension 2 (Card: weight)
Fetchlands: 6.2
Evolving Wilds: 7.0
Birds of Paradise: 6.2
Terramorphic Expanse: 6.4
Restoration Angel: 6.1
Man-o'-War: 6.3
Snapcaster Mage: 6.1
Rancor: 6.1
Shocklands: 6.1
Carrion Feeder: 6.0
Thoughtseize: 6.0
Raise the Alarm: 5.8
Thalia, Guardian of Thraben: 6.0
Faith's Fetters: 5.7
Giant Growth: -1.1
Ancestral Recall: -1.7
Murder: -1.2
Black Lotus: -1.7
Guildgates: -1.3
Moxen: -1.7
Dead Weight: -1.3
Vampiric Tutor: -1.8
Gravedigger: -1.4
Balance: -1.8
Pacifism: -1.7
Wheel of Fortune: -1.8
Gainlands: -1.8
Dual Lands: -2.0

While we can see how each card contributes to each dimension, interpreting the meaning of these new dimensions is more difficult. My loose interpretation for both formats is as follows:

Commander

  • Dimension 1 appears related to budget. Cards with positive weights are expensive and played in decks without price restrictions, while cards with positive weights are cheaper.
  • Dimension 2 may reflect Commander “homogeneity”. Cards with high weights are played in “traditional” Commander decks, while cards with low weights are played in “off the beaten path” Commander decks that may be focused on less common themes.

Cube

  • Dimension 1 is again related to budget. Cards with positive weights are those that are commonly played in cubes without budgets, while cards with negative weights are played in budget cubes.
  • Dimension 2 appears to be capturing elements of cube categories and power level. Cards with positive weights are popular Pauper cards, while cards with negative weights are frequently present in Vintage cubes.

In both cases, the principal dimensions capture underlying themes that are somewhat overlapping, which reveals that PCA is not as useful for our dataset as it might be for others7.

Even though PCA can extract higher level features of each format—power level, budget, and broad categories—it does a poor job of separating lists visually in the plots, so we need more granularity. For example, PCA doesn’t give us enough detail to understand the differences between different Bant Commander Decks decks or see niche Cube designs like Unset cubes. In short, PCA doesn’t communicate the diversity of design approaches in each format.

While PCA is poorly equipped for this task, nonlinear dimensionality reduction is a perfect fit.

Nonlinear Dimensionality Reduction

PCA is interpretable because it is a linear approach, so it preserves “global” structure. Lists that are far apart in the plot shown earlier are truly very different, and we can easily interpret the meaning of the axes using the weights assigned to each card.

Nonlinear dimensionality reduction sacrifices interpretability and global structure to learn more about “local” structure, or differences between lists that are relatively similar. As an example, let’s consider some two-dimensional data:

Imagine we want to project this data into one dimension. PCA would find the line that best fits this data, and in doing so, completely fail to explain the shape of the data. A nonlinear approach might find a more complicated representation; in this case, a spiral instead of a line8. In doing so, it loses interpretability, as we can no longer clearly discern how the original dimensions contribute to the new dimensions. We also lose global structure—a point at the center is considered very far “along the spiral” from the spiral’s end, even though the distance in our original dimensions wasn’t very large.

However, the nonlinear approach strongly emphasizes local structure. By keeping close points together, the nonlinear approach is able to “unfurl” the spiral shape of the data. Practically, the result is that if we use a nonlinear approach, moderate to long range distances are not preserved, but data points (lists) that are very similar will be located close together9. There are a bevy of nonlinear approaches we could use, but I settled on Uniform Manifold Approximation and Projection (UMAP) for this project, mostly because it makes the nicest plots10.

UMAP is significantly more subjective than PCA. For example, PCA compares lists using the Euclidean distance, the form of distance with which most people are familiar (formally, the length of the line segment between two points)11. UMAP has no such restriction, so we can use other distance metrics that are more suitable for our dataset.

An intuitive approach is to instead define similarity between lists as the number of cards that they share, which is called the Hamming distance. Unfortunately, this metric is heavily influenced by the size of the list, as larger lists naturally share more cards with other lists. This is most relevant for cubes, but Commander decks vary in effective size too (due to the presence of repeated basic lands).

Much early work on the maps was devoted to determining the right distance metric—Euclidean distance resulted in a homogenous blob, while Hamming distance tended to group larger lists together regardless of their underlying design goals. Eventually, I decided on Jaccard distance, which defines similarity as the number of cards shared between two lists divided by the total number of unique cards they contain12.

UMAP also has several parameters that control how it reduces dimensionality13. Choosing these parameters is entirely subjective. I chose parameters that encouraged the formation of small clusters, because I knew that very small clusters of cubes or Commander decks with unique design restrictions were possible (for example, the Degenerate Micro Cubes or Chair Tribal Commander decks). Other equally valid parameter choices merged larger landmasses or led to more islands. Because of this subjectivity, the maps don’t represent some ground truth about Commander or Cube—they are merely one way among many of visualizing their respective formats.

Running UMAP on the matrices we defined earlier resulted in the Commander Map and Cube Map that you see. The algorithm does an excellent job of separating out lists of unique and niche design restrictions, like Graveyard Cubes or Commander decks focused on Sagas.

Clustering

Another feature of the Cube and Commander maps is that they highlight different “clusters” and depict the cards that define each cluster. This is not a feature of UMAP, which is just a dimensionality reduction algorithm that does not make distinctions between groups of points. UMAP cannot tell us, for example, whether two or three types of Pauper Cubes exist, even if we visually see different clusters of Pauper cubes. Because UMAP’s dimensions are not interpretable, it also cannot tell us what cards distinguish different kinds of cubes or Commander decks. To tackle these questions, we can use clustering algorithms.

Clustering is a process that groups similar data points together and separates them from other groups. For example, most approaches would group the following two-dimensional data into four clusters:

In our dataset, clustering will draw “boundaries” on our maps to identify lists that are similar to each other and different from others.

Clustering algorithms have trouble with the following kinds of data:

  • High-dimensional data. Due to something called the curse of dimensionality, clustering data with thousands of dimensions is very difficult14.
  • Data with clusters of different sizes and densities. There is an intrinsic tradeoff with most approaches: if you focus on finding small clusters, you risk erroneously splitting large clusters. If you focus on finding large clusters, small clusters will often be needlessly merged.
  • Data with many clusters. Not only does this make computations difficult, it also increases the chances your clustering will be wrong and makes fine-tuning difficult.

Unfortunately, our datasets exhibit all of these features. Both datasets are extremely high-dimensional, since we’ve defined each of the 22,000+ Magic cards as a dimension. Lists with more general goals (e.g. “goodstuff” Commander decks or Legacy cubes) form the large, lower-density “mainland” portions of the maps, but lists with narrow design restrictions (e.g. Ape Tribal Commander decks or Alara set cubes) result in high density “islands” on the maps. Finally, there are hundreds of different kinds of lists represented on each map.

Of course, that won’t stop us!

Choosing an Algorithm

There are dozens of clustering algorithms, each with their own strengths and weaknesses. For better or worse, our needs are actually quite narrow. Because the scale of our data is so large, we must run clustering after dimensionality reduction with UMAP15. This eliminates algorithms that assume a uniform cluster shape or density, as UMAP projections do not fit these assumptions. Additionally, some algorithms require you to specify a priori the number of clusters, which is inappropriate in cases like ours where the number of clusters is both large and unknown. After some tinkering, I settled on an algorithm called HDBSCAN, which is designed to handle clusters of different densities16.

Interestingly, HDBSCAN can decline to assign points to a cluster if it deems them too far away from existing clusters and not dense enough to form their own. While it calls these points “noise”, these are just lists with slightly different design goals or compositions that prevent them from fitting nicely into clusters. For example, the algorithm declines to cluster my own cube, likely because I cube Unset cards in an otherwise power-optimized Legacy cube. This phenomenon is widespread—HDBSCAN consistently refuses to cluster around 30% of the lists for both maps!

This demonstrates an inherent flaw in our approach. Fundamentally, we are placing boundaries on a continuous spectrum of design choices. These boundaries are data-driven and in some cases quite obvious (e.g. Mirrodin set cubes), but they cannot hope to capture the essence of what makes Cube and Commander great: the ability to fine-tune a list to each player’s unique preferences and design goals. I see this wealth of unclassified cubes and Commander decks as a testament to the boundless creativity of Cube designers and Commander players.

With that said, we can force HDBSCAN to assign all lists to a cluster. I decided to do this because it leads to more complete maps and allows people to explore better explore individual decklists.

Clustering Results

HDBSCAN consistently identifies a few thousand clusters for the Commander Map and a few hundred for the Cube Map. The clusters vary dramatically in size; on the Cube Map, they range from a small cluster of cubes designed around terrible cards to larger clusters encompassing Commander Cubes. The clusters overlap fairly well with areas of density on the map.

But HDBSCAN doesn’t tell us what cards define these clusters—after all, it was run on UMAP-reduced data, which has uninterpretable dimensions. Once clusters are assigned, however, we can investigate which cards are played in each cluster. We could, for example, simple examine the most popular cards within each cluster.

This ends up being reasonably informative, but in reality, it’s not truly what guided the maps’ formation in the first place. The most popular card in most clusters of the Commander Map is Sol Ring, and on the Cube Map, Lightning Bolt sits in the top 5 most popular cards of nearly half the clusters. Looking at popular cards within any given cluster tends to yield globally popular cards and doesn’t help us assess what truly defines each cluster17.

Cluster-Defining Cards

We should instead ask what cards are enriched in any given cluster relative to other clusters. There are a few different methods available to determine these “cluster-defining” cards. The simplest is to compare the play rate of cards in lists of a cluster to that in lists outside that cluster. This is exactly how EDHREC defines its “synergy” metric. For example, if 75% of Commander decks in a cluster play Windfall and 5% of decks outside that cluster play it, its synergy would be 75% - 5% = 70% in that cluster. Cards that define clusters will have high synergy in that cluster, while popular cards that are played in all clusters will have low synergy18.

These results are much more informative and are the defining cards you see for clusters on each map. Defining cards among Pauper cubes, for example, include Phyrexian Rager and not Lightning Bolt, which is clearly more informative! On the Commander Map, defining cards tend to follow commanders but can also illustrate emergent themes like Golem Tribal.

As we developed the Cube Map, however, we noted that neighboring clusters would often have the same defining cards. Many clusters in the MTGO-Vintage-Cube-inspired portion of the map, for example, have Sneak Attack and Fetchlands in their defining cards. But HDBSCAN considers these clusters different enough to be distinct, and we’d like to know why.

To answer this, we can instead identify cards that differentiate a card from its closest neighbors using the same process. These are the locally defining cards listed in the Cube Map for each cluster. The results are fascinating—for example, Pauper cubes are partially differentiated by whether they choose to play Bouncelands. Locally defining cards were unnecessary for the Commander Map, probably because Commander decks tend to be more focused than cubes (both thematically and due to color identity restrictions).

Commander Map Features
Submaps

The Commander Map has a central map that includes all ~1.5 million decklists19, but it also has submaps that reveal differences between decks of a specific color identity, Commander, tribe, or theme. To generate submaps, I simply subsetted the full dataset to decks satisfying each condition (e.g. the ~10k Kenrith, the Returned King decks) and ran the entire UMAP/HBDSCAN/defining card process on just that subset. Accounting for commander-partner pairings, there are approximately 3,000 submaps.

Because the number of decks for each submap can vary wildly (from ~70k WUBRG decks to a handful of Ur-Drago decks), some aspects of map generation need to change. Submaps with a smaller number of decks needed smaller parameters for HDBSCAN and UMAP, so I scaled these parameters based on the input number of decks. In addition, HDBSCAN tended to classify small submaps to a single cluster, which is mathematically valid but uninteresting—after all, we want to highlight the different ways to build a single commander. In these cases, I lowered the clustering parameters until more than one cluster was found.

Submaps also require a different way to calculate synergy. There are, in fact, three types of synergy associated with defining cards on the Commander Map. Here are these synergies explained in the context of Atraxa, Praetors' Voice Commander decks:

  • Synergy and cluster-defining cards on the main map compare the decks within a single cluster to decks outside that cluster. One cluster on the main map is composed of ~90% Atraxa decks. This cluster is focused on +1/+1 counters, so one of its defining cards is Fathom Mage (play rate 86% in this cluster, 1% elsewhere = 85% synergy).

  • Synergy and submap-defining cards by contrast, compare decks in that submap to all other decks on the main map. For example, the Atraxa submap displays Astral Cornucopia in its defining cards because its play rate in Atraxa decks (58%) is much higher than all other Commander decks (1%), yielding a 57% synergy. Fathom Mage has a lower synergy here (26%) because its play rate in all Atraxa, Praetors’ Voice decks is only 30%. The synergy depicted on EDHREC most closely mirrors this synergy20.

  • Synergy and cluster-defining cards within a submap compare decks within a submap’s cluster to all other decks in that submap. One cluster on the Atraxa submap has the Myojins as defining cards because this cluster plays that cycle much more (57% play rate) that other Atraxa decks (1% play rate), for a synergy of 56%. Astral Cornucopia does not have a high synergy score in any single cluster because its play rate in all Atraxa decks is high (58%).

The first score helps us understand the driving patterns on the main map, the second helps us understand cards enriched for each commander generally, and the third illustrates different ways to build the same commander or theme21. I have personally found the last to be the most interesting, as it can reveal very niche and flavorful design restrictions like Chair Tribal Oloro decks.

In the future, we hope to implement submaps for cubes based on decks drafted from each cube. So build and save your Cube Cobra drafts, people!

Average Decks

The Commander Map also depicts an “Average Deck” for each cluster on the main map and submaps, as well as each submap generally (e.g. an average Cromat deck), but there are many different interpretations of the word average. We could attempt to construct a “mathematically average” decklist, where the frequency/presence of a card in the average list is proportional to its frequency in the decks in question. Unfortunately, this approach can lead to decklists with conflicting themes, like an Atraxa deck with Blighted Agent (infect), Master Biomancer (+1/+1 counters), and Carth the Lion (planeswalkers).

A better option is to instead pick a decklist that best represents the decks of a cluster or submap. For this purpose, I decided to focus on synergy. For each decklist in a cluster or submap, I calculated a “synergy score”, or the mean synergy for the unique cards in the decklist based on the defining cards for that cluster or submap. The decklist with the highest synergy score is the average deck. But this metric is biased towards decklists with a few, high synergy cards—the mean synergy for a decklist of just Gorilla Chieftain and basics is very high for the Ape submap. To account for this, we enforce that the average deck must have a number of basics between the 20th and 80th percentile of the decks in that cluster or submap.

Conclusion

The UMAP and HDBSCAN algorithms allow us to compactly visualize the astounding breadth of the Commander and Cube formats for the first time. If you have any ideas on ways to improve the maps or explore the data otherwise, please let us know at [email protected]! We update the maps whenever new data becomes available, generally about once a month.

This work would not have been possible without EDHREC and Cube Cobra—we at Lucky Paper are incredibly grateful for their partnerships and support on these projects. I also personally want to express my gratitude for the team at Lucky Paper, who supported the crazy idea to visualize these formats and had the web development skills to bring it to life.

The aspect of both Commander and Cube that I have always loved is the way that they allow players to express themselves. We hope that these maps support this vision for both formats and allow players to explore the depth of Magic’s card pool and their own ingenuity.

Code to Make the Maps

The code I wrote to make the maps and clusters is available on GitHub here. This does not include the underlying data from EDHREC and Cube Cobra, which should be requested from the EDHREC team or developers on the Cube Cobra Discord, respectively.

The code and approach to these data is constantly changing22. If you’re into data science, want to learn or contribute, or have advice on ways to improve, feel free to contact me on twitter at jettcrowdis.

Frequently Asked Questions

What do the axes on the map mean?

The axes on the map have no direct meaning. This is because UMAP, the algorithm we use to make the map, does not result in interpretable dimensions. High-dimensional data is complicated—no one has devised an approach that is interpretable and preserves local connections between points. Approaches that are interpretable fail miserably on this data.

Some of the cluster-defining cards are odd. Why don’t commanders show up? Why isn’t the cluster containing the MTGO Vintage Cube defined by the Power Nine?

This is because the algorithm has no concept of card importance. In calculating similarities, the inclusion of Black Lotus or Sol Ring is no more important than the inclusion of Evolving Wilds. Broadly, this illustrates an important point about the maps—just because two lists are close together does not necessarily mean their gameplay is similar. For the Commander Map, we removed commanders from defining cards, reasoning that their importance was already conveyed by the ”% Commander” metrics.

Why do many of the clusters overlap? How is it possible that two lists close together are in different clusters?

The confusing answer is that the clusters may not actually overlap, at least not in higher dimensions. Clustering is run on a higher-dimensional form of the data, where clusters may not overlap like they appear to in two dimensions. It’s like taking a photo with one person in the foreground and one in the background—in 3D space they are separated, but in the 2D picture, they may appear to overlap.

Is there some significance of the location of island X in the map?

In short, no. UMAP does not preserve global structure, so the locations of islands are essentially arbitrary. This also means that long-range distance can’t be easily interpreted. On the Cube Map, for example, a cube spatially halfway between the MTGO Vintage and MTGO Modern cubes can resemble the former far more than the latter.

Why has the map changed since I last saw it?

UMAP is stochastic, which means that its results can change in different iterations. While we keep UMAP’s random seed the same when we update the map, the maps can still change as new data is added. Plus, people change their lists and new cards are printed, so the maps dynamically evolve to reflect these changes!

This article was originally published March 24th, 2021 with the release of the Cube Map and has been updated, expanded, and generalized to both the Cube and Commander maps.

  1. We can visualize three dimensions, so why not use three? The answer is that humans are terrible at interpreting three-dimensional plots.

  2. Of course this isn’t strictly true. You can guess someone’s “thickness” by knowing their height and width because these metrics are correlated. Dimensionality reduction works much better when correlations exist because the effective dimensionality of the data is lower.

  3. The term “information” can mean many things. In this context, I am referring to the variance in the data, or essentially the differences between points. Intuitively, describing a point’s position along that line allows you to distinguish it from other points.

  4. Astute readers might wonder—how can two dimensions fully explain presence or absence of three colors? Does that not require three dimensions? Only two dimensions are needed because for these lists, the presence of blue and green automatically implies the absence of white. So the “third” dimension gives us no additional information. If there were a fourth list with green, white, and blue cards, a third dimension would be needed to explain all the information.

  5. In preparing the data, I removed duplicate lists from each dataset. For cubes, I filtered out lists that had fewer than 50 unique cards, as these cubes tended to be unfinished (my sincerest apologies to the Hidden Gibbons Cube). No such restriction is possible for Commander, as there are legitimate decklists with a Commander and 99 basic lands. For reasons that I’ll explain later, duplicate cards are ignored (see footnote 12), though for the Commander dataset we receive deduplicated data from the start. Ignoring duplicates doesn’t treat cards like Rat Colony correctly, but the map ends up grouping decks containing cards like these together anyway.

  6. In the Commander dataset, one could alternatively assign more weight to commanders or less weight to basic lands. But this involves value judgments. A commander is not always critical to the deck’s function, while the presence of basics can reflect a deck’s budget or theme. Unsurprisingly, commanders influence deck similarity a great deal anyway by influencing which maindeck cards are played—Gishath decks, for example, almost always play Regisaur Alpha. Using only 0’s and 1’s for weights also makes the math much easier.

  7. To explain a bit more: PCA identifies new dimensions that are uncorrelated (or “orthogonal”). Somewhat confusingly, this does not mean independent. Two variables can be uncorrelated but still dependent if their relationship is nonlinear—as a result, PCA is terrible at capturing complicated relationships in a dataset. As it turns out, the Commander and Cube ecosystems are full of complicated relationships.

  8. Real nonlinear methods do not work like this. The two most popular methods, UMAP and t-SNE, work by establishing connections between similar points in higher-dimensional space (“local structure”) and then trying to maintain those connections in a lower-dimensional representation.

  9. This means that the location of “islands” in the maps are essentially arbitrary. Different iterations of nonlinear algorithms, for example, will place the Degenerate Micro Cube cluster in different positions. Sometimes it’s next to the MTGO Vintage Cube’s cluster, and sometimes it’s halfway across the map.

  10. UMAP preserves nonlinear relationships, which it calls “manifolds”, between similar data points. It is incredibly efficient and can run directly on the original data matrix (for the Commander Map, ~1.2M x 22k). The other popular method, t-SNE, cannot handle such large data and requires you to first reduce the dimensionality to 10-20 dimensions with another algorithm like PCA first. For some niche data-science drama, the creators of UMAP claim that their algorithm is better than t-SNE at preserving global structure. In reality, recent work has shown that the algorithms are essentially equivalent and equally poor at explaining data in an interpretable way. But I think UMAP is prettier!

  11. PCA does not actually “choose” to use Euclidean distance—in a beautiful bit of math, it arises naturally from PCA’s goal of maximizing the variance explained in the data.

  12. This is why duplicate cards are ignored—the Jaccard distance only looks at the presence or absence of cards to define similarity. We could use or define another distance metric that accounts for duplicates, but duplicates are rare given the usually singleton nature of Cube and Commander. The Jaccard distance is also much faster to calculate.

  13. These are controlled by the n_neighbors and min_dist parameters, respectively. The documentation for UMAP is fantastic, and they have made some excellent interactive visualizations to understand exactly how these parameters control the dimensionality reduction.

  14. The curse of dimensionality refers to the fact that high-dimensional data do not conform to our low-dimensional expectations. In particular, distance between points gets extremely large, so even points that are similar get very far from each other. As a result, clustering in low dimensions is like trying to separate colors of the rainbow, but in high dimensions, it’s like trying to distinguish between slightly different shades of green. Euclidean distance is particularly vulnerable to this issue.

  15. Clustering on the results of UMAP is contentious because it does not preserve global structure and can result in fake clusters. Luckily for us, it is easy to validate clusters by seeing if the lists share design goals. Clustering on PCA results is more standard because it preserves global distance, but it fails miserably on these data.

  16. HDBSCAN stands for Hierarchical Density-Based Spatial Clustering of Applications with Noise. What a mouthful! It is similar to other algorithms in that it builds a hierarchy of clusters, but it begins by transforming the data to increase local density—doing so helps fight “noise”. For both maps, I ran HDBSCAN on the data reduced by UMAP to 6 dimensions.

  17. This isn’t strictly true. There are clear cases where cluster-popular cards are also cluster-unique—Sheltering Ancient is one of the most popular cards within the strictly-worse) cubes and is also unique to that cluster.

  18. The first iteration of the Cube Map used a “fancier” approach with the Fisher’s exact test to compare a cluster to each other cluster. This has a number of benefits, namely that accounts for noise in play rates for small cluster sizes, but ultimately this proved unnecessary. The results using play rate (i.e “synergy”) are nearly identical, and calculating pairwise Fisher’s tests for the Commander Map was far too slow with even the fastest implementations.

  19. The map included about 1.2 million decklists when this was first published.

  20. For submaps, our synergy results are slightly different from EDHREC. This could be because we control for date and color identity differently. How EDHREC calculates synergy for tribes and themes isn’t clear.

  21. In determining synergy, there are also tricky issues with handling card color identity and the date the card was released. For example, when examining the synergy of Eerie Ultimatum in Atraxa deck compared to other commanders, we must consider that this card cannot be played in most decks, so its play rate in other decks will be low regardless of its true popularity (and thus its synergy may be inflated). Similarly, some Commander decks may not have been updated since the release of Eerie Ultimatum (April 2020), so its play rate may be lower than Abzan cards released earlier. Fully explaining how I accounted for these issues is beyond the scope of this article, but if you’re curious, I encourage you to read the code.

  22. In particular, I’d love to improve the approach to clustering. Clustering on UMAP embeddings is suspect, and dynamically adjusting parameters to cluster submaps in the Commander Map is inconsistent. Some submaps are severely overclustered, while others are underclustered. I plan on exploring other techniques like Leiden clustering on PCA embeddings and using silohuette scores to optimize hyperparameters.

Lucky Paper Newsletter

Our infrequent, text-only newsletter is a friendly way to stay up-to-date with what we’re doing at Lucky Paper. See past newsletters

Renegade Map — Lake Hurwitz