I've thought a lot about how tier lists should be made and about how tier lists should be made from tier lists if you could make a good MU chart (impossible now but perhaps possible someday). This is going to be really long but I do think it's a thorough look at the whole concept and the best way we could proceed if we wanted to make a community MU chart. Here's my result:
We all know that's not how the ratios work. If we have two hypothetically equally skilled players (who don't exist), I don't think a ratio like 20-80 can exist in results. If one player has an advantage big enough to win 80% of the time, they will win 100% of the time. This is honestly probably true for even something like 30-70 as well. If we go super theory, the distribution of wins as the balance changes is probably along a bell curve that is very densely packed toward the center of the distribution (50-50) and at virtually all reasonable points outside of the 60-40 to 40-60 range approximates either 0-100 or 100-0 as appropriate. You could never simulate this with results; only the inspection of knowledgeable players can provide insight to the balance and even then it's an inexact science. We'll probably be able to put out something semi-rational after EVO; until then we're all still kinda spinning our wheels with some good ideas but also clearly incomplete pictures on a lot of fronts.
As per SRK, as far as I can tell their method for Street Fighter is to just treat the declarations of Japanese arcades as gospel and, since they have no idea how Japanese arcades make MU charts (does anyone?), there's no need to argue over where the numbers come from. It keeps it simple for them, but I'm pretty dubious of the results and even if I weren't don't think it's possible for us to emulate. We actually have to delve into this if we want to know things about how the game is balanced.
I think the numbers just confuse people; I prefer to use words to describe the situation and actively avoid meaningless numbers (even worse when you try to construct match-up charts averaging numbers from different people). I also think the way people usually use the numbers obfuscates a lot of granularity that should exist. I find there are the meaningful levels of advantage:
Hard Counter: The only way it's possible to win this match-up from the wrong side is with a large skill gap. Characters with any hard counters at all are almost never viable in the long term, but in a very poorly balanced game, they may be relevant since poorly balanced games will have a lot more of these than well balanced games. Many people like to academically try to argue the difference between nonsense like 2-8 vs 0-10, but once a match-up goes to the point of "impossible", the point of "how impossible" isn't really knowable so in reality those are all the same match-up.
Soft Counter: This character will realistically lose when our hypothetical perfectly balanced players compete, but since those players don't exist, this character can win games. It's a hard road but it can happen and will happen often enough that the character will work out if it has a small number of these. A large number of these is just damning though. I often seen this expressed as 6-4 but some people don't think 6-4 is as bad as this and use either 65-35 or 7-3 to mean this (part of why numbers are bad).
Disadvantage: This character has the harder end of the match-up to a great enough extent that we can perceive it, but even among players of very even skill, it's very realistic for either side to win. If we had perfect knowledge all match-ups that are not some form of a counter would be this way for one side, but we do not have perfect knowledge. I often see this expressed as 55-45; some people refuse to use 55-45 and use 6-4 to mean this but everyone has some level of numeric use that expresses this concept that they tend to use a ton on their personal MU charts.
Even: In theory no match-ups but dittos are even, but in reality, very close match-ups are often too close for our poor detection abilities to tell which side has the advantage. If we can't tell which side wins, we hand-wave it and call it even. You never know what will happen when these MUs are played though in a mature metagame we'll usually know enough to be able to figure out which side actually wins most MUs. At least here the number used is always consistent: 5-5.
This will also be non-linear with skill; we generally look to highly skilled players, but since the top crowd is such a tiny group (like you could easily argue Zero is on a level of his own, but if you accept that argument, 100% of match data is useless since Zero has yet to play against himself in tournament), we usually accept all forms of "good" as decent approximations since the variance in MUs between top players and generally good players is typically not all that high (though it does tend to have a few quirks...). Smash also makes it hard with stages that matter more than people tend to like to admit; if we were being honest with ourselves, we'd formulate independent MU charts for every stage, but that's an insane amount of work. I think if we wanted to do really good work we'd indicate what the median stage in a MU (out of an agreed upon stagelist) works out to in each MU with the implicit assumption that most non-median stages will have either the same result or shift you just one level and then separately indicate any stages that shift the MU by two or more levels to provide additional information for the handful of extremely stage polarized characters. You do have to be careful to consider that the median stage could be something like Delfino Plaza; just acting like it's Smashville all the time will significantly corrupt your results. Customs are also relevant in that the game is completely different with customs on or off, but if you just assume one or the other, they're not too tough since you can just assume optimum customs chosen by both sides. Since you're allowed to switch characters in-between games, we would naturally be looking at a single game.
If anyone is wondering how I'd make a non-numeric chart readable, I'd rely on mostly color and symbols. Yellow "=" for even, light green "a" for advantage or light red "d" for disadvantage, mid-green + for soft counter (winning side) and mid-red - for soft counter (losing side), dark green "$" for hard counter (winning side) and dark red "X" for hard counter (losing side). If you want a tier list, you don't do a straight average. It's a point based chart for your MUs:
Hard counter (winner): +3
Soft counter (winner): +2
Advantage: +1
Even: +0
Disadvantage: -1
Soft counter (loser): -7
Hard counter (loser): -1000
We could argue with precisely how much negative soft counter (loser) should be; -7 is an arbitrary value chosen based on intuition to be "a lot worse than the symmetric -2 but not so bad as to override the nature of the other MUs this character has". Hard counter (loser) can be any number that is so large that it dominates all other factors; -1000 is chosen merely for convenience but -100 would probably work if you wanted smaller numbers or -1000000000 if you wanted to be silly would have the same result. Note characters who are hard countered have a minor possible manual sort; if a character has hard counters but has the single best MU against any other character that character is put into a side tier we call "counterpick tier" that is not linearly related to the other tiers. If counterpick tier must be related linearly (like in an ordered list of character ranks), it will be above all of the non-viable tiers but below all of the otherwise viable ones (and we'll have to argue where that cut-off happens).
The biggest challenge is that MUs involving generally low tier characters are often poorly understood; there are likely zero relevant data points for the Mii Swordfighter vs Mr. Game & Watch match-up. A practical MU chart has to not assume perfect knowledge and include a purple "?" ranking that basically says "we have no idea on account of this MU not being competitively relevant". You'd treat it like an even MU mathematically, but it would be a big procedural help to be allowed to call super obscure match-ups between two low tiers unknown instead of having to make wild and often severely wrong guesses.
I don't think we can do this now; there's just so much we're still figuring out. I think if we want to make such a chart, right after EVO would be a good time to get to work. We'll have the significant data of summer majors as well as all of us having a few more critical months of exploration time.