hillbillyhick
Smash Cadet
Fancy title isn't it?
Conceptualization: what is a tier list?
I would like to propose a "new" way of creating tier lists, but first what exactly is a tier list? Smashwiki has this to say:
A tier list is a ranking of each character's metagame, based on tournament settings. It indicates how professional smashers expect each character to be able to perform under tournament conditions. Tiers thus measure the potential of each character based on all currently known techniques and strategies that have been shown to be useful in tournaments.
According to this definition the ranking is the result of a professional smasher's expectation of character performance under tournament conditions. To me this means it's a list reflecting the opinion of professional smashers. The current official tier list is certainly an example of this. However, I propose a new kind of tier list, with it is needed a new definition:
A tier list reflects the expected chances of characters winning a matchup in a current tournament setting. It is based on empirical data gained from matchups in tournament settings.
One of the keywords is empirical here, but expected chance is also important. Later in the text I'll give a list of benefits such a tier list might have compared to the current official tier list.
Method: what data is necessary, how do we arrive at this tier list?
(Here, I will be assuming sufficient representative data is gathered)
Matchup data from tournaments is needed, ratios are then calculated from them. Someone on the boards is already doing this:
http://www.smashboards.com/showthread.php?t=238102
This list of empirical matchup ratios would already be extremely handy to any competitive smasher. However it doesn't end there, by using this formula you can get the tier list I speak of:
E(P(Xj)) =∑ i=1 fi / n * P(Xji)
fi is the frequency (number) of matchups of character i in the data, n is the total number of matchups in the data, E(P(Xj)) is the expected winning chance for character j, P(Xji) is the winning chance of character j if against character i.
This is the complete and correct formula, however I'll supply you with a simplified yet incomplete formula and an example:
Suppose we take MK's matchups (and let's suppose there are only four characters in this game, just to keep it simple and remember this is fictional data):
MK matchup ratios
P(W/MK) = 0.5 MK vs MK is always expected chance of 50%
P(W/snake) = 0.6 expected chance of MK win vs snake in tournament setting.
P(W/falco) = 0.4 expected chance of MK win vs falco in tournament setting.
P(W/DDD) = 0.8 expected chance of MK win vs DDD in tourney setting.
Now use this simplified formula (in the case of more characters, you have to add them to the formula)
P(W) = f(MK)/n*P(W/MK) + f(snake)/n*P(W/snake) + f(falco)/n*P(W/falco) + f(DDD)/n*P(W/DDD)
where f(MK) is the total number of matchups with MK and n is the total number of matchups of all characters (do not count double, Mk vs Snake is the same as Snake vs MK). This way you get a relative frequency.
Now let's assume the matchup data gives us these relative frequencies 10% MK, 40% snake, 30% falco and 20% DDD :
P(W) = 0,10*0,5 + 0,40*0,6+ 0,30*0,40 + 0,20*0,80 = 0,57 = 57%
This is the expected chance of MK winning a matchup in a tournament setting. If you do this for every character you'll eventually get my proposed tier list. So basically the only requirement is matchup data.
Something you can do with my tier list that you can't do with others, is calculating your expected chances of winning the next three matchups at a tournament, or even your chance of winning the next tournament (if you know the number of matchups).
For example using the fictional MK data your expected chance of winning three matchups in a row at a tournament is:
0,57*0,57*0,57 = 18,5%
Suppose you know beforehand your matchups are against three DDD players your expected chance will be:
0,8*0,8*0,8 = 51%
Also it follows a binomial distribution, so you can calculate for example your chance of winning three times with MK in 5 matchups (see bottom of text with the technical stuff).
What this tier list can't do or claim.
This tier list can't tell you what character is best, or whether character A is better than character B (for a more elaborate discussion, see farther in the text). It is also not static, just as any other tier list, it is only temporary. If you have reason to believe that you're a lousy tournament player or a very high skilled tournament player, then these percentages will not predict your results well, that is because they are expected values. But seeing as how skill follows a normal distribution (statistical term, maybe better know as a bell curve, most players are in the middle) most tournament players would find that the values are reasonably good at predicting their results. If enough data is available you could even make another tier list, that applies only to the best of the best tournament players. Or the other way around, making a tier list for amateurs.
A methodological analyis of the proposed tier list, ankoku's ranking list and the official tier list.
Official tier list:
The official tier list is opinion based, that makes it completely unreliable. Look at science, introspection and guessing have left the fields a long time ago and have made way for data-analysis and hard cold number crunching. No matter how good the players might be at ssbb their opinion compared to factual numbers is irrelevant. While their opinion is still valuable, it has no place in a real tier list. Data will always reflect the metagame better than opinion.
Ankoku's ranking list:
Ankoku's list is based on tournament rankings, it has an ordinal scale (it's ranked), so mathematical operations are in fact a no-no in most cases.
The math behind the list is as follows:
Base values are
1 for top eight
4 for top four
7 for second
10 for first
Base values are then multiplied by number of entrants and entry fee, then divided by 160. This helps account for larger tournaments being more relevant than smaller ones. If two characters are listed, both gain half the points.
These base values are completely arbitrary, the multiplication by number of entrants and entry fee and then dividing by 160 is also arbitrary. Why not multiply by half the entry fee or double the number of entrants, in fact why multiply at all, do things like large entry fees correlate positively with more reliable results? By just changing the base values a bit (still maintaining the ordinal structure) I could get a different ranking with the exact same data. Ankoku says this:
4. I CONTROL THE WAY YOU PERCEIVE THE METAGAME HAHAHA DANCE PUPPETS
He really isn't kidding here. But even given these flaws, I find this list more reliable than the official tier list, even though ankoku says it's not intended to be one.
The proposed tier list:
I've already discussed my method so I'll concentrate on giving answers to possible questions involving the method that may arise.
Q: Stages are very important variables, because there's no real randomization of this going on at tournaments, how can your tier list be trusted?
A: I don't generalize to non-tournament settings so this is no problem. Here's a simplified example of what I mean:
You're walking down the street and suddenly you find a coin, you have no idea whatsoever of what the chances are the coin will come up heads (I really mean no idea whatsoever). You have some spare time so you decide to flip it 1000 times and look at how many times it comes up heads. The result is 530 (53%), now you have this data. Next day you find another coin on the street, it looks exactly similar, what do you think the outcome would be if you flipped it 100 times? 53 of course.
If I collect data from matchups in tournaments (data from flipping the coin) this information is generalizable to other tournaments (similar coins), but not to other coins (non tournament settings). This is why stages are no problem to my method.
This question may also arise from a misunderstanding of what my tier list reflects, it does not -as you might think- reflect which characters are better (for this randomization of stages is necessary, for a more elaborate and technical discussion, see bottom of text)
Q: Is it not possible that very highly skilled tournament players have a tendency to pick certain characters (like MK, snake, wario, falco), thus skewing your results?
A: This is very similar to the question posed above and it comes down to a misinterpretation of what my tier list reflects, it does not reflect which characters are better.
Advantages and disadvantages of the proposed tier list compared to others.
Advantages:
- not opinion based, but empirical
- not just a ranking, with my list you can effectively say character A has twice as much chance of winning than character B.
- statistical testing is possible
- is a good predictor of tournament results
Disadvantages:
- can not tell you which is the better character (no tier list can do this)
- the list is temporary (as is the case with every tier list)
- most important disadvantage: the collecting of representative data, the entire tier list depends on this data, so if it's biased, then the tier list is biased. (representative data can be had by having all the matchup results of a tournament, so it's no problem if the smash community is willing to help out.)
Important links
Ankoku's character rankings list:
http://www.smashboards.com/showthread.php?t=165954
The official tier list:
http://www.smashboards.com/showthread.php?t=236407
SuSa's ratios:
http://www.smashboards.com/showthread.php?t=238102
(GIVE HIM DATA, I COMMAND YOU)
Rajam's match-up chart and list:
http://www.smashboards.com/showthread.php?t=226315
(This is what the intended tier list would look like, but instead of being opinion-based like Rajam's, it would be empirical)
Technical Stuff
This might be difficult stuff as I'm not going to explain what some terms mean. It's only for those that are truly interested.
How would you go about constructing a tier list that DOES say which character is better. First of all, what is being meant by "better"? Does it mean the character with most wins after a 2 hour training session by professional smash players, 20 hour training session by amateurs, 200 hour training session by randomly chosen people from the population? This is important as it influences the generalizability of the experiment's outcome. Suppose we take the operational definition of "better" to signify the character with most wins after a 200 hour training session by tournament players. We would need an experimental setup, where everything is randomized except for the characters being used. 100 tournament players are randomly assigned to group A and group B, then group A is given falco and group B is given DDD. At assigned times each day, they practice with their character for 2 hours, they do this for 100 days. After this period matches are played between random members from group A and random members from group B. This will give us a ratio of the matchup. Now is the time for statistical testing. binomial testing can be then used to see if there is reason enough to believe the difference in winnings is attributed to the usage of the character instead of chance. But even with this method some criticism can be made:
- There is no double blind or single blind setup (it is in fact impossible)
- The objection could be made that these tournament players are already used to using certain characters, this could significantly influence the results in a number of ways.
- The generalizability of these findings is low
- probably more, but I can't think of anything atm.
I mentioned the binomial distribution earlier in the text. A matchup is in fact a bernoulli trial, following a bernoulli distribution. A ratio is a number of bernoulli trials, so it follows a binomial distribution. This is handy because we can use the binomial distribution to calculate what for example the expected chance is of MK winning 3 out of 5 matchups in a tournament, or what his chances are of winning more than 7 out of 10 matchups.
http://www.stat.yale.edu/Courses/1997-98/101/binom.htm
Conceptualization: what is a tier list?
I would like to propose a "new" way of creating tier lists, but first what exactly is a tier list? Smashwiki has this to say:
A tier list is a ranking of each character's metagame, based on tournament settings. It indicates how professional smashers expect each character to be able to perform under tournament conditions. Tiers thus measure the potential of each character based on all currently known techniques and strategies that have been shown to be useful in tournaments.
According to this definition the ranking is the result of a professional smasher's expectation of character performance under tournament conditions. To me this means it's a list reflecting the opinion of professional smashers. The current official tier list is certainly an example of this. However, I propose a new kind of tier list, with it is needed a new definition:
A tier list reflects the expected chances of characters winning a matchup in a current tournament setting. It is based on empirical data gained from matchups in tournament settings.
One of the keywords is empirical here, but expected chance is also important. Later in the text I'll give a list of benefits such a tier list might have compared to the current official tier list.
Method: what data is necessary, how do we arrive at this tier list?
(Here, I will be assuming sufficient representative data is gathered)
Matchup data from tournaments is needed, ratios are then calculated from them. Someone on the boards is already doing this:
http://www.smashboards.com/showthread.php?t=238102
This list of empirical matchup ratios would already be extremely handy to any competitive smasher. However it doesn't end there, by using this formula you can get the tier list I speak of:
E(P(Xj)) =∑ i=1 fi / n * P(Xji)
fi is the frequency (number) of matchups of character i in the data, n is the total number of matchups in the data, E(P(Xj)) is the expected winning chance for character j, P(Xji) is the winning chance of character j if against character i.
This is the complete and correct formula, however I'll supply you with a simplified yet incomplete formula and an example:
Suppose we take MK's matchups (and let's suppose there are only four characters in this game, just to keep it simple and remember this is fictional data):
MK matchup ratios
P(W/MK) = 0.5 MK vs MK is always expected chance of 50%
P(W/snake) = 0.6 expected chance of MK win vs snake in tournament setting.
P(W/falco) = 0.4 expected chance of MK win vs falco in tournament setting.
P(W/DDD) = 0.8 expected chance of MK win vs DDD in tourney setting.
Now use this simplified formula (in the case of more characters, you have to add them to the formula)
P(W) = f(MK)/n*P(W/MK) + f(snake)/n*P(W/snake) + f(falco)/n*P(W/falco) + f(DDD)/n*P(W/DDD)
where f(MK) is the total number of matchups with MK and n is the total number of matchups of all characters (do not count double, Mk vs Snake is the same as Snake vs MK). This way you get a relative frequency.
Now let's assume the matchup data gives us these relative frequencies 10% MK, 40% snake, 30% falco and 20% DDD :
P(W) = 0,10*0,5 + 0,40*0,6+ 0,30*0,40 + 0,20*0,80 = 0,57 = 57%
This is the expected chance of MK winning a matchup in a tournament setting. If you do this for every character you'll eventually get my proposed tier list. So basically the only requirement is matchup data.
Something you can do with my tier list that you can't do with others, is calculating your expected chances of winning the next three matchups at a tournament, or even your chance of winning the next tournament (if you know the number of matchups).
For example using the fictional MK data your expected chance of winning three matchups in a row at a tournament is:
0,57*0,57*0,57 = 18,5%
Suppose you know beforehand your matchups are against three DDD players your expected chance will be:
0,8*0,8*0,8 = 51%
Also it follows a binomial distribution, so you can calculate for example your chance of winning three times with MK in 5 matchups (see bottom of text with the technical stuff).
What this tier list can't do or claim.
This tier list can't tell you what character is best, or whether character A is better than character B (for a more elaborate discussion, see farther in the text). It is also not static, just as any other tier list, it is only temporary. If you have reason to believe that you're a lousy tournament player or a very high skilled tournament player, then these percentages will not predict your results well, that is because they are expected values. But seeing as how skill follows a normal distribution (statistical term, maybe better know as a bell curve, most players are in the middle) most tournament players would find that the values are reasonably good at predicting their results. If enough data is available you could even make another tier list, that applies only to the best of the best tournament players. Or the other way around, making a tier list for amateurs.
A methodological analyis of the proposed tier list, ankoku's ranking list and the official tier list.
Official tier list:
The official tier list is opinion based, that makes it completely unreliable. Look at science, introspection and guessing have left the fields a long time ago and have made way for data-analysis and hard cold number crunching. No matter how good the players might be at ssbb their opinion compared to factual numbers is irrelevant. While their opinion is still valuable, it has no place in a real tier list. Data will always reflect the metagame better than opinion.
Ankoku's ranking list:
Ankoku's list is based on tournament rankings, it has an ordinal scale (it's ranked), so mathematical operations are in fact a no-no in most cases.
The math behind the list is as follows:
Base values are
1 for top eight
4 for top four
7 for second
10 for first
Base values are then multiplied by number of entrants and entry fee, then divided by 160. This helps account for larger tournaments being more relevant than smaller ones. If two characters are listed, both gain half the points.
These base values are completely arbitrary, the multiplication by number of entrants and entry fee and then dividing by 160 is also arbitrary. Why not multiply by half the entry fee or double the number of entrants, in fact why multiply at all, do things like large entry fees correlate positively with more reliable results? By just changing the base values a bit (still maintaining the ordinal structure) I could get a different ranking with the exact same data. Ankoku says this:
4. I CONTROL THE WAY YOU PERCEIVE THE METAGAME HAHAHA DANCE PUPPETS
He really isn't kidding here. But even given these flaws, I find this list more reliable than the official tier list, even though ankoku says it's not intended to be one.
The proposed tier list:
I've already discussed my method so I'll concentrate on giving answers to possible questions involving the method that may arise.
Q: Stages are very important variables, because there's no real randomization of this going on at tournaments, how can your tier list be trusted?
A: I don't generalize to non-tournament settings so this is no problem. Here's a simplified example of what I mean:
You're walking down the street and suddenly you find a coin, you have no idea whatsoever of what the chances are the coin will come up heads (I really mean no idea whatsoever). You have some spare time so you decide to flip it 1000 times and look at how many times it comes up heads. The result is 530 (53%), now you have this data. Next day you find another coin on the street, it looks exactly similar, what do you think the outcome would be if you flipped it 100 times? 53 of course.
If I collect data from matchups in tournaments (data from flipping the coin) this information is generalizable to other tournaments (similar coins), but not to other coins (non tournament settings). This is why stages are no problem to my method.
This question may also arise from a misunderstanding of what my tier list reflects, it does not -as you might think- reflect which characters are better (for this randomization of stages is necessary, for a more elaborate and technical discussion, see bottom of text)
Q: Is it not possible that very highly skilled tournament players have a tendency to pick certain characters (like MK, snake, wario, falco), thus skewing your results?
A: This is very similar to the question posed above and it comes down to a misinterpretation of what my tier list reflects, it does not reflect which characters are better.
Advantages and disadvantages of the proposed tier list compared to others.
Advantages:
- not opinion based, but empirical
- not just a ranking, with my list you can effectively say character A has twice as much chance of winning than character B.
- statistical testing is possible
- is a good predictor of tournament results
Disadvantages:
- can not tell you which is the better character (no tier list can do this)
- the list is temporary (as is the case with every tier list)
- most important disadvantage: the collecting of representative data, the entire tier list depends on this data, so if it's biased, then the tier list is biased. (representative data can be had by having all the matchup results of a tournament, so it's no problem if the smash community is willing to help out.)
Important links
Ankoku's character rankings list:
http://www.smashboards.com/showthread.php?t=165954
The official tier list:
http://www.smashboards.com/showthread.php?t=236407
SuSa's ratios:
http://www.smashboards.com/showthread.php?t=238102
(GIVE HIM DATA, I COMMAND YOU)
Rajam's match-up chart and list:
http://www.smashboards.com/showthread.php?t=226315
(This is what the intended tier list would look like, but instead of being opinion-based like Rajam's, it would be empirical)
Technical Stuff
This might be difficult stuff as I'm not going to explain what some terms mean. It's only for those that are truly interested.
How would you go about constructing a tier list that DOES say which character is better. First of all, what is being meant by "better"? Does it mean the character with most wins after a 2 hour training session by professional smash players, 20 hour training session by amateurs, 200 hour training session by randomly chosen people from the population? This is important as it influences the generalizability of the experiment's outcome. Suppose we take the operational definition of "better" to signify the character with most wins after a 200 hour training session by tournament players. We would need an experimental setup, where everything is randomized except for the characters being used. 100 tournament players are randomly assigned to group A and group B, then group A is given falco and group B is given DDD. At assigned times each day, they practice with their character for 2 hours, they do this for 100 days. After this period matches are played between random members from group A and random members from group B. This will give us a ratio of the matchup. Now is the time for statistical testing. binomial testing can be then used to see if there is reason enough to believe the difference in winnings is attributed to the usage of the character instead of chance. But even with this method some criticism can be made:
- There is no double blind or single blind setup (it is in fact impossible)
- The objection could be made that these tournament players are already used to using certain characters, this could significantly influence the results in a number of ways.
- The generalizability of these findings is low
- probably more, but I can't think of anything atm.
I mentioned the binomial distribution earlier in the text. A matchup is in fact a bernoulli trial, following a bernoulli distribution. A ratio is a number of bernoulli trials, so it follows a binomial distribution. This is handy because we can use the binomial distribution to calculate what for example the expected chance is of MK winning 3 out of 5 matchups in a tournament, or what his chances are of winning more than 7 out of 10 matchups.
http://www.stat.yale.edu/Courses/1997-98/101/binom.htm