Towards a conceptually and methodologically sound way of constructing tiers.

hillbillyhick · Aug 15, 2009

Fancy title isn't it?

Conceptualization: what is a tier list?

I would like to propose a "new" way of creating tier lists, but first what exactly is a tier list? Smashwiki has this to say:

A tier list is a ranking of each character's metagame, based on tournament settings. It indicates how professional smashers expect each character to be able to perform under tournament conditions. Tiers thus measure the potential of each character based on all currently known techniques and strategies that have been shown to be useful in tournaments.

According to this definition the ranking is the result of a professional smasher's expectation of character performance under tournament conditions. To me this means it's a list reflecting the opinion of professional smashers. The current official tier list is certainly an example of this. However, I propose a new kind of tier list, with it is needed a new definition:

A tier list reflects the expected chances of characters winning a matchup in a current tournament setting. It is based on empirical data gained from matchups in tournament settings.

One of the keywords is empirical here, but expected chance is also important. Later in the text I'll give a list of benefits such a tier list might have compared to the current official tier list.

Method: what data is necessary, how do we arrive at this tier list?

(Here, I will be assuming sufficient representative data is gathered)
Matchup data from tournaments is needed, ratios are then calculated from them. Someone on the boards is already doing this:
http://www.smashboards.com/showthread.php?t=238102
This list of empirical matchup ratios would already be extremely handy to any competitive smasher. However it doesn't end there, by using this formula you can get the tier list I speak of:

E(P(Xj)) =∑ i=1 fi / n * P(Xji)

fi is the frequency (number) of matchups of character i in the data, n is the total number of matchups in the data, E(P(Xj)) is the expected winning chance for character j, P(Xji) is the winning chance of character j if against character i.

This is the complete and correct formula, however I'll supply you with a simplified yet incomplete formula and an example:

Suppose we take MK's matchups (and let's suppose there are only four characters in this game, just to keep it simple and remember this is fictional data):

MK matchup ratios
P(W/MK) = 0.5 MK vs MK is always expected chance of 50%
P(W/snake) = 0.6 expected chance of MK win vs snake in tournament setting.
P(W/falco) = 0.4 expected chance of MK win vs falco in tournament setting.
P(W/DDD) = 0.8 expected chance of MK win vs DDD in tourney setting.

Now use this simplified formula (in the case of more characters, you have to add them to the formula)

P(W) = f(MK)/n*P(W/MK) + f(snake)/n*P(W/snake) + f(falco)/n*P(W/falco) + f(DDD)/n*P(W/DDD)

where f(MK) is the total number of matchups with MK and n is the total number of matchups of all characters (do not count double, Mk vs Snake is the same as Snake vs MK). This way you get a relative frequency.

Now let's assume the matchup data gives us these relative frequencies 10% MK, 40% snake, 30% falco and 20% DDD :

P(W) = 0,10*0,5 + 0,40*0,6+ 0,30*0,40 + 0,20*0,80 = 0,57 = 57%

This is the expected chance of MK winning a matchup in a tournament setting. If you do this for every character you'll eventually get my proposed tier list. So basically the only requirement is matchup data.

Something you can do with my tier list that you can't do with others, is calculating your expected chances of winning the next three matchups at a tournament, or even your chance of winning the next tournament (if you know the number of matchups).
For example using the fictional MK data your expected chance of winning three matchups in a row at a tournament is:
0,57*0,57*0,57 = 18,5%
Suppose you know beforehand your matchups are against three DDD players your expected chance will be:
0,8*0,8*0,8 = 51%
Also it follows a binomial distribution, so you can calculate for example your chance of winning three times with MK in 5 matchups (see bottom of text with the technical stuff).

What this tier list can't do or claim.

This tier list can't tell you what character is best, or whether character A is better than character B (for a more elaborate discussion, see farther in the text). It is also not static, just as any other tier list, it is only temporary. If you have reason to believe that you're a lousy tournament player or a very high skilled tournament player, then these percentages will not predict your results well, that is because they are expected values. But seeing as how skill follows a normal distribution (statistical term, maybe better know as a bell curve, most players are in the middle) most tournament players would find that the values are reasonably good at predicting their results. If enough data is available you could even make another tier list, that applies only to the best of the best tournament players. Or the other way around, making a tier list for amateurs.

A methodological analyis of the proposed tier list, ankoku's ranking list and the official tier list.

Official tier list:
The official tier list is opinion based, that makes it completely unreliable. Look at science, introspection and guessing have left the fields a long time ago and have made way for data-analysis and hard cold number crunching. No matter how good the players might be at ssbb their opinion compared to factual numbers is irrelevant. While their opinion is still valuable, it has no place in a real tier list. Data will always reflect the metagame better than opinion.

Ankoku's ranking list:
Ankoku's list is based on tournament rankings, it has an ordinal scale (it's ranked), so mathematical operations are in fact a no-no in most cases.
The math behind the list is as follows:
Base values are
1 for top eight
4 for top four
7 for second
10 for first
Base values are then multiplied by number of entrants and entry fee, then divided by 160. This helps account for larger tournaments being more relevant than smaller ones. If two characters are listed, both gain half the points.

These base values are completely arbitrary, the multiplication by number of entrants and entry fee and then dividing by 160 is also arbitrary. Why not multiply by half the entry fee or double the number of entrants, in fact why multiply at all, do things like large entry fees correlate positively with more reliable results? By just changing the base values a bit (still maintaining the ordinal structure) I could get a different ranking with the exact same data. Ankoku says this:
4. I CONTROL THE WAY YOU PERCEIVE THE METAGAME HAHAHA DANCE PUPPETS
He really isn't kidding here. But even given these flaws, I find this list more reliable than the official tier list, even though ankoku says it's not intended to be one.

The proposed tier list:
I've already discussed my method so I'll concentrate on giving answers to possible questions involving the method that may arise.

Q: Stages are very important variables, because there's no real randomization of this going on at tournaments, how can your tier list be trusted?
A: I don't generalize to non-tournament settings so this is no problem. Here's a simplified example of what I mean:
You're walking down the street and suddenly you find a coin, you have no idea whatsoever of what the chances are the coin will come up heads (I really mean no idea whatsoever). You have some spare time so you decide to flip it 1000 times and look at how many times it comes up heads. The result is 530 (53%), now you have this data. Next day you find another coin on the street, it looks exactly similar, what do you think the outcome would be if you flipped it 100 times? 53 of course.
If I collect data from matchups in tournaments (data from flipping the coin) this information is generalizable to other tournaments (similar coins), but not to other coins (non tournament settings). This is why stages are no problem to my method.
This question may also arise from a misunderstanding of what my tier list reflects, it does not -as you might think- reflect which characters are better (for this randomization of stages is necessary, for a more elaborate and technical discussion, see bottom of text)

Q: Is it not possible that very highly skilled tournament players have a tendency to pick certain characters (like MK, snake, wario, falco), thus skewing your results?
A: This is very similar to the question posed above and it comes down to a misinterpretation of what my tier list reflects, it does not reflect which characters are better.

Advantages and disadvantages of the proposed tier list compared to others.

Advantages:
- not opinion based, but empirical
- not just a ranking, with my list you can effectively say character A has twice as much chance of winning than character B.
- statistical testing is possible
- is a good predictor of tournament results

Disadvantages:
- can not tell you which is the better character (no tier list can do this)
- the list is temporary (as is the case with every tier list)
- most important disadvantage: the collecting of representative data, the entire tier list depends on this data, so if it's biased, then the tier list is biased. (representative data can be had by having all the matchup results of a tournament, so it's no problem if the smash community is willing to help out.)

Important links

Ankoku's character rankings list:
http://www.smashboards.com/showthread.php?t=165954
The official tier list:
http://www.smashboards.com/showthread.php?t=236407
SuSa's ratios:
http://www.smashboards.com/showthread.php?t=238102
(GIVE HIM DATA, I COMMAND YOU)
Rajam's match-up chart and list:
http://www.smashboards.com/showthread.php?t=226315
(This is what the intended tier list would look like, but instead of being opinion-based like Rajam's, it would be empirical)

Technical Stuff

This might be difficult stuff as I'm not going to explain what some terms mean. It's only for those that are truly interested.

How would you go about constructing a tier list that DOES say which character is better. First of all, what is being meant by "better"? Does it mean the character with most wins after a 2 hour training session by professional smash players, 20 hour training session by amateurs, 200 hour training session by randomly chosen people from the population? This is important as it influences the generalizability of the experiment's outcome. Suppose we take the operational definition of "better" to signify the character with most wins after a 200 hour training session by tournament players. We would need an experimental setup, where everything is randomized except for the characters being used. 100 tournament players are randomly assigned to group A and group B, then group A is given falco and group B is given DDD. At assigned times each day, they practice with their character for 2 hours, they do this for 100 days. After this period matches are played between random members from group A and random members from group B. This will give us a ratio of the matchup. Now is the time for statistical testing. binomial testing can be then used to see if there is reason enough to believe the difference in winnings is attributed to the usage of the character instead of chance. But even with this method some criticism can be made:
- There is no double blind or single blind setup (it is in fact impossible)
- The objection could be made that these tournament players are already used to using certain characters, this could significantly influence the results in a number of ways.
- The generalizability of these findings is low
- probably more, but I can't think of anything atm.

I mentioned the binomial distribution earlier in the text. A matchup is in fact a bernoulli trial, following a bernoulli distribution. A ratio is a number of bernoulli trials, so it follows a binomial distribution. This is handy because we can use the binomial distribution to calculate what for example the expected chance is of MK winning 3 out of 5 matchups in a tournament, or what his chances are of winning more than 7 out of 10 matchups.

http://www.stat.yale.edu/Courses/1997-98/101/binom.htm

Red-Blue · Aug 15, 2009

Or decided not to waste time and simply call Metaknight the winrar

would read again

TP · Aug 15, 2009

There is one major problem with this idea:

A tier list is supposed to represent which characters are best and worst in what order AT THE TOP OF THE METAGAME. Almost all tournament matches don't include players who are playing at the top of the metagame. I've heard before that "Diddy has the advantage over MK at mid level play, it's only at higher levels that MK has the advantage." See the problem? Since most players are mid level, the data would appear to say Diddy has the advantage. This is why Ankoku only worries about the top 8 of a tournament. That way (theoretically) only the top players get considered.

choknater · Aug 15, 2009

What's wrong with current tier lists?
There is a lot wrong with current construction of tier lists so I won't bother spending too much time on it as I have a far better solution.

So vague...

Lost credibility...

hillbillyhick · Aug 15, 2009

Well this can be easily remedied by only using data from these players, ankoku seems to use this data so the data is easily available. If you really want to know everything you could categorize players according to how good they are and make a list for each group. It ultimately depends on the data, so there's really no problem. Pick the data from the best players and presto: a tier list for those that play at the top of the metagame.

hillbillyhick · Aug 15, 2009

choknater said:
So vague...

Lost credibility...

I give a few very important reasons, how have I lost my credibility, have you seen the length of the post? I'm not going to write pages of criticism you know. Oh another criticism i've probably not mentioned is that ankoku's list has no correction for character usage, basically making it a popularity contest.

Gr1mmy · Aug 15, 2009

Character popularity doesn't have that much of a weight in tiers.

hillbillyhick · Aug 15, 2009

Gr1mmy said:
Character popularity doesn't have that much of a weight in tiers.

It shouldn't, but it's incredibly influential in ankoku's ranking list, try averaging his results and you'll get a whole different list, not better per se but different.

RDK · Aug 15, 2009

The OP doesn't realize that any data based off of character win percentages in tournament results is going to be skewed, which is why the matchup system being empirical is so important.

You say you want to make a conceptually sound system of tier-making, but all you're doing is changing how you're implementing the tournament data.

hillbillyhick · Aug 15, 2009

RDK said:
The OP doesn't realize that any data based off of character win percentages in tournament results is going to be skewed, which is why the matchup system being empirical is so important.

You say you want to make a conceptually sound system of tier-making, but all you're doing is changing how you're implementing the tournament data.

Well that's actually a methodological change, the conceptual change lies in the fact that what it measures is different, it is a comparison of characters, not just an observation of tournament results.

And in what way would it be skewed? And yes it being empirical is very important, most important in fact. I'm also not implementing tournament data (if you mean rankings) but matchups in a tournament setting, btw changing what data you implement and how you implement it is a big change imo, it changes everything.

Jane · Aug 15, 2009

hillbillyhick, your idea is beautiful. i want to see your method happen. really badly.

do you plan to start it or make it happen yourself? if so please use me. i want to help you in any and every way possible.

hillbillyhick · Aug 15, 2009

Jane said:
hillbillyhick, your idea is beautiful. i want to see your method happen. really badly.

do you plan to start it or make it happen yourself? if so please use me. i want to help you in any and every way possible.

Thank you, unfortunately no, I don't have the time for it. SaSu is collecting the necessary data for it, after that somebody just needs to process and update it. But it's probably more work than I make it seem, if all the effort put into making ankoku's character rankings list were directed at making this happen, there would be no problem. But convincing an entire community that this is a better idea is hard.

Jane · Aug 15, 2009

i see. well i will see what i can do. because i have to see this happen.

DerpDaBerp · Aug 15, 2009

Right, uh, didn't read the whole thing, probably a lot of whining.

The only thing I would have them change about the tiers is putting more detail into the particular placement of all characters rather than just higher tier ones.

Don't you think the entire backroom would have percieved the problems you present and did things in the best way they could a looong time ago?

hillbillyhick · Aug 15, 2009

DerpDaBerp said:
Right, uh, didn't read the whole thing, probably a lot of whining.

The only thing I would have them change about the tiers is putting more detail into the particular placement of all characters rather than just higher tier ones.

Don't you think the entire backroom would have percieved the problems you present and did things in the best way they could a looong time ago?

Apparently not and your argument is kinda lame. Is the backroom filled with infallible mathematicians who perceive every flaw in tier construction?

Kewkky · Aug 15, 2009

Hillbillyhick, Ankoku's list isn't a tier list. It's an interpretation of tourney results. Even if the values would be:

First place = 1
Top two = 2
Top four = 3
Top eight = 4

The results would still be the same. The list isn't a popularity contest, it's not a tier list, it's not biased, it's not chosen by a person behind a cortain of mystery... It's literally all of the tournament rankings smacked together to see who's been placing the best out of all the characters, and where all the other characters are placing. You can't argue against that "tier" list because it's all about tournament results, and in order to change it, the wineers of the tourneys would have to use other different characters.

And the REAL tier list... It's based on a voting system. The SBR (who are an elite group of gamers for varying reasons) each creates their own tier list depending on how they believe it should be (practically), they vote according to their tierlists where the characters should be placed, they are tallied, and the public list is then created (which is the result of the voting). Now, the voting system varies, of course, but in the end the results end up the same: a middle point between all of the votes the SBR members have done. There's no real way of saying their tierlist is not true, they are also using tournament placings, popularity (to an extent), character metagame development, matchups and weaknesses to spread the characters across the list. In the end, it's just an average of all the people's tier lists summed together.

I don't see the point of creating a better tierlist. People won't change their characters, it's usually just for laughs and giggles. If people depend on the lists to know who's the best character, then it's failry obvious MK is the best, and Snake is second best. The rest are just averages, raised and lowered every now and then depending on how they're doing in the competitive scene.

Spelt · Aug 15, 2009

i doubt this would change the tier list much no matter what.
mk would still be top.

Red Arremer · Aug 15, 2009

Matchups are the usual way to determine tiers in every other fighting game other than Smash. The reason for this is because Smash has stages that can skew matchups into a certain direction unlike other fighting games that have not such features as stages.

hillbillyhick · Aug 15, 2009

Kewkky said:
Hillbillyhick, Ankoku's list isn't a tier list. It's an interpretation of tourney results. Even if the values would be:

First place = 1
Top two = 2
Top four = 3
Top eight = 4

The results would still be the same. The list isn't a popularity contest, it's not a tier list, it's not biased, it's not chosen by a person behind a cortain of mystery... It's literally all of the tournament rankings smacked together to see who's been placing the best out of all the characters, and where all the other characters are placing. You can't argue against that "tier" list because it's all about tournament results, and in order to change it, the wineers of the tourneys would have to use other different characters.

And the REAL tier list... It's based on a voting system. The SBR (who are an elite group of gamers for varying reasons) each creates their own tier list depending on how they believe it should be (practically), they vote according to their tierlists where the characters should be placed, they are tallied, and the public list is then created (which is the result of the voting). Now, the voting system varies, of course, but in the end the results end up the same: a middle point between all of the votes the SBR members have done. There's no real way of saying their tierlist is not true, they are also using tournament placings, popularity (to an extent), character metagame development, matchups and weaknesses to spread the characters across the list. In the end, it's just an average of all the people's tier lists summed together.

I don't see the point of creating a better tierlist. People won't change their characters, it's usually just for laughs and giggles. If people depend on the lists to know who's the best character, then it's failry obvious MK is the best, and Snake is second best. The rest are just averages, raised and lowered every now and then depending on how they're doing in the competitive scene.

I was under the impression ankoku's list was used by sbr as a kind of guideline, if I'm wrong sorry. Your defense of ankoku's list is wrong though. Changing these arbitrary values does make a difference, that's because it's an ordinal scale of data (ranking) , any statistician would tell you this. Consider this exaggerated example:
first place: 500
top 2: 10
top 4: 8
top 8: 1
Granted it's a bit extreme but it's just as arbitrary as ankoku's or your values and you can't deny that this will cause a change in the list. That's just how an ordinal scale functions. It is also most definitely a popularity contest, look at the number of metaknights in top 8 and the number of any other character, is metaknight truly that good, or is some of it attributable to more MK's being used at tournaments. This popularity effect is then enhanced by multiplying it with an arbitrary value, resulting in extreme scores.

If the real tier list is based on pure voting, then I'm even more disappointed, nothing beats real numbers. Instead of gathering data, you're in fact guessing the data.

Some people might like a better tierlist and why wouldn't they. If it can be done better why not?

hillbillyhick · Aug 15, 2009

Spadefox said:
Matchups are the usual way to determine tiers in every other fighting game other than Smash. The reason for this is because Smash has stages that can skew matchups into a certain direction unlike other fighting games that have not such features as stages.

If stages are somewhat randomized then it makes no difference, no skewing takes place. I'm also happy to hear that in other fighting games they use a better method, or is smash right and all the others wrong? My guess is no.

Red Arremer · Aug 15, 2009

hillbillyhick said:
If stages are somewhat randomized then it makes no difference, no skewing takes place.

You... don't understand what I'm saying.

Stages are a huge factor in Smash because they can turn even matchups into advantaged or disadvantaged matchups.
This is the reason why Smash can't rely solely on matchups in order to define a character's ranking.

The stages the tournament matches have taken on are not taken into account. Furthermore, we'd have a tier list looking like that (just as an example):
Meta Knight on Battlefield
Snake on Halberd
Diddy on Final Destination
Snake on Battlefield
Meta Knight on Smashville

etc.

Edit: Also, please don't doublepost, that's against the rules.

hillbillyhick · Aug 15, 2009

Spadefox said:
You... don't understand what I'm saying.

Stages are a huge factor in Smash because they can turn even matchups into advantaged or disadvantaged matchups.
This is the reason why Smash can't rely solely on matchups in order to define a character's ranking.

The stages the tournament matches have taken on are not taken into account. Furthermore, we'd have a tier list looking like that (just as an example):
Meta Knight on Battlefield
Snake on Halberd
Diddy on Final Destination
Snake on Battlefield
Meta Knight on Smashville

etc.

Edit: Also, please don't doublepost, that's against the rules.

Sorry, I'll use the edit button more, won't happen again.
I feel you don't understand me, I'm perfectly aware that stages are a big factor.
Here, suppose we have 15 stages, some where MK has a benefit, some where snake has a benefit and others with no real benefit to either. If 15 matches are played, each on a different stage, then it makes no difference. Ofc you can't force people to change stages for every match in order to gather data, but it's reasonable to assume different stages will be picked and counterpicked enough to ensure a quasi randomization. Stages are necessary to ssbb no stage is no fight, thus if a character has a clear advantage on most stages it just means that the character has bigger chances of winning and this SHOULD be reflected in a tier list.

Kewkky · Aug 15, 2009

Well true that huge numbers like those would make the rankings list change if they changed the values, but the end result will still be the same. If MK's are all winning tourneys and Snakes are being second, while Kirby is like 12 (or 13?) because of his total wins+other placings, then changing the numbered values wouldn't make MK's suddenly be 2nd best to Snake, MK is still winning more tourneys than Snake and placing better than him.

Plus, the rankings list also has a value for total entrants, which I'm sure changes the values for 1st, top2, top 4 and top 8 in a very dramatic way. If there's a Ness player winning a lot of 10-man tournaments and a Sonic winning the same amount of 20-man tourneys, then Sonic will be ranked as a better character because of how many rounds he had to get by in order to reach 1st place while Ness needed a fewer number of matches. So the base numbers are volatile, in the sense that they are always changing due to data that accompanies new information submitted into the rankings list.

Even if the data would be 500 for first and 10 for 2nd, one character could end up higher than another who's won the same amount of tourneys, because of how many people were in the tourney, and the amount of rounds they had to get through in order to reach 1st place. Plus, thinking outside the numbers, the Grand Finals of a tournament are generally more difficult than the rest of the matches in the tourney because it's the 2 players that were skilled enough to get past everyone else in their way, so it's only fair to give them a larger number.

hillbillyhick · Aug 15, 2009

Kewkky said:
Well true that huge numbers like those would make the rankings list change if they changed the values, but the end result will still be the same. If MK's are all winning tourneys and Snakes are being second, while Kirby is like 12 (or 13?) because of his total wins+other placings, then changing the numbered values wouldn't make MK's suddenly be 2nd best to Snake, MK is still winning more tourneys than Snake and placing better than him.

Plus, the rankings list also has a value for total entrants, which I'm sure changes the values for 1st, top2, top 4 and top 8 in a very dramatic way. If there's a Ness player winning a lot of 10-man tournaments and a Sonic winning the same amount of 20-man tourneys, then Sonic will be ranked as a better character because of how many rounds he had to get by in order to reach 1st place while Ness needed a fewer number of matches. So the base numbers are volatile, in the sense that they are always changing due to data that accompanies new information submitted into the rankings list.

Even if the data would be 500 for first and 10 for 2nd, one character could end up higher than another who's won the same amount of tourneys, because of how many people were in the tourney, and the amount of rounds they had to get through in order to reach 1st place. Plus, thinking outside the numbers, the Grand Finals of a tournament are generally more difficult than the rest of the matches in the tourney because it's the 2 players that were skilled enough to get past everyone else in their way, so it's only fair to give them a larger number.

Granted MK will still be on top, but real changes would occur at high, mid and low tier. A sonic winning a 20 man tournament isn't necessarily better than a sonic winning a 10 man tournament, that's just a wild assumption with no real rationale or proof behind it. They're unnecessary assumptions that aren't needed in the method I'm proposing. You seem to be forgetting that in your example with ness, not only he but also the other characters had to go through a fewer number of matches, making it perfectly fair. You're really defending it well but you have to see that you need too many assumptions to justify all the arbitrary multiplying going on, wouldn't it be easier without these assumptions.

Kewkky · Aug 15, 2009

hillbillyhick said:
Granted MK will still be on top, but real changes would occur at high, mid and low tier. A sonic winning a 20 man tournament isn't necessarily better than a sonic winning a 10 man tournament, that's just a wild assumption with no real rationale or proof behind it. They're unnecessary assumptions that aren't needed in the method I'm proposing. You seem to be forgetting that in your example with ness, not only he but also the other characters had to go through a fewer number of matches, making it perfectly fair. You're really defending it well but you have to see that you need too many assumptions to justify all the arbitrary multiplying going on, wouldn't it be easier without these assumptions.

**The thing is that, winning a 10-man tourney requires more matches than winning a 20-man tournament. So, the 20-man tourney winner, for the extra effort he had to do in order to reach 1st place compared to the 10-man winner, deserves more points due to more matches in the tourney.

When it comes to Ankoku's tournament placings list, player skill isn't considered, only placings. It's a list that literally uses nothing but results, it won't be skewed by people wanting DDD to be higher because of his chaingrab, or Falco because of his lasers, etc.

Can you point out what assumptions I'm doing?

hillbillyhick · Aug 15, 2009

Kewkky said:
**The thing is that, winning a 10-man tourney requires more matches than winning a 20-man tournament. So, the 20-man tourney winner, for the extra effort he had to do in order to reach 1st place compared to the 10-man winner, deserves more points due to more matches in the tourney.

When it comes to Ankoku's tournament placings list, player skill isn't considered, only placings. It's a list that literally uses nothing but results, it won't be skewed by people wanting DDD to be higher because of his chaingrab, or Falco because of his lasers, etc.

Can you point out what assumptions I'm doing?

You're assuming a tournament with more contestants has players with more skills, thereby making them more relevant, (this in contradiction with you saying player skill isn't considered) otherwise there's no reason to multiply by number of entrants, because the reliability of the results and the expected winning characters remain the same. Also why multiply by number of entrants, why not by half the number of entrants? Just how much more relevant are big tournament results, if they're even more relevant to begin with?

Ankoku (don't know about you) also assumes that entry fees are also a measure of relevance of the data, exactly how so? The bottom line is, ankoku is working with an ordinal scale and does all kinds of mathematical operations on them with arbitrary numbers which DO have an effect on the list. The method I'm proposing doesn't work on an ordinal scale so operations do make sense and it too would use nothing but results. But I won't lie there's a blatant weakness with my proposition: you have to trust people to give unbiased data. If someone gives data of nothing but falco wins and ignores all losses, that's going to be a problem. However that's certainly not enough of a reason to choose ankoku's way in favor of it.

Kewkky · Aug 15, 2009

hillbillyhick said:
You're assuming a tournament with more contestants has players with more skills, thereby making them more relevant, (this in contradiction with you saying player skill isn't considered) otherwise there's no reason to multiply by number of entrants. Because the reliability of the results and the expected winning characters remain the same. Also why multiply by number of entrants, why not by half the number of entrants? Just how much more relevant are big tournament results, if they're more relevant to begin with. Ankoku (don't know about you) also assumes that entry fees are also a measure of relevance of the data, exactly how so? The bottom line is, ankoku is working with an ordinal scale and does all kinds of mathematical operations on them with arbitrary numbers which DO have an effect on the list. The method I'm proposing doesn't work on an ordinal scale so operations do make sense and it too would use nothing but results. But I won't lie there's a blatant weakness with my proposition: you have to trust people to give unbiased data. If someone gives data of nothing but falco wins and ignores all losses, that's going to be a problem. However that's certainly not enough of a reason to choose ankoku's way in favor of it.

I guess it's my mistake then, for that assumption. I probably wrote it meaning something else, and came out wrong.

In differing tourneys, there are people of differing skills. Size of tournaments =/= higher skill level of players... But it DOES mean that the better players will be pitted against each other if they participate! The reason why Ankoku's list is different than the SBR Tier list is because some characters win small tourneys consistently, but when pitted against more skilled players in bigger tourneys, they fall before they make it into the top 8, meaning that they only win smaller tourneys. They might be winning the small tourneys because either the skill level of other participants is lower (I can win tourneys if my competition couldn't handle me), there's not a dangerous matchup in the brackets (like IC vs Ganondorf), a mix of both (like ROB vs MK, where a skilled ROB can beat a less-skilled MK, but an equally-skilled MK will more likely beat out the ROB), or any other unknown factor. In bigger tourneys, they're facing these threats more than in smaller tourneys, too. A ROB will be going against a lot more MKs in a 100-man tourney compared to a 50-man tourney's attendance.

And if unbiased info is a problem, then it's gonna be a very tough problem. One cannot assure that their info is true unless they bring up actual proof that its valid, and if it depends on people giving info with no bias, well... We can't trust the whole community to abide by something like that. Lots of people argue with the tier list because their mains aren't higher, that they think they should be higher, that's biased because they haven't played against other people who are higher than them on the lists or shown that their characters can bypass the invisible limits between each tier... Like being able to contend with the top tiers.

EvolveOrDie · Aug 15, 2009

Hmm I think SpadeFox may have hit the nail on the head.

IMO an empirical sound match-up results chart/list/doohickey would lead to the best display of raw character ability but that would not be a tierlist as SWF defines it and it would require legit and reliable match-up numbers/ratio/percentages, something we are currently struggling to produce. Lastly stages throw in the kind of wriggle room that would force a separate chart/list/doohickey recording each characters (un)favorable usage of each stage against all other characters. Through some dark wizardry and sacrifice it may be possible to evaluate those two separate forms of results and produce a quasi-tierlist which would have a sound basis.

Or am I rambling again?

hillbillyhick · Aug 15, 2009

Kewkky said:
I guess it's my mistake then, for that assumption. I probably wrote it meaning something else, and came out wrong.

In differing tourneys, there are people of differing skills. Size of tournaments =/= higher skill level of players... But it DOES mean that the better players will be pitted against each other if they participate! The reason why Ankoku's list is different than the SBR Tier list is because some characters win small tourneys consistently, but when pitted against more skilled players in bigger tourneys, they fall before they make it into the top 8, meaning that they only win smaller tourneys. They might be winning the small tourneys because either the skill level of other participants is lower (I can win tourneys if my competition couldn't handle me), there's not a dangerous matchup in the brackets (like IC vs Ganondorf), a mix of both (like ROB vs MK, where a skilled ROB can beat a less-skilled MK, but an equally-skilled MK will more likely beat out the ROB), or any other unknown factor. In bigger tourneys, they're facing these threats more than in smaller tourneys, too. A ROB will be going against a lot more MKs in a 100-man tourney compared to a 50-man tourney's attendance.

And if unbiased info is a problem, then it's gonna be a very tough problem. One cannot assure that their info is true unless they bring up actual proof that its valid, and if it depends on people giving info with no bias, well... We can't trust the whole community to abide by something like that. Lots of people argue with the tier list because their mains aren't higher, that they think they should be higher, that's biased because they haven't played against other people who are higher than them on the lists or shown that their characters can bypass the invisible limits between each tier... Like being able to contend with the top tiers.

The only way to really conclude tournament size has any correlation with player skills is by researching it, but I find it plausible that there is a positive relation, I just had issues with the arbitrary multiplying with the number of entrants (why not half, or double, it's just unfounded).

The biased info will be a problem, but not an insurmountable one. I think it will only be a problem when the number of matchups is low, as soon as data accumulates we'll probably/hopefully see a regression to the mean. Maybe there's a way to have unbiased info, I don't know if ankoku's tournament rankings give results about individual matchups, that would be great seeing as that is already writtien down. Still this problem should not be overrated as every character will have his fans to give the data and it'll balance itself out somewhat.

@ EvolveOrDie

I really don't see a problem my method has that another tier list wouldn't have or maybe I' m just misunderstanding. I've already replied to the stage thingy and as I see it stages pose no problem, because at tournaments, the same stage isn't used all the time. The assumption of randomization (as used in the sciences) solves this, as long as there is enough data. I just don't see a problem and if I'm totally missing the point, please enlighten me.

EvolveOrDie · Aug 15, 2009

No I agree that enough data combined with randomization would suffice for a total package. My only concern is whether getting enough data is feasible. I see no inherent faults with your idea and if nothing else seems to be a better outline for a tierlist than our current one, which gets the job done mostly but to a questionable margin when it does. :/

Jane · Aug 15, 2009

i don't think this type of "tier list" would have to outright replace the SBR tier list, that would be near impossible to do.

but this new strictly statistical tier list could work WITH the SBR tier list, because over time, with enough data, it would definitely have an effect on the SBR tier list.

if we can make this big enough, if enough people agree to try to build this, eventually we could have people dedicated to writing down statistics of tournaments they attend. with big tournaments, TO's would have people who's specific job at the tournament would be to write down who won in every match, and on what stage.

DerpDaBerp · Aug 15, 2009

hillbillyhick said:
Apparently not and your argument is kinda lame. Is the backroom filled with infallible mathematicians who perceive every flaw in tier construction?

There are many people in the backroom, not to mention how they're there because they can be trusted with objective and substantial knowledge on smash matters which is very unlike a single person discrediting a system that has been used here for a long time with fewer problems, or less extreme problems than you think.

Red Arremer · Aug 15, 2009

EvolveOrDie said:
Hmm I think SpadeFox may have hit the nail on the head.

No capital F! But it doesn't matter anymore. *flails at nickchange*

hillbillyhick said:
Is the backroom filled with infallible mathematicians who perceive every flaw in tier construction?

lol, I just saw this.
Thanks for calling me - and ultimately the best players, biggest TOs and brightest minds of the Brawl community - idiots.

hillbillyhick · Aug 15, 2009

Onishiba said:
No capital F! But it doesn't matter anymore. *flails at nickchange*

lol, I just saw this.
Thanks for calling me - and ultimately the best players, biggest TOs and brightest minds of the Brawl community - idiots.

Haha lol, you called yourself that, I wouldn't dare :laugh:

Despite how good SBR are at brawl, they're still human and humans occasionally make mistakes.

EvolveOrDie · Aug 15, 2009

Onishiba said:
No capital F! But it doesn't matter anymore. *flails at nickchange*

I'm sorry don't eat me...I'm too young to die

Red Arremer · Aug 15, 2009

I don't see anything faulty with our current system, to be honest.
Of course there are better ways than it is now, but your way is definitely not better. The thing is, with games, you can only go as far with empirical data.

I would like you to bring up a very good example - Jigglypuff in Super Smash Bros. Melee.
Long time ago, Jigglypuff was placed in Mid Tier of the Tier List, her players never did all too well - despite still doing good, of course.
Suddenly, a player named Mango shows up and destroys every player he faces, including those players of characters that were said to beat Jigglypuff, including amazing Melee masters like Mew2King.
And woosh, Jigglypuff went up the tierlist.

Do you know how other fighting games are determining tier lists? They take their best players, let them look at the different characters and their options towards each other (thus creating matchups) and then look at the matchup ratios and the characters with the best matchups is first, then the second-best matchup list is second, etc.

In Smash, you can't do that just like that. While matchups are a deciding factor, Smash is a game that has been sloppily programmed and where little changes to rulesets (for instance adding a certain stage, items or character strategies) can create a vastly different metagame.
For instance, if you allow Shadow Moses Island, King Dedede will do FAR better than he is now. If you allow Smashballs, Sonic will do FAR better than he does now. Just a few examples.
For the sloppy programming, there's so much evidence that points to it, such as several ATs that are being found (Jab Locks, Chaingrabs, Glitches and what have you).

Other fighting games also don't have the same balance as Smash. All 3 Smash games are insanely unbalanced, whereas other fighting games (usually) seek out a very good balance in between the whole cast.

That's why tournament data also is important. Matchups are either validated and tightened or they are proven utterly wrong (such as the Ice Climbers vs. Meta Knight).

If you have no tournament data for those characters, though (they are underplayed or have less potential - thus are worse), how will you do a proper placement? Do you really think that Wario is a bad character just because he has less good players? Or let's take a better example: Do you really think that Ness is better than solid characters like Donkey Kong, Lucario or Peach, who mainly suffer from a horribly common matchup among the Top Tiers or little well-doing players?
The characters aren't bound to tournament results only, which is the main problem. Donkey Kong is a really good character who can place good in many tournaments. Because of Brawl being so unbalanced, though, he has a character that is common and extremely destroys him. Thus, he cannot place as well as he could would there not be that matchup.

As a last note:
The current version of the tier list (3.0) was very well recieved. The focus was on placing the Top and High Tier characters properly, while less attention was paid to the Mid and Low Tiers, that aren't specifically ordered.
If they are in one tier (except for S, A and B), they can be ordered however you would like them to be.

By the way... That's also what most people who say stuff like "I think Wolf should be higher" or "Lucas should be lower" don't grasp.

My 2 cents.

Zankoku · Aug 15, 2009

hillbillyhick said:
Despite how good SBR are at brawl, they're still human and humans occasionally make mistakes.

I never make mistakes. Ignore the green Luigi.

I skimmed over this thread and I will address a few points that I feel are relevant.

First, my character rankings list is not a tier list, nor was it ever intended to act as one. I originally began the project for myself so I could have some ground to debate on which characters were good or not back in early 2008 before the first Brawl tier list was created. Presently it is a good thing to watch for character centralization in the average tournament environment, and is definitely to some level taken into account during the SBR's creation of the tier list. However, in that regard, it is neither a guideline nor an actual voice in the final tier list.

Second, I'm not exactly sure what you're thinking, claiming that data would be easy to find just because I built my list. My character rankings list pays attention only to the top eight of each tournament, not because I feel those are the only ones who are relevant ever, but because it's so difficult to consistently get TOs to take down every character and placement beyond that. Beyond that, what you seek are brackets, which are even less common than complete results. You will be hard-pressed to find adequate data for what you are proposing.

Third, this idea seems very similar to adumbrodeus'. You should probably get in touch with him.

hillbillyhick · Aug 16, 2009

Onishiba said:
I don't see anything faulty with our current system, to be honest.
Of course there are better ways than it is now, but your way is definitely not better. The thing is, with games, you can only go as far with empirical data.

I would like you to bring up a very good example - Jigglypuff in Super Smash Bros. Melee.
Long time ago, Jigglypuff was placed in Mid Tier of the Tier List, her players never did all too well - despite still doing good, of course.
Suddenly, a player named Mango shows up and destroys every player he faces, including those players of characters that were said to beat Jigglypuff, including amazing Melee masters like Mew2King.
And woosh, Jigglypuff went up the tierlist.

Do you know how other fighting games are determining tier lists? They take their best players, let them look at the different characters and their options towards each other (thus creating matchups) and then look at the matchup ratios and the characters with the best matchups is first, then the second-best matchup list is second, etc.

In Smash, you can't do that just like that. While matchups are a deciding factor, Smash is a game that has been sloppily programmed and where little changes to rulesets (for instance adding a certain stage, items or character strategies) can create a vastly different metagame.
For instance, if you allow Shadow Moses Island, King Dedede will do FAR better than he is now. If you allow Smashballs, Sonic will do FAR better than he does now. Just a few examples.
For the sloppy programming, there's so much evidence that points to it, such as several ATs that are being found (Jab Locks, Chaingrabs, Glitches and what have you).

Other fighting games also don't have the same balance as Smash. All 3 Smash games are insanely unbalanced, whereas other fighting games (usually) seek out a very good balance in between the whole cast.

That's why tournament data also is important. Matchups are either validated and tightened or they are proven utterly wrong (such as the Ice Climbers vs. Meta Knight).

If you have no tournament data for those characters, though (they are underplayed or have less potential - thus are worse), how will you do a proper placement? Do you really think that Wario is a bad character just because he has less good players? Or let's take a better example: Do you really think that Ness is better than solid characters like Donkey Kong, Lucario or Peach, who mainly suffer from a horribly common matchup among the Top Tiers or little well-doing players?
The characters aren't bound to tournament results only, which is the main problem. Donkey Kong is a really good character who can place good in many tournaments. Because of Brawl being so unbalanced, though, he has a character that is common and extremely destroys him. Thus, he cannot place as well as he could would there not be that matchup.

As a last note:
The current version of the tier list (3.0) was very well recieved. The focus was on placing the Top and High Tier characters properly, while less attention was paid to the Mid and Low Tiers, that aren't specifically ordered.
If they are in one tier (except for S, A and B), they can be ordered however you would like them to be.

By the way... That's also what most people who say stuff like "I think Wolf should be higher" or "Lucas should be lower" don't grasp.

My 2 cents.

How is my method not better, I've given sufficient reasons and refuted many counterarguments. They use empirical data in medicine, psychology, sociology, biology, chemistry, physics,... Surely you can't say a game is more complex than what they're researching. What can be observed can be measured and to measure is to know.

Jigglypuff example is irrelevant as it applies to all tier lists, tier lists are always temporary.

Why couldn't we do it like in other games. If you get your data from tournaments then you know items and certain stages will be banned, making it so that your data comes from essentially the same pool. It's perfectly possible, witness the miracle of randomization as data increases. Also the fact that certain characters have more stages or AT's to benefit from is something that SHOULD be reflected in a tier list as these convey the character with certain benefits over others.

I don't think smash is so much more unbalanced than other games, but for the sake of argument I'll follow you. If indeed smash is more unbalanced, that's a good thing as it will make tier lists more pronounced and reliable with less data.

As I see it, tournament rankings are the result of matchups, but because rankings are hard to perform operations on (being on an ordinal scale), matchups should be given the favor as data.

As I've said in my original post, the method I'm suggesting still has no randomized trials, this would effectively solve the problem of good players always picking character a and bad players picking character b. However this only really matters if you want a tier list that says which character is currently best (I never said my tier list would make this happen). When making a tournament performance list (a metagame list for skilled players), then this is no obstacle. Suppose MK is on the top of such a list with a chance of 70% of winning a random match in a tournament setting, your best friend plays an MK and he has no idea which character he'll be playing against next. His expected chance of winning the next match is 70% (this is irregardless of how good or bad he is, it's an expected value, not the real value, which is impossible to know).

The fact that it was well received doesn't mean the method behind it is good. Opinion fails when real data is possible. Which would you prefer: a shaman or a western doctor, the shaman really knows his herbs, but the doctor has the statistical data backing up his claims that his medicine works.

Ankoku said:
I never make mistakes. Ignore the green Luigi.

I skimmed over this thread and I will address a few points that I feel are relevant.

First, my character rankings list is not a tier list, nor was it ever intended to act as one. I originally began the project for myself so I could have some ground to debate on which characters were good or not back in early 2008 before the first Brawl tier list was created. Presently it is a good thing to watch for character centralization in the average tournament environment, and is definitely to some level taken into account during the SBR's creation of the tier list. However, in that regard, it is neither a guideline nor an actual voice in the final tier list.

Second, I'm not exactly sure what you're thinking, claiming that data would be easy to find just because I built my list. My character rankings list pays attention only to the top eight of each tournament, not because I feel those are the only ones who are relevant ever, but because it's so difficult to consistently get TOs to take down every character and placement beyond that. Beyond that, what you seek are brackets, which are even less common than complete results. You will be hard-pressed to find adequate data for what you are proposing.

Third, this idea seems very similar to adumbrodeus'. You should probably get in touch with him.

Well I apologize for having any misconceptions. I think it'd actually be better if they paid more attention to your list when making tiers. Even if the results aren't perfectly sound it's still better than pure opninion.

I don't remember claiming data would be easy to find, I just thought it wouldn't be harder to find than your data, but of course I might be wrong. You know the hardships of collecting data better than me.

Red Arremer · Aug 16, 2009

I find it funny that you're comparing us to shamans who have no idea of what they're talking about.

hillbillyhick · Aug 16, 2009

Onishiba said:
I find it funny that you're comparing us to shamans who have no idea of what they're talking about.

Well maybe the analogy isn't perfect. But I did say the shamans know a lot about herbs, I mean to say that they have a lot of experience and practical knowledge, but still they can't beat the statistical proof of the doctor's medicines just by saying their herbs can heal. I don't want to be responsible for giving others the chance to go offtopic, so I'll just say the comparison wasn't very good.

Towards a conceptually and methodologically sound way of constructing tiers.

Smash Cadet

Was selected randomly

Smash Master

Smash Obsessed

Smash Cadet

Smash Cadet

Smash Apprentice

Smash Cadet

Smash Hero

Smash Cadet

Smash Hero

Smash Cadet

Smash Hero

Smash Champion

Smash Cadet

Uhh... Look at my status.

BRoomer

Smash Legend

Smash Cadet

Smash Cadet

Smash Legend

Smash Cadet

Uhh... Look at my status.

Smash Cadet

Uhh... Look at my status.

Smash Cadet

Uhh... Look at my status.

Smash Cadet

Smash Cadet

Smash Cadet

Smash Hero

Smash Champion

Smash Legend

Smash Cadet

Smash Cadet

Smash Legend

Never Knows Best

Smash Cadet

Smash Legend

Smash Cadet

Information

Network