• Welcome to Smashboards, the world's largest Super Smash Brothers community! Over 250,000 Smash Bros. fans from around the world have come to discuss these great games in over 19 million posts!

    You are currently viewing our boards as a visitor. Click here to sign up right now and start on your path in the Smash community!

Towards a conceptually and methodologically sound way of constructing tiers.

Red Arremer

Smash Legend
Joined
Nov 27, 2005
Messages
11,437
Location
Vienna
Then let's take this scenario:
I play an incredible Ganondorf and go to a tournament full of bad players. I do this very often, so I win a lot. Your results become skewed.
Of course, that would also influence the list from Ankoku, but your "empirical" also would be skewed.
 

hillbillyhick

Smash Cadet
Joined
Jun 23, 2008
Messages
51
Location
Ghent, Belgium
Then let's take this scenario:
I play an incredible Ganondorf and go to a tournament full of bad players. I do this very often, so I win a lot. Your results become skewed.
Of course, that would also influence the list from Ankoku, but your "empirical" also would be skewed.
I think the data SuSa is gathering is from the best players and I would do that too, so your scenario normally wouldn't come up. Suppose it did now, then yes it would skew the list a bit in favor of ganon, but that depends on how much data has been gathered. If a lot of data has been gathered, the skewing would be minimal. Like I said though, by having the right data (that of top players) your scenario wouldn't arise.

Also I think ankoku's list would be more affected, because of the multiplying and also because the value added remains constant, while in a ratio the value added is relative to the size of data already gathered.
 

Jane

Smash Hero
Joined
Jul 29, 2008
Messages
5,593
Location
Ba Sing Se, EK
They use empirical data in medicine, psychology, sociology, biology, chemistry, physics,... Surely you can't say a game is more complex than what they're researching. What can be observed can be measured and to measure is to know.
this is very important.


Why couldn't we do it like in other games. If you get your data from tournaments then you know items and certain stages will be banned, making it so that your data comes from essentially the same pool. It's perfectly possible, witness the miracle of randomization as data increases. Also the fact that certain characters have more stages or AT's to benefit from is something that SHOULD be reflected in a tier list as these convey the character with certain benefits over others.
also this. (most tourneys follow the SBR ruleset)

Suppose MK is on the top of such a list with a chance of 70% of winning a random match in a tournament setting, your best friend plays an MK and he has no idea which character he'll be playing against next. His expected chance of winning the next match is 70% (this is irregardless of how good or bad he is, it's an expected value, not the real value, which is impossible to know).
this is what this list would achieve! how can you guys NOT want this?!

I find it funny that you're comparing us to shamans who have no idea of what they're talking about.
don't feel insulted! he's not at all saying you have no idea what you're talking about! and just for the record shamans aren't complete idiots. haha :D

Then let's take this scenario:
I play an incredible Ganondorf and go to a tournament full of bad players. I do this very often, so I win a lot. Your results become skewed.
Of course, that would also influence the list from Ankoku, but your "empirical" also would be skewed.
think about how many tourney you can go to FULL of bad players. i can definitely happen, but very often? come on lets be realistic.
 

Zankoku

Never Knows Best
Administrator
BRoomer
Joined
Nov 8, 2006
Messages
22,906
Location
Milpitas, CA
NNID
SSBM_PLAYER
Also I think ankoku's list would be more affected, because of the multiplying and also because the value added remains constant, while in a ratio the value added is relative to the size of data already gathered.
It's an interesting thing that happens, actually. If a lot (we're talking weekly events here) of this happens, then you'll see the rise of a character maybe up to bottom of C. But, since it's not the only tournament that happens every weekend, unless the character gets represented in something actually significant, it'll never be able to outrun the more heavily repped ones in larger tourneys.

If anything, the main bother to me regarding tournament results is that they do not follow the absolute top of the tournament metagame, since using only national level tournaments would provide insufficient data to work with.
 

Mr.-0

Smash Ace
Joined
Mar 26, 2008
Messages
986
I wanna say one thing: ( okay, a bunch of stuff )

First of all, your so called theory that as long as the stages are somewhat randomized and that out of 15 stages, A MK vs a snake, there would be a couple in advantage of one character, a couple in advantage of the other, and some that gave no advantage to either. You said that if you played one match in all 15 stages, then that because of this there would be no effect on the overall outcome ( which you based of the standard matcup ) Now, there's so many things wrong with that statement that it's funny. what you say is true, but we don't play 15 times in a row in a tourny set. Nor are we forced to do battle on all of the legal stages. WE fight on the stages that we counterpick. Which means, that, usually, A MK player will counterpick a stage that beefits him over snake. With that in mind, instead of playing on 15 stages, you'd play on 3 or 4. They don't play on the stages in a set order. They do something called counterpicking. Than, since counterpicking diffrent stages screws up[ the matchup for each character, you'd have to consider all possible/legal stages and there effect on all matchups if you wanted to include everything numerical into the list. After all, this entire list that your making is completely based of matchups, so you'd have to include every variable. There goes 20 times however many matchups there are ( which is if your playing as MK there are roughly 20 times 35 possible outcomes for matchups. MK vs snake would have twenty diffrent ratios. And you'd have to include ALL variables to make it fully scientific and numerically fair. not to mention that some of these stages won't get played on for that ONE matchup. Have fun doiung all THAT math. ) Plus there's no way in heck that SWF would suspend the counterpicking rule and make the set of stages randomized. Even then, there'd still be all of those variables.

Second, this whole tier list is based ALL on matchups. Even if the matchups are based on the current top players metagame, every time the metagame changed ( like if a new AT is discovored ) even for only one matchup, it would shift around your whole list. Not to mention you'd have to factor that in to all of the possibl/legal stages. You'd be up all night. And that's for on out of 35 times 35 mathcups. That's 20times 35 times 35 in total, and if the metagame shifts largely than ALL of the matchups are considered outdated and must be redone. And for popular characters, you'd have to definitely change their matchup if a new groundbreaking ( or even moderately awesome ) AT came in because of the maounts of time that it would be used in the matchup. That would take a ton of time.

Third, don't you think that movesets and things other than matchup data should be considered? For example, if you want to take all things related to numbers that could help make a tier list, than you'd have to put in moveset data. For example, add up how much total priority a character has per move on average, or how much recovery a character has, or how laggy or strong or comboable ( yes, even for brawl, couple comboes that it has ) their moves are on average? Don't you think those things should be considered? I mean, those are the things which create the outcome of the matchups, but things other than just matchups alone should be considered. Of course, on a tier list, some sort of opinion should be included as long as it's legit and to level, but if you want to make it all numerical, that's fine with me. With every matchup varible that there is, if you want to make it credible/obsolete, have fun crunching all those numbers.

And if you say that you don't have to include poula conterpicks or counterpicking or stages into matchup/tier list evaluation, ask any top player: stages and counterpicking constitte a large part of the matchup. Like spadefox said, FD heavily increases all of diddy's matchups in his favor. ( Well, most of them, you get my point. ) I just think that it's stupid to make a only numerical and only matchup related tier list with no oinion what so ever, people play smash, not robots. There are somethings that will need human opinion on. And not including counterpicking or stages into matchup evaluation is just stupid.

So, yeah, that's my output on this and stuff. Yeah, hmm... Stuff. :)

So, who agrees with me?
 

hillbillyhick

Smash Cadet
Joined
Jun 23, 2008
Messages
51
Location
Ghent, Belgium
I wanna say one thing: ( okay, a bunch of stuff )

First of all, your so called theory that as long as the stages are somewhat randomized and that out of 15 stages, A MK vs a snake, there would be a couple in advantage of one character, a couple in advantage of the other, and some that gave no advantage to either. You said that if you played one match in all 15 stages, then that because of this there would be no effect on the overall outcome ( which you based of the standard matcup ) Now, there's so many things wrong with that statement that it's funny. what you say is true, but we don't play 15 times in a row in a tourny set. Nor are we forced to do battle on all of the legal stages. WE fight on the stages that we counterpick. Which means, that, usually, A MK player will counterpick a stage that beefits him over snake. With that in mind, instead of playing on 15 stages, you'd play on 3 or 4. They don't play on the stages in a set order. They do something called counterpicking. Than, since counterpicking diffrent stages screws up[ the matchup for each character, you'd have to consider all possible/legal stages and there effect on all matchups if you wanted to include everything numerical into the list. After all, this entire list that your making is completely based of matchups, so you'd have to include every variable. There goes 20 times however many matchups there are ( which is if your playing as MK there are roughly 20 times 35 possible outcomes for matchups. MK vs snake would have twenty diffrent ratios. And you'd have to include ALL variables to make it fully scientific and numerically fair. not to mention that some of these stages won't get played on for that ONE matchup. Have fun doiung all THAT math. ) Plus there's no way in heck that SWF would suspend the counterpicking rule and make the set of stages randomized. Even then, there'd still be all of those variables.

Second, this whole tier list is based ALL on matchups. Even if the matchups are based on the current top players metagame, every time the metagame changed ( like if a new AT is discovored ) even for only one matchup, it would shift around your whole list. Not to mention you'd have to factor that in to all of the possibl/legal stages. You'd be up all night. And that's for on out of 35 times 35 mathcups. That's 20times 35 times 35 in total, and if the metagame shifts largely than ALL of the matchups are considered outdated and must be redone. And for popular characters, you'd have to definitely change their matchup if a new groundbreaking ( or even moderately awesome ) AT came in because of the maounts of time that it would be used in the matchup. That would take a ton of time.

Third, don't you think that movesets and things other than matchup data should be considered? For example, if you want to take all things related to numbers that could help make a tier list, than you'd have to put in moveset data. For example, add up how much total priority a character has per move on average, or how much recovery a character has, or how laggy or strong or comboable ( yes, even for brawl, couple comboes that it has ) their moves are on average? Don't you think those things should be considered? I mean, those are the things which create the outcome of the matchups, but things other than just matchups alone should be considered. Of course, on a tier list, some sort of opinion should be included as long as it's legit and to level, but if you want to make it all numerical, that's fine with me. With every matchup varible that there is, if you want to make it credible/obsolete, have fun crunching all those numbers.

And if you say that you don't have to include poula conterpicks or counterpicking or stages into matchup/tier list evaluation, ask any top player: stages and counterpicking constitte a large part of the matchup. Like spadefox said, FD heavily increases all of diddy's matchups in his favor. ( Well, most of them, you get my point. ) I just think that it's stupid to make a only numerical and only matchup related tier list with no oinion what so ever, people play smash, not robots. There are somethings that will need human opinion on. And not including counterpicking or stages into matchup evaluation is just stupid.

So, yeah, that's my output on this and stuff. Yeah, hmm... Stuff. :)

So, who agrees with me?

I've only slept one hour tonight so I'll probably have to edit this later.
Your arguments really don't make sense if you realize what my proposed tier list actually means, but I see you've put effort into making them so I'll try to put effort in refuting them.

You don't have to play 15 matches in a row, 15 people playing the same matchup but in different tournaments provide exactly the same expected result. It really doesn't matter that in effect they only play on 3-4 stages. My proposed tier list would be one that gives the expected average winning percentage of a character fighting a random character in a tournament setting. IN A TOURNAMENT SETTING means that what you’re saying is already accounted for. The advantage is that you get the real expected winning chance. Suppose MK has a percentage of 65%, then you can predict that the next time an MK fights he’ll have an expected winning chance of 65%. If you would then watch 100 MK matches, then the amount of MK wins wouldn’t statistically speaking be far from 65 (normal distribution, too hard to explain here, you should look it up). Maybe you’ll understand it in the following way, because I’ve heard similar arguments being mentioned over and over again (and I reject them every time):

Where you get your data from, that’s where the data also applies to. Data from tournament settings will apply to tournament settings.

I’d have to include all variables (you mean a near infinite amount) to make it fair?????
I’m a regular reader of scientific articles and this just baffles me. If you’re right, then science is all wrong, the two million research papers in biomedicine alone every year just use the wrong methodology? I’m sorry, lack of sleep has made me cranky.:urg: But statistics is developed just because we need to handle so many variables. The experimental method for evidence based medicine is a control group (with a placebo) and an experimental group (with the medicine), there’s a near infinite amount of possible variables here, but through randomization all are accounted for but one: the medicine vs placebo. If the medicine gives better results then this is the causal variable, not any other from the infinite list.

Your second point makes no sense as it applies to every tier list, even opinion based ones. If a new super AT would be discovered which only pit has, then even an opinion based one would have to be changed as the AT changes the metagame. All tier lists are temporary, said this a million times already and it’s even in my original post.

The suggestions in your third point are too hard to quantify in a methodologically good way, I wouldn’t even know how to begin. And it doesn’t matter as all these variables have already been accounted for. What you're suggesting is extreme reduction to the ssbb laws of physics, making it ridiculously hard to create a tier list. It's like trying to explain social behavior by looking at people's atoms.

I’ll probably use some of your questions and suggestions to make a q and a on my original post, some of them I’ve heard already so maybe I should clear them up better. It’s just hard to explain statistics from the ground up to people (I’m not trying to be arrogant, I actually think I suck at explaining, which is why the same questions pop up again and again.)

Excuse my for my crankiness.;)
 

choknater

Smash Obsessed
Joined
Dec 25, 2002
Messages
27,296
Location
Modesto, CA
NNID
choknater
you certainly don't give up... i suppose that is commendable!

irregardless isn't a word :D
 

Mr.-0

Smash Ace
Joined
Mar 26, 2008
Messages
986
I've only slept one hour tonight so I'll probably have to edit this later.
Your arguments really don't make sense if you realize what my proposed tier list actually means, but I see you've put effort into making them so I'll try to put effort in refuting them.

You don't have to play 15 matches in a row, 15 people playing the same matchup but in different tournaments provide exactly the same expected result. It really doesn't matter that in effect they only play on 3-4 stages. My proposed tier list would be one that gives the expected average winning percentage of a character fighting a random character in a tournament setting. IN A TOURNAMENT SETTING means that what you’re saying is already accounted for. The advantage is that you get the real expected winning chance. Suppose MK has a percentage of 65%, then you can predict that the next time an MK fights he’ll have an expected winning chance of 65%. If you would then watch 100 MK matches, then the amount of MK wins wouldn’t statistically speaking be far from 65 (normal distribution, too hard to explain here, you should look it up). Maybe you’ll understand it in the following way, because I’ve heard similar arguments being mentioned over and over again (and I reject them every time):

Where you get your data from, that’s where the data also applies to. Data from tournament settings will apply to tournament settings.

I’d have to include all variables (you mean a near infinite amount) to make it fair?????
I’m a regular reader of scientific articles and this just baffles me. If you’re right, then science is all wrong, the two million research papers in biomedicine alone every year just use the wrong methodology? I’m sorry, lack of sleep has made me cranky.:urg: But statistics is developed just because we need to handle so many variables. The experimental method for evidence based medicine is a control group (with a placebo) and an experimental group (with the medicine), there’s a near infinite amount of possible variables here, but through randomization all are accounted for but one: the medicine vs placebo. If the medicine gives better results then this is the causal variable, not any other from the infinite list.

Your second point makes no sense as it applies to every tier list, even opinion based ones. If a new super AT would be discovered which only pit has, then even an opinion based one would have to be changed as the AT changes the metagame. All tier lists are temporary, said this a million times already and it’s even in my original post.

The suggestions in your third point are too hard to quantify in a methodologically good way, I wouldn’t even know how to begin. And it doesn’t matter as all these variables have already been accounted for. What you're suggesting is extreme reduction to the ssbb laws of physics, making it ridiculously hard to create a tier list. It's like trying to explain social behavior by looking at people's atoms.

I’ll probably use some of your questions and suggestions to make a q and a on my original post, some of them I’ve heard already so maybe I should clear them up better. It’s just hard to explain statistics from the ground up to people (I’m not trying to be arrogant, I actually think I suck at explaining, which is why the same questions pop up again and again.)

Excuse my for my crankiness.;)
After reading over my post, I get what you mean: you don't have to include every variable into an equation. We don't even do that now. And yeah, you would have to rewrite out tier list now if somebody got a new super AT, but it's a lot easier just saying " I think we should move pit up " and get close to what you'd get rather than rewrite all of pit's data. ( Which since you wouldn't have to facture in variables, would be much lighter ) and get a similair result. And then my thing with counterpicking is, a mtchups can swing 5 or maybe even 10% based on the satge, and if it does, than chances are that stage will be ounterpicked. Doesn't that kinda screw up your tier list system? Since if the matchup was changed a little and your list is based all off matchups? like, i'm saying, that influences tier lists now, but it doesn't effect it as much when you have a bunch of people vote on it like a democracy. And then my last point was that a tier lists shouldn't only be based on matchups, there should be some human opinion ( like in recovery; that doesn't usually influence matchups ) because it's a human game? ASsuming that we all play perfectly like the matchup suggests, than sure, it would be fine, but were humans, unless were one of the top smashers, we probably won't. But then, the tier list is based of the current metagame at it's top potential, so...

I just think it's kinda stupid to base something completely of matchups. I think there should be some opinion on it, and there's also that matchups will sway depending on the counterpick. I'm just saying.
 

hillbillyhick

Smash Cadet
Joined
Jun 23, 2008
Messages
51
Location
Ghent, Belgium
After reading over my post, I get what you mean: you don't have to include every variable into an equation. We don't even do that now. And yeah, you would have to rewrite out tier list now if somebody got a new super AT, but it's a lot easier just saying " I think we should move pit up " and get close to what you'd get rather than rewrite all of pit's data. ( Which since you wouldn't have to facture in variables, would be much lighter ) and get a similair result. And then my thing with counterpicking is, a mtchups can swing 5 or maybe even 10% based on the satge, and if it does, than chances are that stage will be ounterpicked. Doesn't that kinda screw up your tier list system? Since if the matchup was changed a little and your list is based all off matchups? like, i'm saying, that influences tier lists now, but it doesn't effect it as much when you have a bunch of people vote on it like a democracy. And then my last point was that a tier lists shouldn't only be based on matchups, there should be some human opinion ( like in recovery; that doesn't usually influence matchups ) because it's a human game? ASsuming that we all play perfectly like the matchup suggests, than sure, it would be fine, but were humans, unless were one of the top smashers, we probably won't. But then, the tier list is based of the current metagame at it's top potential, so...

I just think it's kinda stupid to base something completely of matchups. I think there should be some opinion on it, and there's also that matchups will sway depending on the counterpick. I'm just saying.
No really, the stage thing is no problem to my list. But I think I know the reason why the same comments keep coming back. Many of you have a misconception of what my proposed tier list actually reflects and granted I might not have worded it as best as I could. I'll try to word it exactly now (I'll also post this at the top of my original post).

My proposed tier list reflects the expected chances of characters winning a matchup in a current tournament setting. It says nothing more than this. But I can't stress enough that THIS IS WHAT IT REFLECTS.

And yes, an opinion based tier list is easier, but so much more unreliable. What happens is those that make the list guess what the data is, instead of looking at the data itself. The kind of psychological effects people are subject to is astounding, take recency effect (remembering only recent matchups), confirmation bias (because you like a specific character, you ignore bad information and accept only good), halo effect (generalizing one good aspect of a character to the entirety of the character,... I'm not even talking about social influencing which is happening all the time on these boards. Also, statistics is handy because it can turn so much data and variables into one neat little number, humans can't possibly remember and compare a similar number of variables and data. Numbers are always the way to go when possible.

Why wouldn't recovery influence matchups, a good recovery makes you more likely to win doesn't it?
 

Mr.-0

Smash Ace
Joined
Mar 26, 2008
Messages
986
True. That is, true. So, I decided to take a look at the matchups for some characters, averaged them, and came up with this: most of them are incomplete, some outdated, not new, crappy, or biased, but here it goes. ( And by the way, this is what their respective character boards said, so... )

Bowser Incomplete
42.8571
Diddy Kong Incomplete
47.5
Donkey Kong Incomplete
48.89
Falco Incomplete
49.375
Fox Incomplete ( They cheated and lied. Probably. )
55 ( Unreal, two matchups, Mk and snake, I doubt it. )
GaW Done ( Pretty Much )
60.1389
MK Incomplete ( Almost there )
60.3846153
Snake Icomplete (Also ^ Almost there, not quite )
51 ( They finished with all his bad matchups, hardly any of his good ones. Exact opposite of fox boards, surprisingly enough )

the only problem is, even if it's incomplete and in no order, if you put it in order, than it's a joke. No way snake is fourth best out of these characters, and that donkye is 5th and better than diddy, who's second worst.
 

hillbillyhick

Smash Cadet
Joined
Jun 23, 2008
Messages
51
Location
Ghent, Belgium
True. That is, true. So, I decided to take a look at the matchups for some characters, averaged them, and came up with this: most of them are incomplete, some outdated, not new, crappy, or biased, but here it goes. ( And by the way, this is what their respective character boards said, so... )

Bowser Incomplete
42.8571
Diddy Kong Incomplete
47.5
Donkey Kong Incomplete
48.89
Falco Incomplete
49.375
Fox Incomplete ( They cheated and lied. Probably. )
55 ( Unreal, two matchups, Mk and snake, I doubt it. )
GaW Done ( Pretty Much )
60.1389
MK Incomplete ( Almost there )
60.3846153
Snake Icomplete (Also ^ Almost there, not quite )
51 ( They finished with all his bad matchups, hardly any of his good ones. Exact opposite of fox boards, surprisingly enough )

the only problem is, even if it's incomplete and in no order, if you put it in order, than it's a joke. No way snake is fourth best out of these characters, and that donkye is 5th and better than diddy, who's second worst.
Is this based on opinion or tournament matchups, if so, how many matchups have been collected for each character?

I've also thought of a good and simple analogy for what I'm doing.
You're walking down the street and suddenly you find a coin, you have no idea whatsoever of what the chances are the coin will come up heads (I really mean no idea whatsoever). You have some spare time so you decide to flip it 100 times and look at how many times it comes up heads. The result is 53 (53%), now you have this data. Next day you find another coin on the street, it looks exactly similar, what do you think the outcome would be if you flipped it 100 times? 53 of course.
If I collect data from matchups in tournaments (data from flipping the coin) this information is generalizable to other tournaments (similar coins). Now you see why stages make no difference.
 

Mr.-0

Smash Ace
Joined
Mar 26, 2008
Messages
986
Real, valid, tournament matchups are far in between. These are tournament matchups, based on top level of play, but really only for snake, GaW, and MK. the rest are crappy, outdated, and pretty biased.
 

Zankoku

Never Knows Best
Administrator
BRoomer
Joined
Nov 8, 2006
Messages
22,906
Location
Milpitas, CA
NNID
SSBM_PLAYER
you certainly don't give up... i suppose that is commendable!

irregardless isn't a word :D
Actually, it is. It's an obsolete word, but still a valid word, nonetheless. Oddly enough it's a synonym for "regardless."
 

Dekar173

Justice Man
Joined
Jun 25, 2008
Messages
3,126
Location
Albuquerque, NM
Any and all research in the manner of ANYTHING smash related that can generate anything positive for this community should be commended and embraced.

That being said, keep it up man :) I like different tier lists, regardless of how they look in the beginning (hell, Diddy was C-rank in the very beginning of Brawl, now look where he is ;D)
 

hillbillyhick

Smash Cadet
Joined
Jun 23, 2008
Messages
51
Location
Ghent, Belgium
I totally revamped the original post. The message and method described is exactly the same, but I've essentially rewritten almost everything. There's more structure now which makes it an easier read I hope. I've also added some things not mentioned in the previous original post, some technical stuff, a small Q and A, more links. And if I'm not mistaken the post is smaller now XD
 

Merce

Smash Cadet
Joined
Nov 10, 2008
Messages
57
I don't understand why people are still concerned about stage selection. If players operate in a manner that best suits their character by appropriately counter picking, they are representing the average performance of their character in a tournament setting, adjusted for the give-and-take nature of stage selection.

I like the level of detail you've put into your suggestion, hillbilly. This is certainly a more statistically sound method than the current one. Although, I don't know if you truly appreciate how difficult it is going to be to gather this data from scratch.
 

Mr.-0

Smash Ace
Joined
Mar 26, 2008
Messages
986
I just realized something.

Go to the matchup chart and lists in the tactical section. They have this already, it's just under the name matchup tier list. It's essentialy the same thing. or am I missing something?
 

hillbillyhick

Smash Cadet
Joined
Jun 23, 2008
Messages
51
Location
Ghent, Belgium
I don't understand why people are still concerned about stage selection. If players operate in a manner that best suits their character by appropriately counter picking, they are representing the average performance of their character in a tournament setting, adjusted for the give-and-take nature of stage selection.

I like the level of detail you've put into your suggestion, hillbilly. This is certainly a more statistically sound method than the current one. Although, I don't know if you truly appreciate how difficult it is going to be to gather this data from scratch.
Thank you.
Yes, data gathering is tricky. I suppose tournament organizers are also the ones that post the tournament results. Now if they included matchup info, then data would be no problem. Of course it would mean they would have to keep track of the winners of every matchup, but that's not so much to ask is it? They could just pass a list and after the match, the players could fill in something like falco -MK 1-0 falco -MK 0-1 diddy kong - MK 1-0
This way you could gather a lot of data from only one tournament. You get a lot less data from just taking the rankings.

I just realized something.

Go to the matchup chart and lists in the tactical section. They have this already, it's just under the name matchup tier list. It's essentialy the same thing. or am I missing something?
If you're referring to Rajam's list, I've already put a link in the original post. His list misses one of the most crucial elements though: empirical data. His list is based on character board's opinions. I actually find his list better than the official tier list, but still, not being empirical is a huge flaw.
 

Mr.-0

Smash Ace
Joined
Mar 26, 2008
Messages
986
Well, you have to understand, we won't find empirical matchups for a LONG time. His is the closest we can get right now.
 

IThinkAboutIvysaur

Smash Apprentice
Joined
Jul 26, 2009
Messages
102
Other fighting games also don't have the same balance as Smash. All 3 Smash games are insanely unbalanced, whereas other fighting games (usually) seek out a very good balance in between the whole cast.
Actually Smash64 was pretty balanced.
 

Nanaki

Smash Lord
Joined
Jul 25, 2008
Messages
1,063
Location
The Golden Saucer
I think you've got a fine idea here, and I think you'll get some interesting results, but don't expect it to be anything groundbreaking.

Your biggest obstacle, as you well know, is going to be getting enough data quickly enough that the metagame hasn't drastically shifted before your data becomes invalid. Obviously you're going to get plenty of data on the top-tier characters, as they're played more in tournament at the high levels and present more data for you to work with. You're going to have the same problem as the SBR: "inexperience" (in your case, lack of data) with low tier characters.

That being said, go for it. I'll sure be interested to see what comes out of all that number crunching.

If you want a ridiculous undertaking to add to this, add doubles results to your measurements to see which characters make the best doubles partners. Of course, you'd then have to make it a two-way analysis (two way ANOVA) with character and partner as your factors and test the interaction significance of the two, which would be interesting to learn anyway (do the character and their partner choice significantly impact one another in terms of tournament success?).

I think that as long as you don't expect people to "believe" your list over the SBR's, you'll come out alright in the end. Simply make it an alternate resource. People don't easily disbelieve the 'experts' in a field, as the SBR is in theirs.

Good luck! I do have to nitpick one thing you said, though:

What can be observed can be measured and to measure is to know.
To measure is certainly NOT to know. To measure is to give evidence that your hypothesis may be correct. I can measure curative effects of carbonated water on damaged muscle tissue (random, I know), but I have to prove beyond a shadow of a doubt that the carbonated water is the source of a curative properties and understand HOW it causes restoration before I 'know' that it has any curative effect.
 

hillbillyhick

Smash Cadet
Joined
Jun 23, 2008
Messages
51
Location
Ghent, Belgium
I think you've got a fine idea here, and I think you'll get some interesting results, but don't expect it to be anything groundbreaking.

Your biggest obstacle, as you well know, is going to be getting enough data quickly enough that the metagame hasn't drastically shifted before your data becomes invalid. Obviously you're going to get plenty of data on the top-tier characters, as they're played more in tournament at the high levels and present more data for you to work with. You're going to have the same problem as the SBR: "inexperience" (in your case, lack of data) with low tier characters.

That being said, go for it. I'll sure be interested to see what comes out of all that number crunching.

If you want a ridiculous undertaking to add to this, add doubles results to your measurements to see which characters make the best doubles partners. Of course, you'd then have to make it a two-way analysis (two way ANOVA) with character and partner as your factors and test the interaction significance of the two, which would be interesting to learn anyway (do the character and their partner choice significantly impact one another in terms of tournament success?).
Remember that you can get a large number of matchups from a single tournament. Granted though, there may be problems with unpopular characters.

The ANOVA and t-testing is a good idea, it gives evidence that certain results are not likely to be only attributed to chance. I was going to say something about it in the OP, but decided it was already too long and complicated.

To measure is certainly NOT to know. To measure is to give evidence that your hypothesis may be correct. I can measure curative effects of carbonated water on damaged muscle tissue (random, I know), but I have to prove beyond a shadow of a doubt that the carbonated water is the source of a curative properties and understand HOW it causes restoration before I 'know' that it has any curative effect.
I agree somewhat:)
"To measure is to know" is a quote by Lord Kelvin, the physicist. I found it kind of appropriate and poetic for that specific comment.

You can "know" that it has curative effects without knowing how it causes it. A well-designed randomized clinical trial with a placebo control group and an experimental group can give you this information (in some cases, you wouldn't find a significant effect even though it is curative under certain conditions). Knowing HOW it causes it is of course more valuable information.
 

Nanaki

Smash Lord
Joined
Jul 25, 2008
Messages
1,063
Location
The Golden Saucer
Remember that you can get a large number of matchups from a single tournament. Granted though, there may be problems with unpopular characters.

The ANOVA and t-testing is a good idea, it gives evidence that certain results are not likely to be only attributed to chance. I was going to say something about it in the OP, but decided it was already too long and complicated.
Yeah, you'd need to run those tests or the whole point is moot. Glad we agree.
I agree somewhat:)
"To measure is to know" is a quote by Lord Kelvin, the physicist. I found it kind of appropriate and poetic for that specific comment.

You can "know" that it has curative effects without knowing how it causes it. A well-designed randomized clinical trial with a placebo control group and an experimental group can give you this information (in some cases, you wouldn't find a significant effect even though it is curative under certain conditions). Knowing HOW it causes it is of course more valuable information.
Meh, I'll agree, kind of.

You may 'know' that it works (if your experiment was indeed perfectly designed and got excellent results), but good luck getting anyone to believe you if don't know the underlying cause of the results. Sure, you'll be able to publicize it, but it won't be 'known' that your treatment works until someone figures out why.

Anyway, off topic galore. Your tier list will be interesting, anyway. I hope it changes some people's opinions of which characters are 'viable'.
 
Top Bottom