Pilgrim
Smash Rookie
Hello Smashboards,
This post is intended to inform the Smashboards community of some major mathematical and scientific flaws in the creation of a formal “Tier” list, as previously used in melee and currently used in many competitive fighting games. Before critiquing some of the techniques, I want to ensure the readers that in no way am I claiming that all characters are equal, or that this is a perfectly balanced game-- however, I do want to bring up some potential confounds in the methods of data analyses used. My primary reason in addressing this issue is that debates and vehement claims are used regarding the use of such data without much methodological consideration regarding how these numbers were identified. In addition, I hope to inform the competitive community of better and more scientifically sophisticated methods of analyzing data.
The primary source of tier lists are basic statistics that anybody can do with excel (i.e. measures of central tendency, t tests, and regression). However these relatively “simple” methods of data analysis are also very vulnerable to threats to the internal and external validity of these studies. When considering the appropriate method of data analysis you must consider the method in which the data is collected. In all fighting tiers the majority of data is collected via passive observational studies in which tournaments are held and data is collected based on winners and competitors in both matchups and in general. This method of data collection does not account for a large quantity of other factors that contribute to the “win” and the “lose”. A coupe examples are as follows: the popularity of the character will greatly skew mean averages of matchup and tournment wins; different characters generally attract different players and play styles; sampling errors (i.e. fluke loses or lucky wins); regression to the mean (i.e. if more people play certain characters and few people play others the mean of good players from the more widely played character will be matched with the smaller sample of other players); and tournament site characteristics.
Given these limitations, and many others, it is highly inappropriate to make causal inferences based on this data. More appropriate methods of data analysis may include logistic regression, multilevel modeling or structural equation modeling which require statistics software such as Mplus, SYSTAT, or SAS. The use of such statistics could more accurately estimate the so called “tier” status of character qualities. If anyone is interested in these statistics or how too apply these to this game feel free to contact me. Below are some wiki links for basic understanding of how these statistics work, and the extent to the inferences that can be drawn from them.
http://en.wikipedia.org/wiki/Structural_equation_modeling
http://en.wikipedia.org/wiki/Multilevel_models
http://en.wikipedia.org/wiki/Logistic_regression
This post is intended to inform the Smashboards community of some major mathematical and scientific flaws in the creation of a formal “Tier” list, as previously used in melee and currently used in many competitive fighting games. Before critiquing some of the techniques, I want to ensure the readers that in no way am I claiming that all characters are equal, or that this is a perfectly balanced game-- however, I do want to bring up some potential confounds in the methods of data analyses used. My primary reason in addressing this issue is that debates and vehement claims are used regarding the use of such data without much methodological consideration regarding how these numbers were identified. In addition, I hope to inform the competitive community of better and more scientifically sophisticated methods of analyzing data.
The primary source of tier lists are basic statistics that anybody can do with excel (i.e. measures of central tendency, t tests, and regression). However these relatively “simple” methods of data analysis are also very vulnerable to threats to the internal and external validity of these studies. When considering the appropriate method of data analysis you must consider the method in which the data is collected. In all fighting tiers the majority of data is collected via passive observational studies in which tournaments are held and data is collected based on winners and competitors in both matchups and in general. This method of data collection does not account for a large quantity of other factors that contribute to the “win” and the “lose”. A coupe examples are as follows: the popularity of the character will greatly skew mean averages of matchup and tournment wins; different characters generally attract different players and play styles; sampling errors (i.e. fluke loses or lucky wins); regression to the mean (i.e. if more people play certain characters and few people play others the mean of good players from the more widely played character will be matched with the smaller sample of other players); and tournament site characteristics.
Given these limitations, and many others, it is highly inappropriate to make causal inferences based on this data. More appropriate methods of data analysis may include logistic regression, multilevel modeling or structural equation modeling which require statistics software such as Mplus, SYSTAT, or SAS. The use of such statistics could more accurately estimate the so called “tier” status of character qualities. If anyone is interested in these statistics or how too apply these to this game feel free to contact me. Below are some wiki links for basic understanding of how these statistics work, and the extent to the inferences that can be drawn from them.
http://en.wikipedia.org/wiki/Structural_equation_modeling
http://en.wikipedia.org/wiki/Multilevel_models
http://en.wikipedia.org/wiki/Logistic_regression