• Welcome to Smashboards, the world's largest Super Smash Brothers community! Over 250,000 Smash Bros. fans from around the world have come to discuss these great games in over 19 million posts!

    You are currently viewing our boards as a visitor. Click here to sign up right now and start on your path in the Smash community!

The SBR Online Rating Project

Nintendude

Smash Hero
Joined
Feb 23, 2006
Messages
5,024
Location
San Francisco
Greetings Smash 64 players! The Smash 64 Backroom is going to be kicking off a new ratings project designed to promote friendly competition among Smash 64's online community.

We will be calculating ratings using an adaptation of the Elo Ratings System. The basic premise of Elo ratings is that they compute relative skill levels using tournament results. Unrated players begin at 1200. Your rating increases when you win and decreases when you lose. The bigger the rating differential between you and your opponent, the more your rating will change after a match.

All TOs may apply for their tournaments to be rated! In order for your tournament to be rated, it must follow these guidelines:
  • Recommended Backroom rules must be used, including the stage striking procedure and stage list.
  • Tournaments must be seeding according to rating. Obviously this won't apply until there is a ratings database.
  • Tournaments must be submitted as completed tio files to nintendude1189@gmail.com. Click here to get the latest release of tio tournament organizer.
  • All names in your tio file must be entered in such a way that there is no confusion about who the players are.
  • All entrants of your tournament must be from the same continent (with reasonable exceptions). Having a single European in a North American tournament will invalidate it to be rated. Likewise, having Americans in a European tournament will invalidate it.

Even though regions cannot be mixed for rated events, the ratings list will be universal. A rating calculated vs. European opponents is just as valid as a rating calculated vs. American opponents.

Please feel free to ask any questions you may have about making your tournament an official SBR rated event!

CURRENT RATINGS
Name | Rating
Peek|1214
JaimeHR|1142
Near|1125
Star King|1092
chef|1050
Surri-Sama|1020
Kangaskhan|990
$N@K3|987
Battlecow|975
Nova|945
clubba22|915
Hai|900

Rated events:
SBR Kickoff Event
2011 Send-Off
 

Battlecow

Play to Win
Joined
May 19, 2009
Messages
8,746
Location
Chicago
This is a really good idea.

You should combine it with clubba's league idea and with the ladder tho, otherwise we have too many "rating" things going on.
 

Nintendude

Smash Hero
Joined
Feb 23, 2006
Messages
5,024
Location
San Francisco
Ratings serve a different purpose from leagues and other competitive organizations. This is basically going to be the universal system for indicating player strength, modeled after how competitive scenes like chess do it.
 

Sangoku

Smash Master
Joined
Apr 25, 2010
Messages
3,931
Location
Geneva, Switzerland
"Even though regions cannot be mixed for rated events, the ratings list will be universal. A rating calculated vs. European opponents is just as valid as a rating calculated vs. American opponents."

Just wondering, how can you do that? The best of Europe and the best of America will have the same ELO? If yes, how can you be sure that one isn't significantly better than the other? And if no, how would you compute how much they differ?
 

Battlecow

Play to Win
Joined
May 19, 2009
Messages
8,746
Location
Chicago
You just an (E) or something next to their rating if they're European, etc. That way no one gets mixed up.

Also, forfeited matches shouldn't count for ratings. There are just too many of them and they'd screw up the system.
 

Nintendude

Smash Hero
Joined
Feb 23, 2006
Messages
5,024
Location
San Francisco
Forfeited matches count as a loss. Otherwise people can forfeit matches in order to dodge opponents they think they would lose to.

Sangoku, there obviously are limitations when you have two non-connected groups of players under the same rating system, but the idea is you are rating players in a completely non-biased manner. Who are we to assume that Europe's overall skill level differs from America's overall skill level (ignoring Boom and malva)?
 

Battlecow

Play to Win
Joined
May 19, 2009
Messages
8,746
Location
Chicago
Forfeited matches count as a loss. Otherwise people can forfeit matches in order to dodge opponents they think they would lose to.

Sangoku, there obviously are limitations when you have two non-connected groups of players under the same rating system, but the idea is you are rating players in a completely non-biased manner. Who are we to assume that Europe's overall skill level differs from America's overall skill level (ignoring Boom and malva)?
This is going to work poorly.

1. People have to miss matches all the time, and they wouldn't skip to save their rating b/c they like to win tourneys. Example: Boom has to miss a week of the SSBL for Genesis. Bam, he loses mad points, the people who "beat" him go way up, the people he beats next time go down a lot, and the system's ****ed. This happens on a very regular basis, mind you. Every tournament has multiple DQ's.

2. The skill levels are obviously going to be different to some degree. If the Euros aren't playing the Americans, it IS biased to say that beating a certain percentage of Euros is the same as beating a certain percentage of NA'ers. Just differentiate a little bit and everything will be fine.

****ing backroom, swear to god. It would have just killed this project to have had input from us noobs, huh?
 

asianaussie

Smash Hero
Joined
Mar 14, 2008
Messages
9,337
Location
Sayonara Memories
Just take it with a grain of salt.

Not hard unless people try to trumpet their success using the rankings as indisputable evidence, in which case ignore the idiots.
 

Battlecow

Play to Win
Joined
May 19, 2009
Messages
8,746
Location
Chicago
Yes, we could take it with a grain of salt. It wouldn't be the end of the world.

Or we could, y'know, differentiate, and then we wouldn't have to, and it would be better.
 

The Star King

Smash Hero
Joined
Nov 6, 2007
Messages
9,681
This is going to work poorly.

1. People have to miss matches all the time, and they wouldn't skip to save their rating b/c they like to win tourneys. Example: Boom has to miss a week of the SSBL for Genesis. Bam, he loses mad points, the people who "beat" him go way up, the people he beats next time go down a lot, and the system's ****ed. This happens on a very regular basis, mind you. Every tournament has multiple DQ's.
Because they like to win tourneys? If I was up against, say, malva, I know I'm going to lose anyways, so if there's no punishment for forfeits, the logical thing to do would be to forfeit and preserve my rating.

Also, it's your problem if you get DQ'ed because of real life obligations. I know I was recently a victim of this myself, but it's my fault for foolishly thinking that I would have time for your tournament. Besides, everybody forfeiting every match they expect to lose will be far more harmful then the hypothetical scenario you gave.

Or we could, y'know, differentiate, and then we wouldn't have to, and it would be better.
How would you suggest doing that? I don't think there's any good solution.
 

Nintendude

Smash Hero
Joined
Feb 23, 2006
Messages
5,024
Location
San Francisco
battlecow, you do realize that a lot of other online gaming communities use very similar systems right? At chess.com there are match forfeits / time outs ALL THE TIME, but that doesn't invalidate their rating system - it is still a very good indicator of skill level.
 

Battlecow

Play to Win
Joined
May 19, 2009
Messages
8,746
Location
Chicago
Because they like to win tourneys? If I was up against, say, malva, I know I'm going to lose anyways, so if there's no punishment for forfeits, the logical thing to do would be to forfeit and preserve your rating.

Also, it's your problem if you get DQ'ed because of real life obligations. I know I was recently a victim of this myself, but it's my fault for foolishly thinking that I would have time for your tournament. Besides, everybody forfeiting every match they expect to lose will be far more harmful then the hypothetical scenario you gave.



How would you suggest doing that? I don't think there's any good solution.
If you were up against malva, your rating would suffer onltiy very slightly when you lost. If someone is so good you know you'll lose, the effect on your rating is minimal.

It's your problem if you get DQ'd, but your misrating affects everyone negatively. The people you play later, etc.

How to differentiate? Just stick an (E) by the Europeans' ratings, etc.

For example:

Olikus would be rated a 1960 (E)

Battlecow would be rated a 9001 (A)

King Funk would be rated a 730 (E)

Star King would be rated an 1190 (A)

Doesn't even need to be that. Just some sort of official acknowledgement that ratings don't translate intercontinentally.
 

Nintendude

Smash Hero
Joined
Feb 23, 2006
Messages
5,024
Location
San Francisco
battlecow, you are making the assumption that the average strength of American and European smashers is unequal. That may be true, but there is no way to actually prove that unless the regions mix. In the event that regions do mix (players travel, etc.), the system will have new information and adjust accordingly with the results. It is a completely unbiased way of rating players since it makes no assumptions.

I have no problem with designating regions though. Actually, I can just put flags next to everyone's name.
 

NovaSmash

Banned via Administration
Joined
Dec 28, 2009
Messages
2,012
Location
Marietta, Ga
3DS FC
2079-8171-3301
wait, so there not gonna be separate rankings for North America and Europe?
 

The Star King

Smash Hero
Joined
Nov 6, 2007
Messages
9,681
If you were up against malva, your rating would suffer onltiy very slightly when you lost. If someone is so good you know you'll lose, the effect on your rating is minimal.
So? A small deduction in rating is still worse than no change, and the logical thing to do is still forfeit.

It's your problem if you get DQ'd, but your misrating affects everyone negatively. The people you play later, etc.
I want to say no to this, but this might be true. I'm not exactly sure how the rating system works, and that's what it depends on:

If changes in your rating looks at only your wins and losses vs other people and their ratings, then no, it doesn't.

If changes in your rating looks at your placing relative to other people and their ratings, then yes, it does.

But even if it's the latter, I still think that everybody dropping out of every match they expect to lose would harm the system far more.

How to differentiate? Just stick an (E) by the Europeans' ratings, etc.

For example:

Olikus would be rated a 1960 (E)

Battlecow would be rated a 9001 (A)

King Funk would be rated a 730 (E)

Star King would be rated an 1190 (A)

Doesn't even need to be that. Just some sort of official acknowledgement that ratings don't translate intercontinentally.
Oh, by differentiate I thought you meant a way to make up for the differences in skill level for different regions and use a different base number or something. Bleh never mind, comprehension fail on my part. I guess I agree with you here, then.
 

Nintendude

Smash Hero
Joined
Feb 23, 2006
Messages
5,024
Location
San Francisco
First of all they are ratings, not rankings. Rankings are a measure of absolute skill while ratings are a measure of relative skill. Ratings can be used as a means of ranking people, but rankings can't be used to rate people. Does that make sense?

This is another reason why you can compare ratings cross-continentally. If Maddy ends up with the second highest rating on the list, that doesn't necessarily imply that he is ranked #2.

To actually answer your question, Nova, there will be a single list of ratings but I will designate region.
 

Battlecow

Play to Win
Joined
May 19, 2009
Messages
8,746
Location
Chicago
battlecow, you do realize that a lot of other online gaming communities use very similar systems right? At chess.com there are match forfeits / time outs ALL THE TIME, but that doesn't invalidate their rating system - it is still a very good indicator of skill level.
I'm familiar with the system. Chess.com has a gigantic community, so they have a margin of error to work with; one forfeit doesn't have near as much effect (and I'd bet there are a lot fewer forfeits at high levels). More importantly, they're ALL ABOUT the ratings, and there's no tightly-knit community like the SWF 64 one (so people would be far more likely to forfeit and keep their rating). Most importantly of all, the majority of matches on Chess.com are essentially friendlies; 1v1's outside of tourney, where the only consequence of loss is rating-change. That's obviously a totally different system than the one we'd have.
 

Nintendude

Smash Hero
Joined
Feb 23, 2006
Messages
5,024
Location
San Francisco
On chess.com you can choose if you want your match to be rated or not, just like how in SSB64 you can choose whether or not to enter a rated tournament. I don't really see the significance of any of your other points. Smash 64 may be a small community but one of the goals behind this project is to expand and generate more interest - ratings give people something to work towards and a way to gauge their improvement.
 

Battlecow

Play to Win
Joined
May 19, 2009
Messages
8,746
Location
Chicago
OK... so you can choose if you want your match rated or not. How is that significant?

Look, on chess.com, most rated matches are friendlies. Forfeiting in those would be a much-abused strategy if it didn't change ratings because there's no tournament that you have to stay in, there's no community calling you out for being a *****, and, oh yeah, you can do it in the middle of a match when you see that you're gonna lose.

Forfeiting wouldn't be abused in this community. Why?

-We all know each other. If someone consistently ****ies out of matches that would hurt their ratings, we'd censure them.

- You want to not get DQ'd from the entire tourney and possibly kept from entering future tourneys, even if it means a tiny drop in rating.

Also, a forfeits-count rule would ruin this system because

-We have far more unavoidable forfeits than Chess.com does (both players have to be online at the same time, and at their own computer with their controller/whatever rather than just in a place with internet access)

-Our community is much smaller, so the effects of an undeserved rating would be more keenly felt.

None of this is hard to understand.

Of course, lag-forfeits would still count, and people who too often couldn't do their matches would be kept from future tournaments.
 

NovaSmash

Banned via Administration
Joined
Dec 28, 2009
Messages
2,012
Location
Marietta, Ga
3DS FC
2079-8171-3301
im still not understanding how u can lump different regions in the same ratings when they dont play each other even if ratings are just a relative measure of skill
 

Nintendude

Smash Hero
Joined
Feb 23, 2006
Messages
5,024
Location
San Francisco
One of the issues that arises when you completely separate ratings into regions is when players travel. It's nonsense to have that player be considered unrated just because they are playing different opponents. Way down the road, when the ratings are in a database, there will be functionality to separate ratings lists by region.

I'm not really sure what to say battlecow, except I simply disagree with your points. You are treating SSB64 as a bubble community, and basing a lot of your arguments on that. I'm saying that SSB64 can (and hopefully will) be much more than that, and this system must be prepared for that. Not to mention that eventually this project can be assimilated into the overall Smash Elo rating project in which all players can have profiles, with Melee, Brawl, online 64, and console 64 ratings.

If you don't like it, nobody's forcing you to participate. I think that the value of this method will become apparent over time.
 

clubbadubba

Smash Master
Joined
Apr 27, 2011
Messages
4,086
By lumping regions in one rating system, you are making an assumption. You assume that they are comparable, which is much more unlikely than that they are not comparable. It should be separated by region, like the SC2 ladders, where there is one for each region and they are completely separate.

Forfeits shouldn't count. If someone seriously is gonna forfeit out of a tournament for this, then they have problems, and may be a little too obsessed with how good people think they are at smash. Easy to spot repeat offenders and get rid of them, as battlecow said.

Also, would you take SSBL results in this? Because it says they have to be seeded by rating. My league isn't exactly a tournament, but it is competitive play. Maybe you should change this to include all forms of competitive play, not just tournaments.
 

Nintendude

Smash Hero
Joined
Feb 23, 2006
Messages
5,024
Location
San Francisco
I don't think it's an assumption to say that ratings are comparable because that's pretty much the default standard. It's an assumption to believe otherwise. Kinda hard to explain clearly...
I don't see why you guys have such a problem with this. We aren't ranking players. You can still look at a single list of ratings and pick out the top-3 European players and how they stack up among each other.

There can be exceptions for forfeits. Like, if someone enters and doesn't play a single match, it's as if they didn't even enter the tournament. If someone plays a match and then forfeits the rest then those forfeits count.

And yes your SSBL can be rated. We'll discuss this at a later time because I don't have time right now.
 

Battlecow

Play to Win
Joined
May 19, 2009
Messages
8,746
Location
Chicago
I'm not really sure what to say battlecow, except I simply disagree with your points. You are treating SSB64 as a bubble community, and basing a lot of your arguments on that. I'm saying that SSB64 can (and hopefully will) be much more than that, and this system must be prepared for that. Not to mention that eventually this project can be assimilated into the overall Smash Elo rating project in which all players can have profiles, with Melee, Brawl, online 64, and console 64 ratings.

If you don't like it, nobody's forcing you to participate. I think that the value of this method will become apparent over time.
SSB64 is a bubble community. I'd love to see it grow, but that kind of growth, if it happens, will take years. We'll have ample time to adjust if the community gets too large. It's as simple as changing the Elo rating rules by just a smidgeon.

I do like the idea, I just don't like one part of it. Of course I'll participate; I'm only trying to make it so that this will work. It won't work if 25% of all results (that's how many forfeits we had in my America! tourney, close as I can count) are total garbage. And that's what a forfeit is- a garbage result. It'll mess with the system to no end, making the ratings inaccurate to a very high degree.
 

Nintendude

Smash Hero
Joined
Feb 23, 2006
Messages
5,024
Location
San Francisco
I don't think either of us can actually prove our point (that forfeits will or will not make ratings very inaccurate) at this time. I'm sticking with my decision to count forfeits unless someone forfeited every single match. If it becomes apparent that there's a big problem, I may reconsider the policy. Don't forget that ratings become a lot less variable as more matches are played.
 

kys

Smash Ace
Joined
Aug 17, 2009
Messages
660
Location
World Traveler
How many more ratings matches are played on chess.com than here? My guess is a lot. We don't have many competitive events right now. They are few and far between, and a change in the ratings based on forfeits will leave people with a long time before they can right the ship.

The community is small enough now that forfeits can be handled on a case by case basis. Everyone knows everyone here, as cow said, and I think the peeps in charge of the ratings system could judge fairly. If we do grow, then the rules can change accordingly. It's always better to remain flexible.
 

Battlecow

Play to Win
Joined
May 19, 2009
Messages
8,746
Location
Chicago
So you're saying that you think a ratings system wherein 25% of the results are garbage will work fine?

Also kys' point about the few+farbetween.
 

SheerMadness

Smash Master
Joined
Aug 18, 2005
Messages
4,781
Battlecow, this was one of the two systems the backroom decided to try. The other one is the AiB ladder, which had more votes than this system.

But the main person promoting it, me, lost faith in the backroom after certain things happened and I no longer care about promoting it.

You're welcome to if you want though (if you honestly think this system is worthless). You can even make a thread saying it's backroom sponsored, because technically it is, as we all agreed upon.

I'm fine with giving this system a try. It's not going to be very efficient or aesthetically pleasing until there's an automated website for it though.
 

Nintendude

Smash Hero
Joined
Feb 23, 2006
Messages
5,024
Location
San Francisco
So you're saying that you think a ratings system wherein 25% of the results are garbage will work fine?

Also kys' point about the few+farbetween.
I think your stat is really misleading. First of all it's less than 25%. Secondly, it appears that Glowworm didn't play a single match so he would get dropped. Finally, you are excluding the matches that haven't been played yet, and with all the inactive people already DQ'd, chances are all of those matches will be played. Your stat should actually be less than half of what you put it as.

kys's point is also misleading. Look at an individual player who's playing 3-day per move correspondence games (the standard on chess.com). Do you realize how long it takes to complete these games? I have some 3-day games that have been going on for like 4 or 5 months. At a MINIMUM these games take me 1 month to complete. Now THAT is what I call few and far between. I think I've won about 10% of my matches due to timeouts, but my rating is only inflated by 50 points or so. And the moment I lose to someone rated lower due to being overrated (which is about to happen), that stat will correct itself quickly.

It's really not worth my time to debate if I have to deal with twisted claims.
 

Sangoku

Smash Master
Joined
Apr 25, 2010
Messages
3,931
Location
Geneva, Switzerland
I have to agree with Nintendude. That's how statistics work. If you're making a randomized clinical trial and people change group without permission, you still have to include them in their initial group, as if nothing happened. Or if you measure something in any kind of study, you have to keep it, otherwise you're already affecting the "randomness factor".

The goal is to avoid any bias from the statisticians and I think in the long run this method is valid. Just my little opinion.

Also, the different regions might be a problem too. As it is natural to assume average skill level does not differ, we still all agree that if malva is first in America and fays is first in Europe, their level is not equal. Will they get the same elo rating? (real question as I don't know the system) Since it's about relative skill I think the answer is yes. Or?
 

clubbadubba

Smash Master
Joined
Apr 27, 2011
Messages
4,086
Just to add some more data to the picture, week 1 of SSBL is over in an hour and a half, and at the moment 4/16 games have not been played and don't look like they are going to be played. Though sometimes I will hand out 2 losses for one match in SSBL, when neither player tries to make the match happen very hard, and I don't think those should count (nor does elo have a way to count that, I don't think).

Also, regarding the "randomness factor," right now its up to the competition hosts to determine what a forfeit is and when to declare it. Isn't it a conflict of interest to let the host (who is presumably in the rating system) have control over the ratings of 2 other players? Say the host is going to play player A soon, and wants to bump up A's rating before the match. Maybe he would be hastier to declare a forfeit of player B in a match between B and A. I don't think this would actually happen, but I think its along the lines of players forfeiting a match to save their ratings. They are both possible, but not very probable. The system can be manipulated either way we do it, I guess is what I'm saying.

As far as number of games go... we are going to have less, and I don't think thats a question. Serious chess players play lots of 3 day games at once. SC2 players have hundreds of matches recorded (though that is a slightly different system). We will be working with less than 20 matches for most players, based on the number of matches people get in during tournaments and the number of tournaments we actually do. Point being, every match should be accurate and meaningful, i.e. forfeits don't count. Maybe have a maximum number of forfeits over a time interval before you are DQ'd from the rating system?
 

Nintendude

Smash Hero
Joined
Feb 23, 2006
Messages
5,024
Location
San Francisco
With respect to intercontinental ratings, consider the real-life analog of this system: tournaments that feature only players from a single continent. If there's ever mixing of continents, that means that a player traveled to another region. By disallowing intercontinental play to be rated, I am basically mimicking this real-life scenario. I think most people would agree that for console play, there's no question that there must be a universal rating system. Imagine what chess would be like if each continent had their own set of ratings. Separating tournaments into tiers of ratings would be chaotic.

clubba, in response to your own event, I may reconsider rating it. The reason is that players signed up for it not knowing that it would be rated. I think you guys are also ignoring that many people will care about their rating (especially if it is used to seed, which will be a requirement for rated events) and will have further incentive to actually play their matches since there is now something on the line at all times.

I don't really understand your point about manipulation. Bracket matches will be rated sequentially and pools will be rated simultaneously. With clear guidelines, there's no "manipulation."
 

Battlecow

Play to Win
Joined
May 19, 2009
Messages
8,746
Location
Chicago
The TO can manipulate **** for his own benefit, is what he's saying.

If my stat about the America tourney was misleading, it's because I just counted the losses and the total matches and didn't think too hard.

I'm fine with universal ratings as long as it's clear who's European and who's North American, and that they aren't the same thing.
 

clubbadubba

Smash Master
Joined
Apr 27, 2011
Messages
4,086
I meant that whether or not forfeits are counted, there is always someone who will be able to use that to their advantage.

About the league, that's okay. Maybe next season I'll let everyone know that it will be rated beforehand.

Also, might be a problem with seeding tournaments using the rating system. With chess, its not just tournament play (at least for online chess). They play serious games online apart from tourneys that count for their rating. Problem is, with our competitive system, almost all of our games are tournament play. Which means that the worst players do nothing but play the best players in the first round of every tournament. The number of rated matches they play that are winnable are scarce. Middle of the pack players, however, play far more winnable games against each other. I don't think the problem is in seeding tournaments, I think the problem is that nobody plays serious matches outside of tourneys. It would be nice if 2 players could decide to play a rated match whenever they wanted and report it back to you. Not trying to knock the system, I think its a great idea.
 

Nintendude

Smash Hero
Joined
Feb 23, 2006
Messages
5,024
Location
San Francisco
One of the ideas I had was to have BR-hosted round robins, either weekly or biweekly. They wouldn't really be tournaments, just collections of rated matches. What's good about round robins is that nobody can intentionally dodge opponents; if there were only individual matches, people can easily be selective about their opponents.

Also, for the seeding, keep in mind that when a 16 seed loses to a 1 seed there's almost no rating change. Then when that 16 seed is in the losers bracket, he'll play other people that lost in the first round and have a shot a boosting his rating. The top seeds, while they play more matches, don't really get much of a boost from their first couple matches. Of course, this is all more realistic once the ratings become established.
 
Top Bottom