The problem with your and Darklin401's observations is that you're not adequately controlling for the fact that, as amiibos level up, their behavior will naturally change from that of a lv.1 AI to a lv.9 AI. Or something similar. The reason they start playing smarter is because they're switching to a superior set of pre-defined scripts. Even inside those pre-defined scripts, there's a great deal of randomness, which would explain why the AI sometimes does one thing and not another. It also explains why it's so easy for people to see patterns in the noise.
There is very little evidence to support the theory that amiibos are just like normal CPUs. In fact, a lot of evidence contradicts that. Amiibos that are well trained are notably superior to their Level 9 counterparts and will consistently outplay them. Furthermore, the normal CPU does not tend to develop strategies and instead follows the simple pattern of "Approach opponent, then attack opponent" while acting faster and smarter as the level goes up. Meanwhile, my Link amiibo clearly swaps between a "projectile spam mode", where it will do things like move to one side of the platform and spam projectiles in various ways, only to swap to a "melee mode" where it runs in and tries to finish off opponents with a sword. I don't see CPU players doing that at all, ever.
In short, this statement blatantly contradicts already established evidence.
That's not true at all. They're the same thing, except that the latter additionally involves code to evaluate the success or failure of actions. And like I said, if they had that code, it would be way smarter for them to just use regular old reinforcement learning.
Dude, what you basically said was "They're not using reinforcement learning, they're using reinforcement learning!" Dafuq do you think reinforcement learning
is? Look I'm an experienced programmer and I have a pretty good idea what is and is not possible. If you don't even know what Big O or O(1) means in terms of measuring program efficiency you're not even anywhere near my level. Status to Response mapping (different people might call it different things) is pretty basic, can be done in constant time (in layman's terms, very quickly), and is exactly the kind of thing I'd expect from Nintendo. People have already used such learning AIs to learn simple games like checkers. (As a side note, most chess AIs use a different method based on completely different properties.)
Dude. No. That was not learning, or the amiibo imitating you. That was the AI being ********. Supervised learning methods take dozens of repetitions AT BEST to learn something. Hundreds is usually more realistic. It's absolutely not going to copy you after you do something once. What kind of genius computer scientists do you think Nintendo or Bamco employs? That wasn't learning, that was you seeing Jesus in a jar of mayonnaise. It's just random noise.
You clearly didn't see what I saw. It mistook my mistake for an effective tactic and deliberately went from "standing perfectly still and watching me die" to "jumping into the waterfall". There's no question in my mind. Considering some of the other things you've said I'm not really inclined to believe your statistics. You are ignoring the fact that these Amiibo are learning in VERY controlled circumstances where they only have to worry about maybe 20-30 things happening at once. The numbers you give are more believable for AI that have to interact with the real world, where the amount of uncontrolled variables can be ridiculous. Computers can easily deal with a small number of variables in real time.
It saw me do something new and wanted to see if it was effective. It experimented. It learned that it was not effective and went back to the tactics that it already knew.
Pretty sure that's just the level 1 CPU being stupid.
Again, I really doubt that the amiibo mimic the normal CPUs in any meaningful way. There are a ton of habits that the level 9 CPUs have that the amiibo clearly don't, such as psychic shield timing or dodge timing. Yes the amiibo dodge well and shield well, but none of the blatant cheating that the CPUs are actually capable of.
If this observation's actually legit, as opposed to more randomness, the more likely explanation would be that the amiibo switched from a level 3 CPU to a level 4 one (for example), and they have different scripts for projectile use. If there WERE reinforcement learning going on (basically learning by experimentation), it wouldn't make sense for the amiibo to start experimenting with moves in chunks. No method I've heard of does it that way.
You clearly don't understand how reinforcement learning works, but we've already established that. It makes perfect sense to limit move learning to chunks. Or rather, said another way, it makes a lot more sense to limit the amiibo to, say, normals at first. By focusing on one thing the amiibo can get a pretty good grasp on it before deciding to expand its horizons. We already know that the amiibo AI, post-training, is superior to the default CPU and so there's clearly something more going on than just "it becomes a higher level CPU". The amiibo clearly changes its behavior according to what does and does not work, and I've been able to prove that to myself through dozens of controlled experiments.
Keep in mind that
humans learn through reinforcement learning, or "conditioning" as psychologists like to call it. Humans also like to learn things in chunks, before moving on to other things. We learn normals, smashes, aerials and specials, and then after that we move on to ATs and combos. Also consider the education system. You don't learn Calculus at the same time that you learn Arithmetic. No, you learn the "chunck" of Arithmetic, then Algebra or Geometry, then Calc, and so on... Learning in chunks is the smarter way to do it. In fact, for any sufficiently complex topic, learning in chunks is the
only way to do it.
Like I said, I tried to train my link as a grabber by doing nothing but shielding and using grounded down-b as kirby. He was much more successful when grabbing me than when using other attacks. But as far as I could tell, he did not grab any more than a regular CPU does.
Because you did not do it correctly. If you wanted to train him as a grabber, you should have picked Link, only attacked via grabs, and
then try to shield every thing the amiibo does. It would have imitated the successful grab, then found success on its own because you kept shielding. The way you did it, maybe it would grab, or maybe it'd do other things. Even if it found out that grabs were effective it would still experiment on its own until it realized that other tactics failed. Unless it sees you grabbing, and having success with grabbing, it's not going to have any reason to think that grabbing is effective.
I'm not yet ready to rule out learning entirely. Like I said, the amiibo's scripted behaviors are quite complicated and fairly random, which can make it difficult to see whether or not there's a real pattern to their madness. But I haven't found any decent evidence in favor of learning yet.
The randomness is in the amiibo randomly trying new things, to see if what's effective. If it doesn't know what to do, or wants to try a new move, it randomly selects something to try.
Edit: there might not be learning going on, but it's still possible that there are scripted behaviors that are amiibo-exclusive. It would be nice if, say, amiibos could go up to being a level 10 CPU or something.
I highly doubt it's just the same CPU but better, as it doesn't follow the simple pattern of approach->destroy that CPUs tend to do.