• Welcome to Smashboards, the world's largest Super Smash Brothers community! Over 250,000 Smash Bros. fans from around the world have come to discuss these great games in over 19 million posts!

    You are currently viewing our boards as a visitor. Click here to sign up right now and start on your path in the Smash community!

Media AI learning to play smash in real time

MeleeNeat

Smash Rookie
Joined
Jan 11, 2021
Messages
1
Just figured I'd drop a link in here for anyone interested in following along side this project. I've included a quick write up of what the project is.
This is a NEAT algorithm implementation running on a Kotlin serve environment. The Kotlin server connects to a python interface using the library LibMelee to directly read and write to an active Melee instance. You can learn more about NEAT here.

The AI is configured with 53 inputs and 9 outputs. This experiment started with a population of 200. The input is 26 data points for player 1, and 26 for the other player, with an additional distance metric to round it out to 53.

Each player sends the following data: stock, x, y, speedXAir, speedXGround, speedXAttack, speedYAttack, speedY, percent, facingRight (bool), 8 values defining the environmental collision box, currentAction, isAttack (bool), isGrab(bool), isBMove(bool), isShield(bool), rangeBackward (of action), rangeForward (of action), hitboxCount (of action)

The AI receives every frame and replies with a vector of controller commands. The corresponding output of this vector maps 0-8 to [A, B, Y, Z, cStickX, cStickY, mainStickX, mainStickY, leftShoulder]. The buttons A, B, Y, and Z are digital so the output is mapped from a 0-1 value and rounds up or down on .5. The remaining outputs are analog and map from 0 -1. The sticks being in a neutral position at .5. The should shield button has additional attention due to the impact of the shield, the mapping is as follows:
If x < .5 THEN 0
If x >=.5 THEN f(x)=2(x-.5)
This leaves the range of .5-1 to be mapped to the controllers output of 0-1, with one being a digital press.

The Evaluation in this rendition uses a time growing boundary and a score linked to damage. The damage score is augmented by various events such as getting a kill, dying, the damage dealt / damage taken, and so on (I will return to this section to get more specific here). The time growing boundaries are controlled by clock mechanisms, which serve as a playtime reward but not necessarily linked to the score reward. The agent does not know the criteria it is being evaluated on.

Ask any questions, and please help me improve the ability for new viewers to understand what is going on on this stream. Many improvements to come.
 
Top Bottom