What’s New in Abmarl

Abmarl version 0.2.7 features the new Smart Simulation and Registry, which streamlines creating simulations by allowing components to be specified at the simulation’s initialization; a new Ammo Agent that restricts how many attacks an agent can issue during the simluation; the ability to barricade a target with barriers at the start of a simulation; and an updated Debugger that outputs the log file by event, so you can see each action and state update in order.

Smart Simulation and Registry

Previously, changing a component in the simulation required a change to the simulation definition. For example, changing between the PositionCenteredEncodingObserver and the AbsoluteEncodingObserver in the Team Battle Simulation required users to manually change the simulation definition or to define multiple simulations that were exactly the same but had a differet observer. The Smart Simulation streamlines creating simulations by allowing components to be specified at the simulation’s initialization, instead of requiring them to be specified in the simulation definition. This avoids workflow issues where the config file in an output directory is including a different version of the simulation than what was used in training caused by the user changing the simulation definition between training runs.

States, Observers, and Dones can be given at initialization as the class (e.g. TargetDone). Any registered component can also be given as the class name (e.g. "TargetDone"). All Built in features are automatically registered, and users can register custom components.


The Smart Simulation does not currently support Actors, so those must still be defined in the simulation definition.

Ammo Agents

Ammo Agents have limited ammunition that determines how many attacks they can issue per simulation. The Attack Actors interpret the ammunition in conjunction with simultaneous attacks to provide the ability to determine both how many attacks can be issued per step and, with the addition of Ammo Agents, how many attacks can be issued during the entire simulation. Agents that have run out of ammo will still be able to chose to attack, but that attack will be unsuccessful.

Target Barricading

Similar to the MazePlacementState, Abmarl now includes the ability to cluster barriers around the target in such a way that the target is completely enclosed. For example, a target with 8 barriers will provide a single layer of barricade, 24 barriers two layers, 48 barriers three, and so on (with some variation if the target starts near an edge or corner). The following animation shows some example starting states using the TargetBarriersFreePlacementState:

Animation showing starting states using Target Barrier Free Placement State component.

Animation showing a target (green) starting at random positions at the beginning of each episode. Barriers (gray squares) completely enclose the target. Free agents (blue and red) are scattered far from the target.

Debugging by Event

Abmarl’s Debugger now outputs log files by agent and by event to the output directory. The file Episode_by_agent.txt organizes the data by type and then by agent, so one can see all the observations made by a specific agent during the simulation, or all the actions made by an agent during the simulation. Episode_by_event.txt, on the other hand, shows the events in order, starting with reset and moving through each step.


Interface changes

Other Features

  • Abmarl provides a custom box space that will return true when checking if a single numeric value is in a Box space with dimension 1. That is, Abmarl’s Box does not distinguish between [24] and 24; both are in, say, Box(-3, 40, (1,), int).

  • MazePlacementState can take the target agent by object or by id, which is useful in situations where one does not have the target object, such as if one is building the sim from an array with an object registry.

  • A new TargetDestroyedDone, which is similar to the already-existing TargetAgentDone, but the target must become inactive in order for the agent to be considered done.

  • Enhanced RLlib’s wrapper for less warnings when training with RLlib.

Bug fixes

  • The TurnBasedManager no longer expects output from non-learning agents, that is, entities in the simulation that are not observing or acting.

  • Inactive agents no longer block.

  • The Debug command line interface now makes use of the -s argument, which specifies simulation horizon (i.e. max steps to take in a single run).