What’s New in Abmarl
Abmarl version 0.2.7 features the new Smart Simulation and Registry, which streamlines creating simulations by allowing components to be specified at the simulation’s initialization; a new Ammo Agent that restricts how many attacks an agent can issue during the simluation; the ability to barricade a target with barriers at the start of a simulation; and an updated Debugger that outputs the log file by event, so you can see each action and state update in order.
Smart Simulation and Registry
Previously, changing a component in the simulation required a change to the simulation definition. For example, changing between the PositionCenteredEncodingObserver and the AbsoluteEncodingObserver in the Team Battle Simulation required users to manually change the simulation definition or to define multiple simulations that were exactly the same but had a differet observer. The Smart Simulation streamlines creating simulations by allowing components to be specified at the simulation’s initialization, instead of requiring them to be specified in the simulation definition. This avoids workflow issues where the config file in an output directory is including a different version of the simulation than what was used in training caused by the user changing the simulation definition between training runs.
States, Observers, and
Dones can be given at initialization as the class (e.g.
TargetDone). Any registered component can also
be given as the class name (e.g.
Built in features are automatically registered,
and users can register custom components.
Ammo Agents have limited ammunition that determines how many attacks they can issue per simulation. The Attack Actors interpret the ammunition in conjunction with simultaneous attacks to provide the ability to determine both how many attacks can be issued per step and, with the addition of Ammo Agents, how many attacks can be issued during the entire simulation. Agents that have run out of ammo will still be able to chose to attack, but that attack will be unsuccessful.
Similar to the MazePlacementState, Abmarl now includes the ability to cluster barriers around the target in such a way that the target is completely enclosed. For example, a target with 8 barriers will provide a single layer of barricade, 24 barriers two layers, 48 barriers three, and so on (with some variation if the target starts near an edge or corner). The following animation shows some example starting states using the TargetBarriersFreePlacementState:
Debugging by Event
Abmarl’s Debugger now outputs log files by agent and by event to the output directory. The file Episode_by_agent.txt organizes the data by type and then by agent, so one can see all the observations made by a specific agent during the simulation, or all the actions made by an agent during the simulation. Episode_by_event.txt, on the other hand, shows the events in order, starting with reset and moving through each step.
attack_counthas been changed to
simultaneous_attacksto deconflict the concept with the new ammunition feature.
Attack mapping now expects a set of attackable encodings instead of a list.
The SingleGridObserver has been changed to the PositionCenteredEncodingObserver.
The MultiGridObserver has been changed to the StackedPositionCenteredEncodingObserver.
Abmarl provides a custom box space that will return true when checking if a single numeric value is in a Box space with dimension 1. That is, Abmarl’s Box does not distinguish between
24; both are in, say,
Box(-3, 40, (1,), int).
MazePlacementState can take the target agent by object or by id, which is useful in situations where one does not have the target object, such as if one is building the sim from an array with an object registry.
Enhanced RLlib’s wrapper for less warnings when training with RLlib.
The TurnBasedManager no longer expects output from non-learning agents, that is, entities in the simulation that are not observing or acting.
Inactive agents no longer block.
The Debug command line interface now makes use of the
-sargument, which specifies simulation horizon (i.e. max steps to take in a single run).