Abmarl API Specification

Abmarl Simulations

class abmarl.sim.PrincipleAgent(id=None, seed=None, **kwargs)

Principle Agent class for agents in a simulation.

property active

True if the agent is still active in the simulation.

Active means that the agent is in a valid state. For example, suppose agents in our Simulation can die. Then active is True if the agents are alive or False if they’re dead.

property configured

All agents must have an id.

finalize(**kwargs)
property id

The agent’s unique identifier.

property seed

Seed for random number generation.

class abmarl.sim.ObservingAgent(observation_space=None, null_observation=None, **kwargs)

ObservingAgents can observe the state of the simulation.

The agent’s observation must be in its observation space. The SimulationManager will send the observation to the Trainer, which will use it to produce actions.

property configured

Observing agents must have an observation space.

finalize(**kwargs)

Wrap all the observation spaces with a Dict and seed it if the agent was created with a seed.

property null_observation

The null point in the observation space.

property observation_space
class abmarl.sim.ActingAgent(action_space=None, null_action=None, **kwargs)

ActingAgents can act in the simulation.

The Trainer will produce actions for the agents and send them to the SimulationManager, which will process those actions in its step function.

property action_space
property configured

Acting agents must have an action space.

finalize(**kwargs)

Wrap all the action spaces with a Dict if applicable and seed it if the agent was created with a seed.

property null_action

The null point in the action space.

class abmarl.sim.Agent(observation_space=None, null_observation=None, **kwargs)

Bases: ObservingAgent, ActingAgent

An Agent that can both observe and act.

class abmarl.sim.AgentBasedSimulation

AgentBasedSimulation interface.

Under this design model the observations, rewards, and done conditions of the agents is treated as part of the simulations internal state instead of as output from reset and step. Thus, it is the simulations responsibility to manage rewards and dones as part of its state (e.g. via self.rewards dictionary).

This interface supports both single- and multi-agent simulations by treating the single-agent simulation as a special case of the multi-agent, where there is only a single agent in the agents dictionary.

property agents

A dict that maps the Agent’s id to the Agent object. An Agent must be an instance of PrincipleAgent. A multi-agent simulation is expected to have multiple entries in the dictionary, whereas a single-agent simulation should only have a single entry in the dictionary.

finalize()

Finalize the initialization process. At this point, every agent should be configured with action and observation spaces, which we convert into Dict spaces for interfacing with the trainer.

abstract get_all_done(**kwargs)

Return the simulation’s done status.

abstract get_done(agent_id, **kwargs)

Return the agent’s done status.

abstract get_info(agent_id, **kwargs)

Return the agent’s info.

abstract get_obs(agent_id, **kwargs)

Return the agent’s observation.

abstract get_reward(agent_id, **kwargs)

Return the agent’s reward.

abstract render(**kwargs)

Render the simulation for vizualization.

abstract reset(**kwargs)

Reset the simulation simulation to a start state, which may be randomly generated.

abstract step(action, **kwargs)

Step the simulation forward one discrete time-step. The action is a dictionary that contains the action of each agent in this time-step.

class abmarl.sim.DynamicOrderSimulation

An AgentBasedSimulation where the simulation chooses the agents’ turns dynamically.

property next_agent

The next agent(s) in the game.

Abmarl Simulation Managers

class abmarl.managers.SimulationManager(sim)

Control interaction between Trainer and AgentBasedSimulation.

A Manager implmenents the reset and step API, by which it calls the AgentBasedSimulation API, using the getters within reset and step to accomplish the desired control flow.

sim

The AgentBasedSimulation.

agents

The agents that are in the AgentBasedSimulation.

done_agents

Set of agents that are done.

render(**kwargs)
abstract reset(**kwargs)

Reset the simulation.

Returns

The first observation of the agent(s).

abstract step(action_dict, **kwargs)

Step the simulation forward one discrete time-step.

Parameters

action_dict – Dictionary mapping agent(s) to their actions in this time step.

Returns

The observations, rewards, done status, and info for the agent(s) whose actions we expect to receive next.

Note: We do not necessarily return anything for the agent whose actions we just received in this time-step. This behavior is defined by each Manager.

class abmarl.managers.TurnBasedManager(sim)

The TurnBasedManager allows agents to take turns. The order of the agents is stored and the obs of the first agent is returned at reset. Each step returns the info of the next agent “in line”. Agents who are done are removed from this line. Once all the agents are done, the manager returns all done.

reset(**kwargs)

Reset the simulation and return the observation of the first agent.

step(action_dict, **kwargs)

Assert that the incoming action does not come from an agent who is recorded as done. Step the simulation forward and return the observation, reward, done, and info of the next agent. If that next agent finished in this turn, then include the obs for the following agent, and so on until an agent is found that is not done. If all agents are done in this turn, then the wrapper returns all done.

class abmarl.managers.AllStepManager(sim)

The AllStepManager gets the observations of all agents at reset. At step, it gets the observations of all the agents that are not done. Once all the agents are done, the manager returns all done.

reset(**kwargs)

Reset the simulation and return the observation of all the agents.

step(action_dict, **kwargs)

Assert that the incoming action does not come from an agent who is recorded as done. Step the simulation forward and return the observation, reward, done, and info of all the non-done agents, including the agents that were done in this step. If all agents are done in this turn, then the manager returns all done.

class abmarl.managers.DynamicOrderManager(sim)

The DynamicOrderManager allows agents to take turns dynamically decided by the Simulation.

The order of the agents is dynamically decided by the simulation as it runs. The simulation must be a DynamicOrderSimulation. The agents reported at reset and step are those given in the sim’s next_agent property.

reset(**kwargs)

Reset the simulation and return the observation of the first agent.

step(action_dict, **kwargs)

Assert that the incoming action does not come from an agent who is recorded as done. Step the simulation forward and return the observation, reward, done, and info of the next agent. The simulation is responsible to ensure that there is at least one next_agent that did not finish in this turn, unless it is the last turn.

Abmarl Wrappers

class abmarl.sim.wrappers.RavelDiscreteWrapper(sim)

Convert complex observations and action spaces into Discrete spaces.

Convert Discrete, MultiBinary, MultiDiscrete, bounded integer Box, and any nesting of these observations and actions into Discrete observations and actions by “ravelling” their values according to numpy’s ravel_mult_index function. Thus, observations and actions that are represented by arrays are converted into unique numbers. This is useful for building Q tables where each observation and action is a row and column of the Q table, respectively.

If the agent has a null observation or a null action, that value is also ravelled into the new Discrete space.

unwrap_action(from_agent, action)
unwrap_observation(from_agent, observation)
wrap_action(from_agent, action)
wrap_observation(from_agent, observation)
class abmarl.sim.wrappers.FlattenWrapper(sim)

Flattens all agents’ action and observation spaces into continuous Boxes.

unwrap_action(from_agent, action)
unwrap_observation(from_agent, observation)
wrap_action(from_agent, action)
wrap_observation(from_agent, observation)
class abmarl.sim.wrappers.SuperAgentWrapper(sim, super_agent_mapping=None, **kwargs)

The SuperAgentWrapper creates “super” agents who cover and control multiple agents.

The super agents take the observation and action spaces of all their covered agents. In addition, the observation space is given a “mask” channel to indicate which of their covered agents is done. This channel is important because the simulation dynamics change when a covered agent is done but the super agent may still be active (see comments on get_done). Without this mask, the super agent would experience completely different simulation dynamcis for some of its covered agents with no indication as to why.

Unless handled carefully, the super agent will generate observations for done covered agents. This may contaminate the training data with an unfair advantage. For exmample, a dead covered agent should not be able to provide the super agent with useful information. In order to correct this, the user may supply the null observation for an ObservingAgent. When a covered agent is done, the SuperAgentWrapper will try to use its null observation going forward.

Furthermore, super agents may still report actions for covered agents that are done. This wrapper filters out those actions before passing them to the underlying sim. See step for more details.

get_done(agent_id, **kwargs)

Report the agent’s done condition.

Because super agents are composed of multiple agents, it could be the case that some covered agents are done while other are not for the same super agent. Because we still want those non-done agents to interact with the simulation, the super agent only reports done when ALL of its covered agents report done.

Parameters

agent_id – The id of the agent for whom to report the done condition. Should not be a covered agent.

Returns

The requested done conndition. Super agents are done when all their

covered agents are done.

get_info(agent_id, **kwargs)

Report the agent’s additional info.

Parameters

agent_id – The id of the agent for whom to get info. Should not be a covered agent.

Returns

The requested info. Super agents info is collected from covered agents.

get_obs(agent_id, **kwargs)

Report observations from the simulation.

Super agent observations are collected from their covered agents. Super agents also have a “mask” channel that tells them which of their covered agent is done. This should assist the super agent in understanding the changing simulation dynamics for done agents (i.e. why actions from done agents don’t do anything).

The super agent will report an observation for done covered agents. This may result in an unfair advantage during training (e.g. dead agent should not produce useful information), and Abmarl will issue a warning. To properly handle this, the user can supply the null observation for each covered agent. In that case, the super agent will use the null observation for any done covered agents.

Parameters

agent_id – The id of the agent for whom to produce an observation. Should not be a covered agent.

Returns

The requested observation. Super agent observations are collected from the covered agents.

get_reward(agent_id, **kwargs)

Report the agent’s reward.

A super agent’s reward is the sum of all its active covered agents’ rewards.

Parameters

agent_id – The id of the agent for whom to report the reward. Should not be a covered agent.

Returns

The requested reward. Super agent rewards are summed from the active covered

agents.

reset(**kwargs)

Reset the simulation simulation to a start state, which may be randomly generated.

step(action_dict, **kwargs)

Give actions to the simulation.

Super agent actions are decomposed into the covered agent actions and then passed to the underlying sim. Because of the nature of this wrapper, the super agents may provide actions for covered agents that are already done. We filter out these actions.

Parameters

action_dict – Dictionary that maps agent ids to the actions. Covered agents should not be present.

property super_agent_mapping

A dictionary that maps from a super agent’s id to a list of covered agent ids.

Suppose our simulation has 5 agents and we use the following super agent mapping: {‘super0’: [‘agent0’, ‘agent1’], ‘super1’: [‘agent3’, ‘agent4’]} The resulting agents dict would have keys ‘super0’, ‘super1’, and ‘agent2’; where ‘agent0’, ‘agent1’, ‘agent3’, and ‘agent4’ have been covered by the super agents and ‘agent2’ is left uncovered and therefore included in the dict of agents. If the super agent mapping is changed, then the dictionary of agents gets recreated immediately.

Super agents cannot have the same id as any of the agents in the simulation. Two super agents cannot cover the same agent. All covered agents must be learning agents.

Abmarl External Integration

class abmarl.external.GymWrapper(sim)

Wrap an AgentBasedSimulation object with only a single agent to the gym.Env interface. This wrapper exposes the single agent’s observation and action space directly in the simulation.

property action_space

The agent’s action space is the environment’s action space.

property observation_space

The agent’s observation space is the environment’s observation space.

render(**kwargs)

Forward render calls to the composed simulation.

reset(**kwargs)

Return the observation from the single agent.

step(action, **kwargs)

Wrap the action by storing it in a dict that maps the agent’s id to the action. Pass to sim.step. Return the observation, reward, done, and info from the single agent.

property unwrapped

Fall through all the wrappers and obtain the original, completely unwrapped simulation.

class abmarl.external.MultiAgentWrapper(sim)

Enable connection between SimulationManager and RLlib Trainer.

Wraps a SimulationManager and forwards all calls to the manager. This class is boilerplate and needed because RLlib checks that the simulation is an instance of MultiAgentEnv.

sim

The SimulationManager.

render(*args, **kwargs)

See SimulationManager.

reset()

See SimulationManager.

step(actions)

See SimulationManager.

class abmarl.external.OpenSpielWrapper(sim, discounts=1.0, **kwargs)

Enable connection between Abmarl’s SimulationManager and OpenSpiel agents.

OpenSpiel support turn-based and simultaneous simulations, which Abmarl provides through the TurnBasedManager and AllStepManager. OpenSpiel expects TimeStep objects as output, which include the observations, rewards, and step type. Among the observations, it expects a list of legal actions available to the agent. The OpenSpielWrapper converts output from the simulation manager to the expected format. Furthermore, OpenSpiel provides actions as a list. The OpenSpielWrapper converts those actions to a dict before forwarding it to the underlying simulation manager.

OpenSpiel does not support the ability for some agents in a simulation to finish before others. The simulation is either ongoing, in which all agents are providing actions, or else it is done for all agents. In contrast, Abmarl allows some agents to be done before others as the simulation progresses. Abmarl expects that done agents will not provide actions. OpenSpiel, however, will always provide actions for all agents. The OpenSpielWrapper removes the actions from agents that are already done before forwarding the action to the underlying simulation manager. Furthermore, OpenSpiel expects every agent to be present in the TimeStep outputs. Normally, Abmarl will not provide output for agents that are done since they have finished generating data in the episode. In order to work with OpenSpiel, the OpenSpielWrapper forces output from all agents at every step, including those already done.

Currently, the OpenSpielWrapper only works with simulations in which the action and observation space of every agent is Discrete. Most simulations will need to be wrapped with the RavelDiscreteWrapper.

action_spec()

The agents’ action spaces.

Abmarl uses gym spaces for the action space. The OpenSpielWrapper converts the gym space into a format that OpenSpiel expects.

property current_player

The agent that currently provides the action.

Current player is used in the observation part of the TimeStep output. If it is a turn based simulation, then the current player is the single agent who is providing an action. If it is a simultaneous simulation, then OpenSpiel does not use this property and the current player is just the first agent in the list of agents in the simulation.

property discounts

The learing discounts for each agent.

If provided as a number, then that value wil apply to all the agents. Make seperate discounts for each agent by providing a dictionary assigning each agent to its own discounted value.

Return the legal actions available to the agent.

By default, the OpenSpielWrapper uses the agent’s entire action space as its legal actions in each time step. This function can be overwritten in a derived class to add logic for obtaining the actual legal actions available.

property is_turn_based

TurnBasedManager.

property num_players

The number of learning agents in the simulation.

observation_spec()

The agents’ observations spaces.

Abmarl uses gym spaces for the observation space. The OpenSpielWrapper converts the gym space into a format that OpenSpiel expects.

reset(**kwargs)

Reset the simulation.

Returns

TimeStep object containing the initial observations. Uniquely at reset,

the rewards and discounts are None and the step type is StepType.FIRST.

step(action_list, **kwargs)

Step the simulation forward using the reported actions.

OpenSpiel provides an action list of either (1) the agent whose turn it is in a turn-based simulation or (2) all the agents in a simultaneous simulation. The OpenSpielWrapper converts the list of actions to a dictionary before passing it to the underlying simulation.

OpenSpiel does not support the ability for some agents of a simulation to finish before others. As such, it may provide actions for agents that are already done. To work with Abmarl, the OpenSpielWrapper removes actions for agents that are already done.

Parameters

action_list – list of actions for the agents.

Returns

TimeStep object containing the observations of the new state, the rewards, and StepType.MID if the simulation is still progressing, otherwise StepType.LAST.

Abmarl GridWorld Simulation Framework

Base

class abmarl.sim.gridworld.base.GridWorldSimulation

GridWorldSimulation interface.

Extends the AgentBasedSimulation interface for the GridWorld. We provide builders for streamlining the building process.

classmethod build_sim(rows, cols, **kwargs)

Build a GridSimulation.

Specify the number of row, the number of cols, a dictionary of agents, and any additional parameters.

Parameters
  • rows – The number of rows in the grid. Must be a positive integer.

  • cols – The number of cols in the grid. Must be a positive integer.

  • agents – The dictionary of agents in the grid.

Returns

A GridSimulation configured as specified.

classmethod build_sim_from_file(file_name, object_registry, **kwargs)

Build a GridSimulation from a text file.

Parameters
  • file_name – Name of the file that specifies the initial grid setup. In the file, each cell should be a single alphanumeric character indicating which agent will be at that position (from the perspective of looking down on the grid). That agent will be given that initial position. 0’s are reserved for empty space.

  • object_registry – A dictionary that maps characters from the file to a function that generates the agent. This must be a function because each agent must have unique id, which is generated here.

Returns

A GridSimulation built from the file.

render(fig=None, **kwargs)

Draw the grid and all active agents in the grid.

Agents are drawn at their positions using their respective shape and color.

Parameters

fig – The figure on which to draw the grid. It’s important to provide this figure because the same figure must be used when drawing each state of the simulation. Otherwise, a ton of figures will pop up, which is very annoying.

class abmarl.sim.gridworld.base.GridWorldBaseComponent(agents=None, grid=None, **kwargs)

Component base class from which all components will inherit.

Every component has access to the dictionary of agents and the grid.

property agents

A dict that maps the Agent’s id to the Agent object. All agents must be GridWorldAgents.

property cols

The number of columns in the grid.

property grid

The grid indexes the agents by their position.

For example, an agent whose position is (3, 2) can be accessed through the grid with self.grid[3, 2]. Components are responsible for maintaining the connection between agent position and grid index.

property rows

The number of rows in the grid.

class abmarl.sim.gridworld.grid.Grid(rows, cols, overlapping=None, **kwargs)

A Grid stores the agents at indices in a numpy array.

Components can interface with the Grid. Each index in the grid is a dictionary that maps the agent id to the agent object itself. If agents can overlap, then there may be more than one agent per cell.

Parameters
  • rows – The number of rows in the grid.

  • cols – The number of columns in the grid.

  • overlapping – Dictionary that maps the agents’ encodings to a list of encodings with which they can occupy the same cell. To avoid undefined behavior, the overlapping should be symmetric, so that if 2 can overlap with 3, then 3 can also overlap with 2.

property cols

The number of columns in the grid.

place(agent, ndx)

Place an agent at an index.

If the cell is available, the agent will be placed at that index in the grid and the agent’s position will be updated. The placement is successful if the new position is unoccupied or if the agent already occupying that position is overlappable AND this agent is overlappable.

Parameters
  • agent – The agent to place.

  • ndx – The new index for this agent.

Returns

The successfulness of the placement.

query(agent, ndx)

Query a cell in the grid to see if is available to this agent.

The cell is available for the agent if it is empty or if both the occupying agent and the querying agent are overlappable.

Parameters
  • agent – The agent for which we are checking availabilty.

  • ndx – The cell to query.

Returns

The availability of this cell.

remove(agent, ndx)

Remove an agent from an index.

Parameters
  • agent – The agent to remove

  • ndx – The old index for this agent

reset(**kwargs)

Reset the grid to an empty state.

property rows

The number of rows in the grid.

Agents

class abmarl.sim.gridworld.agent.GridWorldAgent(initial_position=None, blocking=False, encoding=None, render_shape='o', render_color='gray', **kwargs)

The base agent in the GridWorld.

property blocking

Specify if this agent blocks other agent’s observations and actions.

property configured

All agents must have an id.

property encoding

The numerical value that identifies the type of agent.

The value does not necessarily identify the agent itself. For example, other agents who observe this agent will see this value.

property initial_position

The agent’s initial position at reset.

property position

The agent’s position in the grid.

property render_color

The agent’s color in the rendered grid.

property render_shape

The agent’s shape in the rendered grid.

class abmarl.sim.gridworld.agent.GridObservingAgent(view_range=None, **kwargs)

Observe the grid up to view range cells away.

property configured

Observing agents must have an observation space.

property view_range

The number of cells away this agent can observe in each step.

class abmarl.sim.gridworld.agent.MovingAgent(move_range=None, **kwargs)

Move up to move_range cells.

property configured

Acting agents must have an action space.

property move_range

The maximum number of cells away that the agent can move.

class abmarl.sim.gridworld.agent.HealthAgent(initial_health=None, **kwargs)

Agents have health points and can die.

Health is bounded between 0 and 1. Agents become inactive when the health falls to 0.

property health

The agent’s health throughout the simulation trajectory.

The health will always be between 0 and 1.

property initial_health

The agent’s initial health between 0 and 1.

class abmarl.sim.gridworld.agent.AttackingAgent(attack_range=None, attack_strength=None, attack_accuracy=None, attack_count=1, **kwargs)

Agents that can attack other agents.

property attack_accuracy

The effective accuracy of the agent’s attack.

Should be between 0 and 1. To make deterministic attacks, use 1.

property attack_count

The number of attacks the agent can make per turn.

This parameter is interpreted differently by each attack actor, but generally it specifies how many attacks this agent can carry out in a single step. See specific AttackActor documentation for more information.

property attack_range

The maximum range of the attack.

property attack_strength

The strength of the attack.

Should be between 0 and 1.

property configured

Acting agents must have an action space.

State

class abmarl.sim.gridworld.state.StateBaseComponent(agents=None, grid=None, **kwargs)

Abstract State Component base from which all state components will inherit.

abstract reset(**kwargs)

Resets the part of the state for which it is responsible.

class abmarl.sim.gridworld.state.PositionState(agents=None, grid=None, **kwargs)

Manage the agents’ positions in the grid.

reset(**kwargs)

Give agents their starting positions.

We use the agent’s initial position if it exists. Otherwise, we randomly place the agents in the grid.

class abmarl.sim.gridworld.state.HealthState(agents=None, grid=None, **kwargs)

Manage the state of the agents’ healths.

Every HealthAgent has a health. If that health falls to zero, that agent dies and is remove from the grid.

reset(**kwargs)

Give HealthAgents their starting healths.

We use the agent’s initial health if it exists. Otherwise, we randomly assign a value between 0 and 1.

Actors

class abmarl.sim.gridworld.actor.ActorBaseComponent(agents=None, grid=None, **kwargs)

Abstract Actor Component class from which all Actor Components will inherit.

abstract property key

The key in the action dictionary.

The action space of all acting agents in the gridworld framework is a dict. We can build up complex action spaces with multiple components by assigning each component an entry in the action dictionary. Actions will be a dictionary even if your simulation only has one Actor.

abstract process_action(agent, action_dict, **kwargs)

Process the agent’s action.

Parameters
  • agent – The acting agent.

  • action_dict – The action dictionary for this agent in this step. The dictionary may have different entries, each of which will be processed by different Actors.

abstract property supported_agent_type

The type of Agent that this Actor works with.

If an agent is this type, the Actor will add its entry to the agent’s action space and will process actions for this agent.

class abmarl.sim.gridworld.actor.MoveActor(**kwargs)

Agents can move to nearby squares.

property key

This Actor’s key is “move”.

process_action(agent, action_dict, **kwargs)

The agent can move to nearby squares.

The agent’s new position must be within the grid and the cell-occupation rules must be met.

Parameters
  • agent – Move the agent if it is a MovingAgent.

  • action_dict – The action dictionary for this agent in this step. If the agent is a MovingAgent, then the action dictionary will contain the “move” entry.

Returns

True if the move is successful, False otherwise.

property supported_agent_type

This Actor works with MovingAgents.

class abmarl.sim.gridworld.actor.BinaryAttackActor(attack_mapping=None, stacked_attacks=False, **kwargs)

Agents can attack other agents.

Agents can choose to use up to some number of their attacks. For example, if an agent has an attack count of 3, then it can choose no attack, attack once, attack twice, or attack thrice. The BinaryAttackActor searches the nearby local grid defined by the agent’s attack range for attackable agents, and randomly chooses from that set up to the number of attacks issued.

Observers

class abmarl.sim.gridworld.observer.ObserverBaseComponent(agents=None, grid=None, **kwargs)

Abstract Observer Component base from which all observer components will inherit.

abstract get_obs(agent, **kwargs)

Observe the state of the simulation.

Parameters

agent – The agent for which we return an observation.

Returns

This agent’s observation.

abstract property key

The key in the observation dictionary.

The observation space of all observing agents in the gridworld framework is a dict. We can build up complex observation spaces with multiple components by assigning each component an entry in the observation dictionary. Observations will be a dictionary even if your simulation only has one Observer.

abstract property supported_agent_type

The type of Agent that this Observer works with.

If an agent is this type, the Observer will add its entry to the agent’s observation space and will produce observations for this agent.

class abmarl.sim.gridworld.observer.SingleGridObserver(observe_self=True, **kwargs)

Observe a subset of the grid centered on the agent’s position.

The observation is centered around the observing agent’s position. Each agent in the “observation window” is recorded in the relative cell using its encoding. If there are multiple agents on a single cell with different encodings, the agent will observe only one of them chosen at random.

get_obs(agent, **kwargs)

The agent observes a sub-grid centered on its position.

The observation may include other agents, empty spaces, out of bounds, and masked cells, which can be blocked from view by other blocking agents.

Returns

The observation as a dictionary.

property key

This Observer’s key is “grid”.

property observe_self

Agents can observe themselves, which may hide important information if overlapping is important. This can be turned off by setting observe_self to False.

property supported_agent_type

This Observer works with GridObservingAgents.

class abmarl.sim.gridworld.observer.MultiGridObserver(**kwargs)

Observe a subset of the grid centered on the agent’s position.

The observation is centered around the observing agent’s position. The observing agent sees a stack of observations, one for each positive encoding, where the number of agents of each encoding is given rather than the encoding itself. Out of bounds and masked indicators appear in every grid.

get_obs(agent, **kwargs)

The agent observes one or more sub-grids centered on its position.

The observation may include other agents, empty spaces, out of bounds, and masked cells, which can be blocked from view by other blocking agents. Each grid records the number of agents on a particular cell correlated to a specific encoding.

Returns

The observation as a dictionary.

property key

This Observer’s key is “grid”.

property supported_agent_type

This Observer works with GridObservingAgents.

Done

class abmarl.sim.gridworld.done.DoneBaseComponent(agents=None, grid=None, **kwargs)

Abstract Done Component class from which all Done Components will inherit.

abstract get_all_done(**kwargs)

Determine if all the agents are done and/or if the simulation is done.

Returns

True if all agents are done or if the simulation is done. Otherwise False.

abstract get_done(agent, **kwargs)

Determine if an agent is done in this step.

Parameters

agent – The agent we are querying.

Returns

True if the agent is done, otherwise False.

class abmarl.sim.gridworld.done.ActiveDone(agents=None, grid=None, **kwargs)

Inactive agents are indicated as done.

get_all_done(**kwargs)

Return True if all agents are inactive. Otherwise, return False.

get_done(agent, **kwargs)

Return True if the agent is inactive. Otherwise, return False.

class abmarl.sim.gridworld.done.OneTeamRemainingDone(agents=None, grid=None, **kwargs)

Inactive agents are indicated as done.

If the only active agents are those who are all of the same encoding, then the simulation ends.

get_all_done(**kwargs)

Return true if all active agents have the same encoding. Otherwise, return false.

Wrappers

class abmarl.sim.gridworld.wrapper.ComponentWrapper(agents=None, grid=None, **kwargs)

Wraps GridWorldBaseComponent.

Every wrapper must be able to wrap the respective space and points to/from that space. Agents and Grid are referenced directly from the wrapped component rather than received as initialization parameters.

property agents

The agent dictionary is directly taken from the wrapped component.

abstract check_space(space)

Verify that the space can be wrapped.

property grid

The grid is directly taken from the wrapped component.

abstract unwrap_point(space, point)

Unwrap a point using a reference space.

Parameters
  • space – The reference space for unwrapping the point.

  • point – The point to unwrap.

property unwrapped

Fall through all the wrappers and obtain the original, completely unwrapped component.

abstract wrap_point(space, point)

Wrap a point using a reference space.

Parameters
  • space – The reference space for wrapping the point.

  • point – The point to wrap.

abstract wrap_space(space)

Wrap the space.

Parameters

space – The space to wrap.

abstract property wrapped_component

Get the first-level wrapped component.

class abmarl.sim.gridworld.wrapper.ActorWrapper(component)

Wraps an ActorComponent.

Modify the action space of the agents involved with the Actor, namely the specific actor’s channel. The actions recieved from the trainer are in the wrapped space, so we need to unwrap them to send them to the actor. This is the opposite from how we wrap and unwrap observations.

property key

The key is the same as the wrapped actor’s key.

process_action(agent, action_dict, **kwargs)

Unwrap the action and pass it to the wrapped actor to process.

Parameters
  • agent – The acting agent.

  • action_dict – The action dictionary for this agent in this step. The action in this channel comes in the wrapped space.

property supported_agent_type

The supported agent type is the same as the wrapped actor’s supported agent type.

property wrapped_component

Get the wrapped actor.

class abmarl.sim.gridworld.wrapper.RavelActionWrapper(component)

Use numpy’s ravel capabilities to convert space and points to Discrete.

check_space(space)

Ensure that the space is of type that can be ravelled to discrete value.

unwrap_point(space, point)

Ravel point to a single discrete value.

wrap_point(space, point)

Unravel a single discrete point to a value in the space.

Recall that the action from the trainer arrives in the wrapped discrete space, so we need to unravel it so that it is in the unwrapped space before giving it to the actor.

wrap_space(space)

Convert the space into a Discrete space.

Abmarl Trainers

class abmarl.trainers.MultiPolicyTrainer(sim=None, policies=None, policy_mapping_fn=None, **kwargs)

Train policies with data generated by agents interacting in a simulation.

compute_actions(obs)

Compute actions for agents in the observation.

Forwards the observations to the respective policy for each agent that reports an observation.

Parameters

obs – an observation dictionary, where the keys are the agents reporting from the sim and the values are the observations.

Returns

An action dictionary where the keys are the agents from the observation

and the values are the actions generated from each agent’s policy.

generate_episode(horizon=200, render=False, **kwargs)

Generate an episode of data.

The fundamental data object is a SAR, a (state, action, reward) tuple. We restart the sim, generating initial observations (states) for agents reporting from the sim. Then we use the compute_action function to generate actions for agents who report an observation. Those actions are given to the sim, which steps forward and generates rewards and new observations for reporting agents. This loop continues until the simulation is done or we hit the horizon.

Parameters
  • horizon – The maximum number of steps per epsidoe. The episode may finish early, but it will not progress further than this number of steps.

  • render – Renders the simulation. This should be False when training, and can be True when debugging or evaluating in post-processing.

Returns

Four dictionaries, one for observations, another for actions,

another for rewards, and another for dones. This makes the SAR sequence and provides additional information on the done condition since some algorithms need this. The data is organized by agent_id, so you would call {observations, actions, rewards}[agent_id][i] in order to extract the ith SAR for an agent. NOTE: In multiagent simulations, the number of SARs may differ for each agent.

property policies

A dictionary that maps the policy id’s to a policy object.

property policy_mapping_fn

A function that takes an agent’s id as input and outputs its corresponding policy id.

property sim

The SimulationManager.

abstract train(iterations=10000, **kwargs)

Train the policy objects using generated data.

This function is abstract and should be implemented by the algorithm.

Parameters
  • iterations – The number of training iterations.

  • **kwargs – Any additional parameter your algorithm may need.

class abmarl.trainers.SinglePolicyTrainer(sim=None, policy=None, **kwargs)

Train a single policy with data generated by agents interacting in a simulation.

property policies

A dictionary that maps the policy id’s to a policy object.

property policy

The policy to train.

property policy_mapping_fn

Return function always returns “policy”, which is the name we give the policy.

class abmarl.trainers.monte_carlo.OnPolicyMonteCarloTrainer(sim=None, policy=None, **kwargs)
train(iterations=10000, gamma=0.9, **kwargs)

Implements on-policy monte carlo.

class abmarl.trainers.DebugTrainer(policies=None, name=None, output_dir=None, **kwargs)

Debug the training setup.

The DebugTrainer generates episodes using the simulation and policies. Rather than training those policies, The DebugTrainer simply dumps the observations, actions, rewards, and dones to disk.

The DebugTrainer can be run without policies. In this case, it generates a random policy for each agent. This effectively debug the simulation without having to debug the policy setup too.

property name

The name of the experiment.

If name is not specified, then we just use “DEBUG”. We append the name with the date and time.

property output_dir

The directory for where to dump the episode data.

If the output dir is not specified, then we use “~/abmarl_results/”. We append the experiment name to the end of the directory.

train(iterations=5, render=False, **kwargs)

Generate episodes and write write to disk.

Nothing is trained here. We just generate and dump the data and visualize the simulation if requested.

Parameters
  • iterations – The number of episodes to generate.

  • render – Set to True to visualize the simulation.