Abmarl API Specification

Abmarl Simulations

class abmarl.sim.PrincipleAgent(id=None, seed=None, **kwargs)

Principle Agent class for agents in a simulation.

property active

True if the agent is still active in the simulation.

Active means that the agent is in a valid state. For example, suppose agents in our Simulation can die. Then active is True if the agents are alive or False if they’re dead.

property configured

All agents must have an id.

finalize(**kwargs)
property id

The agent’s unique identifier.

property seed

Seed for random number generation.

class abmarl.sim.ObservingAgent(observation_space=None, null_observation=None, **kwargs)

ObservingAgents can observe the state of the simulation.

The agent’s observation must be in its observation space. The SimulationManager will send the observation to the Trainer, which will use it to produce actions.

property configured

Observing agents must have an observation space.

finalize(**kwargs)

Wrap all the observation spaces with a Dict and seed it if the agent was created with a seed.

property null_observation

The null point in the observation space.

property observation_space
class abmarl.sim.ActingAgent(action_space=None, null_action=None, **kwargs)

ActingAgents can act in the simulation.

The Trainer will produce actions for the agents and send them to the SimulationManager, which will process those actions in its step function.

property action_space
property configured

Acting agents must have an action space.

finalize(**kwargs)

Wrap all the action spaces with a Dict if applicable and seed it if the agent was created with a seed.

property null_action

The null point in the action space.

class abmarl.sim.Agent(observation_space=None, null_observation=None, **kwargs)

Bases: ObservingAgent, ActingAgent

An Agent that can both observe and act.

class abmarl.sim.AgentBasedSimulation(agents=None, **kwargs)

AgentBasedSimulation interface.

Under this design model the observations, rewards, and done conditions of the agents is treated as part of the simulations internal state instead of as output from reset and step. Thus, it is the simulations responsibility to manage rewards and dones as part of its state (e.g. via self.rewards dictionary).

This interface supports both single- and multi-agent simulations by treating the single-agent simulation as a special case of the multi-agent, where there is only a single agent in the agents dictionary.

Parameters:

agents – Dictionary of agents

property agents

A dict that maps the Agent’s id to the Agent object. An Agent must be an instance of PrincipleAgent. A multi-agent simulation is expected to have multiple entries in the dictionary, whereas a single-agent simulation should only have a single entry in the dictionary.

finalize()

Finalize the initialization process. At this point, every agent should be configured with action and observation spaces, which we convert into Dict spaces for interfacing with the trainer.

abstract get_all_done(**kwargs)

Return the simulation’s done status.

abstract get_done(agent_id, **kwargs)

Return the agent’s done status.

abstract get_info(agent_id, **kwargs)

Return the agent’s info.

abstract get_obs(agent_id, **kwargs)

Return the agent’s observation.

abstract get_reward(agent_id, **kwargs)

Return the agent’s reward.

abstract render(**kwargs)

Render the simulation for vizualization.

abstract reset(**kwargs)

Reset the simulation simulation to a start state, which may be randomly generated.

abstract step(action, **kwargs)

Step the simulation forward one discrete time-step. The action is a dictionary that contains the action of each agent in this time-step.

class abmarl.sim.DynamicOrderSimulation(agents=None, **kwargs)

An AgentBasedSimulation where the simulation chooses the agents’ turns dynamically.

property next_agent

The next agent(s) in the game.

Abmarl Simulation Managers

class abmarl.managers.SimulationManager(sim, **kwargs)

Control interaction between Trainer and AgentBasedSimulation.

A Manager implmenents the reset and step API, by which it calls the AgentBasedSimulation API, using the getters within reset and step to accomplish the desired control flow.

sim

The AgentBasedSimulation.

agents

The agents that are in the AgentBasedSimulation.

done_agents

Set of agents that are done.

render(**kwargs)
abstract reset(**kwargs)

Reset the simulation.

Returns:

The first observation of the agent(s).

abstract step(action_dict, **kwargs)

Step the simulation forward one discrete time-step.

Parameters:

action_dict – Dictionary mapping agent(s) to their actions in this time step.

Returns:

The observations, rewards, done status, and info for the agent(s) whose actions we expect to receive next.

Note: We do not necessarily return anything for the agent whose actions we just received in this time-step. This behavior is defined by each Manager.

class abmarl.managers.TurnBasedManager(sim)

The TurnBasedManager allows agents to take turns. The order of the agents is stored and the obs of the first agent is returned at reset. Each step returns the info of the next agent “in line”. Agents who are done are removed from this line. Once all the agents are done, the manager returns all done.

reset(**kwargs)

Reset the simulation and return the observation of the first agent.

step(action_dict, **kwargs)

Assert that the incoming action does not come from an agent who is recorded as done. Step the simulation forward and return the observation, reward, done, and info of the next agent. If that next agent finished in this turn, then include the obs for the following agent, and so on until an agent is found that is not done. If all agents are done in this turn, then the wrapper returns all done.

class abmarl.managers.AllStepManager(sim, randomize_action_input=False, **kwargs)

The AllStepManager gets the observations of all agents at reset. At step, it gets the observations of all the agents that are not done. Once all the agents are done, the manager returns all done.

property randomize_action_input

Randomize the order of the action input at each step.

Multiple agents will report actions within a single step. Depending on how those actions are generated, the ordering within the action_dict may always be the same, which may result in unintended imposed-ordering in the simulation. For example, agent0’s action may always come before agent1’s. If randomize_action_input is set to True, then the agent ordering in the action dict is randomized each step.

reset(**kwargs)

Reset the simulation and return the observation of all the agents.

step(action_dict, **kwargs)

Assert that the incoming action does not come from an agent who is recorded as done. Step the simulation forward and return the observation, reward, done, and info of all the non-done agents, including the agents that were done in this step. If all agents are done in this turn, then the manager returns all done.

class abmarl.managers.DynamicOrderManager(sim)

The DynamicOrderManager allows agents to take turns dynamically decided by the Simulation.

The order of the agents is dynamically decided by the simulation as it runs. The simulation must be a DynamicOrderSimulation. The agents reported at reset and step are those given in the sim’s next_agent property.

reset(**kwargs)

Reset the simulation and return the observation of the first agent.

step(action_dict, **kwargs)

Assert that the incoming action does not come from an agent who is recorded as done. Step the simulation forward and return the observation, reward, done, and info of the next agent. The simulation is responsible to ensure that there is at least one next_agent that did not finish in this turn, unless it is the last turn.

Abmarl Wrappers

class abmarl.sim.wrappers.Wrapper(sim)

Abstract Wrapper class implements the AgentBasedSimulation interface. The simulation is stored and the simulation agents are deep-copied. The interface functions calls are forwarded to the simulation.

get_all_done(**kwargs)

Return the simulation’s done status.

get_done(agent_id, **kwargs)

Return the agent’s done status.

get_info(agent_id, **kwargs)

Return the agent’s info.

get_obs(agent_id, **kwargs)

Return the agent’s observation.

get_reward(agent_id, **kwargs)

Return the agent’s reward.

render(**kwargs)

Render the simulation for vizualization.

reset(**kwargs)

Reset the simulation simulation to a start state, which may be randomly generated.

step(action, **kwargs)

Step the simulation forward one discrete time-step. The action is a dictionary that contains the action of each agent in this time-step.

property unwrapped

Fall through all the wrappers and obtain the original, completely unwrapped simulation.

class abmarl.sim.wrappers.SARWrapper(sim)

Wraps the actions and observations for all the agents at reset and step. To create your own wrapper, inherit from this class and override the wrap and unwrap functions.

Note: wrapping the action “goes the other way” than the reward and observation, like this:

obs: sim agent -> wrapper -> trainer reward: sim agent -> wrapper -> trainer action: sim agent <- wrapper <- trainer

If you wrap an action, be aware that the wrapper must return what the simulation agents expect; whereas if you wrap an observation or reward, the wrapper must return what the trainer expects. The expectations are defined by the observation and action spaces of the wrapped simulation agents at initialization.

get_obs(agent_id, **kwargs)

Return the agent’s observation.

get_reward(agent_id, **kwargs)

Return the agent’s reward.

step(action_dict, **kwargs)

Wrap each of the agent’s actions from the policies before passing them to sim.step.

unwrap_action(from_agent, action)
unwrap_observation(from_agent, observation)
unwrap_reward(reward)
wrap_action(from_agent, action)
wrap_observation(from_agent, observation)
wrap_reward(reward)
class abmarl.sim.wrappers.RavelDiscreteWrapper(sim)

Convert observation and action spaces into a Discrete space.

Convert Discrete, MultiBinary, MultiDiscrete, bounded integer Box, and any nesting of these observations and actions into Discrete observations and actions by “ravelling” their values according to numpy’s ravel_mult_index function. Thus, observations and actions that are represented by arrays are converted into unique numbers. This is useful for building Q tables where each observation and action is a row and column of the Q table, respectively.

If the agent has a null observation or a null action, that value is also ravelled into the new Discrete space.

unwrap_action(from_agent, action)
unwrap_observation(from_agent, observation)
wrap_action(from_agent, action)
wrap_observation(from_agent, observation)
class abmarl.sim.wrappers.FlattenWrapper(sim)

Flattens all agents’ action and observation spaces into Boxes.

Nested spaces (e.g. Tuples and Dicts) are flattened element-wise, each element being concatentated onto the previous. A Discrete space is converted to a Box with a single element, whose bounds are 0 to space.n - 1. MultiBinary and MultiDiscrete are simply converted to Box with the corresponding bounds and integer dtype. A Box space is flattened to a one-dimensional array equivalent.

If the resulting Box can be made with dtype int, then it will be. Otherwise, it will cast up to float.

If the agent has a null observation or a null action, that value is also flattened into the new Box space.

NOTE: Sampling from the flattened space will not produce the same results as sampling from the original space and then flattening.

unwrap_action(from_agent, action)
unwrap_observation(from_agent, observation)
wrap_action(from_agent, action)
wrap_observation(from_agent, observation)
class abmarl.sim.wrappers.SuperAgentWrapper(sim, super_agent_mapping=None, **kwargs)

The SuperAgentWrapper creates “super” agents who cover and control multiple agents.

The super agents take the observation and action spaces of all their covered agents. In addition, the observation space is given a “mask” channel to indicate which of their covered agents is done. This channel is important because the simulation dynamics change when a covered agent is done but the super agent may still be active (see comments on get_done). Without this mask, the super agent would experience completely different simulation dynamcis for some of its covered agents with no indication as to why.

Unless handled carefully, the super agent will generate observations for done covered agents. This may contaminate the training data with an unfair advantage. For exmample, a dead covered agent should not be able to provide the super agent with useful information. In order to correct this, the user may supply the null observation for an ObservingAgent. When a covered agent is done, the SuperAgentWrapper will try to use its null observation going forward.

Furthermore, super agents may still report actions for covered agents that are done. This wrapper filters out those actions before passing them to the underlying sim. See step for more details.

get_done(agent_id, **kwargs)

Report the agent’s done condition.

Because super agents are composed of multiple agents, it could be the case that some covered agents are done while other are not for the same super agent. Because we still want those non-done agents to interact with the simulation, the super agent only reports done when ALL of its covered agents report done.

Parameters:

agent_id – The id of the agent for whom to report the done condition. Should not be a covered agent.

Returns:

The requested done conndition. Super agents are done when all their

covered agents are done.

get_info(agent_id, **kwargs)

Report the agent’s additional info.

Parameters:

agent_id – The id of the agent for whom to get info. Should not be a covered agent.

Returns:

The requested info. Super agents info is collected from covered agents.

get_obs(agent_id, **kwargs)

Report observations from the simulation.

Super agent observations are collected from their covered agents. Super agents also have a “mask” channel that tells them which of their covered agent is done. This should assist the super agent in understanding the changing simulation dynamics for done agents (i.e. why actions from done agents don’t do anything).

The super agent will report an observation for done covered agents. This may result in an unfair advantage during training (e.g. dead agent should not produce useful information), and Abmarl will issue a warning. To properly handle this, the user can supply the null observation for each covered agent. In that case, the super agent will use the null observation for any done covered agents.

Parameters:

agent_id – The id of the agent for whom to produce an observation. Should not be a covered agent.

Returns:

The requested observation. Super agent observations are collected from the covered agents.

get_reward(agent_id, **kwargs)

Report the agent’s reward.

A super agent’s reward is the sum of all its active covered agents’ rewards.

Parameters:

agent_id – The id of the agent for whom to report the reward. Should not be a covered agent.

Returns:

The requested reward. Super agent rewards are summed from the active covered

agents.

reset(**kwargs)

Reset the simulation simulation to a start state, which may be randomly generated.

step(action_dict, **kwargs)

Give actions to the simulation.

Super agent actions are decomposed into the covered agent actions and then passed to the underlying sim. Because of the nature of this wrapper, the super agents may provide actions for covered agents that are already done. We filter out these actions.

Parameters:

action_dict – Dictionary that maps agent ids to the actions. Covered agents should not be present.

property super_agent_mapping

A dictionary that maps from a super agent’s id to a list of covered agent ids.

Suppose our simulation has 5 agents and we use the following super agent mapping: {‘super0’: [‘agent0’, ‘agent1’], ‘super1’: [‘agent3’, ‘agent4’]} The resulting agents dict would have keys ‘super0’, ‘super1’, and ‘agent2’; where ‘agent0’, ‘agent1’, ‘agent3’, and ‘agent4’ have been covered by the super agents and ‘agent2’ is left uncovered and therefore included in the dict of agents. If the super agent mapping is changed, then the dictionary of agents gets recreated immediately.

Super agents cannot have the same id as any of the agents in the simulation. Two super agents cannot cover the same agent. All covered agents must be learning agents.

Abmarl External Integration

class abmarl.external.GymWrapper(sim)

Wrap an AgentBasedSimulation object with only a single learning agent to the gym.Env interface. This wrapper exposes the single agent’s observation and action space directly in the simulation.

property action_space

The agent’s action space is the environment’s action space.

property observation_space

The agent’s observation space is the environment’s observation space.

render(**kwargs)

Forward render calls to the composed simulation.

reset(**kwargs)

Return the observation from the single agent.

step(action, **kwargs)

Wrap the action by storing it in a dict that maps the agent’s id to the action. Pass to sim.step. Return the observation, reward, done, and info from the single agent.

property unwrapped

Fall through all the wrappers and obtain the original, completely unwrapped simulation.

class abmarl.external.MultiAgentWrapper(sim)

Enable connection between SimulationManager and RLlib Trainer.

Wraps a SimulationManager and forwards all calls to the manager. This class is boilerplate and needed because RLlib checks that the simulation is an instance of MultiAgentEnv.

sim

The SimulationManager.

render(*args, **kwargs)

See SimulationManager.

reset()

See SimulationManager.

step(actions)

See SimulationManager.

property unwrapped

Fall through all the wrappers and obtain the original, completely unwrapped simulation.

class abmarl.external.OpenSpielWrapper(sim, discounts=1.0, **kwargs)

Enable connection between Abmarl’s SimulationManager and OpenSpiel agents.

OpenSpiel support turn-based and simultaneous simulations, which Abmarl provides through the TurnBasedManager and AllStepManager. OpenSpiel expects TimeStep objects as output, which include the observations, rewards, and step type. Among the observations, it expects a list of legal actions available to the agent. The OpenSpielWrapper converts output from the simulation manager to the expected format. Furthermore, OpenSpiel provides actions as a list. The OpenSpielWrapper converts those actions to a dict before forwarding it to the underlying simulation manager.

OpenSpiel does not support the ability for some agents in a simulation to finish before others. The simulation is either ongoing, in which all agents are providing actions, or else it is done for all agents. In contrast, Abmarl allows some agents to be done before others as the simulation progresses. Abmarl expects that done agents will not provide actions. OpenSpiel, however, will always provide actions for all agents. The OpenSpielWrapper removes the actions from agents that are already done before forwarding the action to the underlying simulation manager. Furthermore, OpenSpiel expects every agent to be present in the TimeStep outputs. Normally, Abmarl will not provide output for agents that are done since they have finished generating data in the episode. In order to work with OpenSpiel, the OpenSpielWrapper forces output from all agents at every step, including those already done.

Currently, the OpenSpielWrapper only works with simulations in which the action and observation space of every agent is Discrete. Most simulations will need to be wrapped with the RavelDiscreteWrapper.

action_spec()

The agents’ action spaces.

Abmarl uses gym spaces for the action space. The OpenSpielWrapper converts the gym space into a format that OpenSpiel expects.

property current_player

The agent that currently provides the action.

Current player is used in the observation part of the TimeStep output. If it is a turn based simulation, then the current player is the single agent who is providing an action. If it is a simultaneous simulation, then OpenSpiel does not use this property and the current player is just the first agent in the list of agents in the simulation.

property discounts

The learing discounts for each agent.

If provided as a number, then that value wil apply to all the agents. Make seperate discounts for each agent by providing a dictionary assigning each agent to its own discounted value.

Return the legal actions available to the agent.

By default, the OpenSpielWrapper uses the agent’s entire action space as its legal actions in each time step. This function can be overwritten in a derived class to add logic for obtaining the actual legal actions available.

property is_turn_based

TurnBasedManager.

property num_players

The number of learning agents in the simulation.

observation_spec()

The agents’ observations spaces.

Abmarl uses gym spaces for the observation space. The OpenSpielWrapper converts the gym space into a format that OpenSpiel expects.

reset(**kwargs)

Reset the simulation.

Returns:

TimeStep object containing the initial observations. Uniquely at reset,

the rewards and discounts are None and the step type is StepType.FIRST.

step(action_list, **kwargs)

Step the simulation forward using the reported actions.

OpenSpiel provides an action list of either (1) the agent whose turn it is in a turn-based simulation or (2) all the agents in a simultaneous simulation. The OpenSpielWrapper converts the list of actions to a dictionary before passing it to the underlying simulation.

OpenSpiel does not support the ability for some agents of a simulation to finish before others. As such, it may provide actions for agents that are already done. To work with Abmarl, the OpenSpielWrapper removes actions for agents that are already done.

Parameters:

action_list – list of actions for the agents.

Returns:

TimeStep object containing the observations of the new state, the rewards, and StepType.MID if the simulation is still progressing, otherwise StepType.LAST.

property unwrapped

Fall through all the wrappers and obtain the original, completely unwrapped simulation.

Abmarl GridWorld Simulation Framework

Base

class abmarl.sim.gridworld.base.GridWorldSimulation(grid=None, **kwargs)

GridWorldSimulation interface.

Extends the AgentBasedSimulation interface for the GridWorld. We provide builders for streamlining the building process.

Parameters:

grid – The underlying grid. This is typically provided by the builder.

classmethod build_sim(rows, cols, **kwargs)

Build a GridSimulation.

Specify the number of row, the number of cols, a dictionary of agents, and any additional parameters.

Parameters:
  • rows – The number of rows in the grid. Must be a positive integer.

  • cols – The number of cols in the grid. Must be a positive integer.

  • agents – The dictionary of agents in the grid.

Returns:

A GridSimulation configured as specified.

classmethod build_sim_from_array(array, object_registry, extra_agents=None, **kwargs)

Build a GridSimulation from an array.

Parameters:
  • array – An array from which to build the initial grid. Each entry should be an alphanumeric character indicating which agent will be at that location. The agent will be given that initial position.

  • object_registry – A dictionary that maps the characters in the array to a function that generates the agent with its unique id. Zeros, periods, and underscores are reserved for empty space.

  • extra_agents – A dictionary of agents which are not in the input grid but which we want to be a part of the simulation. Note: if there is an agent in the array and in extra_agents, we will use the one from the array.

Returns:

A GridSimulation built from the array along with any extra agents.

classmethod build_sim_from_file(file_name, object_registry, extra_agents=None, **kwargs)

Build a GridSimulation from a text file.

Parameters:
  • file_name – Name of the file that specifies the initial grid setup. In the file, each cell should be a single alphanumeric character indicating which agent will be at that position (from the perspective of looking down on the grid). That agent will be given that initial position.

  • object_registry – A dictionary that maps characters from the file to a function that generates the agent. This must be a function because each agent must have unique id, which is generated here. Zeros, periods, and underscores are reserved for empty space.

  • extra_agents – A dictionary of agents which are not in the input grid but which we want to be a part of the simulation. Note: if there is an agent in the file and in extra_agents, we will use the one from the file.

Returns:

A GridSimulation built from the file along with any extra agents.

classmethod build_sim_from_grid(grid, extra_agents=None, **kwargs)

Build a GridSimluation from a Grid object.

Parameters:
  • grid – A Grid contains the all the agents index by location, so we can construct a simluation from it.

  • extra_agents – A dictionary of agents which are not in the input grid but which we want to be a part of the simulation. Note: if there is an agent in the grid and in extra_agents, we will use the one from the grid.

Returns:

A GridSimulation built from the grid along with any extra agents.

property grid

The underlying grid in the Grid World Simulation.

render(fig=None, gridlines=True, background_color='w', **kwargs)

Draw the grid and all active agents in the grid.

Agents are drawn at their positions using their respective shape and color.

Parameters:
  • fig – The figure on which to draw the grid. It’s important to provide this figure because the same figure must be used when drawing each state of the simulation. Otherwise, a ton of figures will pop up, which is very annoying.

  • gridlines – If true, then draw the gridlines.

  • background_color – The background color of the grid, default is white.

class abmarl.sim.gridworld.smart.SmartGridWorldSimulation(states=None, observers=None, dones=None, **kwargs)

Default “template” for building and running simulations.

The SmartGridWorldSimulation supports varying some components of a simluation at initialzation without changing simulation code. Actor components and the step function must still be implemented by the sub class.

Parameters:
  • states – A set of state components. It could be the component class or the name of a registered state components.

  • observers – A set of observer components. It could be the component class or the name of a registered observer component.

  • dones – A set of done components. It could be the component class or the name of a registered done component.

get_all_done(**kwargs)

Return the simulation’s done status.

get_done(agent_id, **kwargs)

Return the agent’s done status.

get_info(agent_id, **kwargs)

Return the agent’s info.

get_obs(agent_id, **kwargs)

Return the agent’s observation.

get_reward(agent_id, **kwargs)

Return the agent’s reward.

reset(**kwargs)

Reset the simulation simulation to a start state, which may be randomly generated.

class abmarl.sim.gridworld.base.GridWorldBaseComponent(agents=None, grid=None, **kwargs)

Component base class from which all components will inherit.

Every component has access to the dictionary of agents and the grid.

property agents

A dict that maps the Agent’s id to the Agent object. All agents must be GridWorldAgents.

property cols

The number of columns in the grid.

property grid

The grid indexes the agents by their position.

For example, an agent whose position is (3, 2) can be accessed through the grid with self.grid[3, 2]. Components are responsible for maintaining the connection between agent position and grid index.

property rows

The number of rows in the grid.

class abmarl.sim.gridworld.grid.Grid(rows, cols, overlapping=None, **kwargs)

A Grid stores the agents at indices in a numpy array.

Components can interface with the Grid. Each index in the grid is a dictionary that maps the agent id to the agent object itself. If agents can overlap, then there may be more than one agent per cell.

Parameters:
  • rows – The number of rows in the grid.

  • cols – The number of columns in the grid.

  • overlapping – Overlapping matrix tracks which agents can overlap based on their encodings.

property cols

The number of columns in the grid.

property overlapping

Overlapping matrix tracks which agents can overlap based on their encodings.

A dictionary that maps agents’ encodings to a set of encodings with which they can overlap. If the overlapping matrix is not symmetrical, then we update it here to be symmetrical. That is, if 2 can overlap with 3, then 3 can overlap with 2.

place(agent, ndx)

Place an agent at an index.

If the cell is available, the agent will be placed at that index in the grid and the agent’s position will be updated. The placement is successful if the new position is unoccupied or if the agent already occupying that position is overlappable AND this agent is overlappable.

Parameters:
  • agent – The agent to place.

  • ndx – The new index for this agent.

Returns:

The successfulness of the placement.

query(agent, ndx)

Query a cell in the grid to see if is available to this agent.

The cell is available for the agent if it is empty or if both the occupying agent and the querying agent are overlappable.

Parameters:
  • agent – The agent for which we are checking availabilty.

  • ndx – The cell to query.

Returns:

The availability of this cell.

remove(agent, ndx)

Remove an agent from an index.

Parameters:
  • agent – The agent to remove

  • ndx – The old index for this agent

reset(**kwargs)

Reset the grid to an empty state.

property rows

The number of rows in the grid.

abmarl.sim.gridworld.registry.register(component)

Register a component.

Parameters:

component – The component will be registered by its type (actor, done, observer, or state) and class name.

Agents

class abmarl.sim.gridworld.agent.GridWorldAgent(initial_position=None, blocking=False, encoding=None, render_shape='o', render_color='gray', render_size=200, **kwargs)

The base agent in the GridWorld.

property blocking

Specify if this agent blocks other agent’s observations and actions.

property configured

All agents must have an id.

property encoding

The numerical value that identifies the type of agent.

The value does not necessarily identify the agent itself. For example, other agents who observe this agent will see this value.

property initial_position

The agent’s initial position at reset.

property position

The agent’s position in the grid.

property render_color

The agent’s color in the rendered grid.

property render_shape

The agent’s shape in the rendered grid.

property render_size

The agent’s size in the rendered grid.

class abmarl.sim.gridworld.agent.GridObservingAgent(view_range=None, **kwargs)

Observe the grid up to view range cells away.

property configured

Observing agents must have an observation space.

property view_range

The number of cells away this agent can observe in each step.

class abmarl.sim.gridworld.agent.MovingAgent(move_range=None, **kwargs)

Move up to move_range cells.

property configured

Acting agents must have an action space.

property move_range

The maximum number of cells away that the agent can move.

class abmarl.sim.gridworld.agent.OrientationAgent(initial_orientation=None, **kwargs)

Agent that has an orientation, either 1: Left, 2: Down, 3: Right, 4: Up.

property initial_orientation

The agent’s starting orientation at the beginning of the simulation.

property orientation

The agent’s orientation.

class abmarl.sim.gridworld.agent.HealthAgent(initial_health=None, **kwargs)

Agents have health points and can die.

Health is bounded between 0 and 1. Agents become inactive when the health falls to 0.

property health

The agent’s health throughout the simulation trajectory.

The health will always be between 0 and 1.

property initial_health

The agent’s initial health between 0 and 1.

class abmarl.sim.gridworld.agent.AttackingAgent(attack_range=None, attack_strength=None, attack_accuracy=None, simultaneous_attacks=1, **kwargs)

Agents that can attack other agents.

property attack_accuracy

The effective accuracy of the agent’s attack.

Should be between 0 and 1. To make deterministic attacks, use 1.

property attack_range

The maximum range of the attack.

property attack_strength

The strength of the attack.

Should be between 0 and 1.

property configured

Acting agents must have an action space.

property simultaneous_attacks

The number of attacks the agent can make per turn.

This parameter is interpreted differently by each attack actor, but generally it specifies how many attacks this agent can carry out in a single step. See specific AttackActor documentation for more information.

class abmarl.sim.gridworld.agent.AmmoAgent(initial_ammo=None, **kwargs)

Agent that has a limited amount of ammunition.

property ammo

The agent’s ammo throughout the simulation trajectory.

property initial_ammo

The ammount of ammo with which this agent starts.

class abmarl.sim.gridworld.agent.AmmoObservingAgent(initial_ammo=None, **kwargs)

Boilterplate class required to work with the AmmoObserver.

State

class abmarl.sim.gridworld.state.StateBaseComponent(agents=None, grid=None, **kwargs)

Abstract State Component base from which all state components will inherit.

abstract reset(**kwargs)

Resets the part of the state for which it is responsible.

class abmarl.sim.gridworld.state.PositionState(no_overlap_at_reset=False, randomize_placement_order=False, **kwargs)

Manage the agents’ positions in the grid.

property no_overlap_at_reset

Attempt to place each agent on its own cell.

Agents with initial positions will override this property.

property randomize_placement_order

Randomize the order in which each agent in a category is placed.

All agents with initial positions will still be placed before agents without initial positions. Now, the subset of agents with initial positions will be placed in random order. Likewise, the subset of agents without initial positions will be placed in random order.

Agents are reshuffled every episode.

property ravelled_positions_available

A dictionary mapping the enodings to a list of positions available to agents of that encoding at reset. The list should contain cells represented in their ravelled form.

reset(**kwargs)

Give agents their starting positions.

We use the agent’s initial position if it exists. Otherwise, we randomly place the agents in the grid.

class abmarl.sim.gridworld.state.MazePlacementState(target_agent=None, barrier_encodings=None, free_encodings=None, cluster_barriers=False, scatter_free_agents=False, **kwargs)

Place agents in the grid based on a maze generated around a target.

Partition the cells into two categories, either a free cell or a barrier, based on a maze, which is generated starting at a target agent’s position. Specify available positions as follows: barrier-encoded agents will be placed at the maze barriers, free-encoded agents will be placed at free positions.

Note: Because the maze is randomly generated at the beginning of each episode and because the agents must be placed in either a free cell or barrier cell according to their encodings, it is highly recommended that none of your agents be given initial positions, except for the target agent.

Parameters:
  • target_agent – Start the maze generation at this agent’s position and place the target agent here.

  • barrier_encodings – A set of encodings corresponding to the maze’s barrier cells.

  • free_encodings – A set of encodings corresponding to the maze’s free cells.

  • cluster_barriers – Prioritize the placement of barriers near the target.

  • scatter_free_agents – Prioritize the placement of free agents away from the target.

property barrier_encodings

A set of encodings corresponding to the maze’s barrier cells.

property cluster_barriers

If True, then prioritize placing barriers near the target agent.

property free_encodings

A set of encodings corresponding to the maze’s free cells.

reset(**kwargs)

Give the agents their starting positions.

property scatter_free_agents

If True, then prioritize placing free agents away from the target agent.

property target_agent

The target agent is the place from which to start the maze generation.

Other agents are placed relative to the target.

class abmarl.sim.gridworld.state.TargetBarriersFreePlacementState(target_agent=None, barrier_encodings=None, free_encodings=None, cluster_barriers=False, scatter_free_agents=False, **kwargs)

Place agents in the grid based on relationship to the target.

Place a target agent, either randomly or based on its initial position. Barrier agents can be placed near the target, and free agents can be placed far away from the target.

Note: Agents with initial positions may conflict with the target agent. If the target agent is configured for random placement, then we recommend not assigning an initial position to any agent.

Parameters:
  • target_agent – Barrier will cluster near this agent.

  • barrier_encodings – Set of encodings indicating which agents are to be treated as barriers.

  • free_encodings – Set of encodings indicating which agents are to be treated as free.

  • cluster_barriers – Prioritize the placement of barriers near the target.

  • scatter_free_agents – Prioritize the placement of free agents away from the target.

property barrier_encodings

A set of encodings corresponding to the maze’s barrier cells.

property cluster_barriers

If True, then prioritize placing barriers near the target agent.

property free_encodings

A set of encodings corresponding to the maze’s free cells.

reset(**kwargs)

Give the agents their starting positions.

property scatter_free_agents

If True, then prioritize placing free agents away from the target agent.

property target_agent

The target agent’s position is used to place the other agents.

class abmarl.sim.gridworld.state.HealthState(agents=None, grid=None, **kwargs)

Manage the state of the agents’ healths.

Every HealthAgent has a health. If that health falls to zero, that agent dies and is remove from the grid.

reset(**kwargs)

Give HealthAgents their starting healths.

We use the agent’s initial health if it exists. Otherwise, we randomly assign a value between 0 and 1.

class abmarl.sim.gridworld.state.AmmoState(agents=None, grid=None, **kwargs)

Manage the state of the agents’ ammo.

Every AmmoAgent has ammo.

reset(**kwargs)

Give AmmoAgents their starting ammo.

class abmarl.sim.gridworld.state.OrientationState(agents=None, grid=None, **kwargs)

Manages the state of the agent’s orientation.

Orientation determines not only which way the agent is “facing” but also includes drift, which will move the agent one cell away in the direction that it is moving.

reset(**kwargs)

Give OrientationAgents their initial orientation (or random if not assigned).

Actors

class abmarl.sim.gridworld.actor.ActorBaseComponent(agents=None, grid=None, **kwargs)

Abstract Actor Component class from which all Actor Components will inherit.

abstract property key

The key in the action dictionary.

The action space of all acting agents in the gridworld framework is a dict. We can build up complex action spaces with multiple components by assigning each component an entry in the action dictionary. Actions will be a dictionary even if your simulation only has one Actor.

abstract process_action(agent, action_dict, **kwargs)

Process the agent’s action.

Parameters:
  • agent – The acting agent.

  • action_dict – The action dictionary for this agent in this step. The dictionary may have different entries, each of which will be processed by different Actors.

abstract property supported_agent_type

The type of Agent that this Actor works with.

If an agent is this type, the Actor will add its entry to the agent’s action space and will process actions for this agent.

class abmarl.sim.gridworld.actor.MoveActor(**kwargs)

Agents can move to nearby squares.

property key

This Actor’s key is “move”.

process_action(agent, action_dict, **kwargs)

The agent can move to nearby squares.

The agent’s new position must be within the grid and the cell-occupation rules must be met.

Parameters:
  • agent – Move the agent if it is a MovingAgent.

  • action_dict – The action dictionary for this agent in this step. If the agent is a MovingAgent, then the action dictionary will contain the “move” entry.

Returns:

True if the move is successful, False otherwise.

property supported_agent_type

This Actor works with MovingAgents.

class abmarl.sim.gridworld.actor.CrossMoveActor(**kwargs)

Agents can move up, down, left, right, or stay in place.

grid_action(cross_action)

Grid action converts the cross action to an action in the grid.

0: Stay 1: Move up 2: Move right 3; Move down 4: Move left

property key

This Actors key is “move”.

process_action(agent, action_dict, **kwargs)

The agent can move up, down, left, right, or stay in place.

The agent’s new position must be within the grid and the cell-occupation rules must be met.

Parameters:
  • agent – Move the agent if it is a MovingAgent.

  • action_dict – The action dictionary for this agent in this step. If the agent is a MovingAgent, then the action dictionary will contain the “move” entry.

Returns:

True if the move is successful, False otherwise.

property supported_agent_type

This Actor works with MovingAgent, but the move_range parameter is ignored.

class abmarl.sim.gridworld.actor.DriftMoveActor(**kwargs)

Agents can move up, down, left, right, or stay in place.

If the agent chooses to stay in place or if its attempt to change directions is unsuccessful, then we attempt to drift it in the direction of its orientation. For example, if the agent is moving right in a corridor and attempts to move up, that move will fail and it will continue drifting. Again, if the agent is in the corner and attempts to change orientation (but still in the corner), that change will fail and it will keep its current orientation, even though it is blocked that way too.

process_action(agent, action_dict, **kwargs)

The agent can move up, down, left, right, or stay in place.

If the agent chooses to stay in place or if its attempt to change directions is unsuccessful, then we attempt to drift it in the direction of its orientation.

Parameters:
  • agent – Move the agent if it is a MovingAgent and OrientationAgent.

  • action_dict – The action dictionary for this agent in this step. If the agent is a MovingAgent, then the action dictionary will contain the “move” entry.

Returns:

True if the move is successful, False otherwise.

class abmarl.sim.gridworld.actor.AttackActorBaseComponent(attack_mapping=None, stacked_attacks=False, **kwargs)

Abstract class that provides the properties and structure for attack actors.

The agent chooses to attack other agents within its surrounding grid. The derived attack actor interprets and implements the specific attack. Attacked agents have their health reduced by the attacking agent’s strength and possibly become inactive if their health falls too low.

property attack_mapping

Dict that dictates which agents the attacking agent can attack.

The dictionary maps the attacking agents’ encodings to a set of encodings that they can attack.

property key

This Actor’s key is “attack”.

process_action(attacking_agent, action_dict, **kwargs)

Process the agent’s attack.

The derived attack actor interprets and implements the action. In general, an attack is successful if there are attackable agents such that:

  1. The attackable agent is active.

  2. The attackable agent is positioned at the attacked cell.

  3. The attackable agent is valid according to the attack_mapping.

  4. The attacking agent’s accuracy is high enough.

  5. The attacking agent has enough ammo.

Furthemore, a single agent may only be attacked once if stacked_attacks is False. Additional attacks will be applied on other agents or wasted.

If the attack is successful, then the attacked agent’s health is depleted by the attacking agent’s strength, possibly resulting in its death.

Parameters:
  • attacking_agent – The attacking agent.

  • action_dict – The agent’s action in this step.

Returns:

Tuple of (bool, list). The first value is False if the agent is not an attacking agent or chose not to attack; otherwise it is True. The second value is a list of attacked agents, which will be empty if there was no attack or if the attack failed. Thus, there are three possible outcomes:

  1. An attack was not attempted: False, []

  2. An attack failed: True, []

  3. An attack was successful: True, [non-empty]

property stacked_attacks

Allows an agent to attack the same agent multiple times per step.

When an agent has more than 1 attack per turn, this parameter allows them to use more than one attack on the same agent. Otherwise, the attacks will be applied to other agents, and if there are not enough attackable agents, then the extra attacks will be wasted.

property supported_agent_type

This Actor works with AttackingAgents.

class abmarl.sim.gridworld.actor.BinaryAttackActor(attack_mapping=None, stacked_attacks=False, **kwargs)

Launch attacks in a local grid.

Agents can choose to launch attacks up to their attack count or not to attack at all. For example, if an agent has an attack count of 3, then it can choose no attack, attack once, attack twice, or attack thrice. The BinaryAttackActor searches the nearby local grid defined by the agent’s attack range for attackable agents, and randomly chooses from that set up to the number of attacks issued.

class abmarl.sim.gridworld.actor.EncodingBasedAttackActor(attack_mapping=None, stacked_attacks=False, **kwargs)

Launch attacks in a local grid based on encoding.

The attacking agent specifies how many attacks it would like to use per available encoding, based on its attack count and the attack mapping. For example, if the agent can attack encodings 1 and 2 and has up to 3 attacks available, then it may launch up to 3 attacks on encoding 1 and up to 3 attack on encoding 2. Agents with those encodings in the surrounding grid are liable to be attacked.

class abmarl.sim.gridworld.actor.SelectiveAttackActor(attack_mapping=None, stacked_attacks=False, **kwargs)

Launch attacks in a local grid by cell.

The attack is a local grid centered on the agent’s position, and its size depends on the agent’s attack range. Each cell in the grid has a nonnegative integer up to the agent’s attack count, and it indicates how many attacks to use on that cell.

class abmarl.sim.gridworld.actor.RestrictedSelectiveAttackActor(attack_mapping=None, stacked_attacks=False, **kwargs)

Launch attacks in a local grid by cell.

Agents choose to attack specific cells in the surrounding grid. The agent can attack up to its attack count. It can choose to attack different cells or the same cell multiple times.

Observers

class abmarl.sim.gridworld.observer.ObserverBaseComponent(agents=None, grid=None, **kwargs)

Abstract Observer Component base from which all observer components will inherit.

abstract get_obs(agent, **kwargs)

Observe the state of the simulation.

Parameters:

agent – The agent for which we return an observation.

Returns:

This agent’s observation.

abstract property key

The key in the observation dictionary.

The observation space of all observing agents in the gridworld framework is a dict. We can build up complex observation spaces with multiple components by assigning each component an entry in the observation dictionary. Observations will be a dictionary even if your simulation only has one Observer.

abstract property supported_agent_type

The type of Agent that this Observer works with.

If an agent is this type, the Observer will add its entry to the agent’s observation space and will produce observations for this agent.

class abmarl.sim.gridworld.observer.AmmoObserver(**kwargs)

Agents observe their own ammo.

get_obs(agent, **kwargs)

Agents observe their own ammo

property key

This Observer’s key is “ammo”.

property supported_agent_type

This Observer works with AmmoObservingAgents.

class abmarl.sim.gridworld.observer.AbsolutePositionObserver(**kwargs)

Agents observe their absolute position.

get_obs(agent, **kwargs)

Agents observe their absolute position.

property key

This Observer’s key is “position”.

property supported_agent_type

This Observer works with ObservingAgents

class abmarl.sim.gridworld.observer.AbsoluteEncodingObserver(**kwargs)

Observe the agents in the grid according to their actual positions.

This Observer represents agents by their encoding on cells according to their actual positions in the grid. If there are multiple agents on a single cell with different encodings, only a single randomly chosen encoding will be observed. To be consistent with other built-in observers, masked cells are indicated as -2. Typially, -1 is reserved for out of bounds encoding, but because this Observer only reports cells in the grid, we don’t need an out of bounds distinction. Instead, in order for the observing agent to identify itself distinctly from other agents of the same encoding, it is reported as a -1.

get_obs(agent, **kwargs)

The agent observes the grid.

The observation may include the agent itself indicated by a -1, other agents indicated by their encodings, empty space indicated with a 0, and masked cells indicated as -2, which are masked either because they are too far away or because they are blocked from view by view-blocking agents.

property key

This Observer’s key is “absolute_encoding”.

property supported_agent_type

This Observer work with GridObservingAgents

class abmarl.sim.gridworld.observer.PositionCenteredEncodingObserver(observe_self=True, **kwargs)

Observe a subset of the grid centered on the agent’s position.

The observation is centered around the observing agent’s position. Each agent in the “observation window” is recorded in the relative cell using its encoding. If there are multiple agents on a single cell with different encodings, the agent will observe only one of them chosen at random.

get_obs(agent, **kwargs)

The agent observes a sub-grid centered on its position.

The observation may include other agents, empty spaces, out of bounds, and masked cells, which can be blocked from view by other blocking agents.

Returns:

The observation as a dictionary.

property key

This Observer’s key is “position_centered_encoding”.

property observe_self

Agents can observe themselves, which may hide important information if overlapping is important. This can be turned off by setting observe_self to False.

property supported_agent_type

This Observer works with GridObservingAgents.

class abmarl.sim.gridworld.observer.StackedPositionCenteredEncodingObserver(**kwargs)

Observe a subset of the grid centered on the agent’s position.

The observation is centered around the observing agent’s position. The observing agent sees a stack of observations, one for each encoding, where the number of agents of each encoding at a cell is given rather than the encoding itself. Out of bounds and masked indicators appear in every grid.

get_obs(agent, **kwargs)

The agent observes one or more sub-grids centered on its position.

The observation may include other agents, empty spaces, out of bounds, and masked cells, which can be blocked from view by other blocking agents. Each grid records the number of agents on a particular cell correlated to a specific encoding.

Returns:

The observation as a dictionary.

property key

This Observer’s key is “stacked_position_centered_encoding”.

property supported_agent_type

This Observer works with GridObservingAgents.

Done

class abmarl.sim.gridworld.done.DoneBaseComponent(agents=None, grid=None, **kwargs)

Abstract Done Component class from which all Done Components will inherit.

abstract get_all_done(**kwargs)

Determine if all the agents are done and/or if the simulation is done.

Returns:

True if all agents are done or if the simulation is done. Otherwise False.

abstract get_done(agent, **kwargs)

Determine if an agent is done in this step.

Parameters:

agent – The agent we are querying.

Returns:

True if the agent is done, otherwise False.

class abmarl.sim.gridworld.done.ActiveDone(agents=None, grid=None, **kwargs)

Inactive agents are indicated as done.

get_all_done(**kwargs)

Return True if all agents are inactive. Otherwise, return False.

get_done(agent, **kwargs)

Return True if the agent is inactive. Otherwise, return False.

class abmarl.sim.gridworld.done.OneTeamRemainingDone(agents=None, grid=None, **kwargs)

Inactive agents are indicated as done.

If the only active agents are those who are all of the same encoding, then the simulation ends.

get_all_done(**kwargs)

Return true if all active agents have the same encoding. Otherwise, return false.

class abmarl.sim.gridworld.done.TargetAgentDone(target_mapping=None, **kwargs)

Agents are done when they overlap their target.

The target is prescribed per agent.

get_all_done(**kwargs)

Determine if all the agents are done and/or if the simulation is done.

Returns:

True if all agents are done or if the simulation is done. Otherwise False.

get_done(agent, **kwarg)

Determine if an agent is done in this step.

Parameters:

agent – The agent we are querying.

Returns:

True if the agent is done, otherwise False.

property target_mapping

Maps the agent to its respective target.

Mapping is done via the agents’ ids.

class abmarl.sim.gridworld.done.TargetDestroyedDone(target_mapping=None, **kwargs)

Agents are done when their target agent becomes inactive.

get_all_done(**kwargs)

Determine if all the agents are done and/or if the simulation is done.

Returns:

True if all agents are done or if the simulation is done. Otherwise False.

get_done(agent, **kwarg)

Determine if an agent is done in this step.

Parameters:

agent – The agent we are querying.

Returns:

True if the agent is done, otherwise False.

property target_mapping

Maps the agent to its respective target.

Mapping is done via the agents’ ids.

Wrappers

class abmarl.sim.gridworld.wrapper.ComponentWrapper(agents=None, grid=None, **kwargs)

Wraps GridWorldBaseComponent.

Every wrapper must be able to wrap the respective space and points to/from that space. Agents and Grid are referenced directly from the wrapped component rather than received as initialization parameters.

property agents

The agent dictionary is directly taken from the wrapped component.

abstract check_space(space)

Verify that the space can be wrapped.

property grid

The grid is directly taken from the wrapped component.

abstract unwrap_point(space, point)

Unwrap a point using a reference space.

Parameters:
  • space – The reference space for unwrapping the point.

  • point – The point to unwrap.

property unwrapped

Fall through all the wrappers and obtain the original, completely unwrapped component.

abstract wrap_point(space, point)

Wrap a point using a reference space.

Parameters:
  • space – The reference space for wrapping the point.

  • point – The point to wrap.

abstract wrap_space(space)

Wrap the space.

Parameters:

space – The space to wrap.

abstract property wrapped_component

Get the first-level wrapped component.

class abmarl.sim.gridworld.wrapper.ActorWrapper(component)

Wraps an ActorComponent.

Modify the action space of the agents involved with the Actor, namely the specific actor’s channel. The actions recieved from the trainer are in the wrapped space, so we need to unwrap them to send them to the actor. This is the opposite from how we wrap and unwrap observations.

property key

The key is the same as the wrapped actor’s key.

process_action(agent, action_dict, **kwargs)

Unwrap the action and pass it to the wrapped actor to process.

Parameters:
  • agent – The acting agent.

  • action_dict – The action dictionary for this agent in this step. The action in this channel comes in the wrapped space.

property supported_agent_type

The supported agent type is the same as the wrapped actor’s supported agent type.

property wrapped_component

Get the wrapped actor.

class abmarl.sim.gridworld.wrapper.RavelActionWrapper(component)

Use numpy’s ravel capabilities to convert space and points to Discrete.

check_space(space)

Ensure that the space is of type that can be ravelled to discrete value.

unwrap_point(space, point)

Ravel point to a single discrete value.

wrap_point(space, point)

Unravel a single discrete point to a value in the space.

Recall that the action from the trainer arrives in the wrapped discrete space, so we need to unravel it so that it is in the unwrapped space before giving it to the actor.

wrap_space(space)

Convert the space into a Discrete space.

class abmarl.sim.gridworld.wrapper.ExclusiveChannelActionWrapper(component)

Ravel Dict space and points with top-level exclusion.

This wrapper works with Dict spaces, where each subspace is to be ravelled independently and then combined so that that actions are exclusive. The wrapping occurs in two steps. First, we use numpy’s ravel capabilities to convert each subspace to a Discrete space. Second, we combine the Discrete spaces together in such a way that imposes exclusivity among the subspaces. The exclusion happens only on the top level, so a Dict nested within a Dict will be ravelled without exclusion.

check_space(space)

Top level must be Dict and subspaces must be ravel-able.

unwrap_point(space, point)

Ravel point to a single discrete value.

wrap_point(space, point)

Unravel a single discrete point to a value in the space.

Recall that the action from the trainer arrives in the wrapped discrete space, so we need to unravel it so that it is in the unwrapped space before giving it to the actor.

wrap_space(space)

Convert the space into a Discrete space.

The wrapping occurs in two steps. First, we use numpy’s ravel capabilities to convert each subspace to a Discrete space. Second, we combine the Discrete spaces together, imposing that actions among the subspaces are exclusive.

Abmarl Trainers

class abmarl.trainers.MultiPolicyTrainer(sim=None, policies=None, policy_mapping_fn=None, **kwargs)

Train policies with data generated by agents interacting in a simulation.

compute_actions(obs)

Compute actions for agents in the observation.

Forwards the observations to the respective policy for each agent that reports an observation.

Parameters:

obs – an observation dictionary, where the keys are the agents reporting from the sim and the values are the observations.

Returns:

An action dictionary where the keys are the agents from the observation

and the values are the actions generated from each agent’s policy.

generate_episode(horizon=200, render=False, log=None, **kwargs)

Generate an episode of data.

The fundamental data object is a SAR, a (state, action, reward) tuple. We restart the sim, generating initial observations (states) for agents reporting from the sim. Then we use the compute_action function to generate actions for agents who report an observation. Those actions are given to the sim, which steps forward and generates rewards and new observations for reporting agents. This loop continues until the simulation is done or we hit the horizon.

Parameters:
  • horizon – The maximum number of steps per epsidoe. The episode may finish early, but it will not progress further than this number of steps.

  • render – Renders the simulation. This should be False when training, and can be True when debugging or evaluating in post-processing.

  • log – Output SARS as they are produced to this file, allowing users to see a “play-by-play” of how the simulation progressed. If None, then logging is disabled.

Returns:

Four dictionaries, one for observations, another for actions,

another for rewards, and another for dones. This makes the SAR sequence and provides additional information on the done condition since some algorithms need this. The data is organized by agent_id, so you would call {observations, actions, rewards}[agent_id][i] in order to extract the ith SAR for an agent. NOTE: In multiagent simulations, the number of SARs may differ for each agent.

property policies

A dictionary that maps the policy id’s to a policy object.

property policy_mapping_fn

A function that takes an agent’s id as input and outputs its corresponding policy id.

property sim

The SimulationManager.

abstract train(iterations=10000, **kwargs)

Train the policy objects using generated data.

This function is abstract and should be implemented by the algorithm.

Parameters:
  • iterations – The number of training iterations.

  • **kwargs – Any additional parameter your algorithm may need.

class abmarl.trainers.SinglePolicyTrainer(sim=None, policy=None, **kwargs)

Train a single policy with data generated by agents interacting in a simulation.

property policies

A dictionary that maps the policy id’s to a policy object.

property policy

The policy to train.

property policy_mapping_fn

Return function always returns “policy”, which is the name we give the policy.

class abmarl.trainers.monte_carlo.OnPolicyMonteCarloTrainer(sim=None, policy=None, **kwargs)
train(iterations=10000, gamma=0.9, **kwargs)

Implements on-policy monte carlo.

class abmarl.trainers.DebugTrainer(policies=None, name=None, output_dir=None, **kwargs)

Debug the training setup.

The DebugTrainer generates episodes using the simulation and policies. Rather than training those policies, The DebugTrainer simply dumps the observations, actions, rewards, and dones to disk.

The DebugTrainer can be run without policies. In this case, it generates a random policy for each agent. This effectively debug the simulation without having to debug the policy setup too.

property name

The name of the experiment.

If name is not specified, then we just use “DEBUG”. We append the name with the date and time.

property output_dir

The directory for where to dump the episode data.

If the output dir is not specified, then we use “~/abmarl_results/”. We append the experiment name to the end of the directory.

train(iterations=5, render=False, **kwargs)

Generate episodes and write write to disk.

Nothing is trained here. We just generate and dump the data and visualize the simulation if requested.

Parameters:
  • iterations – The number of episodes to generate.

  • render – Set to True to visualize the simulation.