Agents

Agent

Figure 1: Agents interact with environments through sensors and actuators

An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.

The term percept refers to the content an agent’s sensors are perceiving
An agent’s percept sequence is the complete history of everything the agent has ever perceived
The agent function maps any given percept sequence to an action (an abstract mathematical description)
The agent function for an AI agent will be implemented by an agent program (a concrete implementation, running within some physical system)

Example

To illustrate these ideas Russel & Norvig (2022, p. 55) use a simple example—the vacuum-cleaner world, which consists of a robotic vacuum-cleaning agent in a world consisting of squares that can be either dirty or clean. The vacuum agent perceives which square it is in and whether there is dirt in the square. The agent starts in square A. The available actions are to move to the right, move to the left, suck up the dirt, or do nothing. One very simple agent function is the following: if the current square is dirty, then suck; otherwise, move to the other square.

According to Russel & Norvig (2022, p. 56), all areas of engineering can be seen as designing artifacts that interact with the world. AI operates at the most interesting end to the spectrum, where the artifacts have significant computational resources and the task environment requires nontrivial decision making.

Rational agent

A rational agent is one that does the right thing.

For each possible percept sequence, a rational agent should select an action that is expected to maximize its performance measure, given the evidence provided by the percept sequence and whatever built-in knowledge the agent has (Russel & Norvig, 2022, p. 58).

What is rational at any given time depends on four things:

The performance measure that defines the criterion of success
The agent’s prior knowledge of the environment
The actions that the agent can performance
The agent’s percept sequence to date

Example

Under following circumstances, the vacuum cleaning agent is rational:

The performance measure of the vacuum cleaner might award one point for each clean square at each time step, over a “lifetime” of 1,000 time steps (to prevent the cleaner to oscillate needlessly back and forth).
The “geography” of the environment is known a priori but the dirt distribution and the initial location of the agent are not. Clean squares stay clean and sucking cleans the current square. The Right and Left actions move the agent one square except when this would take the agent outside the environment in which case the agent remains where it is.
The only available action is Right, Left, and Suck.
The agent correctly perceives its location and whether that location contains dirt.

For details such as tabulated agent functions please see Russel & Norvig (2022).

It can be quite hard to formulate a performance measure correctly, however:

If we use, to achieve our purposes, a mechanical agency with those operation we cannot interfere once we have started it […] we had better be quite sure that the purpose built into the machine is the purpose which we really desire (Wiener, 1960, p. 1358)

Rationality

Rationality is not the same as perfection.

Rationality maximizes expected performance
Perfection maximizes actual performance
Perfection requires omniscience
Rational choice depends only on the percept sequence to date

As the environment is usually not completely known a priori and completely predictable (or stable), information gathering and learning are important parts of rationality (Russel & Norvig, 2022, p. 59).

Example

The vacuum cleaner needs to explore an initially unknown environment (i.e., exploration) to maximize its expected performance. In addition, a vacuum cleaner that learns to predict where and when additional dirt will appear will do better than one that does not.

Environments

Components

Before designing an agent (the solution), the task environment (the problem) must be specified as fully as possible, including

the performance measure (P),
the environment (E),
the actuators (A), and
the sensors (S)

Russel & Norvig (2022) call the task environment PEAS.

Example of an PEAS description

Task environment of a taxi driver agent

P: Safe, fast, legal, comfortable, maximize profits, minimize impact on other road users
E: Roads, other road users, police, pedestrians, customers, weather
A: Steering, accelerator, brake, signal horn, display, speech
S: Cameras, radar, speedometer, GPS, engine, sensors, accelerometer, microphones, touchscreen

Source: Russel & Norvig (2022, p. 61)

Properties

Task environments can be categorized along following dimensions (Russel & Norvig, 2022, pp. 62–64):

Fully observable vs. partially observable
Single agent vs. multi-agent
Deterministic vs. nondeterministic
Episodic vs. sequential
Static vs. dynamic
Discrete vs. continuous
Known vs. unknown

Explanations

If an agent’s sensors give it access to the full state of the environment at any point in time, then we say that the task environment is fully observable (e.g., image analysis).
When multiple agents intend to maximize a performance measure that depends on the behavior of other agents, we say the environment is multi-agent (e.g., chess).
When the environment is completely determined by the current state and the actions performed by the agent(s), it is called a deterministic environment (e.g., crossword puzzle). When a model of the environment explicitly uses probabilities, it is called a stochastic environment (e.g., poker).
If an agent’s experience is divided into atomic episodes in which the agent receives a perception and then performs a single action, and if the next episode does not depend on the actions performed in the previous episodes, then we say that the task environment is episodic (e.g., image analysis).
If the environment changes while an agent is deliberating, then the environment is dynamic (e.g., taxi driving).
If the environment has a finite number of different states, we speak of discrete environments (e.g., chess).
If the outcomes (or outcome probabilities) for all actions are given, then the environment is known (e.g., solitaire card game).

Source: Russel & Norvig (2022), p.62-64

The hardest case is partially observable, multi-agent, nondeterministic, sequential, dynamic, and continuous.

Agent types

Simple reflex agents

Figure 2: A simple reflex agent

Rectangles are used to denote the current internal state of the agent’s decision process, rectangles with rounded corners to represent the background information used in the process.

Simple reflex agents select actions on the basis of the current percept, ignoring the rest of the percept history. Thus, these agents work only if the correct decision can be made on the basis of just the current percept. The environment needs to be fully observable (Russel & Norvig, 2022, p. 68).

Model-based reflex agents

Figure 3: A model-based reflex agent

Model-based reflex agents use transition models and sensor models to keep track of the state of the world as perceived by the sensors (i.e., internal state). (Russel & Norvig, 2022, p. 70).

The transition model reflects “how the world works,” i.e., how the world evolves (a) independently of the agent and (b) depending on the agent’s actions.

The sensor model reflects how the state of the world is reflected in the agent’s percepts (i.e., by its sensors).

Types of representation of states

The representations of states can be placed along an axis of increasing complexity and expressive power (Russel & Norvig, 2022, pp. 76–77):

An atomic representation is one in which each state is treated as a black box with not internal structure, meaning the state either does or does not match what you’re looking for. In a sliding tile puzz_le, for instance, you either have the correct alignment of tiles or you do not.
A factored representation is one in which the states are defined by set of features (e.g., Boolean, real-valued, or one of a fixed set of symbols). In a sliding puzzle, this might be a simple heuristic like “number of tiles out of place”.
A structured representation is one in which the states are expressed in form of objects and relations between them (e.g., expressed by logic or probability). Such knowledge about relations called facts.

The more expressive language is much more concise, but makes reasoning and learning more complex.

Goal-based agents

Figure 4: A model-based, goal-based agent

A model-based, goal-based agent keeps track of the world state as well as a set of goals it is trying to achieve. Such an agent chooses an action that will (eventually) lead to the achievement of its goals (Russel & Norvig, 2022, p. 72).

Utility-based agents

Figure 5: A model-based, utility-based agent

A model-based, utility-based agent uses a model of the world, along with a utility function that measures its preferences among states of the world. It chooses the action that leads to the best expected utility, where expected utility is computed by averaging over all possible states, weighted by the probability of the outcome (Russel & Norvig, 2022, p. 73).

The utility function is essentially an internalization of the performance measure.

A utility-based agent has many advantages in terms of flexibility and learning, which are particularly helpful in environments characterized by partial observability and nondeterminism.

In addition, there are cases where the goals are insufficient but a utility-based agent can still make rational decisions based on the probabilities and the utilities of the outcomes:

When there are conflicting goals, the utility function specifies the appropriate tradeoff.
Likelihood of success (i.e., goal achievement) can be weighed against the importance of the goals

Model- and utility-based agents are difficult to implement. They need to model and keep track of the task environment, which requires ingenious sensors, sophisticated algorithms, and a high computational complexity.

There are also utility-based agents that are not model-based. These agents just learn what action is best in a particular situation without any “understanding” of its impact on the environment (e.g., based on reinforcement learning).

Main differences

The main difference between simple reflex agents and model-based reflex agents is that the latter keep track of the state of the world. Model-based reflex agents generate knowledge about how the world evolves independently of the agent and how actions of the agent change the world (i.e., they have knowledge about “how the world works”). This knowledge is “stored” in the transition model of the world. A model-based reflex agents still decides on condition-action rules which action to take (i.e., the codified reflexes).

The main difference between model-based reflex agents and goal-based agents is that it does not act on fixed condition-action rules, but on some sort of goal information that describes situations that are desirable (e.g., in the case of route-finding the destination). Based on the goal, the best possible action (based on the knowledge of the world), needs to be selected. Goal-based decision making involves consideration of the future based on the transition model (i.e., “what will happen if I do such-and-such?”) and how it helps to achieve the goal. In reflex agents designs, this information I not explicitly represented, because the built rules map directly from the percepts to actions, without considering/knowing the future state.

The main difference between goal-based agents and utility-based agents is that the performance measure is more general. It does not only consider a binary distinction between “goal achieved” and “goal not achieved” but allows comparing different world states according to their relative utility or expected utility, respectively (i.e., how happy the agent is with the resulting state). Utility-based agents can still make rational decisions when there are conflicting goals for which only one can be achieved (here the utility function needs to specify a trade-off) and when there are several goals that the agent can aim for, none of which can be achieved with certainty (here the utility function provides a which in which the likelihood of success can be weighed against the importance of the goals, e.g., speed and energy consumption in routing).

Example: a goal-based agent for routing just selects actions based on a single, binary goal: reaching the destination; a utility-based agents also considers additional goals like spending as less time as possible on the road, spending as less money as possible, having the best scenery on the trip, etc. and tries to maximize overall utility across these goals. In this example, reaching the destiny is the ultimate goal, without achieving that utility would be zero. However, utility will increase or decrease related to how the actions chosen impact the achievement of the other goals, which importance need to be weighed.

Learning agents

Figure 6: A learning agent

Each type of agent can be either hand-programmed or created as a learning agent. The behavior of learning agents is (also) determined by their own experience, while the behavior of hand-programmed agents is solely determined by their initial programming. Thus, learning agents have greater autonony.

A learning agent consists of four conceptual components (Russel & Norvig, 2022, p- 74-75), as shown in Figure 6:

The performance element is responsible for taking percepts and selecting actions (i.e., what has previously been considered the entire agent program).
The learning element is responsible for improvements. It uses feedback from the critic on how the agent is doing and determines how the performance element should be modified to do better in the future. It can make changes to any of the “knowledge components” shown in the agent diagrams (i.e., condition-action-rules, transition model, sensor model)
The performance standard is responsible to inform the agent about the meaning of percepts — are they good nor not (e.g., the meaning of receiving no tips from passengers is a negative contribution to an automated taxis’s overall performance). The standard is fixed and cannot be influenced by the agent.
The problem generator is responsible for suggesting actions that lead to new and informative experiences. The problem generator suggests exploratory actions that may be suboptimal in the short term, but can lead to the discovery of better actions in the long term.

✏️ Exercises

Concepts

Define in your own words the following terms:

Rationality
Autonomy
Agent
Environment
Sensor
Actuator
Percept
Agent function
Agent program

Agent types

Explain the differences between the following agent types in your own words. Describe the component(s) that is/are specific for each type.

Reflex agent
Model-based agent
Goal-based agent
Utility-based agent
Learning agent

Vacuum cleaner

Under which circumstances does a robotic vacuum cleaner act rational?

Describe the task environment of such an agent.

PEAS

For each of the following agents, specify the performance measure, the environment, the actuators, and the sensors.

Microwave oven
Chess program
Autonomous supply delivery

Performance measure

Describe a task environment in which the performance measure is easy to specify completely and correctly, and a in which it is not.

Assertions

For each of the following assertions, say whether it is true or false and support your answer with examples or counterexamples where appropriate.

An agent that senses only partial information about the state cannot be perfectly rational.
There exist task environments in which no pure reflex agent can behave rationally.
There exists a task environment in which every agent is rational.
Every agent is rational in an unobservable environment.
A perfectly rational poker-playing agent never loses.

Solution notes
Answers

Open only if you need help.

False. Perfect rationality refers to the ability to make gooddecisions given the sensor information received.
True. A pure reflex agent ignores previous percepts and cannot obtain an optimal state estimate in a partially observable environment
True. For example, in an environment with a single state, such that all actions have the same reward, it does not matter which action is taken.
False. Some actions are stupid (and the agent may know this if it has a model) even if it has no environment input.
False. Unless it draws the perfect hand, the agent can lose if an opponent has better cards.

Task environment

For each of the following activities characterize the task environment it in terms of the properties discussed in the lecture notes.

Playing soccer
Exploring the subsurface oceans of Titan
Shopping for used AI books on the internet
Playing a tennis match

Task environment #2

For each of the following task environment properties, rank the example task environments from most to least according to how well the environment satisfies the property.

Lay out any assumptions you make to reach your conclusions.

Fully observable: driving; document classification; tutoring a student in calculus; skin cancer diagnosis from images
Continuous: driving; spoken conversation; written conversation; climate engineering by stratospheric aerosol injection
Stochastic: driving; sudoku; poker; soccer
Static: chat room; checkers; tax planning; tennis

Literature

Russel, S., & Norvig, P. (2022). Artificial intelligence: A modern approach. Pearson Education.

Wiener, N. (1960). Some moral and technical consequences of automation. Science, 131(3410), 1355–1358.