Agents
Agent
An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.
- The term percept refers to the content an agent’s sensors are perceiving
- An agent’s percept sequence is the complete history of everything the agent has ever perceived
- The agent function maps any given percept sequence to an action (an abstract mathematical description)
- The agent function for an AI agent will be implemented by an agent program (a concrete implementation, running within some physical system)
To illustrate these ideas Russel and Norvig (2022, 55) use a simple example—the vacuum-cleaner world, which consists of a robotic vacuum-cleaning agent in a world consisting of squares that can be either dirty or clean. The vacuum agent perceives which square it is in and whether there is dirt in the square. The agent starts in square A. The available actions are to move to the right, move to the left, suck up the dirt, or do nothing. One very simple agent function is the following: if the current square is dirty, then suck; otherwise, move to the other square.
According to Russel and Norvig (2022, 56), all areas of engineering can be seen as designing artifacts that interact with the world. AI operates at the most interesting end to the spectrum, where the artifacts have significant computational resources and the task environment requires nontrivial decision making.
Rational agent
A rational agent is one that does the right thing.
For each possible percept sequence, a rational agent should select an action that is expected to maximize its performance measure, given the evidence provided by the percept sequence and whatever built-in knowledge the agent has (Russel and Norvig 2022, 58).
What is rational at any given time depends on four things:
- The performance measure that defines the criterion of success
- The agent’s prior knowledge of the environment
- The actions that the agent can performance
- The agent’s percept sequence to date
Under following circumstances, the vacuum cleaning agent is rational:
- The performance measure of the vacuum cleaner might award one point for each clean square at each time step, over a “lifetime” of 1,000 time steps (to prevent the cleaner to oscillate needlessly back and forth).
- The “geography” of the environment is known a priori but the dirt distribution and the initial location of the agent are not. Clean squares stay clean and sucking cleans the current square. The Right and Left actions move the agent one square except when this would take the agent outside the environment in which case the agent remains where it is.
- The only available action is Right, Left, and Suck.
- The agent correctly perceives its location and whether that location contains dirt.
For details such as tabulated agent functions please see Russel and Norvig (2022).
It can be quite hard to formulate a performance measure correctly, however:
If we use, to achieve our purposes, a mechanical agency with those operation we cannot interfere once we have started it […] we had better be quite sure that the purpose built into the machine is the purpose which we really desire (Wiener 1960, 1358)
Rationality
Rationality is not the same as perfection.
- Rationality maximizes expected performance
- Perfection maximizes actual performance
- Perfection requires omniscience
- Rational choice depends only on the percept sequence to date
As the environment is usually not completely known a priori and completely predictable (or stable), information gathering and learning are important parts of rationality (Russel and Norvig 2022, 59).
The vacuum cleaner needs to explore an initially unknown environment (i.e., exploration) to maximize its expected performance. In addition, a vacuum cleaner that learns to predict where and when additional dirt will appear will do better than one that does not.
Environments
Components
Before designing an agent (the solution), the task environment (the problem) must be specified as fully as possible, including
- the performance measure (P),
- the environment (E),
- the actuators (A), and
- the sensors (S)
Russel and Norvig (2022) call the task environment PEAS.
Task environment of a taxi driver agent
- P: Safe, fast, legal, comfortable, maximize profits, minimize impact on other road users
- E: Roads, other road users, police, pedestrians, customers, weather
- A: Steering, accelerator, brake, signal horn, display, speech
- S: Cameras, radar, speedometer, GPS, engine, sensors, accelerometer, microphones, touchscreen
Source: Russel and Norvig (2022, 61)
Properties
Task environments can be categorized along following dimensions (Russel and Norvig 2022, 62–64):
- Fully observable vs. partially observable
- Single agent vs. multi-agent
- Deterministic vs. nondeterministic
- Episodic vs. sequential
- Static vs. dynamic
- Discrete vs. continuous
- Known vs. unknown
Explanations
- If an agent’s sensors give it access to the full state of the environment at any point in time, then we say that the task environment is fully observable (e.g., image analysis).
- When multiple agents intend to maximize a performance measure that depends on the behavior of other agents, we say the environment is multi-agent (e.g., chess).
- When the environment is completely determined by the current state and the actions performed by the agent(s), it is called a deterministic environment (e.g., crossword puzzle). When a model of the environment explicitly uses probabilities, it is called a stochastic environment (e.g., poker).
- If an agent’s experience is divided into atomic episodes in which the agent receives a perception and then performs a single action, and if the next episode does not depend on the actions performed in the previous episodes, then we say that the task environment is episodic (e.g., image analysis).
- If the environment changes while an agent is deliberating, then the environment is dynamic (e.g., taxi driving).
- If the environment has a finite number of different states, we speak of discrete environments (e.g., chess).
- If the outcomes (or outcome probabilities) for all actions are given, then the environment is known (e.g., solitaire card game).
Source: Russel and Norvig (2022), p.62-64
The hardest case is partially observable, multi-agent, nondeterministic, sequential, dynamic, and continuous.
Agent types
Simple reflex agents
Rectangles are used to denote the current internal state of the agent’s decision process, rectangles with rounded corners to represent the background information used in the process.
Simple reflex agents select actions on the basis of the current percept, ignoring the rest of the percept history. Thus, these agents work only if the correct decision can be made on the basis of just the current percept. The environment needs to be fully observable (Russel and Norvig 2022, 68).
Model-based reflex agents
Model-based reflex agents use transition models and sensor models to keep track of the state of the world as perceived by the sensors (i.e., internal state). (Russel and Norvig 2022, 70).
The transition model reflects “how the world works,” i.e., how the world evolves (a) independently of the agent and (b) depending on the agent’s actions.
The sensor model reflects how the state of the world is reflected in the agent’s percepts (i.e., by its sensors).
The representations of states can be placed along an axis of increasing complexity and expressive power (Russel and Norvig 2022, 76–77):
- An atomic representation is one in which each state is treated as a black box with not internal structure, meaning the state either does or does not match what you’re looking for. In a sliding tile puzz_le, for instance, you either have the correct alignment of tiles or you do not.
- A factored representation is one in which the states are defined by set of features (e.g., Boolean, real-valued, or one of a fixed set of symbols). In a sliding puzzle, this might be a simple heuristic like “number of tiles out of place”.
- A structured representation is one in which the states are expressed in form of objects and relations between them (e.g., expressed by logic or probability). Such knowledge about relations called facts.
The more expressive language is much more concise, but makes reasoning and learning more complex.
Goal-based agents
A model-based, goal-based agent keeps track of the world state as well as a set of goals it is trying to achieve. Such an agent chooses an action that will (eventually) lead to the achievement of its goals (Russel and Norvig 2022, 72).
Utility-based agents
A model-based, utility-based agent uses a model of the world, along with a utility function that measures its preferences among states of the world. It chooses the action that leads to the best expected utility, where expected utility is computed by averaging over all possible states, weighted by the probability of the outcome (Russel and Norvig 2022, 73).
The utility function is essentially an internalization of the performance measure.
A utility-based agent has many advantages in terms of flexibility and learning, which are particularly helpful in environments characterized by partial observability and nondeterminism.
In addition, there are cases where the goals are insufficient but a utility-based agent can still make rational decisions based on the probabilities and the utilities of the outcomes:
- When there are conflicting goals, the utility function specifies the appropriate tradeoff.
- Likelihood of success (i.e., goal achievement) can be weighed against the importance of the goals
Model- and utility-based agents are difficult to implement. They need to model and keep track of the task environment, which requires ingenious sensors, sophisticated algorithms, and a high computational complexity.
There are also utility-based agents that are not model-based. These agents just learn what action is best in a particular situation without any “understanding” of its impact on the environment (e.g., based on reinforcement learning).
Main differences
The main difference between simple reflex agents and model-based reflex agents is that the latter keep track of the state of the world. Model-based reflex agents generate knowledge about how the world evolves independently of the agent and how actions of the agent change the world (i.e., they have knowledge about “how the world works”). This knowledge is “stored” in the transition model of the world. A model-based reflex agents still decides on condition-action rules which action to take (i.e., the codified reflexes).
The main difference between model-based reflex agents and goal-based agents is that it does not act on fixed condition-action rules, but on some sort of goal information that describes situations that are desirable (e.g., in the case of route-finding the destination). Based on the goal, the best possible action (based on the knowledge of the world), needs to be selected. Goal-based decision making involves consideration of the future based on the transition model (i.e., “what will happen if I do such-and-such?”) and how it helps to achieve the goal. In reflex agents designs, this information I not explicitly represented, because the built rules map directly from the percepts to actions, without considering/knowing the future state.
The main difference between goal-based agents and utility-based agents is that the performance measure is more general. It does not only consider a binary distinction between “goal achieved” and “goal not achieved” but allows comparing different world states according to their relative utility or expected utility, respectively (i.e., how happy the agent is with the resulting state). Utility-based agents can still make rational decisions when there are conflicting goals for which only one can be achieved (here the utility function needs to specify a trade-off) and when there are several goals that the agent can aim for, none of which can be achieved with certainty (here the utility function provides a which in which the likelihood of success can be weighed against the importance of the goals, e.g., speed and energy consumption in routing).
Example: a goal-based agent for routing just selects actions based on a single, binary goal: reaching the destination; a utility-based agents also considers additional goals like spending as less time as possible on the road, spending as less money as possible, having the best scenery on the trip, etc. and tries to maximize overall utility across these goals. In this example, reaching the destiny is the ultimate goal, without achieving that utility would be zero. However, utility will increase or decrease related to how the actions chosen impact the achievement of the other goals, which importance need to be weighed.
Main differences
In contrast to simple reflex agents, model-based reflex agents keep track of the state of the world (“stored” in the transition model). However, both act on fixed condition-action rules.
Goal-based agents , have some sort of goal information that describes situations that are desirable and enables them to make goal-based decisions making (i.e., involving consideration of the future based on the transition model.
The performance measure of utility-based agents is more general. It does not only consider a binary distinction between “goal achieved” and “goal not achieved” but allows comparing different world states according to their relative utility or expected utility, respectively.
Example: a goal-based agent for routing just selects actions based on a single, binary goal: reaching the destination; a utility-based agents also considers additional goals like spending as less time as possible on the road, spending as less money as possible, having the best scenery on the trip, etc. and tries to maximize overall utility across these goals. In this example, reaching the destiny is the ultimate goal, without achieving that utility would be zero. However, utility will increase or decrease related to how the actions chosen impact the achievement of the other goals, which importance need to be weighed.
Learning agents
Each type of agent can be either hand-programmed or created as a learning agent. The behavior of learning agents is (also) determined by their own experience, while the behavior of hand-programmed agents is solely determined by their initial programming. Thus, learning agents have greater autonony.
A learning agent consists of four conceptual components (Russel and Norvig 2022, p- 74-75), as shown in Figure 6:
- The performance element is responsible for taking percepts and selecting actions (i.e., what has previously been considered the entire agent program).
- The learning element is responsible for improvements. It uses feedback from the critic on how the agent is doing and determines how the performance element should be modified to do better in the future. It can make changes to any of the “knowledge components” shown in the agent diagrams (i.e., condition-action-rules, transition model, sensor model)
- The performance standard is responsible to inform the agent about the meaning of percepts — are they good nor not (e.g., the meaning of receiving no tips from passengers is a negative contribution to an automated taxis’s overall performance). The standard is fixed and cannot be influenced by the agent.
- The problem generator is responsible for suggesting actions that lead to new and informative experiences. The problem generator suggests exploratory actions that may be suboptimal in the short term, but can lead to the discovery of better actions in the long term.
✏️ Exercises
Concepts
Define in your own words the following terms:
- Rationality
- Autonomy
- Agent
- Environment
- Sensor
- Actuator
- Percept
- Agent function
- Agent program
Agent types
Explain the differences between the following agent types in your own words. Describe the component(s) that is/are specific for each type.
- Reflex agent
- Model-based agent
- Goal-based agent
- Utility-based agent
- Learning agent
Vacuum cleaner
Under which circumstances does a robotic vacuum cleaner act rational?
Describe the task environment of such an agent.
PEAS
For each of the following agents, specify the performance measure, the environment, the actuators, and the sensors.
- Microwave oven
- Chess program
- Autonomous supply delivery
Performance measure
Describe a task environment in which the performance measure is easy to specify completely and correctly, and a in which it is not.
Assertions
For each of the following assertions, say whether it is true or false and support your answer with examples or counterexamples where appropriate.
- An agent that senses only partial information about the state cannot be perfectly rational.
- There exist task environments in which no pure reflex agent can behave rationally.
- There exists a task environment in which every agent is rational.
- Every agent is rational in an unobservable environment.
- A perfectly rational poker-playing agent never loses.
Open only if you need help.
- False. Perfect rationality refers to the ability to make gooddecisions given the sensor information received.
- True. A pure reflex agent ignores previous percepts and cannot obtain an optimal state estimate in a partially observable environment
- True. For example, in an environment with a single state, such that all actions have the same reward, it does not matter which action is taken.
- False. Some actions are stupid (and the agent may know this if it has a model) even if it has no environment input.
- False. Unless it draws the perfect hand, the agent can lose if an opponent has better cards.
Task environment
For each of the following activities characterize the task environment it in terms of the properties discussed in the lecture notes.
- Playing soccer
- Exploring the subsurface oceans of Titan
- Shopping for used AI books on the internet
- Playing a tennis match
Task environment #2
For each of the following task environment properties, rank the example task environments from most to least according to how well the environment satisfies the property.
Lay out any assumptions you make to reach your conclusions.
- Fully observable: driving; document classification; tutoring a student in calculus; skin cancer diagnosis from images
- Continuous: driving; spoken conversation; written conversation; climate engineering by stratospheric aerosol injection
- Stochastic: driving; sudoku; poker; soccer
- Static: chat room; checkers; tax planning; tennis