Agents

Rational agents

Figure 1: Rational agents interact with environments through sensors and actuators

A rational agent is anything that is

perceiving its environment through sensors,
thinking and deciding on the next actions
(mapping given percepts to actions),
and acting through actuators

Rational means, that the agent acts in a way that is expected to maximize its performance measure, given it’s

built-in knowledge (i.e., the agent function¹),
perceived experience (i.e., the percep sequence²),
and acting capabilities

Example

A robotic vacuum cleaner moves around a grid of squares, some of which are dirty and some of which are clean. The vacuum cleaner perceives where it is and if there is dirt there. It’s actions are move to the right or left, suck up the dirt, or do nothing. The agent function prescribes that if the current square is dirty, it should suck up the dirt; otherwise, it should move to the other square.

Under following circumstances, the vacuum cleaning agent is rational (Russel and Norvig 2022):

The performance measure of the vacuum cleaner awards one point for each clean square at each time step
The only available actions are right, left, and suck.
The “geography” of the environment is known a priori but the dirt distribution and the initial location of the agent are not. Clean squares stay clean and sucking cleans the current square.
The right and left actions move the agent one square except when this would take the agent outside the environment. Then it remains where it is.
The robot correctly perceives its location and whether the square is dirty.

Performance measure

If we use, to achieve our purposes, a mechanical agency with those operation we cannot interfere once we have started it […] we had better be quite sure that the purpose built into the machine is the purpose which we really desire Wiener (1960, 1358)

It is difficult to formulate a performance measure correctly.
This is a reason to be careful.

Rationality vs. perfection

Rationality is not the same as perfection.

Rationality maximizes expected performance.
Perfection maximizes actual performance.
Perfection requires omniscience.
Rational choice depends only on the percept sequence to date.

As the environment is usually not completely known a priori and completely predictable (or stable), information gathering and learning are important parts of rationality (Russel and Norvig 2022, 59).

Example

The vacuum cleaner needs to explore an initially unknown environment (i.e., exploration) to maximize its expected performance. In addition, a vacuum cleaner that learns to predict where and when additional dirt will appear will do better than one that does not.

Environments

Components

Before designing an agent (i.e., the solution), the task environment (i.e., the problem) must be specified as fully as possible, including

the performance measure,
the environment,
the actuators, and
the sensors

Russel and Norvig (2022) uses the short form PEAS to describe these parts of the task environment.

Example of an PEAS description

Task environment of a taxi driver agent:

P: Safe, fast, legal, comfortable, maximize profits, minimize impact on other road users
E: Roads, other road users, police, pedestrians, customers, weather
A: Steering, accelerator, brake, signal horn, display, speech
S: Cameras, radar, speedometer, GPS, engine, sensors, accelerometer, microphones, touchscreen

Source: Russel and Norvig (2022, 61)

Properties

Task environments can be categorized along following dimensions:

Fully observable ↔︎ partially observable
Single agent ↔︎ multi-agent
Deterministic ↔︎ nondeterministic
Episodic ↔︎ sequential
Static ↔︎ dynamic
Discrete ↔︎ continuous
Known ↔︎ unknown

Fully observable ↔︎ partially observable: In a fully observable task environment, the agent has access to the complete state of the environment at all times. There is no hidden information, and the agent can make decisions based on full knowledge of the current state (e.g., chess). In a partially observable task environment, the agent does not have access to the complete state of the environment. Some information is hidden or uncertain, and the agent must make decisions based on incomplete or noisy data (e.g., poker).
Single agent ↔︎ multi-agent: In a single-agent task environment, there is only one agent interacting with the environment. The agent’s actions are solely based on its own goals and perceptions (e.g., crossword puzzles). In a multi-agent task environment, multiple agents interact with each other and the environment. The agents may cooperate, compete, or have mixed interactions.
Deterministic ↔︎ nondeterministic: When the environment is completely determined by the current state and the actions performed by the agent(s), it is called a deterministic environment (e.g., crossword puzzle). When a model of the environment explicitly uses probabilities, it is called a stochastic environment (e.g., poker).
Episodic ↔︎ sequential: In an episodic task environment, each task or episode is independent of the others. The agent’s actions in one episode do not affect future episodes (e.g., spam email filtering). In a sequential task environment, the current task is dependent on previous tasks. The agent’s actions have long-term consequences and affect future states (e.g., chess game).
Static ↔︎ dynamic: In a static task environment, the environment does not change while the agent is deliberating. The agent can take its time to make decisions without worrying about the environment changing (e.g., chess game). In a dynamic task environment, the environment can change while the agent is deliberating. The agent must account for changes and adapt its actions accordingly (e.g., stock-trading).
Discrete ↔︎ continuous: In a discrete task environment, the state space, actions, and time are all distinct and separate. The environment can be broken down into a finite number of states and actions. (e.g., chess). In a continuous task environment, the state space, actions, and time are continuous. The environment cannot be broken down into a finite number of states and actions (e.g., driving).
Known ↔︎ unknown: In a known task environment, the agent has complete information about the environment and the outcomes of its actions. The rules, states, and effects of actions are fully understood. (e.g., solitaire card game). In an unknown task environment, the agent lacks complete information about the environment or the outcomes of its actions. The agent must learn or infer the rules and effects through interaction.

The hardest case is partially observable, multi-agent, nondeterministic, sequential, dynamic, and continuous (Russel and Norvig 2022, 62–64).

Agent types

Simple reflex agents

Figure 2: A simple reflex agent³

Simple reflex agents select actions on the basis of the current percept, ignoring the rest of the percept history. Thus, these agents work only if the correct decision can be made on the basis of just the current percept. The environment needs to be fully observable (Russel and Norvig 2022, 68).

Example

A thermostat makes decisions based solely on the current temperature reading, without considering past temperatures or future predictions.

Model-based reflex agents

Figure 3: A model-based reflex agent

Model-based reflex agents maintain an internal models of the world, which helps them keep track of the current state and make decisions based on this model. This allows them to handle partially observable environments more effectively (Russel and Norvig 2022, 70).

The transition model reflects how the world evolves (a) independently of the agent and (b) depending on the agent’s actions.
The sensor model reflects how the state of the world is reflected in the agent’s percepts (i.e., by its sensors).

Example

A self-driving car uses its transition model to predict the state of the environment reflected in the sensor model and make decisions accordingly.

Types of representation of states

The representations of states can be placed along an axis of increasing complexity and expressive power (Russel and Norvig 2022, 76–77):

An atomic representation is one in which each state is treated as a black box with not internal structure, meaning the state either does or does not match what you’re looking for. In a sliding tile puzzle, for instance, you either have the correct alignment of tiles or you do not.
A factored representation is one in which the states are defined by set of features (e.g., Boolean, real-valued, or one of a fixed set of symbols). In a sliding puzzle, this might be a simple heuristic like “number of tiles out of place”.
A structured representation is one in which the states are expressed in form of objects and relations between them (e.g., expressed by logic or probability). Such knowledge about relations called facts.

The more expressive language is much more concise, but makes reasoning and learning more complex.

Goal-based agents

Figure 4: A model-based, goal-based agent

Goal-based agents make decisions based on a set of goals they aim to achieve. They consider the current state, possible actions, and the outcomes of these actions to determine the best path to reach their goals (Russel and Norvig 2022, 72).

Goal-based agents often use search and planning algorithms to find the optimal sequence of actions to achieve their goals, best-case considering both immediate and long-term consequences.

Example

A delivery robot is designed to navigate from a starting point to a destination within an environment, often avoiding obstacles and optimizing its route.

Utility-based agents

Figure 5: A model-based, utility-based agent

Utility-based agents make decisions by evaluating the utility (or value) of different possible actions and choosing the one that maximizes their overall utility. These agents consider not only the goals but also the trade-offs and preferences to achieve the best possible outcome (Russel and Norvig 2022, 73).

The utility function is a mathematical function that calculates expected utility for all possible states, weighted by the probability of the outcome. The agent evaluates the utility of different actions and selects the one that maximizes its expected utility.

A utility-based agent has many advantages in terms of flexibility and learning, which are particularly helpful in environments characterized by partial observability and nondeterminism. In addition, there are cases where the goals are insufficient but a utility-based agent can still make rational decisions based on the probabilities and the utilities of the outcomes:

When there are conflicting goals, the utility function specifies the appropriate tradeoff.
Likelihood of success (i.e., goal achievement) can be weighed against the importance of the goals

Model- and utility-based agents are difficult to implement. They need to model and keep track of the task environment, which requires ingenious sensors, sophisticated algorithms, and a high computational complexity. There are also utility-based agents that are not model-based. These agents just learn what action is best in a particular situation without any “understanding” of its impact on the environment (e.g., based on reinforcement learning).

Example

A smart thermostat continuously evaluates the utility of different temperature settings and adjusts accordingly to maximize overall utility, balancing comfort and energy savings.

Main differences

The main difference between simple reflex agents and model-based reflex agents is that the latter keep track of the state of the world. Model-based reflex agents generate knowledge about how the world evolves independently of the agent and how actions of the agent change the world (i.e., they have knowledge about “how the world works”). This knowledge is “stored” in the transition model of the world. A model-based reflex agents still decides on condition-action rules which action to take (i.e., the codified reflexes).

The main difference between model-based reflex agents and goal-based agents is that it does not act on fixed condition-action rules, but on some sort of goal information that describes situations that are desirable (e.g., in the case of route-finding the destination). Based on the goal, the best possible action (based on the knowledge of the world), needs to be selected. Goal-based decision making involves consideration of the future based on the transition model (i.e., “what will happen if I do such-and-such?”) and how it helps to achieve the goal. In reflex agents designs, this information I not explicitly represented, because the built rules map directly from the percepts to actions, without considering/knowing the future state.

The main difference between goal-based agents and utility-based agents is that the performance measure is more general. It does not only consider a binary distinction between “goal achieved” and “goal not achieved” but allows comparing different world states according to their relative utility or expected utility, respectively (i.e., how happy the agent is with the resulting state). Utility-based agents can still make rational decisions when there are conflicting goals for which only one can be achieved (here the utility function needs to specify a trade-off) and when there are several goals that the agent can aim for, none of which can be achieved with certainty (here the utility function provides a which in which the likelihood of success can be weighed against the importance of the goals, e.g., speed and energy consumption in routing).

Example: A goal-based agent for routing just selects actions based on a single, binary goal — reaching the destination; a utility-based agents also considers additional goals like spending as less time as possible on the road, spending as less money as possible, having the best scenery on the trip, etc. and tries to maximize overall utility across these goals. In this example, reaching the destiny is the ultimate goal, without achieving that utility would be zero. However, utility will increase or decrease related to how the actions chosen impact the achievement of the other goals, which importance need to be weighed.

Learning agents

Figure 6: A learning agent

Learning agents are AI systems designed to improve their performance over time by learning from their environment and experiences. Unlike traditional AI systems that operate with fixed programming, learning agents adapt, evolve, and refine their actions based on feedback and data. Thus, learning agents have greater autonony.

A learning agent consists of four conceptual components (Russel and Norvig 2022, p- 74-75), as shown in Figure 6:

Learning element: Acquires knowledge and improves performance by analyzing data, interactions, and feedback. Uses techniques such as supervised, unsupervised, and reinforcement learning.
Performance element: Executes tasks based on the knowledge acquired by the learning element.
Performance standard or critic: Evaluates the actions taken by the performance element and provides feedback.
Problem generator: Identifies opportunities for further learning and exploration. Exploratory actions may be suboptimal in the short term, but can lead to the discovery of better actions in the long term.

On rationality

Utility-based learning agents are rational agents as they act so as to achieve the best outcome or, when there is uncertainty, the best expected outcome. This means that for each possible percept sequence, a rational agent should select an action that is expected to maximize its performance measure, given the evidence provided by the percept sequence and whatever built-in knowledge the agent has, which evolves over time (Russel and Norvig 2022, 58).

Evolution of agents

Agentic AI

Definition

Agentic AI is an emerging paradigm in AI that refers to autonomous systems designed to pursue complex goals with minimal human intervention. Acharya, Kuppan, and Divya (2025, 18912)

Core characteristics of Agentic AI are

Autonomy and goal complexity, as agentic AI systems
- can handle multiple complex goals simultaneously,
- can operate independently over extended periods,
- can shift between tasks to achieve higher-order objectives, and
- makes decisions with minimal human supervision
Environmental and situational adaptability, as agentic AI systems
- opperate effectively in dynamic and unpredictable environments
- adapt to changing conditions in real-time
- make decisions with incomplete information
- handle uncertainty effectively
Independent decision-making, as agentic AI systems
- can learn from experience and improve over time
- use reinforcement learning and meta-learning
- demonstrate flexibility in strategy selection
- reconceptualizes approaches based on new information

Agentic AI systems need to have the ability to

gather information from the environment,
maintaining the execution context over long periods,
develop strategies to achieve goals (i.e, independent decision-making),
communicate plans and goals at appropriate abstraction levels,
perform operations that can influence the environment’s state,
learn and adapt to their environment, and
coordinate with other agents or humans in response to current situations (Anthrophic 2024).

Agentic AI vs. traditional AI

Acharya, Kuppan, and Divya (2025) identify three key technical foundations for Agentic AI:

Reinforcement learning enables systems to learn through trial and error, continuously refining strategies based on feedback.
Goal-Oriented architectures provide frameworks for managing complex objectives, breaking larger goals into manageable sub-goals.
Adaptive control mechanisms allow agents to adjust to changing environments, recalibrating parameters in response to external variations.

Comparison with traditional AI

Comparison of traditional AI and Agentic AI based on Acharya, Kuppan, and Divya (2025)
Feature	Traditional AI	Agentic AI
Primary purpose	Task-specific automation	Goal-oriented autonomy
Human intervention	High (predefined parameters)	Low (autonomous adaptability)
Adaptability	Limited	High
Environment interaction	Static or limited context	Dynamic and context-aware
Learning type	Primarily supervised	Reinforcement and self-supervised
Decision-making	Data-driven, static rules	Autonomous, contextual reasoning

Comparison of agent types

Comparison between classical agents, reinforcement learning agents, and agentic AI based on Acharya, Kuppan, and Divya (2025)
Feature	Classical Agents	Learning Agents	Agentic AI
Primary Purpose	Fixed-task automation	Reward-driven optimization	Goal-oriented autonomy
Adaptability	Low	Moderate	High
Learning Type	Supervised	Reinforcement Learning	Hybrid, including RAG and Memory
Applications	Static systems	Dynamic environments	Complex, multi-objective tasks

Exercises

Concepts

Define in your own words the following terms:

Rationality
Autonomy
Agent
Environment
Sensor
Actuator
Percept
Agent function
Agent program

Agent types

Explain the differences between the following agent types in your own words. Describe the component(s) that is/are specific for each type.

Reflex agent
Model-based agent
Goal-based agent
Utility-based agent
Learning agent

Vacuum cleaner

Under which circumstances does a robotic vacuum cleaner act rational?

Describe the task environment of such an agent.

PEAS

For each of the following agents, specify the performance measure, the environment, the actuators, and the sensors.

Microwave oven
Chess program
Autonomous supply delivery

Performance measure

Describe a task environment in which the performance measure is easy to specify completely and correctly, and a in which it is not.

Assertions

For each of the following assertions, say whether it is true or false and support your answer with examples or counterexamples where appropriate.

An agent that senses only partial information about the state cannot be perfectly rational.
There exist task environments in which no pure reflex agent can behave rationally.
There exists a task environment in which every agent is rational.
Every agent is rational in an unobservable environment.
A perfectly rational poker-playing agent never loses.

Solution notes

False. Perfect rationality refers to the ability to make gooddecisions given the sensor information received.
True. A pure reflex agent ignores previous percepts and cannot obtain an optimal state estimate in a partially observable environment
True. For example, in an environment with a single state, such that all actions have the same reward, it does not matter which action is taken.
False. Some actions are stupid (and the agent may know this if it has a model) even if it has no environment input.
False. Unless it draws the perfect hand, the agent can lose if an opponent has better cards.

Task environment

For each of the following activities characterize the task environment it in terms of the properties discussed in the lecture notes.

Playing soccer
Exploring the subsurface oceans of Titan
Shopping for used AI books on the internet
Playing a tennis match

Task environment #2

For each of the following task environment properties, rank the example task environments from most to least according to how well the environment satisfies the property.

Lay out any assumptions you make to reach your conclusions.

Fully observable: driving; document classification; tutoring a student in calculus; skin cancer diagnosis from images
Continuous: driving; spoken conversation; written conversation; climate engineering by stratospheric aerosol injection
Stochastic: driving; sudoku; poker; soccer
Static: chat room; checkers; tax planning; tennis

Literature

Acharya, Deepak Bhaskar, Karthigeyan Kuppan, and B Divya. 2025. “Agentic AI: Autonomous Intelligence for Complex Goals–a Comprehensive Survey.” IEEE Access.

Anthrophic. 2024. “Building Effective Agents.” Anthropic Research Team; https://www.anthropic.com/engineering/building-effective-agents.

Russel, Stuart, and Peter Norvig. 2022. Artificial Intelligence: A Modern Approach. Harlow: Pearson Education.

Wiener, Norbert. 1960. “Some Moral and Technical Consequences of Automation.” Science 131 (3410): 1355–58.

Footnotes

The agent function maps any given percept sequence to an action (an abstract mathematical description).↩︎
The term percept refers to the content an agent’s sensors are perceiving. The percept sequence is the complete history of everything an agent has ever perceived.↩︎
Rectangles are used to denote the current internal state of the agent’s decision process, rectangles with rounded corners to represent the background information used in the process.↩︎