Environments & Agents

Introduction to AI (I2AI)

Andy Weeger

Neu-Ulm University of Applied Sciences

March 12, 2025

Agents

Rational agents

A rational agent is anything that is

perceiving its environment through sensors,
thinking and deciding on the next actions
(mapping given percepts to actions),
and acting through actuators

Rational means, that the agent acts in a way that is expected to maximize its performance measure, given it’s

built-in knowledge (i.e., the agent function¹),
perceived experience (i.e., the percep sequence²),
and acting capabilities

Example

A robotic vacuum cleaner moves around a grid of squares, some of which are dirty and some of which are clean. The vacuum cleaner perceives where it is and if there is dirt there. It’s actions are move to the right or left, suck up the dirt, or do nothing. The agent function prescribes that if the current square is dirty, it should suck up the dirt; otherwise, it should move to the other square.

Under following circumstances, the vacuum cleaning agent is rational (Russel & Norvig, 2022):

The performance measure of the vacuum cleaner awards one point for each clean square at each time step
The only available actions are right, left, and suck.
The “geography” of the environment is known a priori but the dirt distribution and the initial location of the agent are not. Clean squares stay clean and sucking cleans the current square.
The right and left actions move the agent one square except when this would take the agent outside the environment. Then it remains where it is.
The robot correctly perceives its location and whether the square is dirty.

Exercise

Under which circumstances does a vacuum cleaning agent act rational?

Performance measure

If we use, to achieve our purposes, a mechanical agency with those operation we cannot interfere once we have started it […] we had better be quite sure that the purpose built into the machine is the purpose which we really desire Wiener (1960, p. 1358)

It is difficult to formulate a performance measure correctly.
This is a reason to be careful.

Rationality vs. perfection

Rationality is not the same as perfection.

Rationality maximizes expected performance.
Perfection maximizes actual performance.
Perfection requires omniscience.
Rational choice depends only on the percept sequence to date.

Environments

Components

Before designing an agent (i.e., the solution), the task environment (i.e., the problem) must be specified as fully as possible, including

the performance measure,
the environment,
the actuators, and
the sensors

Russel & Norvig (2022) uses the short form PEAS to describe these parts of the task environment.

Properties

Task environments can be categorized along following dimensions:

Fully observable ↔︎ partially observable
Single agent ↔︎ multi-agent
Deterministic ↔︎ nondeterministic
Episodic ↔︎ sequential
Static ↔︎ dynamic
Discrete ↔︎ continuous
Known ↔︎ unknown

Fully observable ↔︎ partially observable: In a fully observable task environment, the agent has access to the complete state of the environment at all times. There is no hidden information, and the agent can make decisions based on full knowledge of the current state (e.g., chess). In a partially observable task environment, the agent does not have access to the complete state of the environment. Some information is hidden or uncertain, and the agent must make decisions based on incomplete or noisy data (e.g., poker).
Single agent ↔︎ multi-agent: In a single-agent task environment, there is only one agent interacting with the environment. The agent’s actions are solely based on its own goals and perceptions (e.g., crossword puzzles). In a multi-agent task environment, multiple agents interact with each other and the environment. The agents may cooperate, compete, or have mixed interactions.
Deterministic ↔︎ nondeterministic: When the environment is completely determined by the current state and the actions performed by the agent(s), it is called a deterministic environment (e.g., crossword puzzle). When a model of the environment explicitly uses probabilities, it is called a stochastic environment (e.g., poker).
Episodic ↔︎ sequential: In an episodic task environment, each task or episode is independent of the others. The agent’s actions in one episode do not affect future episodes (e.g., spam email filtering). In a sequential task environment, the current task is dependent on previous tasks. The agent’s actions have long-term consequences and affect future states (e.g., chess game).
Static ↔︎ dynamic: In a static task environment, the environment does not change while the agent is deliberating. The agent can take its time to make decisions without worrying about the environment changing (e.g., chess game). In a dynamic task environment, the environment can change while the agent is deliberating. The agent must account for changes and adapt its actions accordingly (e.g., stock-trading).
Discrete ↔︎ continuous: In a discrete task environment, the state space, actions, and time are all distinct and separate. The environment can be broken down into a finite number of states and actions. (e.g., chess). In a continuous task environment, the state space, actions, and time are continuous. The environment cannot be broken down into a finite number of states and actions (e.g., driving).
Known ↔︎ unknown: In a known task environment, the agent has complete information about the environment and the outcomes of its actions. The rules, states, and effects of actions are fully understood. (e.g., solitaire card game). In an unknown task environment, the agent lacks complete information about the environment or the outcomes of its actions. The agent must learn or infer the rules and effects through interaction.

The hardest case is partially observable, multi-agent, nondeterministic, sequential, dynamic, and continuous (Russel & Norvig, 2022, pp. 62–64).

Exercise

Describe the task environment of a taxi driver agent.

Agent types

Simple reflex agents

Model-based reflex agents

Model-based reflex agents maintain an internal models of the world, which helps them keep track of the current state and make decisions based on this model. This allows them to handle partially observable environments more effectively (Russel & Norvig, 2022, p. 70).

The transition model reflects how the world evolves (a) independently of the agent and (b) depending on the agent’s actions.
The sensor model reflects how the state of the world is reflected in the agent’s percepts (i.e., by its sensors).

Example

A self-driving car uses its transition model to predict the state of the environment reflected in the sensor model and make decisions accordingly.

Types of representation of states

The representations of states can be placed along an axis of increasing complexity and expressive power (Russel & Norvig, 2022, pp. 76–77):

An atomic representation is one in which each state is treated as a black box with not internal structure, meaning the state either does or does not match what you’re looking for. In a sliding tile puzzle, for instance, you either have the correct alignment of tiles or you do not.
A factored representation is one in which the states are defined by set of features (e.g., Boolean, real-valued, or one of a fixed set of symbols). In a sliding puzzle, this might be a simple heuristic like “number of tiles out of place”.
A structured representation is one in which the states are expressed in form of objects and relations between them (e.g., expressed by logic or probability). Such knowledge about relations called facts.

The more expressive language is much more concise, but makes reasoning and learning more complex.

Goal-based agents

Figure 4: A model-based, goal-based agent

Utility-based agents

Figure 5: A model-based, utility-based agent

Utility-based agents make decisions by evaluating the utility (or value) of different possible actions and choosing the one that maximizes their overall utility. These agents consider not only the goals but also the trade-offs and preferences to achieve the best possible outcome (Russel & Norvig, 2022, p. 73).

The utility function is a mathematical function that calculates expected utility for all possible states, weighted by the probability of the outcome. The agent evaluates the utility of different actions and selects the one that maximizes its expected utility.

A utility-based agent has many advantages in terms of flexibility and learning, which are particularly helpful in environments characterized by partial observability and nondeterminism. In addition, there are cases where the goals are insufficient but a utility-based agent can still make rational decisions based on the probabilities and the utilities of the outcomes:

When there are conflicting goals, the utility function specifies the appropriate tradeoff.
Likelihood of success (i.e., goal achievement) can be weighed against the importance of the goals

Model- and utility-based agents are difficult to implement. They need to model and keep track of the task environment, which requires ingenious sensors, sophisticated algorithms, and a high computational complexity. There are also utility-based agents that are not model-based. These agents just learn what action is best in a particular situation without any “understanding” of its impact on the environment (e.g., based on reinforcement learning).

Example

A smart thermostat continuously evaluates the utility of different temperature settings and adjusts accordingly to maximize overall utility, balancing comfort and energy savings.

Recap

What are the main differences between the agents?

Learning agents

Learning agents are AI systems designed to improve their performance over time by learning from their environment and experiences. Unlike traditional AI systems that operate with fixed programming, learning agents adapt, evolve, and refine their actions based on feedback and data. Thus, learning agents have greater autonony.

A learning agent consists of four conceptual components (Russel & Norvig, 2022, p- 74-75), as shown in Figure 6:

Learning element: Acquires knowledge and improves performance by analyzing data, interactions, and feedback. Uses techniques such as supervised, unsupervised, and reinforcement learning.
Performance element: Executes tasks based on the knowledge acquired by the learning element.
Performance standard or critic: Evaluates the actions taken by the performance element and provides feedback.
Problem generator: Identifies opportunities for further learning and exploration. Exploratory actions may be suboptimal in the short term, but can lead to the discovery of better actions in the long term.

On rationality

A rational agent is one
that does the right thing.

Utility-based learning agents are rational agents as they act so as to achieve the best outcome or, when there is uncertainty, the best expected outcome. This means that for each possible percept sequence, a rational agent should select an action that is expected to maximize its performance measure, given the evidence provided by the percept sequence and whatever built-in knowledge the agent has, which evolves over time (Russel & Norvig, 2022, p. 58).

Evolution of agents

Agentic AI

Definition

Agentic AI is an emerging paradigm in AI that refers to autonomous systems designed to pursue complex goals with minimal human intervention. Acharya et al. (2025, p. 18912)

Core characteristics of Agentic AI are

higher autonomy and goal complexity,
ability to adapt to environmental and situational unpredictabilities, and
independent decision-making.

Autonomy and goal complexity, as agentic AI systems
- can handle multiple complex goals simultaneously,
- can operate independently over extended periods,
- can shift between tasks to achieve higher-order objectives, and
- makes decisions with minimal human supervision
Environmental and situational adaptability, as agentic AI systems
- opperate effectively in dynamic and unpredictable environments
- adapt to changing conditions in real-time
- make decisions with incomplete information
- handle uncertainty effectively
Independent decision-making, as agentic AI systems
- can learn from experience and improve over time
- use reinforcement learning and meta-learning
- demonstrate flexibility in strategy selection
- reconceptualizes approaches based on new information

Agentic AI systems need to have the ability to

gather information from the environment,
maintaining the execution context over long periods,
develop strategies to achieve goals (i.e, independent decision-making),
communicate plans and goals at appropriate abstraction levels,
perform operations that can influence the environment’s state,
learn and adapt to their environment, and
coordinate with other agents or humans in response to current situations (Anthrophic, 2024).

Comparison with traditional AI

Comparison of traditional AI and Agentic AI based on Acharya et al. (2025)
Feature	Traditional AI	Agentic AI
Primary purpose	Task-specific automation	Goal-oriented autonomy
Human intervention	High (predefined parameters)	Low (autonomous adaptability)
Adaptability	Limited	High
Environment interaction	Static or limited context	Dynamic and context-aware
Learning type	Primarily supervised	Reinforcement and self-supervised
Decision-making	Data-driven, static rules	Autonomous, contextual reasoning

Comparison of agent types

Comparison between classical agents, reinforcement learning agents, and agentic AI based on Acharya et al. (2025)
Feature	Classical Agents	Learning Agents	Agentic AI
Primary Purpose	Fixed-task automation	Reward-driven optimization	Goal-oriented autonomy
Adaptability	Low	Moderate	High
Learning Type	Supervised	Reinforcement Learning	Hybrid, including RAG and Memory
Applications	Static systems	Dynamic environments	Complex, multi-objective tasks

Q&A

Exercises

Concepts

Define in your own words the following terms:

Rationality
Autonomy
Agent
Environment
Sensor
Actuator
Percept
Agent function
Agent program

Agent types

Explain the differences between the following agent types in your own words. Describe the component(s) that is/are specific for each type.

Reflex agent
Model-based agent
Goal-based agent
Utility-based agent
Learning agent

Vacuum cleaner

Under which circumstances does a robotic vacuum cleaner act rational?

Describe the task environment of such an agent.

PEAS

For each of the following agents, specify the performance measure, the environment, the actuators, and the sensors.

Microwave oven
Chess program
Autonomous supply delivery

Performance measure

Describe a task environment in which the performance measure is easy to specify completely and correctly, and a in which it is not.

Assertions

For each of the following assertions, say whether it is true or false and support your answer with examples or counterexamples where appropriate.

An agent that senses only partial information about the state cannot be perfectly rational.
There exist task environments in which no pure reflex agent can behave rationally.
There exists a task environment in which every agent is rational.
Every agent is rational in an unobservable environment.
A perfectly rational poker-playing agent never loses.

Task environment

For each of the following activities characterize the task environment it in terms of the properties discussed in the lecture notes.

Playing soccer
Exploring the subsurface oceans of Titan
Shopping for used AI books on the internet
Playing a tennis match

Task environment #2

For each of the following task environment properties, rank the example task environments from most to least according to how well the environment satisfies the property.

Lay out any assumptions you make to reach your conclusions.

Fully observable: driving; document classification; tutoring a student in calculus; skin cancer diagnosis from images
Continuous: driving; spoken conversation; written conversation; climate engineering by stratospheric aerosol injection
Stochastic: driving; sudoku; poker; soccer
Static: chat room; checkers; tax planning; tennis

Literature

Acharya, D. B., Kuppan, K., & Divya, B. (2025). Agentic AI: Autonomous intelligence for complex goals–a comprehensive survey. IEEE Access.

Anthrophic. (2024). Building effective agents. Anthropic Research Team; https://www.anthropic.com/engineering/building-effective-agents.

Russel, S., & Norvig, P. (2022). Artificial intelligence: A modern approach. Pearson Education.

Wiener, N. (1960). Some moral and technical consequences of automation. Science, 131(3410), 1355–1358.

Footnotes

The agent function maps any given percept sequence to an action (an abstract mathematical description).
The term percept refers to the content an agent’s sensors are perceiving. The percept sequence is the complete history of everything an agent has ever perceived.
Rectangles are used to denote the current internal state of the agent’s decision process, rectangles with rounded corners to represent the background information used in the process.