Environments & Agents

Is AI Different from a Calculator? If so, why?

Discussion

💬 Recap

Is AI different from a calculator? If so, why?

Modern AI has moved beyond isolated "calculators." The paradigm of agency shifts our engineering focus from "correct output" to "intelligent behavior" — accounting for feedback loops, uncertainties, and real-time constraints.

Agency is the capacity of a system to maintain a continuous feedback loop with its environment. Agency requires a mapping of a history of environmental percepts to a sequence of actions designed to achieve a goal or maximize a performance measure. — Russel & Norvig, 2022

Core Components & Architecture of Agents

💬 Think about it

What are the core components that define an agent? Explain what each means.

✓ Core components

Agent: anything that perceives its environment through sensors and acts upon it through actuators.
Percept: the agent's perceptual inputs at any given instant.
Percept sequence: the complete history of everything the agent has ever perceived; action depends on this full sequence.
Sensors: mechanisms (cameras, GPS, microphones) that receive environmental input.
Actuators: mechanisms (wheels, display screens, robotic joints) that execute actions.

Core Components & Architecture of Agents

💬 Think about it

What are the components of the Agent Architecture?

✓ Agent Architecture

Agent Function: An abstract mathematical mapping f : P* → A — describing how any given percept sequence results in an action. This is the what.
Agent Program': The concrete physical implementation — the actual code — running on a specific architecture. This is the how.

Can We Consider a Calculator an Agent?

Recap

💬 Think about it

A calculator takes inputs and produces outputs. Could we consider a calculator to be an agent?

✓ Answer

Technically yes — but the framing provides no design leverage:

"One could view a hand-held calculator as an agent that chooses the action of displaying '4' when given the percept sequence '2 + 2 =,' but such an analysis would hardly aid our understanding of the calculator … AI operates at … the most interesting end of the spectrum, where the artifacts have significant computational resources and the task environment requires nontrivial decision making."" — p.36, Russel & Norvig, 2022

Rational Agents

A rational agent selects an action that is expected to maximize its performance measure, given the prior percept sequence and its built-in knowledge.

Rationality is not about the internal process, but the external outcome.

Rationality vs. Perfection

Rationality is not the same as perfection:

Rationality maximizes expected performance.
Perfection maximizes actual performance.
Perfection requires omniscience.
Rational choice depends only on the percept sequence to date.

Metric	Definition	Info Required	Feasibility
Rationality	Maximizes expected performance	Percept sequence + prior knowledge	High — the engineering standard
Omniscience	Knows actual outcome of actions	Complete future & present data	Impossible
Perfection	Maximizes actual performance	Requires omniscience	Impossible in unpredictable worlds

PEAS Framework

Definition

The task environment must be specified across four dimensions:

Performance measure
Environment
Actuators
Sensors

Exercise

In-class Group Exercise10 Min

📝 Exercise

Describe the task environment of the following agents using PEAS.

Type	Performance Measure	Environment	Actuators	Sensors
Microwave oven	• Food heated to correct temperature throughout • Heating time minimized • No overcooking, burning, or cold spots	• Kitchen • Food items of varying types, size, density	• Magnetron (microwave emitter) • Turntable motor	• Temperature sensor (interior) • Timer • Door open/close sensor
Chess program	• Win the game • Minimize opponent's winning probability • Compute within time limit	• 8×8 board with 32 pieces • Opponent • Time constraint	• Move selection output (piece + target square) • Display/board to communicate moves	• Current board state • Remaining time on the clock • Full game history
Autonomous supply delivery	• Package delivered on time and undamaged • Route efficiency • Safety	• Roads, traffic, pedestrians, … • Delivery addresses and access points • Weather, …	• Steering, brakes • Cargo hold/release mechanism	• GPS position • Lidar, radar, cameras • Speedometer, accelerometer, …
Bidding on an item at an auction	• Obtain the item (if wanted) • Minimize price paid	• Auction house / eBay	• Placing a bid (by phone, electronically)	• Eyes, ears

Properties of Task Environments

Task environments can be categorized along seven dimensions:

Fully observable ↔ Partially observable: Does the agent have access to the complete state of the environment at all times? (e.g., chess vs. poker)

Single agent ↔ Multi-agent: Is only one agent interacting with the environment? (e.g., crossword puzzle vs. chess)

Deterministic ↔ Nondeterministic: Is the next state completely determined by current state and action? (e.g., crossword vs. poker)

Episodic ↔ Sequential: Is each episode independent of prior ones? (e.g., spam filtering vs. chess)

Static ↔ Dynamic: Does the environment change while the agent is deliberating? (e.g., chess vs. stock trading)

Discrete ↔ Continuous: Is the state space finite and distinct, or continuous? (e.g., chess vs. self-driving)

Known ↔ Unknown: Does the agent have complete information about outcomes of its actions? (e.g., solitaire vs. new environment)

⚠️ Hardest case

Partially observable, multi-agent, nondeterministic, sequential, dynamic, continuous, and unknown — this is the most challenging combination for agent design (Russel & Norvig, 2022, pp. 62–64).

Exercise

In-class Group Exercise10 Min

Which of these games would a rational agent always win and why?

Sudoku
Chess
Tic-Tac-Toe
Lottery
Minesweeper

Exercise

In-class Group Exercise10 Min

Which of these games would a rational agent always win and why?

Sudoku: Always wins (if a solution exists) — single agent, fully observable, deterministic, episodic. Rational = perfection here because the environment is known and static. Only caveat: some puzzles are intentionally unsolvable.
Chess: Theoretically always wins (practically depends) — environment is deterministic and fully observable. However, the state space is enormous (10⁴³ board positions, 10¹²⁰ possible games). A winning/drawing strategy has not yet been explicitly found and is currently computationally infeasible.
Tic-Tac-Toe: Always at least draws — fully observable, deterministic, known. Rational = perfect here. A rational agent with perfect play can always force at least a draw. Against a non-rational agent it can win; against another rational agent it always draws. xkcd/832
Lottery: Rational agent would not play — fully observable, known and static. The agent can calculate exact expected values and determine that keeping its money maximizes expected performance. The rational move is not to play (negative expected value).
Minesweeper: ~50/50 chance — partially observable and stochastic. Many endgame configurations require a pure guess between two equally likely mines. A rational agent reasons perfectly up to that point, then faces a 50/50 with no additional information. Perfection would require knowing mine locations — i.e., omniscience.

Exercise: Assertions

✏️ Exercise

For each assertion, say whether it is true or false and support your answer with examples or counterexamples.

An agent that senses only partial information about the state cannot be perfectly rational.
False. Perfect rationality means making good decisions given available sensor information; partial observability does not preclude it.
There exist task environments in which no pure reflex agent can behave rationally.
True. A pure reflex agent ignores percept history and cannot obtain an optimal state estimate in partially observable environments.
There exists a task environment in which every agent is rational.
True. In a single-state environment where all actions yield the same reward, any action is rational.
Every agent is rational in an unobservable environment.
False. Even without sensory input, some actions are inherently suboptimal — an agent with an internal model can know this.
A perfectly rational poker-playing agent never loses.
False. Rationality maximizes expected outcomes; an opponent may simply hold better cards.
An agentic AI system always outperforms a classical goal-based agent.
False. In simple, static environments a classical agent already acts optimally; agentic AI adds overhead without benefit.
Agentic AI systems can be fully rational without learning capabilities.
False. In dynamic or initially unknown environments, learning is required to compensate for partial or incorrect prior knowledge.
An agentic AI system that can decompose goals into sub-goals is always more rational than one that cannot.
False. Goal decomposition is an architectural feature, not a prerequisite for rationality in simpler environments.

Exercise: Task Environments

✏️ Exercise

For each of the following activities, characterize the task environment in terms of the properties discussed in the lecture.

Playing soccer
Exploring the subsurface oceans of Titan
Shopping for used AI books on the internet
Playing a tennis match

Solution: Part 1

Playing Soccer

Property	Characterization
Observability	Partial — field not fully visible; opponent intentions hidden
Agents	Multi — cooperative teammates and competitive opponents
Determinism	Stochastic — ball bounce and weather introduce uncertainty
Episodes	Sequential — actions affect the flow of the game and future options
Dynamics	Dynamic — ball and players continuously move while deliberating
Continuity	Continuous — speed and position of players and ball sweep smooth ranges

Exploring the Subsurface Oceans of Titan

Property	Characterization
Observability	Partial — sensors limited to local range in dark, murky ocean
Agents	Single — currents treated as physical laws, not agents
Determinism	Stochastic — unpredictable currents and unknown obstacles
Episodes	Sequential — path taken dictates future discoveries and energy budget
Dynamics	Dynamic — currents and conditions change while the agent processes data
Continuity	Continuous — movement and navigation occur through continuous space and time

Solution Part 2

Shopping for Used AI books

Property	Characterization
Observability	Partial — prices, stock, and inventories across the web not fully visible
Agents	Multi — other buyers, algorithmic sellers, and dynamic pricing bots
Determinism	Stochastic — item may be bought by a competing agent before checkout
Episodes	Sequential — search → evaluate → add to cart → checkout
Dynamics	Static / semidynamic — site waits for input; stock may change concurrently
Continuity	Discrete — keystrokes and clicks are distinct, separate actions

Playing a Tennis Match

Property	Characterization
Observability	Partial — opponent's intentions and muscle movements not directly observable
Agents	Multi — strictly competitive opponent
Determinism	Stochastic — wind, spin, and string bed variation affect ball trajectory
Episodes	Sequential — shot placement determines positioning for the next shot
Dynamics	Dynamic — ball and opponent continue to move while player deliberates
Continuity	Continuous — ball trajectory, swing angles, and player movement are continuous

Types of Agents

Fundamental equation of agency is Agent = Architecture + Program. As we move up the complexity scale, we face a trade-off between flexibility and computational overhead.

💬 Recap

What types of agents do you know?

✓ Agent Types

Simple reflex agents: Act solely on the current percept using condition-action rules; require a fully observable environment.
Model-based reflex agents: Maintain an internal model of the world (transition + sensor model) to handle partially observable environments.
Goal-based agents: Use goal information and search/planning to select actions that lead to a desired future state.
Utility-based agents: Maximize a utility function over possible outcomes, enabling rational trade-offs between conflicting or uncertain goals.
Learning agents: Improve performance over time via a learning element, performance element, critic, and problem generator — gaining greater autonomy through experience.

Exercise: Performance Measures & Agent Types

In-class Group Exercise10 Min

✏️ Exercise

Suggest performance measures for each of the following agents and argue which type of agent should be used.

Agent	Performance Measure	Agent Type
Bomb disposal	Bomb does not explode; casualties avoided; mission completed in time	Goal-based; utility-based if time constraints require trade-offs between effectiveness and speed
Traffic light control	Minimize avg. wait time; maximize throughput; ensure fairness across lanes	Simple reflex for fixed time cycles; model-based if queue length tracking is required; utility-based for fairness vs throughput utility based agent with tradeoff function
Microwave oven	Food heated uniformly to target temperature within set time	Simple reflex — fixed rules (time, power setting) to actions (run magnetron); fully observable, deterministic environment
Content moderation	Takedown rate of harmful content; false positive rate; false negative rate; appeal outcomes	Utility-based learning agent for tradeoff between safety and freedom of speech; The utility function itself cannot be fully specified in advance, for two reasons: (1) what counts as harmful content evolves, (2) the appropriate tradeoff between safety and expression is not the same in every context. → So you need a utility-based agent because the problem has irreducible competing objectives, and you need a learning agent because both the environment and the right weighting of those objectives change continuously. Neither alone is sufficient.

Performance vs Utility

In-class Plenum

💬 Think about it

Both the performance measure and the utility function measure how well an agent is doing. What is the difference between the two?

✓ Key definitions

Performance measure is a specification from the designer or programmer (external to the agent) to specify what the agent should do. If an agent always acts to maximize or achieve the performance measure, this is a rational agent.
Not all agents have utility functions, for example, reflex agents do not. The utility function is used internally by the robot itself to evaluate the best course of action is to optimally achieve/maximize the performance measure(s), given its perceived state.

Exercise: Properties of Task Environments

✏️ Exercise

For each of the following task environment properties, rank the example task environments from most to least according to how well the environment satisfies the property. Lay out any assumptions you make to reach your conclusions.

Fully observable: driving — document classification — tutoring a student in calculus — skin cancer diagnosis from images
Continuous: driving — spoken conversation — written conversation — climate engineering by stratospheric aerosol injection
Stochastic: driving — sudoku — poker — soccer
Static: chat room — checkers — tax planning — tennis

Solution: Properties of Task Environments (1/2)

1. Fully observable — most to least

Document classification → Skin cancer diagnosis → Tutoring a student → Driving

Document classification provides the complete, static text upfront. The full image is visible for diagnosis, though subsurface biology and medical history are hidden. In tutoring, the student's true understanding is a hidden variable — only explicit answers are observable. Driving is highly partial: blind spots, truck interiors, and other drivers' intentions are all unobservable.

Assumptions: Document classification provides complete text upfront; diagnosis relies only on the provided image; tutoring treats the student's mental state as hidden.

2. Continuous — most to least

Climate engineering → Driving → Spoken conversation → Written conversation

Climate engineering involves planetary-scale fluid dynamics operating across sweeping continuous values. Driving sweeps speed, location, and steering angles continuously. Spoken conversation is continuous at the acoustic wave level, even though words are discrete units. Written conversation is strictly discrete — keystrokes, characters, and messages are all distinct.

Assumptions: Climate engineering involves massive fluid models compared to localized driving physics. Spoken conversation is analyzed at the raw audio level.

Solution: Properties of Task Environments (2/2)

3. Stochastic — most to least

Soccer → Driving → Poker → Sudoku

Soccer is most stochastic: physical unpredictability (ball bounce, wind) combines with multiple adversarial agents creating chaotic states. Driving is highly stochastic due to unpredictable traffic behaviour and potential hardware failures. Poker is stochastic but constrained — uncertainty is strictly quantified by deck probabilities, without real-world physical chaos. Sudoku is fully deterministic; the board state is entirely determined by the agent's actions.

Assumptions: Soccer's adversarial physical complexity edges out driving. Poker's stochasticity is purely mathematical.

4. Static — most to least

Tax planning → Checkers → Chat room → Tennis

Tax planning is perfectly static — historical data and published laws do not change while the agent computes. Checkers is static: the board does not change while the agent deliberates. A chat room is semidynamic — other agents can post simultaneously, altering context while the agent thinks. Tennis is highly dynamic: ball and opponent continuously move while the player decides how to react.

Assumptions: Tax planning relies on a closed financial year. Checkers is played without a strict clock. Chat room participants do not wait their turn like in a turn-based game.

Learning Agents & True Autonomy

True autonomy is achieved when an agent can compensate for partial or incorrect prior knowledge by learning from its experience. A learning agent consists of four components:

Performance Element: The core agent that selects actions based on current knowledge.
Learning Element: Responsible for making improvements to the agent function based on data and experience.
Critic: Evaluates the agent's performance against a fixed external standard and provides feedback to the learning element.
Problem Generator: Suggests exploratory actions that may be suboptimal in the short term but yield vital information for long-term improvement.

💡 The Problem Generator as Scientist

Consider Galileo's experiments at the Tower of Pisa — he wasn't dropping rocks because the action was inherently useful, but to gather data to update his internal model of motion. This tension between exploration (gathering new information) and exploitation (acting on what is already known) is the heart of autonomous learning.

Agentic AI

💬 Recap

What is Agentic AI?

Autonomous systems designed to pursue complex goals with minimal human intervention.

Core characteristics:

Higher autonomy and goal complexity,
ability to adapt to environmental and situational unpredictabilities, and
independent decision-making.

Exercise: Agentic AI for Literature Reviews

In-class Group Exercise10 Min

✏️ Exercise

For an AI agent that independently conducts scientific literature reviews, characterize the task environment in terms of the properties discussed in the lecture. Then argue why the scenario requires an agentic AI approach rather than a classical agent design.

Solution: Task Environment

Property	Characterization
Partially observable	Cannot access all papers at once; paywalls hide content; relevance only clear after reading; full scope of literature never known
Multi-agent	Interacts with search engine algorithms, publisher systems, paywalls, and other softbots — an environment of comparable complexity to the physical world
Stochastic	Outcome of queries is uncertain; different searches yield different results; relevance judgments are probabilistic (deterministic only with identical queries)
Sequential	Search → filter → read → refine → repeat; early mistakes (e.g., missing a key paper) propagate to later conclusions
Dynamic	New papers are constantly published; citations and research trends evolve during the review process (static only with a fixed, frozen corpus)
Discrete	Actions such as selecting or excluding a paper are discrete choices
Unknown	The agent does not know the full relevant literature in advance, nor the optimal search strategy

Solution: Why Agentic AI?

A classical agent design is insufficient — an agentic approach is required for three key reasons:

Goal complexity and task shifting: A literature review is a multi-objective task requiring the agent to decompose goals into sub-goals (search → filter → synthesize) and shift between them autonomously. Classical agents assume predefined, fixed workflows; agentic AI sets and refines subgoals dynamically.
Adaptability and contextual reasoning: If a paper introduces a new relevant term, the agent must update its search strategy accordingly. Agentic systems incorporate reflection loops and flexible strategy selection; classical agents rely on static rules that cannot reconceptualize their approach based on new information.
Handling uncertainty and incomplete information: The environment is partially observable and continuously changing. Agentic AI uses adaptive control mechanisms to operate under uncertainty; classical agents struggle outside predefined, limited contexts with high reliance on supervised, static rules.

Environments & Agents

Agency

Is AI Different from a Calculator? If so, why?

Core Components & Architecture of Agents

Core Components & Architecture of Agents

Can We Consider a Calculator an Agent?

Rational Agents

Rationality vs. Perfection

Environments

PEAS Framework

Exercise

Properties of Task Environments

Exercise

Exercise

Exercise: Assertions

Exercise: Task Environments

Solution: Part 1

Playing Soccer

Exploring the Subsurface Oceans of Titan

Solution Part 2

Shopping for Used AI books

Playing a Tennis Match

Agent Types

Types of Agents

Exercise: Performance Measures & Agent Types

Performance vs Utility

Exercise: Properties of Task Environments

Solution: Properties of Task Environments (1/2)

Solution: Properties of Task Environments (2/2)

Agentic AI

Learning Agents & True Autonomy

Agentic AI

Exercise: Agentic AI for Literature Reviews

Solution: Task Environment

Solution: Why Agentic AI?

Questions?