Introduction to AI
Course :: I2AI
Topic :: Environments & Agents
Is AI really just a fancy calculator? We examine what sets agents apart — and why the distinction matters.
Note
💬 Recap
Is AI different from a calculator? If so, why?
Modern AI has moved beyond isolated “calculators.” The paradigm of agency shifts our engineering focus from “correct output” to “intelligent behavior” — accounting for feedback loops, uncertainties, and real-time constraints.
Agency is the capacity of a system to maintain a continuous feedback loop with its environment. Agency requires a mapping of a history of environmental percepts to a sequence of actions designed to achieve a goal or maximize a performance measure.
— Russel & Norvig, 2022
Note
💬 Think about it
What are the core components that define an agent? Explain what each means.
✓ Core components
Note
💬 Think about it
What are the components of the Agent Architecture?
✓ Agent Architecture
Note
💬 Think about it
A calculator takes inputs and produces outputs. Could we consider a calculator to be an agent?
✓ Answer
Technically yes — but the framing provides no design leverage:
“One could view a hand-held calculator as an agent that chooses the action of displaying ‘4’ when given the percept sequence ‘2 + 2 =,’ but such an analysis would hardly aid our understanding of the calculator … AI operates at … the most interesting end of the spectrum, where the artifacts have significant computational resources and the task environment requires nontrivial decision making.”
— p.36, Russel & Norvig, 2022
A rational agent selects an action that is expected to maximize its performance measure, given the prior percept sequence and its built-in knowledge.
Rationality is not about the internal process, but the external outcome.
Rationality is not the same as perfection:
| Metric | Definition | Info Required | Feasibility |
|---|---|---|---|
| Rationality | Maximizes expected performance | Percept sequence + prior knowledge | High — the engineering standard |
| Omniscience | Knows actual outcome of actions | Complete future & present data | Impossible |
| Perfection | Maximizes actual performance | Requires omniscience | Impossible in unpredictable worlds |
Before designing an agent (the solution), the task environment (the problem) must be specified as fully as possible using the PEAS framework.
The task environment must be specified across four dimensions:
Note
📝 Exercise
Describe the task environment of the following agents using PEAS.
| Type | Performance Measure | Environment | Actuators | Sensors |
|---|---|---|---|---|
| Microwave oven | Food heated correctly; time minimized; no burning | Kitchen; food of varying types/density | Magnetron; turntable motor | Temperature sensor; timer; door sensor |
| Chess program | Win; minimize opponent’s probability; compute within time | 8×8 board; opponent; time constraint | Move selection; display | Board state; clock; game history |
| Autonomous supply delivery | On-time delivery; route efficiency; safety | Roads; traffic; pedestrians; weather | Steering; brakes; cargo release | GPS; lidar; radar; cameras; speedometer |
| Bidding at an auction | Obtain item; minimize price paid | Auction house / eBay | Placing a bid | Eyes, ears |
Task environments can be categorized along seven dimensions:
Warning
⚠️ Hardest case
Partially observable, multi-agent, nondeterministic, sequential, dynamic, continuous, and unknown — this is the most challenging combination for agent design (Russel & Norvig, 2022, pp. 62–64).
Which of these games would a rational agent always win and why?
Note
✏️ Exercise
For each assertion, say whether it is true or false and support your answer with examples or counterexamples.
An agent that senses only partial information about the state cannot be perfectly rational.
False. Perfect rationality means making good decisions given available sensor information; partial observability does not preclude it.
There exist task environments in which no pure reflex agent can behave rationally.
True. A pure reflex agent ignores percept history and cannot obtain an optimal state estimate in partially observable environments.
There exists a task environment in which every agent is rational.
True. In a single-state environment where all actions yield the same reward, any action is rational.
Every agent is rational in an unobservable environment.
False. Even without sensory input, some actions are inherently suboptimal — an agent with an internal model can know this.
A perfectly rational poker-playing agent never loses.
False. Rationality maximizes expected outcomes; an opponent may simply hold better cards.
An agentic AI system always outperforms a classical goal-based agent.
False. In simple, static environments a classical agent already acts optimally; agentic AI adds overhead without benefit.
Agentic AI systems can be fully rational without learning capabilities.
False. In dynamic or initially unknown environments, learning is required to compensate for partial or incorrect prior knowledge.
An agentic AI system that can decompose goals into sub-goals is always more rational than one that cannot.
False. Goal decomposition is an architectural feature, not a prerequisite for rationality in simpler environments.
Note
✏️ Exercise
For each of the following activities, characterize the task environment in terms of the properties discussed in the lecture.
| Property | Characterization |
|---|---|
| Observability | Partial — field not fully visible; opponent intentions hidden |
| Agents | Multi — cooperative teammates and competitive opponents |
| Determinism | Stochastic — ball bounce and weather introduce uncertainty |
| Episodes | Sequential — actions affect the flow of the game and future options |
| Dynamics | Dynamic — ball and players continuously move while deliberating |
| Continuity | Continuous — speed and position of players and ball sweep smooth ranges |
| Property | Characterization |
|---|---|
| Observability | Partial — sensors limited to local range in dark, murky ocean |
| Agents | Single — currents treated as physical laws, not agents |
| Determinism | Stochastic — unpredictable currents and unknown obstacles |
| Episodes | Sequential — path taken dictates future discoveries and energy budget |
| Dynamics | Dynamic — currents and conditions change while the agent processes data |
| Continuity | Continuous — movement and navigation occur through continuous space and time |
| Property | Characterization |
|---|---|
| Observability | Partial — prices, stock, and inventories across the web not fully visible |
| Agents | Multi — other buyers, algorithmic sellers, and dynamic pricing bots |
| Determinism | Stochastic — item may be bought by a competing agent before checkout |
| Episodes | Sequential — search → evaluate → add to cart → checkout |
| Dynamics | Static / semidynamic — site waits for input; stock may change concurrently |
| Continuity | Discrete — keystrokes and clicks are distinct, separate actions |
| Property | Characterization |
|---|---|
| Observability | Partial — opponent’s intentions and muscle movements not directly observable |
| Agents | Multi — strictly competitive opponent |
| Determinism | Stochastic — wind, spin, and string bed variation affect ball trajectory |
| Episodes | Sequential — shot placement determines positioning for the next shot |
| Dynamics | Dynamic — ball and opponent continue to move while player deliberates |
| Continuity | Continuous — ball trajectory, swing angles, and player movement are continuous |
From simple reflex agents to learning agents — a progression in capability, complexity, and autonomy.
Fundamental equation of agency is Agent = Architecture + Program. As we move up the complexity scale, we face a trade-off between flexibility and computational overhead.
Note
💬 Recap
What types of agents do you know?
✓ Agent Types
Note
✏️ Exercise
Suggest performance measures for each of the following agents and argue which type of agent should be used.
| Agent | Performance Measure | Agent Type |
|---|---|---|
| Bomb disposal | Bomb does not explode; casualties avoided; mission completed in time | Goal-based; utility-based if time constraints require trade-offs |
| Traffic light control | Minimize avg. wait time; maximize throughput; ensure fairness | Simple reflex for fixed cycles; model-based for queue tracking; utility-based for fairness vs. throughput |
| Microwave oven | Food heated uniformly to target temperature within set time | Simple reflex — fixed rules (time, power setting) to actions; fully observable, deterministic environment |
| Content moderation | Takedown rate; false positive rate; false negative rate; appeal outcomes | Utility-based learning agent — harmful content evolves; appropriate safety/expression tradeoff changes continuously |
Note
💬 Think about it
Both the performance measure and the utility function measure how well an agent is doing. What is the difference between the two?
✓ Key definitions
Note
✏️ Exercise
For each of the following task environment properties, rank the example task environments from most to least according to how well the environment satisfies the property. Lay out any assumptions you make to reach your conclusions.
1. Fully observable — most to least
Document classification → Skin cancer diagnosis → Tutoring a student → Driving
Document classification provides the complete, static text upfront. The full image is visible for diagnosis, though subsurface biology and medical history are hidden. In tutoring, the student’s true understanding is a hidden variable — only explicit answers are observable. Driving is highly partial: blind spots, truck interiors, and other drivers’ intentions are all unobservable.
Assumptions: Document classification provides complete text upfront; diagnosis relies only on the provided image; tutoring treats the student’s mental state as hidden.
2. Continuous — most to least
Climate engineering → Driving → Spoken conversation → Written conversation
Climate engineering involves planetary-scale fluid dynamics operating across sweeping continuous values. Driving sweeps speed, location, and steering angles continuously. Spoken conversation is continuous at the acoustic wave level, even though words are discrete units. Written conversation is strictly discrete — keystrokes, characters, and messages are all distinct.
Assumptions: Climate engineering involves massive fluid models compared to localized driving physics. Spoken conversation is analyzed at the raw audio level.
3. Stochastic — most to least
Soccer → Driving → Poker → Sudoku
Soccer is most stochastic: physical unpredictability (ball bounce, wind) combines with multiple adversarial agents creating chaotic states. Driving is highly stochastic due to unpredictable traffic behaviour and potential hardware failures. Poker is stochastic but constrained — uncertainty is strictly quantified by deck probabilities, without real-world physical chaos. Sudoku is fully deterministic; the board state is entirely determined by the agent’s actions.
Assumptions: Soccer’s adversarial physical complexity edges out driving. Poker’s stochasticity is purely mathematical.
4. Static — most to least
Tax planning → Checkers → Chat room → Tennis
Tax planning is perfectly static — historical data and published laws do not change while the agent computes. Checkers is static: the board does not change while the agent deliberates. A chat room is semidynamic — other agents can post simultaneously, altering context while the agent thinks. Tennis is highly dynamic: ball and opponent continuously move while the player decides how to react.
Assumptions: Tax planning relies on a closed financial year. Checkers is played without a strict clock. Chat room participants do not wait their turn like in a turn-based game.
From reactive rules to autonomous goal pursuit — what changes when agents start acting on their own?
True autonomy is achieved when an agent can compensate for partial or incorrect prior knowledge by learning from its experience. A learning agent consists of four components:
Tip
💡 The Problem Generator as Scientist
Consider Galileo’s experiments at the Tower of Pisa — he wasn’t dropping rocks because the action was inherently useful, but to gather data to update his internal model of motion. This tension between exploration (gathering new information) and exploitation (acting on what is already known) is the heart of autonomous learning.
Note
💬 Recap
What is Agentic AI?
Autonomous systems designed to pursue complex goals with minimal human intervention.
Core characteristics:
Note
✏️ Exercise
For an AI agent that independently conducts scientific literature reviews, characterize the task environment in terms of the properties discussed in the lecture. Then argue why the scenario requires an agentic AI approach rather than a classical agent design.
| Property | Characterization |
|---|---|
| Partially observable | Cannot access all papers at once; paywalls hide content; relevance only clear after reading; full scope of literature never known |
| Multi-agent | Interacts with search engine algorithms, publisher systems, paywalls, and other softbots — an environment of comparable complexity to the physical world |
| Stochastic | Outcome of queries is uncertain; different searches yield different results; relevance judgments are probabilistic (deterministic only with identical queries) |
| Sequential | Search → filter → read → refine → repeat; early mistakes (e.g., missing a key paper) propagate to later conclusions |
| Dynamic | New papers are constantly published; citations and research trends evolve during the review process (static only with a fixed, frozen corpus) |
| Discrete | Actions such as selecting or excluding a paper are discrete choices |
| Unknown | The agent does not know the full relevant literature in advance, nor the optimal search strategy |
A classical agent design is insufficient — an agentic approach is required for three key reasons:
What remains unclear — about agents, environments, or agentic AI?