Search & Planning

Introduction to AI (I2AI)

Andy Weeger

Neu-Ulm University of Applied Sciences

April 4, 2026

Agenda

Warm-up 10 min
Search problems & modelling 20 min
Search algorithms in action 25 min
Adversarial search & MCTS 25 min
Wrap-up 10 min

Search problems & modelling

Recap: Problem-solving process

Agents in simple environments follow a four-phase process:

Goal formulation: identify the objectives based on the current situation
Problem formulation: decide which states and actions to consider
Search: explore possible action sequences to find a solution path
Execution: carry out the solution in the environment

Formal search problem

A search problem is formally defined by:

State space: all possible states the environment can be in
Initial state: the state the agent starts in
Actions \(\textrm{ACTIONS}(s)\) – available actions when in state \(s\)
Transition model \(\textrm{RESULT}(s, a)\) – state resulting from action \(a\) in state \(s\)
Goal test: condition determining whether a state is a goal state
Action cost function \(\textrm{ACTION-COST}(s, a, s')\) – numeric cost of performing \(a\) from \(s\) to \(s'\)
Path: a sequence of actions leading from one state to another
Solution: a path from the initial state to a goal state

Recap: modelling

A simplified map of Romania with road distances in miles (Russel & Norvig, 2022)

Recap: modelling #2

Figure 1: A partial search tree for finding a route from Arad to Bucharest based on the simplified map of Romania (Russel & Norvig, 2022)

Definition sequence

Problem formulation must follow goal formulation.

Is this statement true or false? Why?

Problem formulation

There are six glass boxes in a row, each with a lock. Each of the first five boxes holds a key unlocking the next box in line; the last box holds a banana. You have the key to the first box and want the banana.

Write out all formal components:

State space
Initial state
Actions
Transition model
Goal test
Action cost function

10:00

Solution note

State space: A state is defined by which boxes are open (or equivalently, which keys you hold). Since each box is either open or closed, there are 2⁶ = 64 possible states in theory, though only 7 are reachable in practice: you can have boxes 1 through k open (for k = 0, 1, …, 6).
Initial state: All six boxes are closed. You hold the key to box 1. You do not have the banana.
Actions: \(\textrm{ACTIONS}(s) = \{ \textit{unlock and open box } k \}\), where k is the next locked box for which you hold the key. In each reachable state, there is exactly one available action (or zero, if all boxes are open).
Transition model: \(\textrm{RESULT}(s, \textit{open box } k)\) = the state where box k is now open. If k < 6, you now hold the key to box k+1 (and no longer need the key to box k). If k = 6, you now hold the banana.
Goal test: You have the banana, i.e., box 6 is open.
Action cost function: \(ACTION-COST(s, a, s')\) = 1 for each unlock action (uniform cost). Alternatively, 0 if only the final state matters, not the path.

Search algorithms in action

Recap: uninformed search strategies

Overview of uninformed search strategies

Recap: informed search

Algorithm	Evaluation function	Information used
Greedy best-first	\(f(n) = h(n)\)	Heuristic estimate to goal only
A*	\(f(n) = g(n) + h(n)\)	Actual path cost + heuristic estimate

Key question: What information does each algorithm use to decide which node to expand next?

A* is cost-optimal if \(h(n)\) is admissible (never overestimates the true cost to goal).

Trace the frontier

Simplified graph (edge costs in parentheses):

Arad -> Sibiu (140) -> Fagaras (99) -> Bucharest (211);
Sibiu -> Rimnicu Vilcea (80) -> Pitesti (97) -> Bucharest (101)

Tasks

Breadth-first search from Arad to Bucharest.
List the frontier after each expansion. Which path does BFS return?
Uniform-cost search on the same graph.
List the frontier with cumulative costs after each expansion. Which path does UCS return? Compare with BFS.
(Bonus) Straight-line distance to Bucharest:
Arad=366, Sibiu=253, Fagaras=176, Rimnicu Vilcea=193, Pitesti=100.
Trace A*: how does the heuristic change the expansion order?

15:00

Adversarial search & MCTS

Recap: MinMax

Recap: Alpha-beta pruning

Monte Carlo Tree Search (MCTS)

Instead of exhaustive traversal, MCTS uses repeated simulation:

Selection: follow tree policy (e.g., UCT) from root to a leaf
Expansion: add a new child node at the selected leaf
Simulation: play out the game randomly to a terminal state
Backpropagation: update win/playout counts up to the root

Monte Carlo Tree Search (MCTS) #2

MCTS: expansion and simulation

Pruning

Your task:

Draw a game search tree with 3 levels of depth (MAX at root, MIN at level 1, terminal values at level 2). Assign arbitrary terminal values.
Apply alpha-beta pruning step by step, tracking \(\alpha\) and \(\beta\) at each node.
Mark the pruned branches and explain why each is skipped (i.e., state the \(\alpha \geq \beta\) condition).

Can MinMax be beaten?

Can the MinMax algorithm be beaten by an opponent playing suboptimally?

Why or why not?

Discuss and reach a verdict
Sketch a small game tree where MAX wins against a suboptimal MIN

Wrap-up

Key Takeaways

Search operates over models; planning is “in simulation”
Search is only as good as the models it uses
BFS finds shortest paths (fewest hops); UCS finds cheapest paths (lowest cost); A* adds a heuristic to guide expansion toward the goal

Adversarial search

MinMax assumes optimal adversarial play; a suboptimal opponent only helps MAX
Alpha-Beta Pruning achieves the same result as MinMax while evaluating fewer nodes (\(\alpha \geq \beta\) prune)
MCTS estimates state value via simulation rather than exhaustive search – no evaluation function needed

xkcd

Q&A

Literature

Russel, S., & Norvig, P. (2022). Artificial intelligence: A modern approach. Pearson Education.