Introduction to AI (I2AI)
Neu-Ulm University of Applied Sciences
March 12, 2025
Rational agents select actions to maximize performance across time.
Planning involves mapping action sequences from current to goal states.
Solutions for search problems are efficient action sequences reaching goal states.
Optimization differs from planning—focuses on finding optimal states, not paths.
Agents that plan by considering a sequence of actions that form a path to a goal are called planning agents (Russel and Norvig 2022, 81).
Here only simple environments are considered (episodic, single agent, fully observable, deterministic, static, discrete, and known).
We assume that information about the environment are given (e.g., a map)
In simple environments, agents can follow a four-phase-problem-solving process (Russel and Norvig 2022, 81–82):
A search problem can be defined formally as (Russel and Norvig 2022, 83):
The formulation of a search problem, encompassing the definition of state space, initial state, goal states, transition model, and other pertinent elements, effectively constitutes the blueprint for a search tree:
The search tree is the computational manifestation of your problem formulation.
The frontier (sometimes called the “open list” or “fringe”) is the collection of nodes that have been generated but not yet expanded or evaluated.
The frontier serves as the boundary between explored and unexplored parts of the search space. It contains all the nodes that are immediate candidates for expansion.
You might also call the frontier the “to-do list” of nodes that the search algorithm maintains.
Search algorithms take a search problem as input and return a solution, or an indication of failure (Russel and Norvig 2022, 89).
Depending on the problem formulation, they can implement different methods:
Performance metrics guide the choice of algorithms based on problem requirements, help predict computational demands, and establish clear expectations about what an algorithm can and cannot do.
The Greedy best-first search uses the evlation function \(f(n) = h(n)\)
It expands the first node with the lowest \(h(n)\) value in the queue.
In route finding problems, the straight-line distance heuristic is used as \(h(n)\).
Greedy best-first graph search is complete in finite state spaces, but not in infinite ones.
The A* search (pronounced “A-star search”) uses the evaluation function \(f(n) = g(n) + h(n)\).
A* search is complete, if it is cost-optimal depends on the heuristic.
The agents considered so far use offline search algorithm. They compute a complete solution before taking their first action. They are not helpful in unknown environments.
Online search agents interleaves computation and action:
These agents can discover successor only for a state that is occupied or that is learned (i.e., contained in a map created online)
Online search is a good idea in dynamic or semi-dynamic environments.
In competitive environments, two or more agents have conflicting goals, which gives rise to adversarial search problems.
The algorithm described in this section is applicable for all games with the following characteristics:
2 players fully observable deterministic zero-sum
As in such games the state of a game is easy to represent, agents are restricted to a few actions and effects of actions are defined by precise rules (Russel and Norvig 2022, 192).
Given a state-space search tree, each node’s MinMax value is calculated.
The MinMax value is the best achievable utility of being in a given sate (against a rational adversary).
MAX
prefers to move to a state of maximum valueMIN
prefers to move to a state of minimum valueThe MinMax algorithm performs a complete depth-first exploration of the game tree (Russel and Norvig 2022, 196–96).
Min
MinMax will choose in each state the action, which leads to a successor of minimum utility — since minimum utility for Max
is maximum utility of Min
. Therefore, to each node in level \(L-1\), the minimum utility of it’s successor nodes is assigned.Max
then knows his next action — the one, which yields to a successor with maximum utility-value.Alpha-Beta Pruning is an improvement of the MinMax algorithm in the sense that it reduces the number of nodes, that have to be evaluated and generated by applying a simple logic (Russel and Norvig 2022, 198–99). The process is as follows:
Max
-node store the maximum value of the underlying subtree traversed so far in the variableMin
-node store the minimum value of the underlying subtree traversed so far in the variableThe concept of a limited planning horizon, as applied in the MinMax-algorithm is one way to handle complex problems, which can not be planned to the goal-state in one step.
Another concept is to construct a tree until a final state in such a way that in each state only promising actions and corresponding successors are generated. This concept is applied by Monte Carlo tree search.
Monte Carlo tree search (MCTS)9 does estimate the value of a state as the average utility10 over a number of simulations of complete games starting from the current state (Russel and Norvig 2022, 207–9).
Each iteration follows four steps:
If the computational budget is reached, MCTS returns with the best action a for the given root node.
Figure 6 shows a tree with the root representing a state where P-A
has won 37/100 playouts done
P-A
has just moved to the root nodeP-B
selects a move to a node where it has won 60/79 playouts; this is the best win percentage among the available movesP-A
will select a move to a node where it has won 16/53 playouts (assuming it plays optimally)P-B
then continues on the leaf node marked 27/35Upper confidence bounds applied to trees (UCT) is a very effective selection policy ranking possible moves based on an upper confidence bound formula (UCB1)
\[ \textrm{UCB1}(n) = \frac{\textrm{U}(n)}{\textrm{N}(n)} + C * \sqrt{\frac{\log{\textrm{N}(\textrm{PARENT}(n))}}{\textrm{N}(n)}} \]
Figure 7 shows a tree where a new child of the selected node is generated and marked with 0/0
(expansion).
A playout for the newly generated child node is performed (simulation).
The result of the simulation is used to update all the search tree nodes going up to the root.
P-B's
nodes are incremented in both the number of wins and the number of playoutsP-A's
nodes are incremented in the number of playouts onlySearch operates over models of the world (which might be observed online)
Problem formulation must follow goal formulation.
Discuss that statement. Is it true or false? Why?
Give a complete problem formulation for each of the following problems. Choose a formulation that is precise enough to be implemented.
Your goal is to navigate a robot out of a maze. It starts in the center of the maze facing north. You can turn the robot to face north, east, south, or west; direct the robot to move forward a certain distance (it will stop before a wall).
Explain in your own words the following terms:
Explain if the MINIMAX
algorithm is complete and optimal.
Can it be beaten by an opponent playing suboptimally? Why (not)?
Come up with a game tree in which MAX will beat a suboptimal MIN.
Read the note about pruning (and consult Russel and Norvig (2022) if necessary).
Explain in your own words, under what conditions a subtree is skipped using Alpha-beta pruning.
Draw an example (game search tree, 3 levels depth).
An atomic representation is one in which each state is treated as a black box with not internal structure, meaning the state either does or does not match what you’re looking for.
Problem formulation is specifically about defining the formal components of a search problem (state space, initial state, goal state, actions, etc.) — it’s about translating a real-world problem into the specific mathematical structure needed for search algorithms.
A state is a situation that an agent can find itself in.
Expressed by a graph whose nodes are the set of all states, and whose links are actions that transform one state into another
Can be a single goal state, a small set of alternative goal states, or a property that applies to many states (e.g, no dirt in any location)
First-in-first-out (FIFO) queue first pops the node that was added to the queue first
Last-in-first-out queue (LIFO; also known as a stack) pops first the most recently added node
First pops the node with the minimum costs according to \(f(n)\)
The average utility is guided by the selection policy.
For games with binary outcomes, average utility equals win percentage
Playout policies biases the moves toward good ones. For Go and other games, playout policies have been successfully learned from self-play by using neural networks. Sometimes also game-specific heuristics are used (e.g., take the corner square in Othello)