🧠 Introduction to AI
Neu-Ulm University of Applied Sciences
February 12, 2024
Agents that plan ahead by considering a sequence of actions that form a path to a goal state are called problem-solving agents (Russel and Norvig 2022, 81)
Here only simple environments are considered (episodic, single agent, fully observable, deterministic, static, discrete, and known).
We assume that information about the environment are given (e.g., a map)
In simple environments, agents can follow a four-phase-problem-solving process (Russel and Norvig 2022, 81–82):
A search problem can be defined formally as (Russel and Norvig 2022, 83):
Search algorithms take a search problem as input and return a solution, or an indication of failure (Russel and Norvig 2022, 89).
They can implement
The agents considered so far use offline search algorithm. They compute a complete solution before taking their first action. They are not helpful in unknown environments.
Online search agents interleaves computation and action:
These agents can discover successor only for a state that is occupied or that is learned (i.e., contained in a map created online)
Online search is a good idea in dynamic or semi-dynamic environments.
In competitive environments, two or more agents have conflicting goals.
This gives rise to adversarial search problems.
The AI community is particularly interested in games of a simplified nature (e.g., chess, go, and poker), as in such games
The games most commonly studied within AI are deterministic (two-player, turn-taking, fully observable, zero-sum8) (Russel and Norvig 2022, 192).
Possible formalization
Players (here MIN
and MAX
, alternate turns) need to have a conditional plan—a contingent strategy specifying a response to each move of the opponent.
Given a state-space search tree, each node’s minimax value is calculated.
The minimax value is the best achievable utility of being in a given sate (against a rational adversary).
MAX
prefers to move to a state of maximum valueMIN
prefers to move to a state of minimum valueThe minimax algorithm performs a complete depth-first exploration of the game tree (Russel and Norvig 2022, 196–96).
The exponential complexity makes the miminmax algorithm impractical for complex games (even with alpha-beta pruning applied; chess game tree size > atoms in the universe).
What is pruning?
Read the note about pruning (and consult Russel and Norvig (2022) or the internet if necessary).
Explain in your own words, under what conditions a subtree is skipped using Alpha-beta pruning.
Draw an example (game search tree, 3 levels depth).
Monte Carlo tree search (MCTS)11 does estimate the value of a state as the average utility12 over a number of simulations of complete games starting from the current state (Russel and Norvig 2022, 207–9).
Each iteration follows four steps:
Repeats for a set number of iterations or until a given time is over. At the end the successor with the hightest utily is chosen.
To what generic function of agents does MCTS relate to?
It is the utility function that measures its preferences among states of the world.
What should the selection process look like?
Figure 5 shows a tree with the root representing a state where P-A
has won 37/100 playouts done
P-A
has just moved to the root nodeP-B
selects a move to a node where it has won 60/79 playouts; this is the best win percentage among the available movesP-A
will select a move to a node where it has won 16/53 playouts (assuming it plays optimally)P-B
then continues on the leaf node marked 27/35Upper confidence bounds applied to trees (UCT) is a very effective selection policy ranking possible moves based on an upper confidence bound formula (UCB1)
\[ \textrm{UCB1}(n) = \frac{\textrm{U}(n)}{\textrm{N}(n)} + C * \sqrt{\frac{\log{\textrm{N}(\textrm{PARENT}(n))}}{\textrm{N}(n)}} \]
Figure 6 shows a tree where a new child of the selected node is generated and marked with 0/0
(expansion).
The result of the simulation is used to update all the search tree nodes going up to the root.
P-B's
nodes are incremented in both the number of wins and the number of playoutsP-A's
nodes are incremented in the number of playouts onlySearch operates over models of the world (which might be observed online)
Problem formulation must follow goal formulation.
Discuss that statement. Is it true or false? Why?
Give a complete problem formulation for each of the following problems. Choose a formulation that is precise enough to be implemented.
Your goal is to navigate a robot out of a maze. It starts in the center of the maze facing north. You can turn the robot to face north, east, south, or west; direct the robot to move forward a certain distance (it will stop before a wall).
Explain in your own words the following terms:
Explain if the MINIMAX
algorithm is complete and optimal.
Can it be beaten by an opponent playing suboptimally? Why (not)?
Come up with a game tree in which MAX will beat a suboptimal MIN.
Read the note about pruning (and consult Russel and Norvig (2022) if necessary).
Explain in your own words, under what conditions a subtree is skipped using Alpha-beta pruning.
Draw an example (game search tree, 3 levels depth).
An atomic representation is one in which each state is treated as a black box with not internal structure, meaning the state either does or does not match what you’re looking for.
A state is a situation that an agent can find itself in.
Expressed by a graph whose nodes are the set of all states, and whose links are actions that transform one state into another
Can be a single goal state, a small set of alternative goal states, or a property that applies to many states (e.g, no dirt in any location)
first-in-first-out queue first pops the node that was added to the queue first
last-in-first-out queue (also known as a stack) pops first the most recently added node
first pops the node with the minimum costs according to \(f(n)\)
Zero-sum means that what is good for one player is just as bad for the other: there is no “win-win” outcome
States where the game has ended are called terminal states
The search tree consists of AND nodes and OR nodes. AND nodes reflect the environment’s choice of action (which all need to be considers), OR nodes the agent’s own choices in each state (where only one needs to be considered). AND and OR nodes alternate. A solution is a conditional plan that considers every nondeterministic outcome and makes a plan for each one.
guided by the selection policy
For games with binary outcomes, average utility equals win percentage
A heuristic evaluation function returns an estimate of the expected utility of state \(s\) to player \(p\).
Playout policies biases the moves toward good ones. For Go and other games, playout policies have been successfully learned from self-play by using neural networks. Sometimes also game-specific heuristics are used (e.g., take the corner square in Othello)