Search

🧠 Introduction to AI

Andy Weeger

Neu-Ulm University of Applied Sciences

February 12, 2024

Examples

Agents

Agents that plan ahead by considering a sequence of actions that form a path to a goal state are called problem-solving agents (Russel and Norvig 2022, 81)

  • The computational process it undertakes is search
  • The representations the agents use are atomic representations1
  • There are search algorithms for several environments

Here only simple environments are considered (episodic, single agent, fully observable, deterministic, static, discrete, and known).

We assume that information about the environment are given (e.g., a map)

Process

In simple environments, agents can follow a four-phase-problem-solving process (Russel and Norvig 2022, 81–82):

  1. Goal formulation: goals organize behavior by limiting the objectives and hence the actions to be considered
  2. Problem formulation: the agents devices a description of the states and actions necessary to reach the goal—an abstract model of the relevant part of the environment
  3. Search: the agent simulates sequences of actions in its model, searching until it finds a sequence that reaches the goal (i.e., the solution)
  4. Execution: the agent executes the actions in the solution, one at a time

Search problem

Definition

A search problem can be defined formally as (Russel and Norvig 2022, 83):

  • The state space: a set of possible states2 the environment can be in3
  • The initial state: the state that the agent starts
  • The goal state(s): a state that the agent is trying to reach4
  • A transition model: \(\textrm{RESULT}(s,a)\) (returns the state \(s'\) that results from performing action \(a\) in state \(s\))
  • The successor function: \(\textrm{ACTIONS}(s)\) (returns a set of (action, state) pairs for node \(s\), where the state is the state reachable by taking the action \(a\))
  • An action cost function: \(\textrm{ACTION-COST}(s,a,s')\) (gives the numeric cost of performing action \(a\) in state \(s\) to reach state \(s'\))
  • A path: a sequence of actions
  • A solution: a path from the initial state to the goal state

Modelling

Figure 1: Example: A simplified map of Romania, with road distances in miles; based on Russel and Norvig (2022)

Search tree

Structure

Figure 2: Example: A partial search tree for finding a route from Arad to Bucharest; based on Russel and Norvig (2022)

Search algorithms

Definition

Search algorithms take a search problem as input and return a solution, or an indication of failure (Russel and Norvig 2022, 89).

  • They superimpose a search tree over the state-space graph,
  • form various paths from the initial state, and
  • try to find a path that reaches a goal state

They can implement

  • uninformed search methods, which only have access to the problem definition but not clue about how close a state is to the goal(s).
  • informed search methods, which have access to a heuristic function that gives domain-specific hints about how close a state is to the goal(s) (e.g., using straight-line distance in route-finding problems)

Competitive environments

Tree search in action

Deterministic games

The games most commonly studied within AI are deterministic (two-player, turn-taking, fully observable, zero-sum8) (Russel and Norvig 2022, 192).

Possible formalization

  • States: \(S\) (start at \(S_0\))
  • Player: \(\textrm{TO-MOVE}(s)\) (defines which player has the move in state \(s\))
  • Actions: \(\textrm{ACTIONS}(s)\) (the set of legal moves in state \(s\))
  • Transition model: \(\textrm{RESULTS}(s,a)\) (defines the state resulting from action \(a\) in state \(s\))
  • Terminal test: \(\textrm{IS-TERMINAL}(s)\) (is true when the game is over9)
  • Utility function: \(\textrm{UTILITY}(s,p)\) (defines the final numeric value to player \(p\) when the game ends in terminal state \(s\))

Optimal decisions

Players (here MIN and MAX, alternate turns) need to have a conditional plan—a contingent strategy specifying a response to each move of the opponent.

  • For games with binary outcomes (win or lose), AND-OR10 search tree can be used
    (see Russel and Norvig 2022, 143)
  • For games with multiple outcome scores, the minimax search algorithm is used

Minimax value

Given a state-space search tree, each node’s minimax value is calculated.

The minimax value is the best achievable utility of being in a given sate (against a rational adversary).

  • MAX prefers to move to a state of maximum value
  • MIN prefers to move to a state of minimum value

Adversarial game tree

Figure 4: Example: A adversarial game tree

Minimax search algorithm

The minimax algorithm performs a complete depth-first exploration of the game tree (Russel and Norvig 2022, 196–96).

  • Assumes that the adversary plays optimal
  • Returns action whose terminal state has the optimal \(\textrm{MINIMAX}\) value
    • If the state is a terminal state (\(\textrm{IS-TERMINAL}(s) = true\)):
      return the state’s utility (\(\textrm{UTILITY}(s,p)\))
    • If the next agent is \(\textrm{MAX}\) (\(\textrm{TO-MOVE}(s) = MAX\)): return \(\textrm{MAX-VALUE}(s)\)

The exponential complexity makes the miminmax algorithm impractical for complex games (even with alpha-beta pruning applied; chess game tree size > atoms in the universe).

Pruning

What is pruning?

Read the note about pruning (and consult Russel and Norvig (2022) or the internet if necessary).

Explain in your own words, under what conditions a subtree is skipped using Alpha-beta pruning.

Draw an example (game search tree, 3 levels depth).

Function

To what generic function of agents does MCTS relate to?

It is the utility function that measures its preferences among states of the world.

Function

What should the selection process look like?

Selection

Figure 5: Example: MCTS — selection

Figure 5 shows a tree with the root representing a state where P-A has won 37/100 playouts done

  • P-A has just moved to the root node
  • P-B selects a move to a node where it has won 60/79 playouts; this is the best win percentage among the available moves
  • P-A will select a move to a node where it has won 16/53 playouts (assuming it plays optimally)
  • P-B then continues on the leaf node marked 27/35
  • … until a terminal state is reached

Selection policy example

Upper confidence bounds applied to trees (UCT) is a very effective selection policy ranking possible moves based on an upper confidence bound formula (UCB1)

\[ \textrm{UCB1}(n) = \frac{\textrm{U}(n)}{\textrm{N}(n)} + C * \sqrt{\frac{\log{\textrm{N}(\textrm{PARENT}(n))}}{\textrm{N}(n)}} \]

  • \(\textrm{U}(n)\) is the total utility of all playouts that went through node \(n\)
  • \(\textrm{N}(n)\) is the number of playouts through node \(n\)
  • \(\textrm{PARENT}(n)\) is the parent node of \(n\) in the tree
  • \(\frac{\textrm{U}(n)}{\textrm{N}(n)}\) is the average utility of \(n\) (exploitation term, “how good are the stats?”)
  • \(\frac{\log{\textrm{N}(\textrm{PARENT}(n))}}{\textrm{N}(n)}\) is higher for \(n\) only explored a few times
    (exploration term, “how much has the child be ‘ignored’?”)
  • \(C\) is a constant that balance exploitation and exploration (theoretically \(\sqrt{2}\))

Expansion and simulation

Figure 6: Example: MCTS — expansion and simulation

Figure 6 shows a tree where a new child of the selected node is generated and marked with 0/0 (expansion).

A playout for the newly generated child node is performed (simulation).

  • Moves for both players according the playout policy15 are chosen
  • The moves are not recorded in the search tree
  • In Figure 6, the simulation results in a win for P-B

Back-propagation

Figure 7: Example: MCTS — selection

The result of the simulation is used to update all the search tree nodes going up to the root.

  • P-B's nodes are incremented in both the number of wins and the number of playouts
  • P-A's nodes are incremented in the number of playouts only

Wrap-up

Search

Search operates over models of the world (which might be observed online)

  • Agents do not try all possible plans
  • Planning is all “in simulation”
  • Search is only as good as the models are

Adversarial search

  • In two-player, discrete, deterministic, turn-taking zero-sum games with perfect information, the minimax algorithm can select optimal moves by a depth-first search in the game tree
  • Efficiency can be improved by using the alpha-beta search algorithm, which eliminates subtrees that are shown to be irrelevant.
  • Monte Carlo tree search evaluates states by playing out the game all the way to the end to see who won. This playout simulation is repeated multiple times. The evaluation is an average of the results.

xkcd

Figure 8: xkcd 1263: reassuring

✏️ Exercises

Definition sequence

Problem formulation must follow goal formulation.

Discuss that statement. Is it true or false? Why?

Problem formulation

Give a complete problem formulation for each of the following problems. Choose a formulation that is precise enough to be implemented.

  • There are six glass boxes in a row, each with a lock. Each of the first five boxes holds a key unlocking the next box in line; the last box holds a banana. You have the key to the first box, and you want the banana.
  • There is an n x n grid of squares, each square initially being either unpainted floor or a bottomless pit. You start standing on an unpainted floor square, and can either paint the square under you or move into an adjacent unpainted floor square. You want the whole floor painted.

Maze

Your goal is to navigate a robot out of a maze. It starts in the center of the maze facing north. You can turn the robot to face north, east, south, or west; direct the robot to move forward a certain distance (it will stop before a wall).

  1. Formally define this problem as a search problem. How large is the state space?
  2. In navigating a maze, the only place we need to turn is at the intersection of two or more corridors. Reformulate this problem using this observation. How large is the state space now?
  3. From each point in the maze, we can move in any of the four directions until we reach a turning point, and this is the only action we need to do. Reformulate the problem using these actions. Do we need to keep track of the robot’s orientation now?
  4. In our initial description of the problem we already abstracted from the real world. Name three such simplifications we made.

Concepts

Explain in your own words the following terms:

  • Zero-sum
  • Terminal test
  • Minimax value
  • Selection policy
  • Playout policy
  • Monte Carlo tree
  • Back-propagation

MINIMAX

Explain if the MINIMAX algorithm is complete and optimal.

Can it be beaten by an opponent playing suboptimally? Why (not)?

Come up with a game tree in which MAX will beat a suboptimal MIN.

Pruning

Read the note about pruning (and consult Russel and Norvig (2022) if necessary).

Explain in your own words, under what conditions a subtree is skipped using Alpha-beta pruning.

Draw an example (game search tree, 3 levels depth).

Literature

Russel, Stuart, and Peter Norvig. 2022. Artificial Intelligence: A Modern Approach. Harlow: Pearson Education.

Footnotes

  1. An atomic representation is one in which each state is treated as a black box with not internal structure, meaning the state either does or does not match what you’re looking for.

  2. A state is a situation that an agent can find itself in.

  3. Expressed by a graph whose nodes are the set of all states, and whose links are actions that transform one state into another

  4. Can be a single goal state, a small set of alternative goal states, or a property that applies to many states (e.g, no dirt in any location)

  5. first-in-first-out queue first pops the node that was added to the queue first

  6. last-in-first-out queue (also known as a stack) pops first the most recently added node

  7. first pops the node with the minimum costs according to \(f(n)\)

  8. Zero-sum means that what is good for one player is just as bad for the other: there is no “win-win” outcome

  9. States where the game has ended are called terminal states

  10. The search tree consists of AND nodes and OR nodes. AND nodes reflect the environment’s choice of action (which all need to be considers), OR nodes the agent’s own choices in each state (where only one needs to be considered). AND and OR nodes alternate. A solution is a conditional plan that considers every nondeterministic outcome and makes a plan for each one.

  11. Reading recommendation

  12. guided by the selection policy

  13. For games with binary outcomes, average utility equals win percentage

  14. A heuristic evaluation function returns an estimate of the expected utility of state \(s\) to player \(p\).

  15. Playout policies biases the moves toward good ones. For Go and other games, playout policies have been successfully learned from self-play by using neural networks. Sometimes also game-specific heuristics are used (e.g., take the corner square in Othello)