Search

Examples

Examples for search problems are (Russel & Norvig, 2022, pp. 87–88):

Route-finding problems (e.g., car navigation, airline travel-planning)
Touring problems (e.g., the traveling salesperson problem)
VLSI layout problems (positioning millions of components and connections on a chip)
Robot navigation (e.g., vacuum robots)
Automatic assembly sequencing of complex objects (e.g., protein design)

Agents

Agents that plan ahead by considering a sequence of actions that form a path to a goal state are called problem-solving agents (Russel & Norvig, 2022, p. 81)

The computational process it undertakes is search
The representations the agents use are atomic representations¹
There are search algorithms for several environments

Here only simple environments are considered (episodic, single agent, fully observable, deterministic, static, discrete, and known).

We assume that information about the environment are given (e.g., a map)

There are also search algorithms for problems in partially observable, nondeterministic, unknown, and continuous environments (i.e., complex environments) like local search methods (e.g., hill-climbing search, local beam search, evolutionary algorithms). For details please see Russel & Norvig (2022).

Process

In simple environments, agents can follow a four-phase-problem-solving process (Russel & Norvig, 2022, pp. 81–82):

Goal formulation: goals organize behavior by limiting the objectives and hence the actions to be considered
Problem formulation: the agents devices a description of the states and actions necessary to reach the goal—an abstract model of the relevant part of the environment
Search: the agent simulates sequences of actions in its model, searching until it finds a sequence that reaches the goal (i.e., the solution)
Execution: the agent executes the actions in the solution, one at a time

Problem formulation must follow goal formulation

In goal formulation, we decide which aspects of the world we are interested in, and which can be ignored or abstracted away. Then in problem formulation we decide how to manipulate the important aspects (and ignore the others). If we did problem formulation first we would not know what to include and what to leave out.

It can happen that there is a cycle of iterations between goal formulation, problem formulation, and problem-solving until one arrives at a sufficiently useful and efficient solution.

Search problem

Definition

A search problem can be defined formally as (Russel & Norvig, 2022, p. 83):

The state space: a set of possible states² the environment can be in³
The initial state: the state that the agent starts
The goal state(s): a state that the agent is trying to reach⁴
A transition model: \(\textrm{RESULT}(s,a)\) (returns the state \(s'\) that results from performing action \(a\) in state \(s\))
The successor function: \(\textrm{ACTIONS}(s)\) (returns a set of (action, state) pairs for node \(s\), where the state is the state reachable by taking the action \(a\))
An action cost function: \(\textrm{ACTION-COST}(s,a,s')\) (gives the numeric cost of performing action \(a\) in state \(s\) to reach state \(s'\))
A path: a sequence of actions
A solution: a path from the initial state to the goal state

Modelling

Figure 1: Example: A simplified map of Romania, with road distances in miles; based on Russel & Norvig (2022)

Figure 1 depicts the search problem as model, the state space graph (Russel & Norvig, 2022, pp. 82–84):

State space: cities (vertices, each state occurs only once)
Initial state: Arad
Goal state: Bucharest (goal test: \(Is state == Bucharest?\))
Actions: directed edges between the vertices (paths)
Action costs: numbers on the paths

Abstraction

The model is an abstract mathematical description, here a simple atomic state description. The model is an abstraction as it ignores many details of the reality (e.g., weather and scenery).

A good problem formulation has the right level of detail (i.e., an appropriate level of abstraction).

The choice of a good abstraction involves removing as much detail as possible while retaining validity and ensuring that the abstract actions are easy to carry out. An abstraction is valid if any abstract solution can be elaborated into a solution in the more detailed world.

Search tree

Structure

Figure 2: Example: A partial search tree for finding a route from Arad to Bucharest; based on Russel & Norvig (2022)

Figure 2 visualizes a partial search tree: the first few steps in finding a path from Arad (initial state) to Bucharest (goal).

The root node of the search tree is the initial state \(s\) (Arad)
The available actions considerd using \(ACTIONS(s)\)
New nodes, called child nodes or successor nodes are generated using \(RESULT(s,a)\) (i.e., the root node is expanded)
Each child node has Arad as its parent node
The child node to be considered next is selected by the minimum value of some evaluation function \(f(n)\) (strategies see below)

In Figure 2, nodes that have been expanded are white with bold letters; nodes on the frontier that have been generated but not yet expanded are in white and regular letters; the set of states corresponding to these two types of nodes are said to have been reached. Nodes that could be generated next are shown in faint dashed lines.

Definition of search trees

A search tree is a “what if” tree of plans and their outcomes.

The start state is the root node,
children correspond to successors,
nodes show states, but correspond to PLANS that achieve those states

There are lots of repeated structure in the search tree. Thus, for most problems, the whole tree can never be actually built. In practice, both state space graphs and search trees are constructed on demand and as little as possible (Russel & Norvig, 2022).

Search algorithms require a data structure to keep track of the search tree. A node in the tree is represented by a data structure with four components (Russel & Norvig, 2022, p. 91):

node.STATE: the state to which the node corresponds
node.PARENT: the node in the tree that generated this node
node.ACTION: the action that was applied to the parent’s state to generate this node
node.PATH-COST: the total cost of the path from the initial state to generate this node

Following the PARENT pointers back from a node allows to revocer the states and actions along the path to that node. Doing this from a goal node generates the solution.

Example data structure in pseudo code:

var searchTree = {
  node.STATE: "Bucharest"
  node.PARENT: {
    node.STATE: "Pitesti"
    node.PARENT: {
      node.STATE: "Rimmicu Vilcea"
      node.PARENT: {
        node.STATE: "Sibiu"
        node.PARENT: {
          node.STATE: "Arad"
          node.PARENT: NULL
          node.ACTION: NULL
          node.PATH-COST: 0
        }
        node.ACTION: "Arad to Sibiu"
        node.PATH-COST: 140
      }
      node.ACTION: "Sibiu to Rimmicu Vilcea"
      node.PATH-COST: 220
    }
    node.ACTION: "Rimmicu Vilcea to Pitesti"
    node.PATH-COST: 317
  }
  node.ACTION: "Pitesti to Bucharest"
  node.PATH-COST: 418
}

In addition, a data structure to store the fontier is neded. Usually this is realized using a queue with following functions (Russel & Norvig, 2022, p. 91):

IS-EMPTY(frontier): returns true only if there are no nodes in the frontier
POP(frontier): removes the top node from the frontier and returns it
TOP(frontier): returns (but does not remove) the top node from the frontier
ADD(node,frontier): inserts node into its proper place in the queue

Search algorithms

Definition

Search algorithms take a search problem as input and return a solution, or an indication of failure (Russel & Norvig, 2022, p. 89).

They superimpose a search tree over the state-space graph,
form various paths from the initial state, and
try to find a path that reaches a goal state

They can implement

uninformed search methods, which only have access to the problem definition but not clue about how close a state is to the goal(s).
informed search methods, which have access to a heuristic function that gives domain-specific hints about how close a state is to the goal(s) (e.g., using straight-line distance in route-finding problems)

Uninformed search

Figure 3: Overview of uninformed search strategies

Breadth-first search

expands the shallowest nodes first;
\(f(n)\) is the depth of the node

complete, optimal for unit action costs
exponential space complexity
FIFO queue⁵
a reached table is necessary to store the nodes already reached

Depth-first search

expands the deepest unexpanded node first;
\(f(n)\) is the negative of the depth

neither complete (in infinite state spaces, it can get stock going down an infinite path) nor optimal
linear space complexity, a depth bound can be added
LIFO queue⁶
the frontier is very small

Uniform-cost search

expands the node with the lowest path costs;
\(f(n)\) returns the cost of the path from the root to the current node

complete, optimal for general action costs
priority queue⁷ using cumulative cost

Illustration of strategies

General Process

Add the start node to the frontier using \(ADD(node,frontier)\)
Get the top node from the frontier using \(POP(frontier)\)
Get all the actions avaible for the top node via \(ACTIONS(s)\)
Get the child nodes for the retreived actions via \(RESULT(s,a)\)
Check if one child nodes is a goal state; if yes return the search tree; if not go ahead
Add the child nodes to the frontier \(ADD(node,frontier)\)
Repeat steps 2 to 6 as long as no solution is found and \(IS-EMPTY(frontier)\) is not true

Example

Example using the tree depicted in Figure 2, assuming that the goal is not included:

Add 0 to the frontier
Get the top node from the frontier (0)
Get the actions avaiable from 0 (to 1 and to 2)
Get the child nodes for the retrieved actions (1 and 2)
Check if 1 is a goal state (false)
Add 1 to the frontier
Check if 2 is a goald state (false)
Add 2 to the frontier
Get the top node from the frontier, here using the FIFO (1)
Get the actions avaiable from 1 (to 3 and to 4)
Get the child nodes for the retrieved actions (3 and 4)
Check if 3 is a goal state (false)
Add 3 to the frontier
Check if 4 is a goal state (false)
Add 4 to the frontier
Get the top node from the frontier (2)
Get the actions avaiable from 2 (to 5 and to 6)
Get the child nodes for the retrieved actions (5 and 6)
Check if 5 is a goal state (false)
Add 5 to the frontier
Check if 6 is a goal state (false)
Add 6 to the frontier
Get the top node from the frontier (3)
…

General Process

The overall process is the same, only the frontier queue returns the node that was added last.

General Process

The overall process is the same, only the frontier queue returns the node in the queue with the minimum : cost-paths.

Example

Example using the tree depicted in Figure 1:

Add Arad to the frontier
Get the top node from the frontier (Arad)
Get the actions avaiable from Arad (to Zerind, to Timisoara, to Sibiu)
Get the child nodes for the retrieved actions (Zerind, Timisoara, Sibiu)
Add Zerind, Timisoara, and Sibiu to the frontier
Get the node from the frontier with the lowest path cost (Zerind)
Check if Zerind is a goal state (false)
Get the actions avaiable from Zerind (to Oradea)
Get the child nodes for the retrieved actions (Oradea)
Add Oradea to the frontier
Get the node from the frontier with the lowest path cost (Timisoara)
Check if Timisoara is a goal state (false)
Get the actions avaiable from Timisoara (to Lugoj)
Get the child nodes for the retrieved actions (Lugoj)
Add Lugoj the frontier
Get the node from the frontier with the lowest path cost (Sibiu)
Check if Sibiu is a goal state (false)
Get the actions avaiable from Sibiu (to Fagaras, to Rimmicu Vilcea)
Get the child nodes for the retrieved actions (Fagaras, Rimmicu Vilcea)
Add Fagaras and Rimmicu Vilcea to the frontier
Get the node from the frontier with the lowest path cost (Rimmicu Vilcea)
Check if Rimmicu Vilcea is a goal state (false)
Get the actions avaiable from Rimmicu Vilcea (to Pitesti, to Craiova)
Get the child nodes for the retrieved actions (Pitesti, Craiova)
Add Pitesti and Craiova to the frontier
Get the node from the frontier with the lowest path cost (Lugoj)
Check if Lugoj is a goal state (false)
Get the actions avaiable from Lugoj (to Mehadia)
Get the child nodes for the retrieved actions (Mehadia)
Add Mehadia to the frontier
Get the node from the frontier with the lowest path cost (Fagaras)
Check if Fagaras is a goal state (false)
Get the actions avaiable from Fagaras (to Bucharest)
Get the child nodes for the retrieved actions (Bucharest)
Add Bucharest to the frontier
Get the node from the frontier with the lowest path cost (Mehadia)
Check if Mehadia is a goal state (false)
Get the actions avaiable from Mehadia (to Drobeta)
Get the child nodes for the retrieved actions (Drobeta)
Add Drobeta to the frontier
Get the node from the frontier with the lowest path cost (Pitesti)
Check if Pitesti is a goal state (false)
Get the actions avaiable from Pitesti (to Bucharest, to Craiova)
Get the child nodes for the retrieved actions (Bucharest, Craiova)
Add Bucharest and Craiova to the frontier
Get the node from the frontier with the lowest path cost (Drobeta)
Check if Drobeta is a goal state (false)
Get the actions avaiable from Drobeta (to Craiova)
Get the child nodes for the retrieved actions (Craiova)
Add Craiova to the frontier
Get the node from the frontier with the lowest path cost (Bucharest)
Check if Drobeta is a goal state (true)
Return solution

Note if we had checked for a goald upon generating a node rather than when expanding the lowest-cost node, then we would have returned a higher-cost path (the one through Fagaras)

Measuring problem-solving performance

The performance of problem-solving algorithms can be evaluated in four ways (Russel & Norvig, 2022, p. 93):

Completeness: Is the algorithm guaranteed to find a solution when there is one, and to correctly report failure when there is not?
Cost optimality: Does it find a solution with the lowest : cost-path of all solutions?
Time complexity: How long does it take to find a solution (e.g., measured in seconds or by the number of states and actions considered)?
Space complexity: How much memory is needed to perform the search?

Informed search

Informed (heuristic) search strategies use domain-specific hints about the location of goals (Russel & Norvig, 2022, p. 102).

These strategies can find solutions more efficiently than uninformed strategies.

The hints are generated by a heuristic function \(h(n)\)

Greedy best-first search (Russel & Norvig, 2022, p. 103ff)

The Greedy best-first search uses the evlation function \(f(n) = h(n)\)

It expands the first node with the lowest \(h(n)\) value in the queue.

In route finding problems, the straight-line distance heuristic is used as \(h(n)\).

Greedy best-first graph search is complete in finite state spaces, but not in infinite ones.

A* Search (Russel & Norvig, 2022, p. 103ff)

The A* search (pronounced “A-star search”) uses the evaluation function \(f(n) = g(n) + h(n)\)

\(g(n)\) is the path cost from the initial state to node \(n\) (e.g., computed by using the action-cost function)
\(h(n)\) is the estimated cost of the shortes path from \(n\) to a goal state

A* search is complete, if it is cost-optimal depends on the heuristic

Online search

The agents considered so far use offline search algorithm. They compute a complete solution before taking their first action. They are not helpful in unknown environments.

Online search agents interleaves computation and action:

Takes action,
observes the environment, and
computes the next action,

These agents can discover successor only for a state that is occupied or that is learned (i.e., contained in a map created online)

Online search is a good idea in dynamic or semi-dynamic environments.

Competitive environments

Adversarial search

In competitive environments, two or more agents have conflicting goals.

This gives rise to adversarial search problems.

The AI community is particularly interested in games of a simplified nature (e.g., chess, go, and poker), as in such games

the state of a game is easy to represent,
agents are restricted to a few actions, and
effects of actions are defined by precise rules.

Deterministic games

The games most commonly studied within AI are deterministic (two-player, turn-taking, fully observable, zero-sum⁸) (Russel & Norvig, 2022, p. 192).

Possible formalization

States: \(S\) (start at \(S_0\))
Player: \(\textrm{TO-MOVE}(s)\) (defines which player has the move in state \(s\))
Actions: \(\textrm{ACTIONS}(s)\) (the set of legal moves in state \(s\))
Transition model: \(\textrm{RESULTS}(s,a)\) (defines the state resulting from action \(a\) in state \(s\))
Terminal test: \(\textrm{IS-TERMINAL}(s)\) (is true when the game is over⁹)
Utility function: \(\textrm{UTILITY}(s,p)\) (defines the final numeric value to player \(p\) when the game ends in terminal state \(s\))

The initial state, \(\textrm{ACTIONS}(s)\), and \(\textrm{RESULT}(s,a)\) define the state space graph, where the vertices are states, the edges are moves and the state might be reached by multiple paths.

Optimal decisions

Players (here MIN and MAX, alternate turns) need to have a conditional plan—a contingent strategy specifying a response to each move of the opponent.

For games with binary outcomes (win or lose), AND-OR¹⁰ search tree can be used
(see Russel & Norvig, 2022, p. 143)
For games with multiple outcome scores, the minimax search algorithm is used

Minimax search

Minimax value

Given a state-space search tree, each node’s minimax value is calculated.

The minimax value is the best achievable utility of being in a given sate (against a rational adversary).

MAX prefers to move to a state of maximum value
MIN prefers to move to a state of minimum value

Adversarial game tree

Figure 4: Example: A adversarial game tree

The △ nodes are “MAX nodes”, in which it is MAX’s turn to move; the ▽ nodes are “MIN nodes”. MAX’s best move at the root is α₁ (the highest minimax value), MIN’s best move is β₁ (the lowest minimax value).

Minimax search algorithm

The minimax algorithm performs a complete depth-first exploration of the game tree (Russel & Norvig, 2022, pp. 196–196).

Assumes that the adversary plays optimal
Returns action whose terminal state has the optimal \(\textrm{MINIMAX}\) value
- If the state is a terminal state (\(\textrm{IS-TERMINAL}(s) = true\)):
  return the state’s utility (\(\textrm{UTILITY}(s,p)\))
- If the next agent is \(\textrm{MAX}\) (\(\textrm{TO-MOVE}(s) = MAX\)): return \(\textrm{MAX-VALUE}(s)\)

The exponential complexity makes the miminmax algorithm impractical for complex games (even with alpha-beta pruning applied; chess game tree size > atoms in the universe).

Pruning to reduce computing complexity (Russel & Norvig, 2022, pp. 198–199).

Pruning stops the search at the moment when it is determined that the value of a subtree is worse than the best solution already identified.

The general principle is as follows: consider a node n somewhere in the tree, such that Player has a choice of moving to n. If Player has a better choice either at the same level or at any point higher up in the tree, then Player will never move to n. So enough about n is found out (by examining some of its descendants) to reach this conclusion, it can be pruned (Russel & Norvig, 2022, p. 198).

Alpha-beta pruning gets its name from the two extra parameters in MAX-VALUE(state,α,β) that describe the bounds on the backed-up values that appear anywhere along the path:

α = the value of the best choice for MAX found so far (“at least”)
β = the value of the bast choice for MIN found so far (“at most”)

Alpha-beta search updates the values of α and β as it goes along and prunes the remaining branches at a node as soon as the value of the current node is known to be worse than the current α and β for MAX or MIN, respectively.

Monte Carlo tree search

Monte Carlo tree search (MCTS)¹¹ does estimate the value of a state as the average utility¹² over a number of simulations of complete games starting from the current state (Russel & Norvig, 2022, pp. 207–209).

Each iteration follows four steps:

Selection (choosing a move¹³, leading to a successor node, and repeat that process, moving down the tree to a leaf)
Expansion (one to several new children are created for the selected node)
Simulation (a simulation for the newly generated child node is performed)
Back-propagation (the result of the simulation is used to update all the search tree nodes going up to the root)

Repeats for a set number of iterations or until a given time is over. At the end the successor with the hightest utily is chosen.

Selection

Figure 5 shows a tree with the root representing a state where P-A has won 37/100 playouts done

P-A has just moved to the root node
P-B selects a move to a node where it has won 60/79 playouts; this is the best win percentage among the available moves
P-A will select a move to a node where it has won 16/53 playouts (assuming it plays optimally)
P-B then continues on the leaf node marked 27/35
… until a terminal state is reached

It would also have been reasonable to select the 2/11 node for the sake of exploration—with only 11 playouts, the node still has high uncertainty in its valuation, and might end up being the best option if more information about it is gained. So it makes sense to use a selection policy that balances exploitation and exploration.

Selection policy example

Upper confidence bounds applied to trees (UCT) is a very effective selection policy ranking possible moves based on an upper confidence bound formula (UCB1)

\[ \textrm{UCB1}(n) = \frac{\textrm{U}(n)}{\textrm{N}(n)} + C * \sqrt{\frac{\log{\textrm{N}(\textrm{PARENT}(n))}}{\textrm{N}(n)}} \]

\(\textrm{U}(n)\) is the total utility of all playouts that went through node \(n\)
\(\textrm{N}(n)\) is the number of playouts through node \(n\)
\(\textrm{PARENT}(n)\) is the parent node of \(n\) in the tree
\(\frac{\textrm{U}(n)}{\textrm{N}(n)}\) is the average utility of \(n\) (exploitation term, “how good are the stats?”)
\(\frac{\log{\textrm{N}(\textrm{PARENT}(n))}}{\textrm{N}(n)}\) is higher for \(n\) only explored a few times
(exploration term, “how much has the child be ‘ignored’?”)
\(C\) is a constant that balance exploitation and exploration (theoretically \(\sqrt{2}\))

With \(C=1.4\), the 60/79 node in Figure 5 has the highest UCB1 score, but with \(C=1.5\), it would be the 2/11 node.

Utiliy of MCTS

The conventional wisdom has been that Monte Carlo search has an advantage over heuristic alpha-beta tree search (not discussed here, see (Russel & Norvig, 2022, p. 202ff)) for games where the branching factor is very high (and thus alpha-beta can’t search deep enough), or when it is difficult to define a good evaluation function ¹⁴.

Expansion and simulation

Figure 6 shows a tree where a new child of the selected node is generated and marked with 0/0 (expansion).

A playout for the newly generated child node is performed (simulation).

Moves for both players according the playout policy¹⁵ are chosen
The moves are not recorded in the search tree
In Figure 6, the simulation results in a win for P-B

Back-propagation

The result of the simulation is used to update all the search tree nodes going up to the root.

P-B's nodes are incremented in both the number of wins and the number of playouts
P-A's nodes are incremented in the number of playouts only

Wrap-up

Search

Search operates over models of the world (which might be observed online)

Agents do not try all possible plans
Planning is all “in simulation”
Search is only as good as the models are

Adversarial search

In two-player, discrete, deterministic, turn-taking zero-sum games with perfect information, the minimax algorithm can select optimal moves by a depth-first search in the game tree
Efficiency can be improved by using the alpha-beta search algorithm, which eliminates subtrees that are shown to be irrelevant.
Monte Carlo tree search evaluates states by playing out the game all the way to the end to see who won. This playout simulation is repeated multiple times. The evaluation is an average of the results.

xkcd

✏️ Exercises

Definition sequence

Problem formulation must follow goal formulation.

Discuss that statement. Is it true or false? Why?

Problem formulation

Give a complete problem formulation for each of the following problems. Choose a formulation that is precise enough to be implemented.

There are six glass boxes in a row, each with a lock. Each of the first five boxes holds a key unlocking the next box in line; the last box holds a banana. You have the key to the first box, and you want the banana.
There is an n x n grid of squares, each square initially being either unpainted floor or a bottomless pit. You start standing on an unpainted floor square, and can either paint the square under you or move into an adjacent unpainted floor square. You want the whole floor painted.

Maze

Your goal is to navigate a robot out of a maze. It starts in the center of the maze facing north. You can turn the robot to face north, east, south, or west; direct the robot to move forward a certain distance (it will stop before a wall).

Formally define this problem as a search problem. How large is the state space?
In navigating a maze, the only place we need to turn is at the intersection of two or more corridors. Reformulate this problem using this observation. How large is the state space now?
From each point in the maze, we can move in any of the four directions until we reach a turning point, and this is the only action we need to do. Reformulate the problem using these actions. Do we need to keep track of the robot’s orientation now?
In our initial description of the problem we already abstracted from the real world. Name three such simplifications we made.

Solution notes
Visualization

Open only if you need help.

Figure 9: Solution note for question 2/3

Concepts

Explain in your own words the following terms:

Zero-sum
Terminal test
Minimax value
Selection policy
Playout policy
Monte Carlo tree
Back-propagation

MINIMAX

Explain if the MINIMAX algorithm is complete and optimal.

Can it be beaten by an opponent playing suboptimally? Why (not)?

Come up with a game tree in which MAX will beat a suboptimal MIN.

Open only if you need help.

In two-player, discrete, deterministic, turn-taking zero-sum games with perfect information, the MINIMAX algorithm can select optimal moves by a depth-first emuration, the algorithm is also guaranteed to find a solution when there is one.

The algorithm performs a complete depth-first exploration of the game tree. If the maximum depth of the tree is \(m\) and there are \(b\) legal moves at each point, then the time complexity is \(O(b^m)\) for an algorithm that generates all actions at once, or \(O(m)\) for an algorithm that generates actions on at a time. The exponential complexity makes MINIMAX impractical for complex games. MINIMAX does, however, serve as a basis for the mathematical analysis for games. By approximating the minimax analysis in various ways, we can derive more practical algorithms.

If MIN does not play optimally, then MAX will do at least as well as against an optimal player, possibly better.

Pruning

Read the note about pruning (and consult Russel & Norvig (2022) if necessary).

Explain in your own words, under what conditions a subtree is skipped using Alpha-beta pruning.

Draw an example (game search tree, 3 levels depth).

Literature

Russel, S., & Norvig, P. (2022). Artificial intelligence: A modern approach. Pearson Education.

Footnotes

An atomic representation is one in which each state is treated as a black box with not internal structure, meaning the state either does or does not match what you’re looking for.↩︎
A state is a situation that an agent can find itself in.↩︎
Expressed by a graph whose nodes are the set of all states, and whose links are actions that transform one state into another↩︎
Can be a single goal state, a small set of alternative goal states, or a property that applies to many states (e.g, no dirt in any location)↩︎
first-in-first-out queue first pops the node that was added to the queue first↩︎
last-in-first-out queue (also known as a stack) pops first the most recently added node↩︎
first pops the node with the minimum costs according to \(f(n)\)↩︎
Zero-sum means that what is good for one player is just as bad for the other: there is no “win-win” outcome↩︎
States where the game has ended are called terminal states↩︎
The search tree consists of AND nodes and OR nodes. AND nodes reflect the environment’s choice of action (which all need to be considers), OR nodes the agent’s own choices in each state (where only one needs to be considered). AND and OR nodes alternate. A solution is a conditional plan that considers every nondeterministic outcome and makes a plan for each one.↩︎
Reading recommendation ↩︎
guided by the selection policy↩︎
For games with binary outcomes, average utility equals win percentage↩︎
A heuristic evaluation function returns an estimate of the expected utility of state \(s\) to player \(p\).↩︎
Playout policies biases the moves toward good ones. For Go and other games, playout policies have been successfully learned from self-play by using neural networks. Sometimes also game-specific heuristics are used (e.g., take the corner square in Othello)↩︎