Lecture 2: Solving problems by searching
Prof. Gilles Louppe
- Planning agents
- Search problems
- Uninformed search methods
- Depth-first search
- Breadth-first search
- Uniform-cost search
- Informed search methods
Reflex agents
- select actions on the basis of the current percept;
- may have a model of the world current state;
- do not consider the future consequences of their actions;
- consider only how the world is now.
.caption[For example, a simple reflex agent based on condition-action rules could move
to a dot if there is one in its neighborhood.
No planning is involved to take this decision. ]
Can a reflex agent be rational?
Yes, provided the correct decision can be made on the basis of the current percept. That is, if the environment is fully observable, deterministic and known.
In the figure, the sequence of actions is clearly suboptimal.
- Single-agent, observable, deterministic and known environment.
Problem-solving agents
- take decisions based on (hypothesized) consequences of actions, by considering how the world could be;
- must have a model of how the world evolves in response to actions;
- formulate a goal, explicitly.
.caption[A planning agent looks for sequences of actions to eat all the dots.]
- Problem-solving agents are offline. The solution is executed "eyes closed", ignoring the percepts.
- Online problem solving involves acting without complete knowledge. In this case, the sequence of actions might be recomputed at each step.
A search problem consists of the following components:
- A representation of the states of the agent and its environment.
- The initial state of the agent.
- A description of the actions available to the agent given a state
$s$ , denoted$\text{actions}(s)$ . - A transition model that returns the state
$s' = \text{result}(s, a)$ that results from doing action$a$ in state$s$ .- We say that
$s'$ is a successor of$s$ if there is an acceptable action from$s$ to$s'$ .
- We say that
- Together, the initial state, the actions and the transition model define the state space of the problem, i.e. the set of all states reachable from the initial state by any sequence of action.
- The state space forms a directed graph:
- nodes = states
- links = actions
- A path is a sequence of states connected by actions.
- The state space forms a directed graph:
- A goal test which determines whether the solution of the problem is achieved in state
$s$ . - A path cost that assigns a numeric value to each path.
- In this course, we will also assume that the path cost corresponds to a sum of positive step costs
$c(s,a,s')$ associated to the action$a$ in$s$ leading to$s'$ .
- In this course, we will also assume that the path cost corresponds to a sum of positive step costs
A solution to a problem is an action sequence that leads from the initial state to a goal state.
- A solution quality is measured by the path cost function.
- An optimal solution has the lowest path cost among all solutions.
.exercise[What if the environment is partially observable? non-deterministic?]
.caption[How to go from Arad to Bucharest?]
- Representation of states: the city we are in.
$s \in \{ \text{in}(\text{Arad}), \text{in}(\text{Bucharest}), \ldots \}$
- Initial state = the city we start in.
$s_0 = \text{in}(\text{Arad})$
- Actions = Going from the current city to the cities that are directly connected to it.
$\text{actions}(s_0) = \{ \text{go}(\text{Sibiu}), \text{go}(\text{Timisoara}), \text{go}(\text{Zerind}) \}$
- Transition model = The city we arrive in after driving to it.
$\text{result}(\text{in}(\text{Arad}), \text{go}(\text{Zerind})) = \text{in}(\text{Zerind})$
- Goal test: whether we are in Bucharest.
$s \in \{ \text{in}(\text{Bucharest}) \}$
- Step cost: distances between cities.
The real world is absurdly complex.
- The world state includes every last detail of the environment.
- A search state keeps only the details needed for planning.
.center[Search problems are models.]
Search problems are models, i.e. abstract mathematical abstractions. These models omit details that not relevant for solving the problem.
The process of removing details from a representation is called abstraction.
- States:
$\{ (x, y), \text{dot booleans}\}$ - Actions: NSEW
- Transition: update location and possibly a dot boolean
- Goal test: dots all false
World state:
- Agent positions: 120
- Found count: 30
- Ghost positions: 12
- Agent facing: NSEW
- How many?
The set of acceptable sequences starting at the initial state form a search tree.
- Nodes correspond to states in the state space, where the initial state is the root node.
- Branches correspond to applicable actions, with child nodes corresponding to successors.
For most problems, we can never actually build the whole tree. Yet we want to find some optimal branch!
- Fringe (or frontier) of partial plans under consideration
- Expansion
- Exploration
.exercise[Which fringe nodes to explore? How to expand as few nodes as possible, while achieving the goal?]
Uninformed search strategies use only the information available in the problem definition. They do not know whether a state looks more promising than some other.
- Depth-first search
- Breadth-first search
- Uniform-cost search
- Iterative deepening
- A strategy is defined by picking the order of expansion.
- Strategies are evaluated along the following dimensions:
- Completeness: does it always find a solution if one exists?
- Optimality: does it always find the least-cost solution?
- Time complexity: how long does it take to find a solution?
- Space complexity: how much memory is needed to perform the search?
- Time and complexity are measured in terms of
$b$ : maximum branching factor of the search tree -
$d$ : depth of the least-cost solution- the depth of
$s$ is defined as the number of actions from the initial state to$s$ .
- the depth of
$m$ : maximum length of any path in the state space (may be$\infty$ )
Number of nodes in a tree =
- Strategy: expand the deepest node in the fringe.
- Implementation: fringe is a LIFO stack.
class: middle
- No, DFS finds the leftmost solution, regardless of depth or cost.
Time complexity:
- May generate the whole tree (or a good part of it, regardless of
$d$ ). Therefore$O(b^m)$ , which might be much greater than the size of the state space!
- May generate the whole tree (or a good part of it, regardless of
Space complexity:
- Only store siblings on path to root, therefore
$O(bm)$ . - When all the descendants of a node have been visited, the node can be removed from memory.
- Only store siblings on path to root, therefore
- Strategy: expand the shallowest node in the fringe.
- Implementation: fringe is a FIFO queue.
class: middle
- If the shallowest goal node is at some finite depth
$d$ , BFS will eventually find it after generating all shallower nodes (provided$b$ is finite).
- If the shallowest goal node is at some finite depth
- The shallowest goal is not necessarily the optimal one.
- BFS is optimal only if the path cost is a non-decreasing function of the depth of the node.
Time complexity:
- If the solution is at depth
$d$ , then the total number of nodes generated before finding this node is$b+b^2+b^3+...+b^d = O(b^d)$
- If the solution is at depth
Space complexity:
- The number of nodes to maintain in memory is the size of the fringe, which will be the largest at the last tier. That is
- The number of nodes to maintain in memory is the size of the fringe, which will be the largest at the last tier. That is
Idea: get DFS's space advantages with BFS's time/shallow solution advantages.
- Run DFS with depth limit 1.
- If no solution, run DFS with depth limit 2.
- If no solution, run DFS with depth limit 3.
- ...
- Strategy: expand the cheapest node in the fringe.
Implementation: fringe is a priority queue, using the cumulative cost
$g(n)$ from the initial state to node$n$ as priority.
class: middle
- Yes, if step cost are all such that
$c(s,a,s') \geq \epsilon > 0$ . (Why?)
- Yes, if step cost are all such that
- Yes, sinces UCS expands nodes in order of their optimal path cost.
Time complexity:
- Assume
$C^*$ is the cost of the optimal solution and that step costs are all$\geq \epsilon$ . - The "effective depth" is then roughly
$C^*/\epsilon$ . - The worst-case time complexity is
$O(b^{C^*/\epsilon})$ .
- Assume
Space complexity:
- The number of nodes to maintain is the size of the fringe, so as many as in the last tier
$O(b^{C^*/\epsilon})$ .
- The number of nodes to maintain is the size of the fringe, so as many as in the last tier
One of the issues of UCS is that it explores the state space in every direction, without exploiting information about the (plausible) location of the goal node.
Informed search strategies aim to solve this problem by expanding nodes in the fringe in decreasing order of desirability.
- Greedy search
- A*
A heuristic (or evaluation) function
- a function that estimates the cost of the cheapest path from node
$n$ to a goal state;-
$h(n) \geq 0$ for all nodes$n$ -
$h(n) = 0$ for a goal state.
- is designed for a particular search problem.
Strategy: expand the node
$n$ in the fringe for which$h(n)$ is the lowest. -
Implementation: fringe is a priority queue, using
$h(n)$ as priority.
.center[At best, greedy search takes you straight to the goal.
At worst, it is like a badly-guided BFS.]
- No, unless we prevent cycles (more on this later).
- No, e.g. the path via Sibiu and Fagaras is 32km longer than the path through Rimnicu Vilcea and Pitesti.
Time complexity:
$O(b^m)$ , unless we have a good heuristic function.
Space complexity:
$O(b^m)$ , unless we have a good heuristic function.
- A* was first proposed in 1968 to improve robot planning.
- Goal was to navigate through a room with obstacles.
- Uniform-cost orders by path cost, or backward cost
$g(n)$ - Greedy orders by goal proximity, or forward cost
$h(n)$ -
A* combines the two algorithms and orders by the sum
$$f(n) = g(n) + h(n)$$ -
$f(n)$ is the estimated cost of cheapest solution through$n$ .
class: middle
A heuristic
.caption[The Manhattan distance is admissible]
$A$ is an optimal goal node -
$B$ is a suboptimal goal node -
$h$ is admissible
$f(n) \leq f(A)$ -
$f(n) = g(n) + h(n)$ (by definition) -
$f(n) \leq g(A)$ (admissibility of$h$ ) -
$f(A) = g(A) + h(A) = g(A)$ ($h=0$ at a goal)
$f(A) < f(B)$ -
$g(A) < g(B)$ ($B$ is suboptimal) -
$f(A) < f(B)$ ($h=0$ at a goal)
- Therefore,
$n$ expands before$B$ .
- Assume
$f$ -costs are non-decreasing along any path. - We can define contour levels
$t$ in the state space, that include all nodes$n$ for which$f(n) \leq t$ .
Greedy search
A* finds the shortest path.
Most of the work in solving hard search problems optimally is in finding admissible heuristics.
Admissible heuristics can be derived from the exact solutions to relaxed problems, where new actions are available.
- If
$h_1$ and$h_2$ are both admissible and if$h_2(n) \geq h_1(n)$ for all$n$ , then$h_2$ dominates$h_1$ and is better for search. - Given any admissible heuristics
$h_a$ and$h_b$ ,$$h(n) = \max(h_a(n), h_b(n))$$ is also admissible and dominates$h_a$ and$h_b$ .
- Assuming an episodic environment, an agent can learn good heuristics by playing the game many times.
- Each optimal solution
$s^*$ provides training examples from which$h(n)$ can be learned. - Each example consists of a state
$n$ from the solution path and the actual cost$g(s^*)$ of the solution from that point. - The mapping
$n \to g(s^*)$ can be learned with supervised learning algorithms.- Linear models, Neural networks, etc.
The failure to detect repeated states can turn a linear problem into an exponential one. It can also lead to non-terminating searches.
Redundant paths and cycles can be avoided by keeping track of the states that have been explored. This amounts to grow a tree directly on the state-space graph.
- Completeness is fine.
- Optimality is tricky. Maybe we found the wrong one!
- We start at
$S$ and$G$ is a goal state. - Which path does graph search find?
Consequences of consistent heuristics:
$f(n)$ is non-decreasing along any path. -
$h(n)$ is admissible. - With a consistent heuristic, graph-search A* is optimal.
- Task environment?
- performance measure, environment, actuators, sensors?
- Type of environment?
- Search problem?
- initial state, actions, transition model, goal test, path cost?
- Good heuristic?
<iframe width="600" height="400" src="" frameborder="0" allowfullscreen></iframe>A* in action
- Problem formulation usually requires abstracting away real-world details to define a state space that can feasibly be explored.
- Variety of uninformed search strategies (DFS, BFS, UCS, Iterative deepening).
- Heuristic functions estimate costs of shortest paths. Good heuristic can dramatically reduce search cost.
- Greedy best-first search expands lowest
$h$ , which shows to be incomplete and not always optimal. -
A* search expands lowest
$f=g+h$ . This strategy is complete and optimal. - Graph search can be exponentially more efficient than tree search.
The end.