Distances

Depth-first search readily identifies all the vertices of a graph that can be reached from a designated starting point, and finds explicit paths to vertices summarized in a search tree. However, these paths may not be the most economical ones, and here we discuss how to find the shortest paths in graphs.

The distance between two nodes is the length of the shortest path between them. Path lengths allow us to discuss quantitatively how separated different vertices are.

To get a concrete feel for this notion, consider a physical manifestation of a graph with balls and strings. If you lift the ball for vertex $s$ high enough, the other balls that get pulled up with it are precisely reachable from $s$. To find their distances from $s$, measure how far below $s$ they hang.

The shortest paths are the paths that are taut when one vertex is held up. On the other hand, edges that play no role in any shortest paths involving the vertex being held up remain slack.

Breadth-first search

In the figure above, the lifting of $s$ partitions the graphs into layers, the nodes at a distance 0, 1, and 2 from $s$. A convenient way to compute distances from $s$ to other vertices is to proceed layer by layer. Once we have picked out nodes at a distance $0,1,2,\cdots,d$, the ones at $d+1$ are easily determined.

Breadth-first search (BFS) implements an iterative algorithm in which two layers are active at any given time: some layer $d$, which has been fully identified, and $d+1$, which is being discovered by scanning neighbors of layer $d$.

algorithm BFS(G, s):
	// Input: Graph G = (V,E), directed or undirected; vertex s in V
	// Output: For all vertices u reachable from s, dist(u) is set as the distance from s to u
	
	for all u in V:
		dist(u) = infinity

	dist(s) = 0
	Q = [s] (queue containing just S)
	while Q is not empty:
		u = eject(Q)
		for all edges (u, v) in E:
			if dist(v) = infinity:
				inject(Q, v)
				dist(v) = dist(u) + 1

<aside> 🕟 The overall running time of this algorithm is $O(|V|+|E|)$, for exactly the same reason as DFS. There are $2|V|$ queue operations, and over the course of execution, the loop looks at each edge either once or twice, taking $O(|E|)$ time.

</aside>

Comparing BFS and DFS

As we saw in the previous unit, DFS makes deep incursions into a graph, retreating only when it runs out of new nodes. This strategy gives it the wonderful properties we discussed, but it also means it can take a long and convoluted route to a nearby vertex.

BFS makes sure to visit vertices in increasing order of their distance from the starting point. This is a broader, shallower search, and achieved with largely the same code as DFS, except with a queue replacing a stack.

Notice one stylistic difference from DFS — BFS does not restart searches in other connected components. Nodes not reachable from $s$ are ignored.

Dijkstra's algorithm

BFS treats all edges as having the same length, but this is rarely the case in applications where shortest paths are to be found. Consider a scenario where you want to find the shortest public transit route home. Picking the right route is a shortest-path problem where the length of each edge is important. From now on, we will deal with scenarios where edge lengths are not the same across every edge, and we will annotate every edge $e\in E$ with a length $l_e$. If $e=(u,v)$< we will also write $l(u,v)$ or $l_{uv}$.

These $l_e$'s do not have to correspond to physical lengths — they can denote various properties like time or money.