Computer Club - Group C: Graphs Basics - Prerequisite for Class 2

Topics:-

Trees
Graphs
Adjacency matrix, Adjacency list
DFS(via stack) - simple, modified( coloring, time-in time-out)
BFS (via queue) - simple

Trees

We will start with trees. Basically a tree is a graph without loops. A more formal definition of a tree is that it is a connected acyclic graph. This simply means that there are no cycles in the graph and every node is connected to at least one other node in the graph.

For example, if you wanted to start a family tree (a hierarchical organization of children to parents, starting from one child) you would not be able to store more than one parent per child. If we want to do it via nodes, our node structure will look something like this :

struct node

     struct node* mother, father;
     char* name;                 
}                            
struct node* originalChild;

Here we can see that every node has a mother and father. And since node is a recursive structure definition, every mother has mother and father, and every father has a mother and father, and so on.
Now just define the node structure is not enough. We need to take care about certain conditions. For example, it might be possible to form a loop. And a tree clearly cannot have a loop. If there's a loop it would mean a father of a child is also the son of that child!

Our example above is known as a binary tree, since it only has two node references. There could be a situation easily where the tree requires more than two node references, for example in an organizational hierarchy, you can have a manager who manages many people then the CEO manages many managers.

Graphs

A tree only allows a node to have children, and there cannot be any loops in the tree, with a more general graph we can represent many different situations. A very common example used is flight paths between cities. If there is a flight between city A and city B there is an edge between the cities. The cost of the edge can be the length of time that it takes for the flight, or perhaps the amount of fuel used.
The way that we will represent this is to have a concept of a node (or vertex) that contains links to other nodes, and the data associated with that node. So for our flight path example we might have the name of the airport as the node data, and for every flight leaving that city we have an element in neighbors that points to the destination.

struct node

                  vector<struct node*> neighbors;

int data;               
}

    list nodes;

cost(X, Y) := if (X.neighbors contains Y) return X.neighbors[Y];
           else "Not possible"

This is a very general way to represent a graph. It allows us to have multiple edges from one node to another and it is a very compact representation of a graph as well. However the downside is that it is usually more difficult to work with than other representations (such as the array method discussed below).

Representing a graph and key concepts

Graphs can represent many different types of systems, from a two-dimensional grid to a map of the internet that shows how long it takes data to move from computer A to computer B.

A graph consists of two components, nodes and edges.

A node (or vertex) is a discrete position in the graph.

An edge (or connection) is a link between two vertices that can be either directed or undirected and may have a cost associated with it.

An undirected edge means that there is no restriction on the direction you can travel along the edge.
So for example, if there were an undirected edge from A to B you could move from A to B or from B to A.
A directed edge only allows travel in one direction.
So if there were a directed edge from A to B you could travel from A to B, but not from B to A.

A graph G = {V, E} is defined as a set of vertices, V, and a collection of edges, E. An edge can then be defined as (u, v) where u and v are elements of V.

There are a few technical terms related to graphs:

Order – The number of vertices in a graph
Size – The number of edges in a graph
Degree of a vertex -- v is the number if edges which contain v i.e., the number of edges that are incident on v.
Multiple Edges -- edges that connect the same pair of distinct vertices.
Self-loop -- an edge which connects a vertex to itself.
Simple Graph -- A graph without multiple edges and without loops is called a . Mostly we will deal with simple graphs only
Walk -- list of vertices and edges ( v₀ e₁ v₁ e₂ . . . . e_k v_k ), such that for 1<=i<=k, the edge e_i connects v_i-1 and v_i .
Trail -- a walk with no repeated edge.
Simple Path -- a walk with no repeated vertices (hence no repeated edges).
Cycle -- a path in which no vertex is repeated, except the first and the last vertex, which are same.

Array representation( Adjacency Matrix)

This approach is used when the graphs are relatively small, with a small number of nodes and edges. When this is the case, we can use a 2-D array of integers, where the element array[i][j] represents the edge cost from node i to j.
If the connection from i to j is not possible, we use some sort of sentinel value (usually a very large or small value, like -1 or the maximum integer).

For example, the following connection matrix:

    
    A  B  C
A   0  1  5
B  -1  0  1
C  -1 -1  0

Would mean that node A has a 0 weight connection to itself, a 1 weight connection to node B and 5 weight connection to node C. Node B on the other hand has no connection to node A, a 0 weight connection to itself, and a 1 weight connection to C. Node C is connected to nobody. This graph would look like this if you were to draw it:

This representation is very convenient for graphs that do not have multiple edges between each node, and allows us to simplify working with the graph.

Adjacency List :-

Try to implement it using vector<vector<int> > . Try questions on sites to see if working properly. This implementation is very important.

DFS

The basic concept is to visit a node, then push all of the nodes to be visited onto the stack. To find the next node to visit we simply pop a node of the stack, and then push all the nodes connected to that one onto the stack as well and we continue doing this until all nodes are visited.
It is a key property of the Depth First search that we not visit the same node more than once, otherwise it is quite possible that we will recurse infinitely. We do this by marking the node as we visit it.
Depth-first: visit all neighbors of a neighbor before visiting your other neighbors

First visit all nodes reachable from node s (ie visit neighbors of s and their neighbors)
Then visit all (unvisited) nodes that are neighbors of s
Repeat until all nodes are visited

Iterative DFS : -

           dfs(node start)

           {
               stack<node> s;
               s.push(start);
               while (s.empty() == false)

               {
                   top = s.top();
                   s.pop();

                   if (top is not marked as visited)

                   {
                       check for termination condition (have we reached the node we want to?)
                       mark top as visited;
                       add all of top's neighbors to the stack.
                   }
               }
           }

Recursive DFS : -

           dfs(node current)

           {
               mark current as visited;
               visit all of current's unvisited neighbors by calling dfs(neighbor)
           }

Modifications to give to give more functionalities:-

Mark nodes as to their status

White: not yet seen (initial color)
Gray: seen, but not finished

Node u is finished if we have visited all nodes reachable from u

Black: finished

Maintain start and finish time stamps at each node

These stamps give the time (ie step in the algorithm) when each node is first seen and when it is finished

    time := 0;   -- Global!
    DFS(G)
        for each vertex u in G loop
            u.color := white
        end loop

        for each vertex u in G loop
            if u.color = white then
                DFS-Visit(G, u)
            end if
        end loop
    end DFS

    DFS-Visit(G, u)
        u.d := ++time
        u.color := gray
        -- process vertex u, as needed
        for each vertex v adjacent to u loop
            -- process edge (u, v), as needed
            if v.color = white
                DFS-Visit(G, v)
            end if
        end loop
        u.color := black
        u.f := ++time
    end DFS-Visit

BFS

The Breadth First search is an extremely useful searching technique. It differs from the depth-first search in that it uses a queue to perform the search, so the order in which the nodes are visited is quite different.
If all of the edges in a graph are unweighted (or the same weight) then the first time a node is visited is the shortest path to that node from the source node. This comes about naturally from the FIFO property of the queue.
We do not want to visit the same node twice, or we will lose the property that when a node is visited it is the quickest path to that node from the source. Notice that we mark a vertex visited as we push it into the queue, not as we pop it.
Breadth-first Algorithm:

Depth-first: visit all neighbors before visiting neighbors of your neighbors
Start from node s.
Visit all neighbors of node s.
Then visit all of their neighbors, if not already visited
Continue until all nodes visited

Intuition of Implementation

Keep a queue of nodes to visit
When visiting node u, put neighbors of u on the queue, if neighbor not yet visited
Queue contains all nodes that have been seen, but not yet visited

The basic structure of a breadth first search will look this:

     void bfs(node start)

     {
         queue<node> s;
         s.push(start);
         mark start as visited
         while (s.empty() == false)

         {
             top = s.front();
             s.pop();
             check for termination condition (have we reached the node we want to?)
             add all of top's unvisited neighbors to the queue
             mark all of top's unvisited neighbors as visited
         }
      }

Common graph algoriths uses a breadth-first approach

Search all nodes for a node containing a given value
Find length of shortest path from node s to all other nodes
Find shortest path from node s to all other nodes

Breadth First Search - Code
Problem: find length of shortest path from s to each node

Let u.d represent length of shortest path from nodes to node u
Remember: length is number of edges from s to u

BFS(V, E, s)
    -- Initialize all nodes as unvisited
    for each node u loop
        u.d := -1 -- setting all vertices unreachable(distance = -1)
    end loop

    -- Mark first node as seen 
    -- Put first node on queue of nodes to visit
    s.d := 0  -- the distance of s from itself is 0
    enqueue(Q, s)

    -- While there are more nodes to visit
    while not isEmpty(Q) loop
        u := dequeue(Q) -- dequeues and returns front

        -- Process vertex u, as needed
        for each vertex v adjacent to u loop
            -- Process edge (u, v), as needed
            if v.d = -1 then     -- Not yet visited
                v.d := u.d + 1   -- How many times per node?
                enqueue(Q, v)
            end if
        end loop

    end loop

For in-depth knowledge , you can refer to these :-

Link1 Link2 Link3

Practice :

Write you own implementation of graphs using Adjacency List.
Write Codes of BFS, DFS and calculate their complexity.
Try these questions to check your implementations - Questions
Read about Connected Component in a graph

Computer Club - Group C

Wednesday, August 12, 2015

Graphs Basics - Prerequisite for Class 2