chapter4

Graph concepts

Graphs

  • \(G = (V,E)\)
  • V = verticies, nodes
  • \(E \subseteq V \times V\) = unordered pairs of verticies = edges
  • an edge from a node to itself is a loop
  • a loop-less graph is a simple graph
  • edge is incident with its nodes
  • digraph uses ordered pairs of vertices for edges

Subgraphs

  • \(V_H \subseteq V \land E_H \subseteq E\)
  • A subgraph has vertices and edges that are in a graph

Degree

  • degree of node is the number of edges incident with it
  • Probability mass function = probability that node has degree k = \(f(k) = P(X=k) = \frac{N_k}{n} \)
  • \(N_k\) is the number of nodes with degree k, n is total number of nodes

Path and Distance

  • walk is an ordered sequence of vertices, an edge between every pair
  • Number of edges is the distance (number of hops)
  • trail: distinct edges
  • path: distinct vertices
  • closed path with length > 3 is a cycle
  • Minimum length path is the "shortest path"
  • minimum distance is called the distance \(d(x,y)\)
  • if no path exists between two vertices, distance is \(d(x,y) = \infty\)

Connectedness

  • nodes are connected if path exists between them
  • graphs are connects if path exists between all pairs of vertices
  • component is the maximal (biggest) connected subgraph. If graph has one component it is connected, other it is disconnected
  • Directed graph is strongly connected if ordered path exists between all vertices

Adjacency Matrix

  • graph can be represented as an n x n symmetric binary adjacency matrix
  • \(A(i,j) = \begin{cases}1 \text{ if $v_i$ is adjacent to $v_j$} \\ 0 \text{ otherwise }\end{cases}\)
  • Directed graph is not symmetric
  • weighted graph = \(A(i,j) = \begin{cases}w_{ij} \text{ if $v_i$ is adjacent to $v_j$} \\ 0 \text{ otherwise }\end{cases}\)
  • threshold can converted weighted graph to binary graph

Graphs from Data Matrix

  • Graph is created from data where edges have weight \(w_{ij} = sim(x_i,x_j)\)
  • \(sim(x_i,x_j) = \) similarity between \(x_i\) and \(x_j\)
  • \(sim(x_i,x_j = \exp \left\{-\frac{\| x_i-x_j\|}{2\sigma^2}\right\}\)
  • \(\sigma\) is the spread parameter

Topological Attributes

Degree

  • \(d_i = \sum\limits_jA(i,j)\)
  • average degree: \(\mu_d = \frac{\sum_jd_i}{n}\)
  • Same for indegree and outdegree

Average Path Length

  • \(\mu_L = \frac{\sum_{i}\sum_{j>i}d(v_i, v_j)}{\begin{pmatrix}n \\2 \end{pmatrix}} = \frac{2}{n(n-1)}\sum\limits_i\sum\limits_{j>i}d(v_i, v_j)\)

Eccentricity

  • Maximum distance between two nodes
  • \(e(v_i) = \max\limits_j\{d(v_i, v_j)\}\)

Radius and Diameter

  • radius is the minimum eccentricity of a graph
  • \(r(G) = \min\limits_i\{e(v_i)\} = \min\limits_i\left\{\max\limits_j\{d(v_i, v_j)\}\right\}\)
  • Diameter is the maximum eccentricity in a graph
  • \(d(G) = \max\limits_i\{e(v_i)\} = \max\limits_{i,j}\{d(v_i, v_j)\}\)

Clustering Coefficient

  • measure of the density of edges neighboring a vertex
  • \(C(v_i) = \frac{\text{no. of edges in $G_i$}}{\text{maximum number of edges in $G_i$}} = \frac{m_i}{\left(\begin{smallmatrix}n_i \\ 2 \end{smallmatrix}\right)} = \frac{2 \cdot m_i}{n_i(n_i-1)}\)
  • The clustering coefficient of a graph is the average clustering coefficient over all of the nodes

Efficiency

  • defined as \(\frac{1}{d(v_i,v_j)}\)
  • if not connected, \(d(v_i, v_j) = \infty\) and the efficiency is 0
  • Efficiency of a graph is the average efficiency over all pairs of nodes.

Centrality Analysis

Basic Centralities

Degree Centrality

  • higher the degree, more important or central the vertex is

Eccentricity Centrality

  • Less eccentric, the more central the vertex is
  • \(c(v_i) = \frac{1}{e(v_i)\)

Closeness Centrality

  • sum of all the distances from a node, higher the distance, less central
  • \(c(v_i) = \frac{1}{\sum_jd(v_i, v_j)\)
  • node with the smallest total distance (most central) is the median node

Betweenness Centrality

  • how many shortest paths between all pairs of nodes include the vertex
  • \(\gamma_{jk}(v_i) = \frac{n_{jk}(v_i)}{n_{jk}\)
  • betweeeness centrality = \(c(v_i) = \displaystyle\sum\limits_{j\ne i} \displaystyle\sum\limits_{\begin{smallmatrix}k \ne i \\ k > j\end{smallmatrix}} \gamma_{jk}\)

Web Centralities

Prestige

  • eigenvector centrality
  • more links to node, higher the centrality
  • also depends on the prestige of linked nodes

PageRank

  • webpages(Nodes) connected by hyperlinks(edges)
  • Rank is assigned based on probability of landing on a page

Normalized Prestige

Hub and Authority Scores

Random Jumps

Graph Models

Small World Property

Scale-free Property

Clustering Effect

Average Degree

Degree Distribution

Clustering Coefficient

Diameter

Watts-Strogatz Small-world Graph Model

Clustering COefficient and Diameter of Regular Graph

Random Perturbation of Regular Graph edge rewiring

edge shortcuts

Properties of Watts-Strogatz Graphs Degree Distribution

Clustering Coefficient

Diameter

Barabasi-Albert Scale-free Model

Initilization

Growth and Peferential Attachment

Degree Distribution

Discrete Approach

Continous Approach

Clustering Coefficient and Diameter