Traditional ML pipeline
- Design features for nodes/links/graphs
- Obtain features of training data
- Train an ML model
- Apply the model
Traditional Features
To classify nodes. Useful for predicting influential node in the network.
Node Degree
- the node’s degree
- treats all neighbouring nodes equally
- limitation: does not capture the importance of the neighbouring nodes
- improvement: Node Centrality
Node Centrality
- takes node importance into consideration
- some examples:
- eigenvector centrality: a node is important if it is surrounded by important nodes.
- betweenness centrality: a node is important if it lies on many shortest paths between other nodes.
- closeness centrality: a node is important if it has small shortest path lengths to all other nodes.
Clustering Coefficients
- counts the number of triangles that touch the node
- measures the connectivity of the node
- counts the number of graphlets that touch the node
- characterizes the topology of the neighbourhood of the node
To predict new links based on existing links.
Distance-based Feature
- measures the shortest-path distance between two nodes
- limitation: does not capture the degree of neighbourhood overlap
- improvement: Local Neighbourhood Overlap
Local Neighbourhood Overlap
- some examples:
- number of common neighbours
- Jaccard’s coefficient
- Adamic-Adar index
- limitation: metric is always zero if the number of common neighbours is zero
- improvement: Global Neighbourhood Overlap
Global Neighbourhood Overlap
- some examples:
- Katz index: count the number of paths of all lengths between two nodes (using matrix power)
To identify similar graphs.
- Graph Kernels
- Graphlet Kernel
- bag of graphlets
- limitation: computationally expensive
- Weisfeiler-Lehman Kernel
- uses color refinement (different from the one used for graph isomorphism) for k hops
- bag of colors
- Random Forest
- Neural Network
- off-the-shelf ML model