Quantifying Network Structure: Basic Properties, Centralities, Communities IV124 Josef Spurný & Eva Výtvarová Faculty of Informatics, Masaryk University March 17, 2023 Tools Networkx Python library basic functions for manipulation and analysis basic visualization using matplotlib http://networkx.github.io/ Useful for: interactive work using ipython scripting batch processing reproducibility! igraph http://igraph.org/ as an alternative J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 2 / 30 Tools Gephi multiplatform graphic app (Java) robust visualization many measures and plugins time-varying and dynamic graphs Useful for: interactive exploratory analysis visualization Cytoscape http://www.cytoscape.org/, UCINET http://bit.ly/1zkNUk6, yEd http://bit.ly/1piw0j0 as alternatives J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 3 / 30 Tools Gephi – file formats GEXF file format Gexf . net A hello world ! f i l e J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 4 / 30 Basic properties Connected Components, Histogram Is a network fully connected? strongly connected components weekly connected components Histogram – an approximate representation of the distribution of numerical data J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 5 / 30 Basic properties Node Degree # edges connecting a node with others # non-zero elements in a row (a column) in an adjacency matrix in-degree/out-degree in oriented (directed) networks Interpretation? Average node degree For an undirected network k = 2|E| |V| every edge contributes to two nodes Degree distribution P(k) probability that a random node has a degree of k an average is not enough description parameter in real networks J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 6 / 30 Basic properties Paths and distance A path in a network a sequence of edges connecting two nodes path length in unweighted networks: # edges path length in weighted networks: depends on the semantics A distance of two nodes d a length of the shortest path there can be more than one shortest path d = ∞ if two nodes are unconnected A network diameter D the longest distance between any two nodes in a network J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 7 / 30 Basic properties Paths and distance II. Computing distance Unweighted network breadth-first search Weighted network Dijkstra algorithm Average path length d all-to-all Floyd-Warshall algorithm Interpretation efficiency of spreading e.g. information network integration J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 8 / 30 Basic properties Path length – examples J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 9 / 30 Node Importance Node Importance Types of questions we are interested in: which individuals are key for disease spreading? how to target attacks on a network? how to improve spreading information in an organization? which web pages are more valuable than others? which people have the highest influence on forming group opinion? ... J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 10 / 30 Node Importance Centrality as Node Importance The importance of a node depends on: its attributes location in the network A choice of a suitable measure depends on: research question semantics of particular network of interest J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 11 / 30 Degree Centrality Node Degree as Centrality Node with high degree: is highly connected to the rest of the network has a direct impact on a large number of other nodes (neighbours) In directed network: in-degree and out-degree substantial difference in interpretation! Node degree does not indicate the importance of its neighbours J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 12 / 30 Degree Centrality Node Degree: Example I. World Trade Network directed network degree refers to the number of business partners in-degree: import out-degree: export An evolution of nodes with the highest degree centrality reflects structural changes in world trade higher overall connectedness (lesser differences) changes in the composition of the most central group J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 13 / 30 Degree Centrality World Trade Network1 in-degree out-degree 1960 1. 0.6438 UK 1. 0.5987 USA 2. 0.5954 Netherlands 2. 0.5861 UK 3. 0.5866 France 3. 0.5740 France 2000 1. 0.8920 USA 1. 0.8636 USA 1. 0.8920 Germany 1. 0.8636 UK 3. 0.8808 UK 1. 0.8636 France 1 De Benedictis, L., & Tajoli, L. (2011) J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 14 / 30 Degree Centrality Node Degree: Example II.2 Protein network of Helicobacter pylori bacteria undirected network, edges represent known physical interaction (catalysis, signalization...) elimination of a specific protein has known effects Network Robustness relatively high tolerance to random mutations removal of high node degree proteins is fatal correlation between node degree and severity of consequences r = 0.75 2 Jeong, Hawoong, et al. (2001) J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 15 / 30 Path-based Centrality Shortest Path and Centrality Even nodes with a low degree may be important J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 16 / 30 Path-based Centrality Shortest Path and Centrality Even nodes with a low degree may be important J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 17 / 30 Path-based Centrality Closeness Centrality "To be in the center of action" inversely proportional to avg shortest path to other nodes advantageous position for spreading information in terms of influencing other nodes Definition Cc(i) = N j=1 d(i, j) −1 normalized C′ c(i) = Cc(i) N−1 J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 18 / 30 Path-based Centrality Betweenness Centrality J. Spurný, E. Výtvarová ·Observations ·March 17, 2023 19 / 30 Path-based Centrality Betweenness Centrality Represents brokerage nodes connecting clusters advantageous position for spreading information Definition Cb(i) = j