7. Trees, Graphs, Networks
vis.stanford.edu
linkanalysisnow.com
www.ibiblio.org
champagnewhisky.com
Relationships Between Data
• Hierarchical relationships
• Interconection,continuity
• Derivation (sequence)
• Shared classification
• Similarity of values
• Similarity of attributes
www.caida.org
Visualization of Hierarchical Structures
• Hierarchical structures = trees
• Algorithms for visualization:
– Space-filling – maximal usage of screen space
– Non-space-filling
blog.revolutionanalytics.com
Space-filling Methods
• Two most common types:
– Rectangular layout
• Treemaps
– Radial layout
• Sunburst view
www.oreillynet.com
www.cc.gatech.edu
Rectangular Layout
• Treemaps
– Basic form - recursive splitting of rectangle
alternatively by horizontal and vertical lines
– Other variants, e.g.:
• Squarified treemaps
• Nested treemaps
hcil2.cs.umd.edu
www.mathworks.com
Treemap - Pseudocode
Naming:
Width = width of the rectangle
Height = height of the rectangle
Node = root node of the tree
Position = position of rectangle (e.g., [0, 0])
Orientation = orientation of the split – alternating
horizontal and vertical split
treemap(Node n, Orientation o, Position orig, Width
w, Height h)
treemap(Node n, Orientation o, Position orig, Width w,
Height h)
if n is the leaf node (has no child nodes)
draw_rectangle(orig, w, h);
return;
for each child node of the n (child_i)
get number of leaf nodes in the subtree;
sum up the total number of the leaf nodes;
compute the percentage of leaf nodes in each subtree
(percent_i);
if orientation is horizontal
for each subtree
compute the offset of the origin based on the
origin and the width (offset_i);
treemap(child_i, vertical, orig+offset_i,
w*percent_i, h);
else
for each subtree
compute the offset of the origin based on the
origin and the height (offset_i);
treemap(child_i, horizontal, orig+offset_i, w,
h*percent_i);
Radial Layout
• Sunburst displays
• Root of the hierarchy placed in the center of
the radial view, individual layers of hierarchy
represented by concentric rings
• Rings are split based on the number of nodes
at the given level
• Radial techniques display inner nodes as well
as leaves
Sunburst - Pseudocode
Naming:
Start = start angle of the node (initially 0)
End = end angle of the node (initially 360)
Origin = position of the center of the radial view
(e.g., 0, 0])
Level = current level of hierarchy (initially 0)
Width = width of each ring – based on maximal depth
of the hierarchy and the size of display
sunburst(Node n, Start st, End en, Level l)
sunburst(Node n, Start st, End en, Level l)
if n is the leaf node (has no child nodes)
draw_radial_arc(Origin, st, en, l*Width,
(l+1)*Width);
return;
for each child nod of the n (child_i)
get number of leaf nodes in the subtree;
sum up the total number of the leaf nodes;
compute the percentage of leaf nodes in each
subtree (percent_i);
for each subtree
compute the initial/end angle based on the size
of subtrees, their order and the range of
angles;
sunburst(child_i, st_i, en_i, l+1);
Color in Hierarchical Techniques
• Highlighting of many attributes, e.g.:
– Values in nodes
– Enhancement of hierarchical relationships (similar
color for parent and child nodes)
• Additional properties displayed by symbols,
marks and labels placed into rectangular or
radial segments
protempore.net
Non-Space-Filling Methods
• Node-link diagrams are the most common
• Diagrams dependent mostly on two factors:
– Degree of the nodes (number of branches, e.g.,
binary trees)
– Depth
swamitra.blogspot.com
Design of the Algorithms for
Rendering Node-Link Graphs
• Three categories of rules:
– Rendering conventions – shape and curvature of
edges, placement of the nodes into fixed grid,
neighbors at the same vertical position, …
– Restrictions– placement of the nodes at given
position, displaying nodes in close proximity,
orientation of edges, …
– Aesthetics– minimization of edge intersections,
keeping the width/height ratio of the graph,
minimization of the overall graph area, minimization
of the edge length, minimization of the edge
bending, minimization of the number of different
angles and curves, effort to keep the symmetry
Design of the Algorithms for
Rendering Trees
1. Divide the rendering area into slices of the same
height – the number of slices is derived from the
depth of the tree
2. For each level of the tree, determine how many
nodes need to be drawn
3. Divide each slice into rectangles of same size –
derived from the number of nodes at the given
level
4. Draw each node into the corresponding rectangle
5. Draw connecting line leading from the center of
the bottom edge of each node to the center of
the top edge of the its child nodes
Optimalisation
• Improve the usage of the screen space
• Examples:
– Each level divided by the number of leaf nodes in the
corresponding subtree
– Uniform layout of the leaf nodes + centering of their
parent nodes
– Adding additional gaps between neighboring nodes
that do not have common parent (i.e., are not siblings)
– Reorganization of the tree in order to improve the
symmetry and balance
– Root node at the center of the screen, child nodes
arranged radially around it
Application of Third Dimension
• For large trees, complemented by rotation,
translation and zooming
• E.g., cone trees – child nodes are uniformly
radially placed around parent node
– Primary parameters are
radius and translation
distance
www.dcc.uchile.cl
Arbitrary Graphs/Networks
• Trees are just one category of graphs –
connected unweighted acyclic graphs
• Other graph representations exist, e.g., graphs
with weighted edges, undirected graphs, cyclic
graphs, disjoint graphs, …
• We will focus on arbitrary graphs – their
structure is not known. Two different
approaches:
– Node-link diagrams
– Matrix displays
Node-Link Diagrams
• Force-directed graphs – strings between nodes
• Links between nodes are iteratively adjusted,
until the stress values is minimalized
• Example:https://vega.github.io/vega/examples
/force-directed-layout/
http://www.absint.com/aisee/gallery.htm
Planar Graphs
• Edges are not intersecting
• Very popular
– Long history, many studies
– Edge intersections complicate the interpretation of
the graph, it is better to avoid them
– Sparse – according to Euler’s formula for n nodes
there are at most 3n – 6 edges
• Graphs with intersections can be transformed to
planar graphs by introduction of „dummy“ nodes
at intersection points – after transforming the
graph to planar layout, the dummy nodes are
removed
Planar Graphs
• We have to set a strategy, by which to decide
if the graph is planar – algorithms are either
complex or computationally expensive
• Simplification:
– Graph is planar if all its connected components
are planar
– Connected graph is planar only if all its
biconnected components are planar
• We therefore only need algorithm for
detection of planarity of biconnected graph
Algorithm for Detection of Planarity
of Biconnected Graph
• Divide and conquer approach
• If the graph contains cycle and does not contain any
other cycle sharing an edge of the original cycle, then
after removal of this cycle only paths (“pieces”) starting
and ending in one of the cycle vertices remain
• If two pieces intersect, one must be drawn inside of the
cycle and one outside
• If we create a graph of paths, where the intersecting
paths are divided by an edge and the graph is bipartite
(separable into two sets of nodes, such that within the
nodes of one set no edge exists), then the original
graph is planar
Algorithm for Detection of Planarity
of Biconnected Graph
Algorithm for Detection of Planarity
of Biconnected Graph
• If the graph still contains cycles after the
removal of the edges of the original cycle, it
means that one or more pieces contain cycle
• Create subgraph containing this piece and the
parts of original cycle containing the
endpoints of the piece
• Recursively call the algorithm for detection of
planarity
Example
Algorithm for Detection of Planarity
of Biconnected Graph
• If the graph is not planar, we can planarize it:
1. Identify the largest planar subgraph of the
original graph
2. Place remaining nodes into graph faces in
a way that minimizes edge intersections
3. For each edge intersection split the
crossing edges into two parts and create a
“dummy” node at the intersection point
Algorithm for Detection of Planarity
of Biconnected Graph
• Separating cycle is such a cycle that generates
at least two pieces
Given biconnected graph G and separating cycle C:
1. Compute all pieces of G relative to C.
2. For each piece P, which is not a simple path
(i.e., contains a cycle):
a. Create graph G’ from piece P and cycle C.
b. Create cycle C’ containing a path through P and
a part of cycle C connecting the ends of the P.
c. Apply this algorithm to (G’, C’). If the result
if not planar, then also G is not planar.
3. Compute the graph I from the adjusted pieces
of G.
4. If I is not bipartite, G is not planar; else
G is planar.
Rendering Planar Graphs
• Visibility approach – 2 steps:
– Visibility step – create a representation, where
each node is rendered as horizontal segment and
each edge as vertical line connecting
corresponding segments of the nodes
– Replacement step – each segment corresponding
to a node „shrinks“ into a point and each vertical
connector is replaced by polyline
Rendering Planar Graphs
Matrix Representation of Graphs
• Adjacency matrix = grid of NxN dimensions (N
is the number of nodes)
– Binary or containing values of forces or weights of
edges
– Overcomes the problem of edge intersections
– The strategy for organisation of rows and columns
is crucial for revealing interesting patters and
structures of the graph
Matrix Representation of Graphs
• Variety of algorithms for row and column
reorganisation
– User driven / automatic
– Finding the optimal solution is NP-complete
problem, therefore the application various
heuristics is necessary
en.wikipedia.org
Labelling
• Necessary for
understandingwhat the
graph represents
• When rendering trees and
graphs the labelling is
difficult, because the
graphs may contain large
amount of nodes and
labelling of edges may also
be required
labelwallpapers.blogspot.com
Labelling
• For small amount of different labels it is better to
use color, size, or shape of the node or color,
thickness and the style of the edge line
• If the number of different labels is over 5 or 6 –
use textual labels
• Small graphs – inserted directly into the node
(rectangular or oval shape of node) – size of the
nodes based on the longest text
• Unified placement of labels for edges:
– Vertical – everything left or everything right to the edge
– Horizontal – everything above or everything below the
edge
Labeling
• If the amount of different labels is too large,
showing them all at once is not effective. We
can use different strategies:
– Showing labels only in the neighbourhood of
current cursor position
– Various deformations of the visualization, so we
can dedicate the given part of the graph larger
amount of screen space
– Rotation of graph in order to reduce overlaps and
occlusions of the labels
Labelling
Labeling
• Showing random subset of labels for short
time interval and then showing different
random subset etc.
• Short-term memory of the user enables
remembering of larger amount of labels than
when using static view
Labelling
• LadislavČmolík, Jiří Bittner, Layout-awareoptimization forinteractivelabellingof
3D models, Computers & Graphics, Volume 34, Issue 4, August 2010, pages 378-
387, ISSN 0097-8493
Trees, Graphs, and Interaction
• Generic types of interaction
– E.g., trailing camera, zooming
– Common for all types of visualization
• Specialized interactions
– E.g., focus + context
– Applicable for a large variety of visualizations, but
primarily developed for visualization of trees and
graphs
Interactions with Virtual Camera
• Common interactions (trailing camera, zoom,
rotation) that are considered to be simple
changes applied to the virtual camera
observing a certain part of the scene
• Operations are directed manually or
automatically (e.g., flights over the scene,
automatic rotation of 3D objects)
Interactions with Graph Elements
• Starts with selection
– We isolate a single or more graph components
• Graphs with disarranged complicated clusters can
be adjusted:
– Select a set of nodes and drag it to the less cluttered
part of the screen
– Select, translate, or change the shape of edges in
order to eliminate their intersections or increase the
aestheticalvalue of the graph
• Problems – in dense cluttered regions it is almost
impossible to select object unambiguously
Interactions with Graph Structure
• Two basic classes of interaction:
1. Changing the structure of the graph (e.g.,
reordering the branches of the tree)
2. Focus+context techniques, where a subset of the
structure is represented in detail and the rest of
the structure just in outline (e.g., fish eye)
• Performed in screen space
• Performed in structure space – suitable for graphs
(enlargement of one branch (see image on the next
slide), highlighting of edges adjacent to a given node, …)
Interactions with Graph Structure
Interactions with Graph Structure
• Selective hiding or removal of the selected
parts of the graph
Text and Document Visualization
giladlotan.coms
joeganley.com
vialab.science.uoit.ca
Definitions
• Corpus = set of documents
• We work with objects inside of corpus –
words, sentences, paragraphs,whole
documents, other corpora as well as images
and videos
• Texts and documents – structured, contain
attributes and metadata
• Systems for data mining from documents –
querying
Levels of Text Representation
• Three levels:
– Lexical
– Syntactic
– Semantic
• Each level requires certain conversion of
unstructured text to some form of structured
data
Lexical Level
• Transformationof string of characters into
sequence of atomic entities = tokens
• Lexical analysers process sequence of
characters with a given set of rules into new
sequence of tokens
• Tokens contain characters , n-grams, words,
lexemes, phrases, …
• Finite state automata
Syntactic Level
• Identificationand annotation of function of
each token
• Assignment of marks – position of sentence,
lexical category, singular or plural, affinity of
tokens, …
• Process of annotation extraction = NER
(named entity recognition)
Semantic Level
• Extraction of the meaning and the relationship
of the findings derived from the structures
identified in syntactic level
• Goal is to define analytic interpretation of the
whole text in the given context or independent
of the context
Vector Space Model (VSM)
• Algebraic model for representation of text
documents
• „Term vector“ = vector, in which each dimension
represent a weight of a given word in a document
• Removal of „stop words“ (the, a, …) for reduction
of noise
• Aggregation of words with the same root
• Example of term vector is a hash table which
maps unique terms to the number of their
occurrences in the document
Vector Space Model
Count-Terms(tokenStream)
1.terms := ∅; /* initialize terms
to empty hash table*/
2. for each token t in tokenStream
3. do if t is not stop word
4. do increment (or initialize
to 1) terms[t];
5.return terms;
Example
• Text example:
There is a great deal of controversy about the safety of
genetically engineered foods. Advocates of biotechnology
often say that the risks are overblown. ‘‘There have been
25,000 trials of genetically modified crops in the world,
now, and not a single incident, or anything dangerous in
these releases,’’ said a spokesman for Adventa Holdings, a
UK biotech firm. During the 2000 presidential campaign,
then-candidate George W. Bush said that ‘‘study after
study has shown no evidence of danger.’’ And Clinton
Administration Agriculture Secretary Dan Glickman said
that ‘‘test after rigorous scientific test’’ had proven the
safety of genetically engineered products.
Example
• The aforementioned paragraph contains 98
string tokens, 74 terms aand 48 terms after
removal of stop words
• Example of term vector generated by the
pseudocode:
VSM – Computation of Weights
• Various methods of weight assignment, most
well knownis term frequencyinverse
document frequency (TfIdf)
• If Tf(w) is term frequency = number of
occurences of a word w in a document, Df(w)
is document frequency = number of
documents, which contain word w, N is
number of documents, then TfIdf(w) is:






=
)(
log*)()(
wDf
N
wTfwTfIdf
VSM – Computation of Weights
• We are interested in words that are frequently
occurring in one document but are not so
common in other documents of the corpus
Compute-TfIdf(documents)
1 termFrequencies ← ∅ // Looks up term count tables fordocument names.
2 documentFrequencies← ∅ // Counts the documents in which a term occurs.
3 uniqueTerms← ∅ // The list of all unique terms.
4 for each document d in documents
5 do docName ← Name(d) // Extract the name of the document.
6 tokenStream ← Tokenize(d) // Generate document token stream.
7 terms ← Count-Terms(tokenStream)// Count the term frequencies.
8 termFrequencies[docName]← terms // Store the term frequencies.
9 for each term t in Keys(terms)
10 do increment (or initialize to 1) documentFrequencies[t]
11 uniqueTerms← uniqueTerms ∪ t
12
13 tfIdfVectorTable ← ∅ // Looks up tf-idf vectors for document names.
14 n ← Length(documents)
15 for each document name docName in Keys(termFrequencies)
16 do tfIdfVector ← create zeroed arrayof length Length(uniqueTerms)
17 terms ← termFrequencies[docName]
18 for each term t in keys(terms)
19 do tf ← terms[t]
20 df ← documentFrequencies[t]
21 tfIdf ← tf ∗ log(n/df)
22 tfIdfVector[index of t in uniqueTerms]← tfIdf
23 tfIdfVectorTable[docName] ← tfIdfVector
24 return tfIdfVectorTable
Zipf’s Law
• In a typical document in a natural language
the frequency of any word is inversely
proportionalto its rank in the frequency table.
Tasks Using Vector Space Model
• Vector Space Model complemented by distance
metric allows for whole set of various tasks:
– TfIdf in combination with VSM – identification of
interesting documents, querying
– Providing information about the meaning of the whole
corpus – searching for patterns,clusters, …
• Visualization pipeline can be well mapped onto
visualization of the documents:
– Dataacquisition (corpus), transformation to vectors,
processing by algorithms based on the tasks,
visualization
Visualization of Individual Documents
• Tag clouds
Visualization of Individual Documents
• WordTree – tree branches represent various
contexts in which the word from the root of
the tree occurs in the document – also
showing frequency of terms
Visualization of Individual Documents
• TextArc
Smith Williams – A History of Science
Visualization of Individual Documents
• Arc Diagrams
–visualizing repeated occurrences in text
• Bach’s minuet in G dur:
Visualization of Individual Documents
• Literature Fingerprints
Visualization of Set of Documents
• The goal is to place similar documents close to
each other and different documents far apart
• Algorithm computes similarity between all
pairs of documents and this then drives their
placement – O(n2) complexity
Visualization of Set of Documents
• Self-Organizing Maps – machine learning
algorithm. Collection of 2D nodes, into which we
place the documents. Each node contains vector
of the same dimensionality as the input vectors of
the training documents
• At the beginning initialize the SOM nodes –
typically randomly. Select random vector,
compute its distance from the other nodes. Assign
weights to the closest nodes (within given radius)
– the closest node has the largest weight. Iterate
over input vectors while decreasing the radius.
Self-Organizing Maps
Self-Organizing Maps - Example
Visualization of Set of Documents
• Themescapes – summary informationabout
corpuses in the form of 3D terrains
• Height and color used for representation of
density of similar
documents
www3.sympatico.ca
Visualization of Set of Documents
• Document cards – represent the semantics of
documents in the form of mixture of images and
key terms
• Terms are acquired by advanced data mining
algorithms
• Images are acquired by using graphical heuristics
• Uses colored histograms of images, which classify
them into individual categories:
– 1st class: photographs/rendered images, 2nd class:
diagrams/sketches/graphs,3rd class: tables
– At least one representativefrom each non-empty
class is shown
Document Cards
Extended Methods for Text Visualization
• Include also metadata:
– Software Visualization
– Search Result Visualization
– Temporal Document Collection Visualizations
– Representing Relationships
Software Visualization
Search Result Visualization
• TileBars – each document of the result set is
represented by rectangle where width indicates
relative length of the document and layered
squares inside correspond to text segments. The
darker the square the higher the frequency of the
queried set of terms
• Compact representation and information about
the structure of the document reflecting relative
length of the document, frequency of the queried
terms and their distribution
Search Result Visualization
Example
Temporal Document Collection
Visualizations
• ThemeRiver – visualization of thematic
changes in time for a set of documents,
vertical thickness corresponds to frequency in
the given time
Temporal Document Collection
Visualizations
• Jigsaw calendar view – visualization and
investigation of text corpuses using calendar
layout
• Newspaper
articles
Representing Relationships
• Jigsaw Graph entity view
www.cs.umd.edu
Representing Relationships
• Jigsaw List view