PV251 Visualization Autumn 2024 Study material Lecture 6: Geospatial data visualization Geospatial data differs from other types of data mainly because it describes objects or phenomena that occur in the real world. Geospatial data appears in several applications, such as credit payment systems, telephone networks, censuses, etc. In this lecture, we will focus on an overview of special features and methods that are used to visualize geospatial data. This area is often referred to as geovisualization. We will show the most important basics of geospatial visualization, such as maps, and introduce visualization techniques for point, line, area, and surface data. This area is widely covered by GIS (geographic information systems) and cartography, so here we will deal with this area purely from a visual point of view. After this lecture, we should have a general overview of state-of-the-art visualization techniques for geospatial data, and we should be able to implement and use them. Large sets of spatial data are often created by the accumulation of discrete samples of a continuous phenomenon in the real world. Currently, there are several applications for which it is very important to analyze and display relationships between data that include geographic locations. Examples are modeling of global climate development (e.g., measuring temperature, precipitation, wind speed), monitoring economic and social indicators (unemployment rate, level of education, etc.), customer behavior analysis, telephone call statistics, credit card payments, or crime statistics. Thanks to the special properties of such spatial data, their basic visualization is straightforward - spatial attributes are mapped directly to two dimensions of the output device (screen), thus achieving a map display. Points, lines, areas Maps can simply be considered a representation of the world, which is reduced to points, lines, and areas. Visualization parameters, such as size, shape, value, texture, color, orientation, supplement the displayed data with additional information. According to U.S. Geological Survey, map visualizations are defined as a set of points, lines, and areas, which are defined by their position in the coordinate system (spatial properties) and by other "non-spatial" attributes. It is clear from the definition that we can distinguish spatial phenomena according to their spatial dimension into: - Point phenomena - do not have a spatial component. They can be marked as 0dimensional and can be specified using a pair (longitude, latitude) and a set of other attributes. Examples are buildings, oil wells, cities, … - Line phenomena - have a certain length, their width is given by default. They can be marked as 1-dimensional and can be specified using a set of pairs (longitude, latitude). Examples are large telecommunications networks, roads, borders between states… Attributes associated with bar phenomena may include capacity, traffic congestion, names,… - Area phenomena - contain length and width. They can be marked as 2-dimensional and can be specified using a set of pairs (longitude, latitude) that are enclosed in a given region. Each pair can again have an associated set of other attributes associated with it. Examples are lakes, political map - states, … - Surface phenomena - in addition to length and width, they also contain height. Thus, they are referred to as 2.5-dimensional and may be specified by a set of vectors composed of longitude, latitude, and altitude, and each pair (longitude, latitude) may have additional attributes associated with it (altitude). Types of maps Maps can be divided according to their types based on the properties of the displayed data (qualitative vs. quantitative, discrete vs. continuous) and based on the properties of the socalled graphical variables (points, lines, surfaces, volumes). Examples of the resulting maps can be: - Symbol maps (nominal point data) - Point maps (ordinal point data) - Land use maps (nominal area data) - Choropleth maps - used to display phenomena using colored areas and shades. For example, population density is displayed. - Line diagrams (nominal or ordinal line data) - Isoline maps (ordinal surface data) - Surface maps (ordinal volumetric data). In addition, the image shows the same data displayed with a contour and then with a surface map. Different types of representations It is clear from the examples that the same data can be displayed using different types of maps. By aggregating point data within areas, we can create a choropleth map from a point map. Similarly, a land use map can be created from a symbol map. We can also generate a surface with a point map density display and display it as an isoline map or a surface map. If we group point data inside areas and map the number of points inside a given area to their size, we get a so-called cartogram - thematic map. The picture shows the cartogram of the world population. Exploratory geovisualization In exploratory geovisualization, the ability to interact with maps is crucial. Compared to traditional cartography, the classification and mapping of data using this technique is interactively adapted to the needs of the user. At the same time, interactive queries are enabled. These interaction capabilities are supported by a set of current techniques and systems. These allow, for example, the connection of several maps or a combination of maps with standard statistical visualization, such as bar and line graphs. Maps can also be combined with much more complex techniques for multidimensional visualizations, such as parallel coordinates. Map projection Map projection plays a key role in the visualization of geospatial data. Map projection deals with the mapping of positions on the globe (spheres) to positions on the screen (planar surface). Map projection is defined as Π: (λ, φ) → (x, y). The data format for degrees of longitude (λ) is limited to the interval [-180, 180], where negative values indicate western longitude and positive values represent eastern longitude. Degrees of latitude (φ) are defined similarly for the interval [-90, 90], where negative values represent south latitude and positive values represent north latitude. Map projections can have different properties: - Conformal projection - correctly preserves the local angles of each point on the map. This means that it also preserves shapes locally. However, the area is not preserved. - Equivalent projection - a specific area of a part of the map covers exactly the same surface on a sphere. The resulting map distorts the shape and angles. For example, a square on the surface of a globe is mapped to a rectangle on a map of the same size. - Equidistant projection - maintains the distance from any point or line. - Gnomonic projection - allows the display of "great circles" using lines. The main circle divides the sphere into two equally large hemispheres (at the globe it is the equator). Gnomic projections maintain the shortest path between two points. The entire hemisphere cannot be displayed because the edges "run" indefinitely. - Azimuthal projections - maintain the direction from the center point. Usually, this type of projection has radial symmetry, for example the distances from the center point are independent of the angle and at the same time concentric circles centered at the center point of the projection are projected on the circles centered at the center point of the map. - Retro Azimuthal projection - the direction from point A to a fixed location L corresponds to the direction from S to L on the map. Map projections - classification according to the type of surface Map projections can also be classified according to the type of surface on which the sphere is projected. The most important types of such surfaces are: - Cylindrical projection - Planar projection - Conical projection Cylindrical projection A cylindrical projection projects the surface of a sphere onto a cylinder placed around this sphere. Each point of the sphere is then projected onto the outer cylinder. Cylindrical projections have the advantage of being able to display the entire spherical surface. Most cylindrical projections maintain local angles and are therefore conformal. Degrees of longitude and latitude are usually orthogonal to each other. Pseudo-cylindrical projection Pseudo-cylindrical projections represent the main meridian and each parallel as one straight line, the other meridians are deformed. Planar projection Planar projections are azimuthal projections mapping the surface of a sphere to a plane that is tangent to that sphere. The tangent point corresponds to the center of the projection. Some planar projections are perspective. Conical projection The principle of conical projection is the mapping of the surface of a sphere to its tangent cone. Degrees of latitude are represented by circles centered on the projection center, degrees of longitude are straight lines emanating from that center. Conical projections can be designed to maintain distances from the center of the cone. There are also a number of pseudo-conical projections that preserve, for example, distances from the pole and at the same time distances from the meridians. Examples of commonly used map projections We will now list several widely used map projections. The variables defined in the map projections are listed in the following table. Visual variables for spatial data Maps are used in different ways - for example, to find out information about a certain location, to find out general information about spatial patterns in many maps. Therefore, the mapping of spatial data properties to visual variables must reflect these goals. The visual variables for spatial data are as follows: - Size - size of individual symbols, width of lines, … - Shape - the shape of individual symbols or patterns in lines and areas - Brightness - brightness of symbols, lines, and areas - Color - color of symbols, lines, areas - Orientation - orientation of individual symbols or patterns in lines and surfaces - Texture - placement of patterns in symbols, lines, and areas - Perspective height - perspective 3D view of phenomena where data values are mapped to the perspective height of individual points, lines, and areas - Arrangement - arrangement of patterns in individual symbols (for point phenomena), patterns of dots and dashes for line phenomena and patterns of regular vs. random distribution of symbols for surface phenomena The effect of editing the input data on the resulting map Cartographic designs have been intensively studied for several decades, during which time high-quality guides to these map designs have been created. All are based on the results of research in the field of human perception. In addition, the input data is often subject to various modifications (sampling, segmentation, normalization, …), which can have a significant effect on the resulting visualization. As an example, let's take two visualizations of the same data in the form of a choropleth map. The only difference is the different choice of departments of individual "classes", which has a very significant impact on the generated maps. Another example is a significant change in the resulting visualization, which was created only by changing the absolute mapping to relative. Absolute numbers are visualized on the left, while they are displayed on the right relative to the size of the population. Due to large differences in the population in some areas, the expression may be quite the opposite of that in absolute numbers. Visualization also strongly depends on the boundaries of the areas we cluster. The picture shows a well-known case of London cholera with different ways of clustering areas, leading to different choropleth maps. Geospatial data visualization In the next part we will focus on visualization techniques for data according to their types. We focus on three basic types of data - point, line, and area. Visualization of point data The first important class of spatial data is point data. They are discrete in nature, but may describe a continuous phenomenon, such as the measurement of temperature at a given location. Depending on the nature of the input data and the task required, the designer must decide whether he wants to display the data continuously or discretely, smoothly, or intermittently. The figure illustratively shows possible combinations of these decisions. For discrete data, we assume that they occur in certain places, while continuous data must be defined in all places. Smooth data changes gradually, while abrupt changes suddenly. Point phenomena can be visualized in such a way that the given symbol is placed at the place where the phenomenon occurs. The simplest visualization of this type is called a point visualization. The quantitative parameter can be mapped to the size or color of the symbol. The most commonly used symbols in point maps are circles, but it is possible to use squares, columns, etc. If the size of a symbol is associated with a quantitative parameter of a point, it is necessary to consider how to scale the symbols. Namely, the correct calculation of the size of individual symbols does not necessarily mean that the symbols will also be perceived correctly. The perceived size of the symbols does not necessarily correspond to the actual size, mainly due to problems with human perception - the perception of the size of the symbols depends on the local environment. Therefore, there is no global calculation for perceptual size perception. The figure shows an example of different perceptions of the size of a circle depending on the local environment. The phenomenon shown is called the Ebbinghaus illusion. Similarly, if color is used to represent a quantitative parameter, we must similarly take into account color perception problems. Point maps are an elegant way to communicate a large amount of information about the relationships between point phenomena in a compact, convenient, and familiar form. However, when displaying large amounts of data on a map with different data densities, overlaps may occur in densely populated areas (e.g., populations), while sparsely populated areas remain essentially empty (see figure). The figure on the left represents the spatial distribution of the event. If we zoom in on the map, it can be seen that there is a considerable overlap of data (figure on the right side). Examples of this type of spatial data are credit card payments, telephone calls, health statistics, environmental records, crime statistics, etc. It is worth noting that analyzes can contain several parameters that can be plotted on several maps. If all these maps present data in the same form, it is possible to put individual parameters into certain relationships and detect local correlations between data, dependencies, and other characteristics. There are several approaches to solving the problem of displaying "dense" data. One of the widely used methods is 2.5D visualization clustering data points into regions. This technique is available in commercial systems, such as Visual Insight's In3D or ESRI's ArcView. An alternative approach showing more detail displays individual data points as columns relative to their statistical value on the map. This technique is used by systems such as MineSet from Vero Insight and Swift 3D from AT&T. The problem with this method is that the columns overlap in the case of large datasets. In the end, therefore, only a certain part of the input data is visible. PixelMaps An approach to avoid clustering data but also solve the overlaps is the PixelMap approach. The main idea is to move pixels that would otherwise overlap. The relocation algorithm recursively divides the dataset into four subsets containing data points in four equally large subregions. An efficient implementation of this algorithm uses a structure based on a quadtree approach that supports a recursive division process. The division process works as follows. We start at the root of the quad-tree and in each step, we divide the data space into 4 subregions. The condition for division is that the space contained in the subregion (in pixels) is greater than the number of pixels belonging to the given subregion. If, after several repetitions of recursion, only a limited number of data points remain in the subregion, the points are placed using the so-called "pixel placement" algorithm, which places the first data item in its correct position and subsequent data items are placed in the nearest unoccupied positions. The resulting location is locally quasi- random. The problem with displaying point data using PixelMaps is that in areas with high data overlap, the relocation of individual points depends on the order in which they are stored in the database. The figure shows the four time slots displayed using this type of visualization (0:00 AM, 6:00 AM, 10:00 PM, and 6:00 PM, all EST time zone), when the volume of telephone calls in the United States was recorded at ten-minute intervals. The visualization intuitively shows the "development" of the volume of telephone calls according to time zones – i.e., when people wake up in the given areas or, for example, the decline of calls at lunch time. The visualization reveals both the expected patterns of behavior and the unexpected ones - for example, where the largest call centers operating overnight are located. Line data visualization The basic idea is to visualize spatial data describing linear phenomena to represent them using line segments between two endpoints determined by longitude and latitude. Standard line data mapping also allows you to map other input data parameters, such as line width, line pattern, color, lines, and icons. In addition, it is possible to map start points, end points, and intersections to nodes with a specific color, size, shape, and label. The lines do not have to be straight, they can be polylines or splines. Network maps are used in a wide range of applications. Some approaches only show network connectivity to understand their structure. Eick and Wills used features such as aggregation, hierarchical information, node positions, and more to explore large networks with a hierarchical structure and no natural arrangement. They also used the color and shape to encode the information in the nodes and the width and color of the line to encode the connection information between the nodes. The NCSA research group added 3D graphics to these network maps to animate the movement of packets in the Internet backbone network. Becker, Eick and Wilks described a system known as SeeNet. The main idea is to involve the user in the process and let him interactively control the display and thus focus on interesting patterns in the data. To do this, they use two static views of networks to visualize geographic relationships. Another interesting system for visualizing large network data is the Swift-3D System of AT&T (see figure), which integrates a set of relevant visualization techniques. All of the network display techniques mentioned suffer from the problem of overlapping line segments in areas with dense data coverage. Flow maps The flow map technique is inspired by graph algorithms that minimize edge intersection and deformation of node positions while maintaining their relative position. The algorithm is based on hierarchical clustering according to the position of nodes and flows between them, which can compute clusters and redirect flows. Compared to other computer-generated flow maps (see image on the left showing the flow of tourists around Berlin), Stanford flow maps can produce much "cleaner" graphs (see image on the right showing migration from California). Edge clustering also helps reduce ambiguity in line representation. If we have a hierarchy defined above the nodes, the corresponding edges can be clustered according to it. Nodes connected through the root of the hierarchy are maximally bent, while nodes at the same level of the hierarchy are bent minimally. This method can be combined with a number of other visualization techniques, such as traditional maps, tree visualizations, etc. The figures show edge clustering applied to visualize IP flow traffic. The visualization shows traffic from external nodes to internal nodes displayed using a treemap. The picture shows the advantage of displaying edge clusters (right) compared to the original display of all straight edges (left). Visualization of area data The so-called thematic maps are most often used for the visualization of surface phenomena. These maps have several variants. The most popular variant are the so-called choropleth maps (Greek choro = area, pleth = value), where the values of attributes are coded using colored or shaded regions on the map. Choropleth maps assume that the mapped attributes are uniformly distributed within the regions. The picture shows a sample choropleth map showing the results of the 2008 election in the United States (Obama vs. McCain). If the attribute has a different distribution than the division into regions, it is necessary to use other techniques, such as so-called asymmetric maps. In asymmetric maps, the displayed variable forms an area independent of the original regions. For example, compared to choropleth maps, the area boundaries derived from an attribute may not match the regions in the map. Another important type of maps are the so-called isarithmic maps showing the contours of continuous phenomena (see the picture showing the concentration of photographs taken on the island of Mainau). Typical representatives of this type of map are contour and topographic maps. If the contours are derived from real data points (for example, the temperature measured at a given location), then these maps are called isometric. If the data is measured for a given region (for example, a region) and the data point is considered the center of gravity, then these maps show so-called isopleth maps. One of the main tasks in generating isarythmic maps is to interpolate data points to obtain smooth contours. This can be achieved, for example, by triangulation. A complex and less frequent mapping technique is the so-called cartograms, where the size of regions is scaled in order to display statistical information. Cartograms have several variants - ranging from continuous cartograms that preserve the topology of polygonal meshes to discontinuous cartograms that simply scale each polygon independently of rectangular or circular approximations of surfaces. Note that area information can also be visualized using discrete points or symbols on the map — for example, using symbols that are proportional to the statistical parameters on the map or a point map. We will now focus in more detail on choropleth maps and cartograms. Choropleth maps Choropleth maps usually present surface phenomena in the form of shaded polygons, which are closed by a contour formed by a set of points, where the first and last points are identical. Examples are states, regions, parks, etc. Choropleth maps are used to highlight the spatial distribution of one or more geographic attributes. When generating choropleth maps, data normalization (see the chapter on preprocessing of input data) and color and grayscale mapping are usually performed. The problem with choropleth maps is that the most interesting values are often concentrated in densely populated areas (e.g., densely populated) with small and hardly visible polygons, and less interesting values are spread over sparsely populated areas, which are mostly represented by large and visually dominant polygons. Choropleth maps therefore highlight areas represented by large polygons, which are usually of lower importance. Cartograms Cartograms are generalizations of common thematic maps, which try to avoid the problems that appear in choropleth maps, when geographical data is distorted based on their statistical value. Cartograms are a specific type of map transformation, where the size of regions is changed based on a certain input variable, which is tied to the geographical properties of the input data. Examples of the use of cartograms are the display of demographics, election results or epidemiological data. Cartograms can be divided into several categories. Noncontinuous cartograms They fully comply with area and shape constraints, but do not maintain the topology of the input map. Because the scaled polygons are drawn inside the original polygons, there are no problems with the perception of such a map, despite the loss of topology. What's worse is that the original size of individual polygons limits their final size - so it is not possible to arbitrarily enlarge small polygons without scaling (enlarging) the entire map. As a result, important areas are difficult to see and screen space is very limited. Noncontiguous cartograms They scale all polygons to their target sizes, which exactly meet the space requirements. Individual shapes can be slightly relaxed, so the polygons touch, but without overlap. Thanks to this, the whole topology of the map is also highly relaxed, because the polygons do not maintain mutual "neighboring" relationships. This type of cartograms provides a very good arrangement of surfaces, including the preservation of their shapes. However, the global shape and topology of the map is lost, which can worsen the perception of the map as a whole. Circular cartograms They completely ignore the shape of the input polygons and represent them with circles. In most cases, area and topological restrictions are also relaxed, so this type of cartograms shows the same problems as previous non-adjacent cartograms. Continuous cartograms The last category is the so-called continuous cartograms, which, unlike the previous types, completely preserve the topology of the map and they relax given surface and shape restrictions. In general, cartograms cannot fully preserve shape and area, so generating cartograms involves a relatively complex optimization problem that seeks to find a satisfactory compromise between preserving shape and area. Although it is difficult to generate continuous cartograms, the resulting polygonal meshes resemble the original map much more than other computer-generated variants of cartograms. Therefore, in the rest of this section, we will deal with continuous cartograms. Generating cartograms Creating cartograms manually is very difficult, so it is a very popular area to study automatic methods for generating cartograms using a computer. Cartograms can also be considered as a general technique for visualizing information. They provide a means to deal with shape retention versus surface retention by scaling the original polygons based on external parameters. In the so-called "population" cartograms, more space is allocated for densely populated areas and these areas are additionally highlighted (because they most likely contain the most interesting data). The image on the left shows a traditional map of the election results in the United States in 2000, the image on the right presents the same information using a population cartogram. In the cartogram, the area of the states is adjusted according to their population, which will allow to display the close election results in the individual states much more accurately and effectively than in the original chorion map on the left. To make a cartogram effective, its message must be quickly understood by the user, and at the same time the user must understand its relationship to the original map. This recognition depends in particular on the preservation of basic properties such as shape, orientation or adjacency. However, this maintenance is very difficult to achieve, especially for cartograms, and it has been shown that in general this problem of cartograms cannot be solved. Even if we allow the existence of these errors in the representation of shape and area, we still have the problem of difficult optimization, thanks to which the current algorithms solving this problem are very time consuming. Rectangular cartograms The main idea of rectangular cartograms is to approximate known maps covering a given area using rectangles and to find the distribution of available screen space, where the areas occupied by individual rectangles are determined proportionally to the statistical values. To promote the best possible understanding of the information presented by such a cartogram, the rectangles are placed as close as possible to their original positions and original neighbors. A problem can be defined as an optimization problem with a set of constraints and optimization criteria, including area, topology, relative position of polygons, rectangle size, and free space. There are different algorithms for solving this problem. One of them is the so-called RecMap algorithm. Map labeling These are placed text or image tags near points, lines, and polygons. Although it seems like a simple and straightforward task, it has been proven not to be the case. There are a number of different algorithms that solve this problem, which differ in efficiency, quality of results. These algorithms are mostly based on heuristic methods.