1
The Language of Graphics
Leland Wilkinson, Daniel J. Rope, Daniel B. Carr, Matthew A. Rubin
SPSS Inc., 233 South Wacker, Chicago, IL 60606
Abstract
We describe a system, called GPL, that implements a language for quantitative graphics. The
structure of this system differs from existing statistical graphics, visualization, and mapping systems.
Instead of treating a graphics display as a viewer for underlying data, GPL treats data as an
accessory to viewing a graph. GPL is based on the mathematical definition of the graph of a function
and uses that definition to organize data linked to the graph.
To be published in Journal of Computational and Graphical Statistics.
GPL has been renamed nViZn
2
1 Introduction
When thinking about computers, we often associate the word "language" with text. Statisticians
often compute with command languages, such as those in SAS, SPSS, S-Plus, or SYSTAT.
Software developers work in FORTRAN, C, Java, and other programming languages. We may
also think occasionally of visual programming languages. Less frequently do we think that a nontextual
expressive setting, such as a paint program, can constitute a language. In any case, however,
we can benefit from examining the linguistic rules underlying the specification of a problem
on a computer. This is especially true in the world of graphics.
This paper is about the language of statistical graphics. It outlines a framework called the
Graphics Production Library (GPL), which we have developed as a language for presenting
graphics on the World Wide Web. Although GPL is tailored to the Internet, it reflects a number of
ideas that are independent of computing environment. First, it is based on an assumption that statistical
procedures serve graphics; graphics are not ancillary displays of statistical results, but are
means for perceiving statistical relationships directly (Chambers et al., 1983; Cleveland, 1985;
McDonald and Pedersen, 1991). Second, it assumes that graphical elements are "alive"; wherever
possible, graphical features such as points, lines, bars, legends, and axes are connected to data,
metadata, or statistics in a way that allows users to drill-down, link, rotate, filter, brush and zoom
directly in the display (Fisherkeller et al., 1974; Becker and Cleveland, 1987; Stuetzle, 1987;
Velleman, 1988; Tierney, 1991; Swayne et al., 1998). Third, it is based on a formal model of
graphics (Wilkinson, 1999); individual displays are not ad hoc visual arrangements of data, but
reflect instead a quantitative or qualitative model of the variables in the display.
Figure 1 contains a diagram of the temporal model underlying GPL. It is one of several architectural
views of GPL, and is used here to illustrate the issues involved in devising a statistical
graphics language. The shaded rectangles represent functional objects in the system. The rounded
rectangles represent the sets on which these functions operate. This view derives from data-flow
and pipeline models devised for scientific and statistical visualization systems (Buja et al., 1988;
Upson et al., 1989)
The data for the graphic in Figure 1 are described in Carr et al. (1999) as follows. Daly et al.
(1994) developed an analytic method called Parameter-elevation Regressions on Independent
Slopes Model (PRISM) that uses point data and a digital elevation model (DEM) to generate 2.5
minute by 2.5 minute gridded estimates of monthly and annual U.S. climatic parameters. Their
data sets and further details can be found at www.ocs.orst.edu/prism. Carr et al. (1999, 2000) used
those gridded summaries for the time period 1961-1990 to develop graphics characterizing the
spatial variation of climatic parameters within ecoregions. They associated each grid cell with an
Omernik level II ecoregion (Omernik 1987, 1995) using a point-in-polygon matching procedure.
Figure 1 shows panels for three of the Omernik ecoregions (Ozark/Ouachita Appalachian Forests,
Chihuahuan Desert, and West-Central Semi-Arid Praries). The horizontal axis of each panel represents
the average yearly precipitation in millimeters over the three decades. The vertical axis
represents average annual growing degree days, a measure of the number of degrees in daily average
temperature above 50 degrees summed over all days with a daily average temperature above
50. There are 78,766 grid cells underlying Figure 1. (No binning adjustments have been made for
the latitudinal variation in cell area; Carr, Kahn, Sahr and Olsen (1997) discuss equal area gridding
alternatives.)
3
A graphical system like GPL needs to be able to represent these 78,766 data points in real
time, in a Web browser, in a distributed data environment, with instant access to associated metadata
through pop-up annotations and other viewers. The right-most panel of the graphic, for
example, shows a pop-up containing count information when a user clicks on a selected region of
the graphic. Any element in the graphic, including legend items, scale values, labels, smoothers,
etc., can be queried in this manner. GPL also offers real-time controllers and widgets (sliders, buttons,
list boxes, etc.) for transforming, manipulating, and selecting subsets of the underlying data.
The diagram in Figure 1 summarizes how this functionality is accomplished. The remainder
of this paper is a walk through Figure 1. In each of the following sections, we will step successively
through each functional object to see how it operates. At the end, we should be able to see
that a language for expressing the capabilities of a system like GPL differs in important respects
from the languages used in statistical packages, visualization systems, databases, or data mining
systems.
Figure 1. Data flow model for GPL system.
CGraph
Aesthetic
Graphic
Analytic
StatTree
DataSource VarMap
Coordinate
Algebra
VarSet
Statistic Statistic
Geometry Geometry
SGraph SGraph
GGraph GGraph
DataView
4
2 DataViews
The GPL data source is an abstraction. Avoiding concrete data formats and structures encourages
us to define a greater variety of graphs than is customary in relational databases or statistical
packages. Having the graph organize the data, rather than having the data organize the graph,
frees us from having to limit graph types to the particular structures we find in our data sources.
Moreover, an abstract DataView allows us to connect our graph to heterogenous and distributed
sources of data. For example, we can collect the tuples defining 25 points in an XY plot from 25
different Web sites in a live feed to our DataView.
Database client-server graphics and visualization systems attempt to solve these problems
with several ad hoc strategies. First, they can collect heterogeneous data into binary large objects
(BLOBs) inside a database. With this approach, structural organization can be tailored to the
needs of specific object-oriented clients rather than to a relational table or flat file model. This
method places the organizational burden on the database server rather than on the client, however.
Adding new functionality requires reorganization of the data on the heavy-weight server. Second,
database systems can employ bots (software robots) to collect data from distributed sources and
embed them in a single database. This method requires job scheduling to synchronize updates,
however. It is not suitable for streaming data applications such as stock quotes and other dynamic
time series.
In short, if we want a user to be able to explore Web pages, databases, ftp repositories, and
other data sources by interacting directly with a graphic, we must maintain references all the way
back to the original sources. By using abstract interfaces to data, we implement this flexible func-
tionality.
3 Analytics
Analytics involve filtering, recoding, aggregating, segmenting, modeling, or summarizing
data. GPL Analytics are transformations that operate on an object called a StatTree. A StatTree
contains a snapshot of a DataView plus, optionally, the results of dependent analyses. Because
they are transformations, GPL Analytics can be chained. And because their domain is a StatTree,
they offer a relatively high degree of flexibility in a relatively simple object.
A tree structure has many advantages. A StatTree is easily encoded in extensible markup language
(XML) for use as a portable Web resource. Also, a StatTree is serializable (externalizable)
with a simple interface so that it can be passed between separate components in a distributed system.
These capabilities make a StatTree easy to work with in a distributed network environment
containing a variety of protocols.
Figure 2 shows an example of a StatTree. A StatTree is a rooted tree whose nodes hierarchically
alternate between data nodes containing data objects and dependent analysis nodes that
identify analytic methods. The data nodes are represented in Figure 2 by shaded rectangles and
the analytic nodes by clear rectangles. This simple structure allows us to walk a StatTree to locate
a particular analysis or sequence of analyses. We can also determine both the input to an analytic
method (a data object) and its output (one or more data objects).
5
Figure 2. Stat Tree.
Data nodes of the tree contain a data object identifiable by the label of the node. Instances of
this object contain an array of numerical data, an array of associated string data, as well as
optional resources such as case weights and metadata. Thus, a node can contain resources such as
raw data, parameter estimates, fit statistics, confidence intervals, diagnostics, and model comparison
statistics or information measures. All data in a StatTree must be derivable from the data at
the ROOT node. Thus, descendent data nodes are either proper subsets of the ROOT data or are
the results of sequences of analyses on the ROOT data.
A practical consequence of this architecture is that we can annotate graphics with goodnessof-fit
statistics, model expressions, and other metadata from a StatTree without making additional
passes through a data source to compute them. Data passes can be expensive, so it is worth collecting
and persisting relatively cheap calculations even when they are not known in advance to be
needed in a graphic.
Because Analytics can have StatTrees as their input and output, we may collect them in a
transformation chain. Each Analytic adds one or more children to a StatTree. Thus, we can build
graphics from compound analyses (e.g., cluster analyses on principal components), while maintaining
case IDs, weights, and other information we need to perform brushing, linking, and sensitivity
analysis.
Another benefit of transformation-chains is in handling large datasets. An abstract DataView
can be used to hand Analytics chunks of data, one row or table at a time, to be aggregated by rectangular
or hexagonal binning. With binning, we can process datasets with millions of cases, maintain
case weights, and compute weighted statistics on the aggregates. This is how we handled the
bulky ecoregion data in Figure 1. The computationally intensive LOESS smoothers in Figure 1
were computed from pre-aggregated hex-bin data.
Analytics in GPL currently include statistical methods like cluster analysis, regression, multidimensional
scaling, principal components, and singular value decomposition. GPL Analytics
also include organizing methods such as merging StatTree data nodes, reshaping matrices (e.g.,
ROOT
PCA
TREE MEMBERS
CLUSTER
EIGENVALUES EIGENVECTORS SCORESESTIMATESRESIDUALS INFLUENCESFIT COEFFICIENTS
REGRESS
6
triangular-to-rectangular), recoding variables, partitioning variables (e.g., subgrouping), jackknifing,
bootstrapping, and simple random sampling.
4 Var Map
VarMap extracts one data object from a StatTree and outputs a table called a VarSet. A VarSet
is a set of variables, a matrix whose columns are variables and whose rows are instances of values
on those variables. We need VarMap to make a VarSet because Algebra operates on variables, not
on raw data.
VarMap finds the source table to make a VarSet through a simple StatTree addressing mechanism:
a string representing the path from root to node. For example, the path ROOT/PCA/
SCORES/CLUSTER/MEMBERS points to the cluster members data in Figure 2. StatTree paths
encapsulate what has been done to data before graphing. The StatTree path for the graphic in Figure
1 is ROOT/AGGREGATE/AGGREGATION. The AGGREGATION data object contains the
coordinates and counts for the hex bins.
5 Algebra
The graph is a subset of R2. To display G, we
choose a bounded region of R2, F = [xmin,xmax] × [ymin, ymax], and we physically represent the set
of points by choosing a coordinate system and making a graphic with ink or some
other perceivable medium.
Graphics algebra provides a method for specifying F (which we call a frame) when we wish to
construct a graphic based on some function of a set of data. Wilkinson (1999) presents three algebraic
operators called cross (*), nest (/), and blend (+), together with the rules for their use. They
are derived from the set operators product (×), discrete union ( ), and union ( ), respectively.
We use cross to construct a computer-generated graphic of the error function in the example at the
beginning of this section. The algebraic expression for Figure 1 is rainfall*days*region. The
frame for each panel is given by rainfall*days and the frame for the set of three panels is derived
by crossing with region.
It is easy to confuse graphics algebra with command languages or scripts used to construct
statistical charts in some computer packages. GPL does not simply parse graphics algebra; it symbolically
evaluates it. For example, the expression a*(b + c) is equivalent to a*b + a*c; GPL produces
the same graphic when presented with either expression.
With graphics algebra, GPL can build either single graphics or tables of graphics. GPL creates
tables differently from a tables-producing language (TPL), however. TPL formats multi-way
tables by specifying rows, columns, layers, headings, and contents. By contrast, graphics algebra
is indirectly related to the physical appearance of tables. GPL has a separate layout component to
generate a specific table format from its algebraic structure. It is this separation-of-function
between algebra and layout that gives GPL its extraordinary scope. In GPL, a table is not a rectangular
layout of cells; it is a lattice-structured frame that may be arranged on a rectangular grid, on
the circumference of a circle, on the trajectory of a spiral, or on some other geometric object.
G x f x( ),( ) : x R and f x( ) e
x
2
–
=∈{ }=
P F G∩=
∪
7
6 Statistics
At this point in our excursion through Figure 1, we have data and a frame, but we have no
graph. The Statistics component of GPL contains functions that receive a VarSet from Algebra
and output a statistical graph, called an SGraph. The most familiar statistical graphs are location
statistics such as means. Statistical graphs also include confidence regions, smoothers, densities,
directed graphs such as trees, and other functions. The modifier "statistical" is something of a misnomer,
since our requirement for this component is only that Statistics output a unique tuple or set
of tuples or collection of sets of tuples for each input tuple. This more general definition of an
SGraph allows GPL to produce a wide variety of graphics not limited to statistical charts.
Figure 1 indicates that a frame may include more than one SGraph. Our ecoregion example
includes a hex-bin density and a LOESS smoother. Each is computed from the same VarSet. This
architecture is ideally suited for a multi-threaded environment in which certain tasks can be handled
simultaneously. The possibility of more than one SGraph in a frame is one of the obvious
ways GPL differs from a charting program. Standard charts have only one or two hard-wired
graphical elements per frame.
Computing statistical values requires a lot of housekeeping. We must not only handle sample
weights and missing data, but we must also carry along the pointers necessary to link geometric
components to data. If we compute a schematic (box) plot, for example, we need to maintain a list
of cases in the central box, whiskers, and possible outliers so that a user can brush these objects
and link them to other graphs in real-time. If we compute a tessellation, we need to maintain
enough information to compute the perimeter or area of a polygon if requested. Doing this in a
general and efficient way, for linear, order and other statistics, while allowing for missing data and
sample weights, is nontrivial.
It is important not to confuse the hex-bin Statistics element with the hexagon binning Analytic.
If we eliminated the hexagon binning Analytic from our specification, we would have to
compute the hex-bin Statistic from the raw data. This would not have taken more time (since the
same algorithm is involved in both computations), but it would have prevented us from computing
the LOESS smoother in reasonable time. Instead, we pre-computed the bins in an Analytic and
then used the bin counts and locations to compute both the hex-bin and LOESS Statistics. The
same trade-offs exist for other Statistics and Analytics. Where we choose to locate the computations
depends on the functionality we want in our application.
7 Geometry
We now have a graph we can draw. Since a graph is a set of points, we could represent each
point by a spot of ink or pixel of light. For example, we could render each point in a 2D graph by
placing a spot of ink on a piece of paper, perhaps with an ink-jet printer.
There are two problems with this approach. First, many graphs are infinite, even within the
bounds of a frame. A regression line, for example, is an infinite set of points. We could not plot
every point in the set. We can solve this drawing problem by sampling on a regular or irregular
grid. This is how we draw lines on a computer screen or laser printer, for example. Second, statistical
graphs are not always in the form we expect to see in a chart. For example, some viewers
8
want a point estimate to be represented by a bar or spike rather than a dot. Solving this problem
requires making a graph from a graph, using a composition of functions. We need another transformation
object, called Geometry, in our system.
Geometry converts an SGraph (a statistical graph) to a GGraph (a geometric graph). The
classes of GGraph include point, line, area, bar, histobar, schema, tile, contour, path, and link.
With the point geometry object, we can represent a point estimate of a mean with a dot. With the
bar geometry object, we can represent the same point estimate by locating one end of the bar at
the coordinates of the point.
Sometimes, Geometry cannot produce a GGraph from a particular SGraph it has been given.
Undefined instances (e.g., a histobar of a tree) result in null objects. These instances, some of
which were exposed only when we attempted to code the complete crossing of classes, are surprisingly
rare. As elsewhere in the GPL system, modularity of function and orthogonality of
design increase the potential output of the system. This design strategy also encourages us to think
more broadly about graphical representation.
Figure 3 shows an example. This figure displays the airline ticket sales series from Box and
Jenkins (1976). Four different geometric objects are used to display the series. From bottom to
top, these are point, line, area, and bar. Each highlights a different aspect of the series. The
graphic in Figure 1 employs two Geometrics: line (for the LOESS smoother) and tile (for the hex-
bins).
Figure 3. Point, line, area, and bar geometric elements.
0 50 100 150
Month
0
100
200
300
400
500
600
700
TicketSales
0
100
200
300
400
500
600
700
TicketSales
0
100
200
300
400
500
600
700
TicketSales
0
100
200
300
400
500
600
700
TicketSales
9
8 Coordinates
Most of us are used to seeing graphics in rectangular coordinates. Sometimes, as with pie
charts, we are accustomed to polar coordinates. We rarely expect to see bar charts in spherical
coordinates, however, or time series charts in polar. Geographers and spatial statisticians are more
inclined to transform their viewing space, as a consequence of having to map the sphere to the
plane.
We can generalize this capability by locating coordinate transformations in a separate object
and making them work on most geometric graphs. Coordinates convert one or more geometric
graphs (GGraph) to a composite graph (CGraph). A CGraph embeds one or more geometric
graphs in a single frame and its associated coordinate system. The coordinate system used in Figure
1 is rectangular. It embeds the density and smoother graphics in the frame.
Figure 4. Robinson projection
10
Figure 6 shows a Robinson map projection. This geographic projection is a blending of local
transformations of the sphere. It was designed to give more prominence to countries in the southern
hemisphere. In this example, the North American continent is clipped in a rectangular frame
prior to the transformation, so the result is curved at the left and right edges and straight at the top
and bottom (the Robinson projection maps latitudes to horizontal straight lines). Only objects
whose coordinates are contained in the frame-bounded region are transformed by GPL. This
includes axes and grid lines (not shown), but excludes legends and other annotations.
Note that the pop-up annotation has been designed to work in different coordinate systems.
This is a consequence of the transformation architecture in Figure 1. Most GPL transformations
are invertible, so that location messages can be passed in either direction through the pipeline.
Figure 5. Lensing transformation
11
Figure 5 shows a coordinate transformation used to reveal local detail in time series. The
example is taken from the Bureau of Labor Statistics’ Consumer Price Index (CPI) for 1989-1999.
This fisheye projection is a lensing transformation that expands the plane near the focal point.
When connected to a controller and user tool (e.g., a magnifying glass), this transformation can be
used in exploratory analysis. It is, in effect, a nonlinear scroller that maintains a view of the whole
range of data at all lensing points. In this example, the lens is a 1D coordinate transformation
applied to the horizontal dimension.
9 Aesthetics
At this point in our travel through Figure 1, we have data, a frame, and a graph, but we cannot
see or otherwise perceive this graph. In order to produce the examples we have been viewing, we
need to map tuples in our graph to visually perceivable attributes, called Aesthetics. Aesthetics
convert a composite graph (CGraph) to a perceivable graphic. When we colloquially call a chart a
graph, we are really speaking of the realization of a CGraph. A CGraph is a mathematical graph;
it is not visible or perceivable. A graphic, on the other hand, is perceivable in some sense.
Aesthetics would appear to be a quaint term to use for describing the conversion of a graph to
a graphic. The post-Enlightenment meaning of this term is associated with art and expression. The
classical Greek meaning of the word, however, is perception -- the representation of an idea in
perceivable form. GPL includes a variety of Aesthetics that extend the work of Bertin (1967).
These are position, size, shape, rotation, color, texture, blur, and transparency. GPL Aesthetics
also include an attribute not generally thought of as an aesthetic or visual variable: a label. A label
is a text object glued to an element of a graph. In a graphic, a label functions like a color, texture,
or other attribute to make a graph perceivable to a reader.
The abstraction and localization of Aesthetics in GPL yields some interesting behaviors. First,
GPL can construct tables of numerals or text by using a label attached to an invisible geometric
element such as a point or tile. Second, a brushing event can be attached to any attribute such as a
label, color, rotation, or blur. Using a brush in one frame, for example, can cause points in another
linked frame to show their labels, change their color, rotate, or blur. Third, GPL can be used to
construct graphics that do not even remotely resemble XY plots. These include such images as
appear in Eick (1992) and Rao et al. (1994).
10 Controllers
A Controller is an object that connects a user gesture to a function. For example, a brush is a
controller that wires a user-manipulatable brushing region (usually a rectangular brush tool) to an
Aesthetic through a graph-subsetting function. In subsetting a graph, we select a region that
defines a subset of the values on one or more variables. When we drive this back through the pipeline,
we select a subset of our VarSet. Any Frame that is dependent on the same VarSet will
receive brushing messages identifying elements in the subset and these identifiers will be mapped
to a selected Aesthetic.
GPL has over 30 controllers that allow a programmer to connect functions in the graphics system
to user widgets such as sliders, list boxes, buttons, and modal cursor tools. These controllers
extend the scope of GPL beyond visualization, making it a system for manipulating as well as
12
viewing data. Several of these controllers are apparent in the figures we have reviewed so far. Figure
1, for example, contains a check-box that allows an end-user to panel the display by ecoregion.
Figure 4 contains a tree controller for selecting map regions. This controller can be used to
drill-down a hierarchy, such as world/continent/country/region/state/county/tract. Figure 4 also
contains several other button controllers in the upper left corner. Leftmost is a drill-down controller
for selecting a subset of the map. To its right is a lensing button for zooming into a region of
the map with the fisheye projection. Next is a question button for querying points on the map.
This button was used to activate a pointing tool that produced the pop-up annotation shown for
Nevada. A projection button on the right allows us to select a geographic projection. Finally, Figure
4 contains a list-box controller at the bottom right for selecting a variable to represent by the
brightness Aesthetic. The currently selected variable is INCOME.
Figure 6 shows several additional controllers used to implement a Web-based ordering system
for video cameras. The graphic in the figure shows a momentary state a user encountered in the
ordering process. It is a plot of recording format versus price. The user produced this plot by
selecting recording format in the horizontal axis list-box controller at the upper right-hand corner
of the window. Prior to this point, the user had examined a plot of brand by price. We need to
review what the user did in that previous state in order to understand the current plot.
While looking at the plot of brand by price, the user decided to select all the Canon cameras
for highlighting in red. The user did this with the selection controller in the middle-right region of
the window. Choosing the selection controller allowed the user to turn the Canon camera symbols
red by clicking on them. Next, the user decided to limit the view to cameras costing more than
$400 and less than $2400 by adjusting the yellow range filter controller at the left of the window.
This hid all cameras outside the selected price range. At this point, the user changed the horizontal
axis variable to see how Canons were distributed on recording format.
After seeing the distribution of Canons, the user decided to limit the display to only five formats.
The user did this with the 2D pan-and-zoom controller at the bottom right of the window.
The setting on the controller forced the frame to display only the left half of the horizontal axis,
which includes the five tick marks. Next, the user found a suitable camera by touching the leftmost
Canon symbol with the cursor. The popup annotation controller revealed summary details.
Finally, the user decided to place an order for the camera by double-clicking this symbol. The
pop-up metadata window controller appeared to the right of the display. By clicking on the Purchase
button, the user was taken to the Canon Website to order the camera.
Many of the actions described here resemble what database and data mining engineers call
drill-down methods. These methods are invoked when a user selects subsets of a multidimensional
array of data for closer examination. There is an important functional difference between the GPL
and ordinary drill-down implementations, however. Ordinary drill-down graphics are limited to
pie or bar charts because the variables are defined to be categorical so that they can be indexed in
the database. By contrast, GPL begins with a graphic definition of subsetting; subsets on continuous
or categorical variables are selected directly in \em any \em graphic using sliders, pointers,
lassos, and other tools.
13
Figure 6. Controllers for a Web ordering system.
11 Conclusion
The contrast between different interpretations of drill-down points to the main distinction
between GPL and relational data mining systems that employ graphics. Data mining systems
based on relational databases begin by defining, organizing, and modeling data. In these systems,
graphics are treated as passive views into data. Even when such systems implement graphical controls
for manipulating data, they are defined in terms of the data model underlying the system. In
short, these systems begin by specifying a range and domain for the function mapping elements of
data to a relational table or hierarchical or associational object. All graphics possible in such systems
follow from these definitions.
GPL, in contrast, begins by defining, organizing, and constructing a graphic. As Wilkinson
(1999) stated, "These definitions are embedded in the mathematical history that determined the
evolution of statistical charts and maps." In short, we begin by considering what is the range and
what is the domain of a graph underlying a graphic. From there, we recurse our definitions until
we reach a specification of data underlying the graphic. For that specification, we construct an
abstract DataView and link the graphic to our data.
14
In an early attempt to develop a statistical metadata standard, Dolby, Clark, and Rogers (1986)
described a language of data. This project was aimed at developing a general language for organizing
data, metadata, and images. While such high-level analyses can sometimes be fruitful, we
believe that tailoring languages to more specific problem domains can sometimes yield more
powerful capabilities. Computer generated graphics have for too long been regarded as views of
pre-determined data structures. It is time to consider the possibility of structuring data to fit the
view rather than structuring the view to fit the data. This effort requires a language of graphics.
References
Becker, R.A., and Cleveland, W.S. (1987). Brushing scatterplots. Technometrics, 29, 127-142.
Bertin, J. (1967). Sémiologie Graphique. Paris: Editions Gauthier-Villars. English translation by
W.J. Berg as Semiology of Graphics, Madison, WI: University of Wisconsin Press, 1983.
Box, G.E.P., and Jenkins, G.M. (1976). Time Series Analysis: Forecasting and Control. San Francisco:
Holden-Day.
Buja, A., Asimov, D., Hurley, C., and McDonald, J.A. (1988). Elements of a viewing pipeline for
data analysis. In W.S. Cleveland, and M.E. McGill (Eds.), Dynamic Graphics for Statistics.
Monterey, CA: Wadsworth, 277-308.
Carr, D.B., Kahn, R., Sahr, K., and Olsen, A.R. (1997). ISEA Discrete Global Grids. Statistical
Computing & Graphics Newsletter, 8, 31-39.
Carr, D.B., Olsen, A.R., Pierson, S.M, and Courbois, J.P. (1999). Boxplot variations in a spatial
context: An Omernik ecoregion and weather example. Statistical Computing & Statistical
Graphics Newsletter, 9, 4-13.
Carr, D.B., Olsen, A.R., Pierson, S.M, and Courbois, J.P. (2000) Using Linked Micromap Plots To
Characterize Omernik Ecoregions. Data Mining and Knowledge Discovery, 4, 43-67.
Chambers, J.M., Cleveland, W.S., Kleiner, B., and Tukey, P.A. (1983). Graphical Methods for
Data Analysis. Monterey, CA: Wadsworth.
Cleveland, W.S. (1985). The Elements of Graphing Data. Summit, NJ: Hobart Press.
Daly, C., Neilson, R.P. and Phillips, D.L. (1994). A Statistical-topographic Model for Mapping
Climatological Precipitation over Mountainous Terrain. Journal of Applied Meteorology,
33, 140-158.
Dolby, J.L., Clark, N., and Rogers, W.H. (1986). The language of data: A general theory of data.
Computer Science and Statistics: Proceedings of the 18th Symposium on the Interface, 96-
103.
Eick, S.G., Steffen, J.L., and Sumner, E.E. (1992). SeeSoft -- a tool for visualizing line oriented
software statistics. IEEE Transactions on Software Engineering}, 18, 957-968.
Fisherkeller, M.A., Friedman, J.H., and Tukey, J.W. (1974). PRIM-9: An interactive multidimensional
data display and analysis system. SLAC-Pub-1408. Stanford, CA: Stanford Linear
Accelerator. Reprinted in W.S. Cleveland, and M.E. McGill (Eds.), Dynamic Graphics for
Statistics. Monterey, CA: Wadsworth, 91-109.
McDonald, J.A., and Pedersen, J. (1991). Geometric abstractions for constrained optimization of
layouts. In A. Buja, and P.A. Tukey (Eds.), Computing and Graphics in Statistics. New
York: Springer-Verlag, 95-105.
Omernik, J.M. (1987). Ecoregions of the coterminous United States. Annals of the Association of
American Geographers}, 77, 118-25.
15
Omernik, J.M. (1995). Ecoregions: a spatial framework for environmental management. In W.S.
Davis, and T.P. Simon (Eds.), Biological Assessment and Criteria: Tools for WaterResource
Planning and Decision Making. Boca Raton, FL: Lewis Publishers, 49-62.
Rao, R., and Card, S.K. (1994). The table lens: Merging graphical and symbolic representations in
an interactive focus+context visualization for tabular information. In Proceedings of the
ACM SIGCHI Conference on Human Factors in Computing Systems, 18, 318-322.
Stuetzle, W. (1987). Plot windows. Journal of the American Statistical Association}, 82, 466-475.
Swayne, D.F., Cook, D., and Buja, A. (1998). XGobi: Interactive Dynamic Data Visualization in
the X Window System. Journal of Computational and Graphical Statistics}, 7, 113-130.
Tierney, L. (1991). LispStat. New York: John Wiley & Sons.
Upson, C., Faulhaber, T., Kamins, D., Schlege, D., Laidlaw, D., Vroom, J., Gurwitz, R., and vanDam,
A. (1989). The application visualization system: A computational environment for
scientific visualization. IEEE Computer Graphics and Applications}, 9, 30-42.
Velleman, P.F. (1988). Data Desk. Ithaca, NY: Data Description Inc.
Wilkinson, L. (1999). The Grammar of Graphics. New York: Springer Verlag.