CHAPTER 11
AN INTRODUCTION TO
CONTENT ANALYSIS
Throughout the preceding chapters, techniques and strategies for collecting
and organizing data have been discussed. With a partial exception for Chapters
4, 6, and perhaps 7, where limited analytic procedures are mentioned,
analysis of data has not yet been extensively discussed. In this chapter the
task of analysis is considered at length.
Interviews, field notes, and various types of unobtrusive data are often
not amenable to analysis until the information they convey has been condensed
and made systematically comparable. An objective coding scheme must be
applied to the notes or data. This process is commonly called content analysis.
The instructions in this chapter are intended to assist novice researchers
in their attempt to learn the methodological technique(s) for standard content
analysis. First, a brief discussion of analysis approaches in qualitative research
are outlined. Following this, some general concerns and debates regarding content
analysis is presented. Then, a number of procedures for analyzing content
analysis are discussed. These include consideration of what to count and what
to analyze, the nature of levels and units of analysis, and how to effectively
employ coding frames. In the next section, the strengths and weaknesses of content
analysis as a research technique are discussed, and analytic induction is
examined in relation to content analysis procedures. Finally, this chapter will
address word crunching, the use of computers in qualitative research.
ANALYSIS OF QUALITATIVE DATA
There are a number of procedures used by qualitative researchers to analyze
their data. Miles and Huberman (1994) identify three major approaches to
qualitative data analysis: interpretative approaches, social anthropological
approaches, and collaborative social research approaches.
AN INTRODUCTION TO CONTENT ANALYSIS 2 3 9
Interpretative Approaches
This orientation allows researchers to treat social action and human activity as
text. In other words, human action can be seen as a collection of symbols
expressing layers of meaning. Interviews and observational data, then, can be
transcribed into written text for analysis. How one interprets such a text
depends in part on the theoretical orientation taken by the researcher. Thus, a
researcher with a phenomenological bent will resist condensing data or framing
data by various sorting or coding operations. A phenomenologically oriented
researcher might, instead, attempt to uncover or capture the telos
(essence) of an account. This approach provides a means for discovering the
practical understandings of meanings and actions. Researchers with a more
general interpretative orientation (dramaturgists, symbolic interactionists,
etc.) are likely to organize or reduce data in order to uncover patterns of
human activity, action, and meaning.
Social Anthropological Approaches
Researchers following this orientation often have conducted various sorts of
field or case study activities to gather data. In order to accomplish data collection,
they have necessarily spent considerable time in a given community, or
with a given assortment of individuals in the field. They have participated,
indirectly or directly, with many of the individuals residing in or interacting
with the study population. This provides the researcher with a special perspective
on the material collected during the research, as well as a special
understanding of the participants and how these individuals interpret their
social worlds.
Analysis of this sort of data can be accomplished by setting information
down in field notes, and then applying the interpretative style of treating this
information as text. However, frequently this analytic process requires the
analysis of multiple sources of data such as diaries, observations, interviews,
photographs, and artifacts. Determining what material to include or exclude,
how to order the presentation of substantiating materials, and what to report
first or last are analytic choices the researcher must make.
Researchers employing the social anthropological approach usually are
interested in the behavioral regularities of everyday life; language and language
use, rituals and ceremonies, and relationships. The analytic task, then,
is to identify and explain the ways people use or operate in a particular setting;
how they come to understand things; account for, take action, and generally
manage their day-to-day life. Many researchers using this approach
begin with a conceptual or theoretical frame, then move into the field in order
to test or refine this conceptualization.
240 CHAPTER ELEVEN
Collaborative Social Research Approaches
Researchers operating in this research mode work with their subjects in a
given setting in order to accomplish some sort of change or action (see Chapter
7 on action research). The analysis of data gathered in such collaborative
studies is accomplished with the participation of the subjects who are seen by
the researcher as stakeholders in the situation in need of change or action. Data
are collected, and then reflexively considered both as feedback to craft action
and as information to understand a situation, resolve a problem, or to satisfy
some sort of field experiment. The actual analytic strategies applied in this
effort may be similar to the interpretative and social anthropology approaches.
Given these diverse yet overlapping approaches, you can see certain facets
of research that recur during any style of qualitative analysis. Below is a fairly
standard set of analytic activities arranged in a general order of sequence:
• Data are collected and made into text (e.g., field notes, transcripts, etc.).
• Codes are analytically developed or inductively identified in the data
and affixed to sets of notes or transcript pages.
• Codes are transformed into categorical labels or themes.
• Materials are sorted by these categories, identifying similar phrases,
patterns, relationships, and commonalties or disparities.
• Sorted materials are examined to isolate meaningful patterns and
processes.
• Identified patterns are considered in light of previous research and theories,
and a small set of generalizations are established.
During the remainder of this chapter, these features will be discussed and
considered in relationship to content analysis. In the next section, I will consider
the nature of content analysis as a technique.
CONTENT ANALYSIS AS A TECHNIQUE
In content analysis, researchers examine artifacts of social communication. Typically,
these are written documents or transcriptions of recorded verbal communications.
Broadly defined, however, content analysis is "any technique for
making inferences by systematically and objectively identifying special
characteristics of messages" (Holsri, 1968, p. 608). From this perspective, photographs,
videotape, or any item that can be made into text are amenable to
content analysis. In this chapter, objective analysis of messages conveyed in the
data being analyzed is accomplished by means of explicit rules called criteria of
selection, which must be formally established before the actual analysis of data.
The criteria of selection used in any given content analysis must be sufficiently
exhaustive to account for each variation of message content and must
AN INTRODUCTION TO CONTENT ANALYSIS 2 4 1
be rigidly and consistently applied so that other researchers or readers, looking
at the same messages, would obtain the same or comparable results. This may
be considered a kind of reliability of the measures, and a validation of eventual
findings (Selltiz et al., 1967). The categories that emerge in the course of developing
these criteria should reflect all relevant aspects of the messages and
retain, as much as possible, the exact wording used in the statements. They
should not be merely arbitrary or superficial applications of irrelevant categories.
Holsri (1968, p. 598) explains this type of content analysis procedure:
"The inclusion or exclusion of content is done according to consistently applied
criteria of selection; this requirement eliminates analysis in which only material
supporting the investigator's hypotheses are examined."
CONTENT ANALYSIS:
QUANTITATIVE OR QUALITATIVE?
One of the leading debates among users of content analysis is whether analysis
should be quantitative or qualitative. Berelson (1952), for example, suggests
that content analysis is "objective, systematic, and quantitative." Similarly,
Silverman (1993, p. 59) dismisses content analysis from his discussion of
qualitative data analysis "because it is a quantitative method." Selltiz et al.
(1959, p. 336) however, state that concerns over quantification in content
analysis tend to emphasize "the procedures of analysis," rather than the
"character of the data available." Selltiz et al. suggest also that heavy quantitative
content analysis results in a somewhat arbitrary limitation in the field
by excluding all accounts of communications that are not in the form of numbers
as well as those that may lose meaning if reduced to a numeric form (definitions,
symbols, detailed explanations, photographs, and so forth). Other
proponents of content analysis, notably Smith (1975), suggest that some
blend of both quantitative and qualitative analysis should be used. Smith
(1975, p. 218) explains that he has taken this position "because qualitative
analysis deals with the forms and antecedent-consequent patterns of form,
while quantitative analysis deals with duration and frequency of form."
Abrahamson (1983, p. 286) suggests that "content analysis can be fruitfully
employed to examine virtually any type of communication." As a consequence,
content analysis may focus on either quantitative or qualitative
aspects of communication messages.
Some authors of methods books have written about the procedure of narrative
analysis as distinguishable from the procedure of content analysis (see, for
example, Silverman, 1993; Manning & Cullum-Swan, 1994). In narrative analysis,
the investigator typically begins with a set of principles and seeks to
exhaust the meaning of the text using specified rules and principles, but maintains
a qualitative textual approach (Boje, 1991; Heise, 1992; Manning &
2 4 2 CHAPTER ELEVEN
Cullum-Swan, 1994; Silverman, 1993). In contrast to this allegedly more textual
approach, content analysis is suggested to be limited to counts of textual elements.
Thus, the implication is that content analysis is more reductionistic and
ostensively a more positivistic approach. I argue here that content analysis can
be effective in qualitative analysis—that "counts" of textual elements merely
provide a means for identifying, organizing, indexing, and retrieving data.
Analysis of the data once organized according to certain content elements
should involve consideration of the literal words in the text being analyzed,
including the manner in which these words have been offered. In this way, content
analysis provides a method for obtaining good access to the words of the
text or transcribed accounts offered by subjects (Glassner & Loughlin, 1987).
This offers, in turn, an opportunity for the investigator to learn about how subjects
or the authors of textual materials view their social worlds.
From this perspective, content analysis is not a reductionistic, positivistic
approach. Rather, it is a passport to listening to the words of the text, and
understanding better the perspective(s) of the producer of these words.
This chapter strives for a blend of qualitative and quantitative analysis:
the descriptions of quantitative analysis show how researchers can create a
series of tally sheets to determine specific frequencies of relevant categories.
The references to qualitative analysis show how researchers can examine ideological
mind-sets, themes, topics, symbols, and similar phenomena, while
grounding such examinations to the data.
Manifest versus Latent Content Analysis
Another controversy concerning the use of content analysis is whether the
analysis should be limited to manifest content (those elements that are physically
present and countable) or extended to more latent content. In the latter
case, the analysis is extended to an interpretive reading of the symbolism
underlying the physical data. For example, an entire speech may be assessed
for how radical it was, or a novel could be considered in terms of how violent
the entire text was. Stated in different words, manifest content is comparable
to the surface structure present in the message, and latent content is the deep
structural meaning conveyed by the message.
Holsti (1969, p. 598) has tried to resolve this debate: "It is true that only
the manifest attributes of text may be coded, but this limitation is already
implied by the requirement of objectivity. Inferences about latent meanings of
messages are therefore permitted but... they require corroboration by independent
evidence." One reasonable interpretation of this passage, and a similar
statement made by Berelson (1952, p. 488ff), suggests that although there
are some dangers in directly inferring from latent symbolism, it is nonetheless
possible to use it (see also Merton, 1968, pp. 366-370, on the use of content
analysis in examining propaganda). To accomplish this sort of "deciphering"
(Heilman, 1976) of latent symbolic meaning, researchers must first
AN INTRODUCTION TO CONTENT ANALYSIS 2 4 3
incorporate independent corroborative techniques (for example, agreement
between independent coders concerning latent content or some noncontent
analytic source). Finally, and especially when latent symbolism may be discussed,
researchers should offer detailed excerpts from relevant statements
(messages) that serve to document the researchers' interpretations. A safe rule
of thumb to follow is the inclusion of at least three independent examples for
each interpretation.
Blending Manifest and Latent Content
Analysis Strategies
Perhaps the best resolution of this dilemma about whether to use manifest or
latent content is to use both whenever possible. In this case, a given unit of
content would receive the same attention from both methods—to the extent
that coding procedures (discussed presently) for both the manifest and latent
content are reasonably valid and reliable (Babbie, 1998). By reporting the frequency
with which a given concept appears in text, researchers suggest the
magnitude of this observation. It is more convincing for their arguments
when researchers demonstrate the appearance of a claimed observation in
some large proportion of the material under study (e.g., 20 percent, 30 percent,
40 percent, and so on).
Researchers must bear in mind, however, that these descriptive statistics—namely,
proportions and frequency distributions—do not necessarily
reflect the nature of the data or variables. If the theme "positive attitude
toward shoplifting," appears 50 times in one subject's interview transcript
and 25 times in another subject's, this would not be justification for the
researchers to claim that the first subject is twice as likely to shoplift as the
second subject. In short, researchers must be cautious not to take or claim
magnitudes as findings in themselves. The magnitude for certain observations
is presented to demonstrate more fully the overall analysis.
COMMUNICATION COMPONENTS
According to Holsti (1969) and Carney (1972), communications have three
major components: the message, the sender, and the audience. The message
should be analyzed in terms of explicit themes, relative emphasis on various
topics, amount of space or time devoted to certain topics, and numerous
other dimensions. Occasionally, messages are analyzed for information about
the sender of the communication. According to Chadwick et al. (1984), the
linkages between the message content and attributes of the sender are often
slight. Nonetheless, some characteristics of the sender may be discernible,
especially if numerous examples are available, audible (recorded) messages
are examined, or verbatim transcriptions from recordings are used (including
244 CHAPTER ELEVEN
literal representations of pauses, mispronounced words, grammatical errors,
slang, and other language styles).
Strauss (1987, p. 33) similarly differentiates between what he calls in
vivo codes and sociological constructs. In vivo codes are the literal terms used
by individuals under investigation, the terms used by the various actors
themselves. "In vivo codes tend to be the behaviors or processes which will
explain to the analyst how the basic problem of the actors is resolved or
processed" (Strauss, 1987, p. 33). In contrast, sociological constructs are formulated
by the analyst. Terms and categories such as professional attitude,
family oriented, obsessive workaholic, and educationally minded might represent
examples of sociological constructs. These constructs, of course, need not
derive exclusively from sociology and may come from the fields of education,
nursing, psychology, and the like. Strauss (1987, p. 34) explains that
these constructs "are based on a combination of the researcher's scholarly
knowledge and knowledge of the substantive field under study." The result
of using constructs is the addition of certain social scientific meanings that
might otherwise be missed in the analysis. Thus, sociological constructs add
breadth and depth to observations by reaching beyond local meanings to
broader social scientific ones.
Researchers may additionally use content analysis to assess a message's
effects on the audience. The Pornography and Television Violence
Commissions tried, for example, to assess the impact of sexual or violent
material on television and in movies on those who watched this genre of
entertainment (Commission on Obscenity and Pornography, 1970; Comstock
& Rubinstein, 1972). However, making accurate inferences about either the
characteristics of the sender or the effects of the message on the audience is
often tenuous at best.
WHAT TO COUNT:
LEVELS AND UNITS OF ANALYSIS
When using a content analysis strategy to assess written documents, researchers
must first decide at what level they plan to sample and what units of analysis will
be counted. Sampling may occur at any or all of the following levels: words,
phrases, sentences, paragraphs, sections, chapters, books, writers, ideological
stance, subject topic, or similar elements relevant to the context. When examining
other forms of messages, researchers may use any of the preceding levels or
may sample at other conceptual levels more appropriate to the specific message.
For example, when examining television programs for violent content,
researchers might use segments between commercials as the level of analysis, or
they might choose to use the entire television program (excluding commercials)
as the level (see, for example, Fields, 1988).
AN INTRODUCTION TO CONTENT ANALYSIS 2 4 5
CATEGORY DEVELOPMENT:
BUILDING GROUNDED THEORY
Strauss (1987) describes the considerable misconception surrounding the development
of grounded theory. The term misconception, as Strauss (1987, p. 55)
points out, seems more appropriate than criticism. Misconception implies an inaccurate
reading of material pertaining to building grounded theory. On the other
hand, criticism connotes more of a challenge to or detraction from the benefits of
this process. Central to misconception are the notions that grounded theory is an
entirely inductive process, that it does not verify findings, and that it somehow
molds the data to the theory rather than the reverse.
Strauss (1987, p. 55), in a lengthy note, singles out Miles and Huberman
(1983) as illustrating several instrumental misconceptions (brackets in original
text contain Strauss's responses):
In Miles and Huberman (1983, p. 57) there is also a misunderstanding about
grounded theory technology. The material in my book, written before their publication
appeared, runs directly counter to some of their remarks, that: the
grounded theory approach has a lot going for it. Data get well molded to the
codes that represent them, and we get more of a code-in-use flavor than the
generic code-for-many-uses generated by prefabricated start lists.... The tradeoff
here is that earlier segments may have different codes than later ones. [They
may, in part, of course.] Or to avoid this everything may have to be recorded
once a more empirically sculpted scheme emerges. [No.] This means more overall
coding time, and longer uncertainty about the coherence of the coding frame.
[Probably, but deliberate, in part].
In addition, Miles and Huberman (1983, pp. 63-64) promote the worrisome
notion that coding is not an enjoyable task, which suggests that other
aspects of the research enterprise are more fun. This text as well as Strauss
(1987) strongly disagree. Coding and other fundamental procedures associated
with grounded theory development are certainly hard work and must
be taken seriously, but just as many people enjoy finishing a complicated jigsaw
puzzle, many researchers find great satisfaction in coding and analysis.
As researchers move through the coding process and begin to see the puzzle
pieces come together to form a more complete picture, the process can be
downright thrilling. Time consuming, tiring, and even laborious as the
process is, it is seldom boring!
The categories researchers use in a content analysis can be determined
inductively, deductively, or by some combination of both (Strauss, 1987).
Abrahamson (1983, p. 286) indicates that an inductive approach begins with
the researchers "immersing" themselves in the documents (that is, the various
messages) in order to identify the dimensions or themes that seem meaningful
to the producers of each message. In a deductive approach, researchers
2 4 6 CHAPTER ELEVEN
use some categorical scheme suggested by a theoretical perspective, and the
documents provide a means for assessing the hypothesis. In many circumstances,
the relationship between a theoretical perspective and certain messages
involves both inductive and deductive approaches. However, in order
to present the perceptions of others (the producers of messages) in the most
forthright manner, a greater reliance upon induction is necessary. Nevertheless,
as will be shown, induction should not be undertaken to the exclusion of
deduction.
The development of inductive categories allows researchers to link or
ground these categories to the data from which they derive. Certainly it is reasonable
to suggest that insights and general questions about research derive
from previous experience with the study phenomena. This may represent
personal experience, scholarly experience (having read about it), or previous
research undertaken to examine the matter. Researchers, similarly, draw on
these experiences in order to propose tentative comparisons that assist in creating
various deductions. Experience thus underpins both inductive and
deductive reasoning.
From this interplay of experience, induction, and deduction, Glaser and
Strauss formulate their description of grounded theory. According to Glaser
and Strauss (1967, pp. 2-3):
To generate theory... we suggest as the best approach an initial, systematic discovery
of the theory from the data of social research. Then one can be relatively
sure that the theory will fit the work. And since categories are discovered by
examination of the data, laymen involved in the area to which the theory
applies will usually be able to understand it, while sociologists who work in
other areas will recognize an understandable theory linked with the data of a
given area.
What to Count
Seven major elements in written messages can be counted in content analysis:
words or terms, themes, characters, paragraphs, items, concepts, and
semantics (Berelson, 1952; Berg, 1983; Merton, 1968; Selltiz et al., 1959).
Words. The word is the smallest element or unit used in content analysis. Its
use generally results in a frequency distribution of specified words or terms.
Themes. The theme is a more useful unit to count. In its simplest form, a
theme is a simple sentence, a string of words with a subject and a predicate.
Because themes may be located in a variety of places in most written documents,
it becomes necessary to specify (in advance) which places will be
searched. For example, researchers might use only the primary theme in a
AN INTRODUCTION TO CONTENT ANALYSIS 2 4 7
given paragraph location or alternatively might count every theme in a given
text under analysis.
Characters. In some studies, characters (persons) are significant to the analysis.
In such cases, you count the number of times a specific person or persons
are mentioned rather than the number of words or themes.
Paragraphs. The paragraph is infrequently used as the basic unit in content
analysis chiefly because of the difficulties that have resulted in attempting to
code and classify the various and often numerous thoughts stated and
implied in a single paragraph.
Items. An item represents the whole unit of the sender's message—that is,
an item may be an entire book, a letter, speech, diary, newspaper, or even an
in-depth interview.
Concepts. The use of concepts as units to count is a more sophisticated type of
word counting than previously mentioned. Concepts involve words grouped
together into conceptual clusters (ideas) that constitute, in some instances, variables
in a typical research hypothesis (Sanders & Pinhey 1959, p. 191). For
instance, a conceptual cluster may form around the idea of deviance. Words
such as crime, delinquency, kiting, and fraud might cluster around the conceptual
idea of deviance (Babbie, 1998). To some extent, the use of a concept as the unit
of analysis leads toward more latent than manifest content.
Semantics. In the type of content analysis known as semantics, researchers are
interested not only in the number and type of words used but also in how affected
the word(s) may be—in other words, how strong or weak a word (or words) may
be in relation to the overall sentiment of the sentence (Sanders & Pinhey, 1959).
Combinations of Elements
In many instances, research requires the use of a combination of several content
analytic elements. For example, in my study (Berg, 1983) to identify subjective
definitions for Jewish affiliational categories (Orthodox, Conservative,
Reform, and Nonpracticing), I used a combination of both item and paragraph
elements as a content unit. In order to accomplish a content analysis of
these definitions (as items), I lifted every respondent's definitions of each
affiliational category verbatim from an interview transcript. Each set of definitions
was additionally annotated with the transcript number from which it
had been taken. Next, each definition (as items) was separated into its component
definitional paragraph for each affiliational category. An example of
this definitional paragraphing is shown below (Berg, 1983, p. 76):
2 4 8 CHAPTER ELEVEN
INTERVIEW #60: ORTHODOX
Well, I guess, Orthodox keep kosher in [the] home and away from home.
Observe the Sabbath, and, you know . . . , actually if somebody did [those] and
considered themselves an Orthodox Jew, to me that would be enough. I would
say that they were Orthodox.
INTERVIEW #60: CONSERVATIVE
Conservative, I guess, is the fellow who doesn't want to say he's Reform
because it's objectionable to him. But he's a long way from being Orthodox.
INTERVIEW #60: REFORM
Reform is just somebody that, they say they are Jewish because they don't want
to lose their identity. But actually I want to be considered a Reform, 'cause I say
I'm Jewish, but I wouldn't want to be associated as a Jew if I didn't actually
observe any of the laws.
INTERVIEW #60: NONPRACTICING
Well, a Nonpracticing is the guy who would have no temple affiliation, no affiliation
with being Jewish at all, except that he considers himself a Jew. I guess he
practices in no way, except to himself.
Units and Categories
Content analysis involves the interaction of two processes: specification of the
content characteristics (basic content elements) being examined and application
of explicit rules for identifying and recording these characteristics. The
categories into which you code content items vary according to the nature of
the research and the particularities of the data (that is, whether they are
detailed responses to open-ended questions, newspaper columns, letters,
television transcripts, and so on).
As with all research methods, conceptualization and operationalization
necessarily involve an interaction between theoretical concerns and empirical
observations. For instance, if researchers wanted to examine newspaper orientations
toward changes in a state's seat-belt law (as a potential barometer
of public opinion), they might read newspaper articles and/or editorials. As
they read each article, the researchers could ask themselves which ones were
in favor of and which ones were opposed to changes in the law. Were the articles'
positions more clearly indicated by their manifest content or by some
undertone? Was the decision to label one article pro or con based on the use
of certain terms, on presentation of specific study findings, or because of
statements offered by particular characters (for example, celebrities, political
figures, and so on)? The answers to these questions allow the researchers to
develop inductive categories in which to slot various units of content.
AN INTRODUCTION TO CONTENT ANALYSIS 2 4 9
As previously mentioned, researchers need not limit their procedures to
induction alone. Both inductive and deductive reasoning may provide fruitful
findings. If, for example, investigators are attempting to test hypothetical
propositions, their theoretical orientation should suggest empirical indicators
of concepts (deductive reasoning). If they have begun with specific empirical
observations, they should attempt to develop explanations grounded in the
data (grounded theory) and apply these theories to other empirical observations
(inductive reasoning).
There are no easy ways to describe specific tactics for developing categories
or to suggest how to go about defining (operationalizing) these tactics.
To paraphrase Schatzman and Strauss's (1973, p. 12) remark about methodological
choices in general, the categorizing tactics worked out—some in
advance, some developed later—should be consistent not only with the questions
asked and the methodological requirements of science but also with a
relation to the properties of the phenomena under investigation. Stated succinctly,
categories must be grounded in the data from which they emerge
(Denzin, 1978; Glaser & Strauss, 1967). The development of categories in any
content analysis must derive from inductive reference (to be discussed in
detail later) concerning patterns that emerge from the data.
For example, in a study evaluating the effectiveness of a Florida-based
delinquency diversion program, I (Berg, 1986) identified several thematic categories
from information provided on intake sheets. By setting up a tally sheet,
I managed to use the criminal offenses declared by arresting officers in their
general statements to identify two distinct classes of crime, in spite of arresting
officers' use of similar-sounding terms. In one class of crime, several similar
terms were used to describe what amounted to the same type of crime. In a second
class of crime, officers more consistently referred to the same type of crime
by a consistent term. Specifically, I found that the words shoplifting, petty theft,
and retail theft each referred to essentially the same category of crime involving
the stealing of some type of store merchandise, usually not exceeding $3.50 in
value. Somewhat surprisingly, the semantically similar term petty larceny was
used to describe the taking of cash whether it was from a retail establishment,
a domicile, or an auto. Thus, the data indicated a subtle perceptual distinction
made by the officers reporting juvenile crimes.
Recently, Dabney (1993) examined how practicing nurses perceived other
nurses who worked while impaired by alcohol or drugs. He developed several
thematic categories based on previous studies found in the literature. He was
also able to inductively identify several classes of drug diversion described by
subjects during the course of interviews. For instance, many subjects referred
to stockpiled drugs that nurses commonly used for themselves. These drugs
included an assortment of pain killers and mild sedatives stored in a box, a
drawer, or some similar container on the unit or floor. These stockpiled drugs
250 CHAPTER ELEVEN
accumulated when patients died or were transferred to another hospital unit
and this information did not immediately reach the hospital pharmacy.
Classes and Categories
Three major procedures are used to identify and develop classes and categories
in a standard content analysis and to discuss findings in research that use content
analysis: common classes, special classes, and theoretical classes.
Common Classes. The first are the common classes of a culture in general.
These classes are used by virtually anyone in society to distinguish between
and among persons, things, and events (for example, age, gender, mother,
father, teacher, and so on). These common classes, as categories, provide for
lay people a means of designation in the course of everyday thinking and
communicating, and to engender meaning in their social interactions (see
Duncan, 1962; Schatzman & Strauss, 1973; Strauss, 1959). These common
classes are essential in assessing whether certain demographic characteristics
are related to patterns that may arise during a given data analysis.
Special Classes. Special classes are those labels used by members of certain
areas (communities) to distinguish among the things, persons, and events
within their limited province (Schatzman & Strauss, 1973). These special classes
can be likened to jargonized terms used commonly in certain professions but not
by lay people. Alternatively, these special classes may be described as out-group
versus in-group classifications. In the case of the out-group, the reference is to
labels conventionally used by the greater (host) community or society; as for the
in-group, the reference is to conventional terms and labels used among some
specified group or that may emerge as theoretical classes.
Theoretical Classes. The theoretical classes are those that emerge in the course
of analyzing the data (Schatzman & Strauss, 1973). In most content analysis,
these theoretical classes provide an overarching pattern (a key linkage) that
occurs throughout the analysis. Nomenclature that identifies these theoretical
classes generally borrows from that used in special classes and, together with
analytically constructed labels, accounts for novelty and innovations.
According to Schatzman and Strauss (1973), these theoretical classes are
special sources of classification because their specific substance is grounded
in the data. Because these theoretical classes are not immediately knowable or
available to observers until they spend considerable time going over the ways
respondents (or messages) in a sample identify themselves and others, it is
necessary to retain the special classes throughout much of the analysis.
The next problem to address is how to identify various classes and categories
in the data set, which leads to a discussion of open coding.
AN INTRODUCTION TO CONTENT ANALYSIS 2 5 1
OPEN CODING
Inexperienced researchers, although they may intellectually understand the
process described so far, usually become lost at about this point in the actual
process of coding. Some of the major obstacles that cause anguish include the
so-called true or intended meaning of the sentence and a desire to know the
real motivation behind a subject's clearly identifiable lie. If the researchers can
get beyond such concerns, the coding can continue. For the most part, these
concerns are actually irrelevant to the coding process, particularly with regard
to open coding, the central purpose of which is to open inquiry widely. Although
interpretations, questions, and even possible answers may seem to emerge as
researchers code, it is important to hold these as tentative at best. Contradictions
to such early conclusions may emerge during the coding of the very next
document. The most thorough analysis of the various concepts and categories
will best be accomplished after all the material has been coded. The solution to
the novice investigators' anguish, then, as suggested by Strauss (1987, p. 28) is
to "believe everything and believe nothing" while undertaking open coding.
Strauss (1987, p. 30) suggests four basic guidelines when conducting open
coding. These are: (1) ask the data a specific and consistent set of questions,
(2) analyze the data minutely, (3) frequently interrupt the coding to write a theoretical
note, and (4) never assume the analytic relevance of any traditional
variable such as age, sex, social class, and so forth until the data show it to be
relevant. A detailed discussion of each of these guidelines follows.
1. Ask the data a specific and consistent set of questions. The most general question
researchers must keep in mind is, What study are these data pertinent to?
In other words, what was the original objective of the research study? This is
not to suggest that the data must be molded to that study. Rather, the original
purpose of a study may not be accomplished and an alternative or unanticipated
goal may be identified in the data. For example, in Pearson's (1987)
evaluation of a New Jersey intensive problem supervision program, the original
aim was to demonstrate cost effectiveness. Although objective indicators
failed to support the cost effectiveness of the experimental program, several
indirect indicators suggested that the program nonetheless was fairly successful.
These other measures involved repeated reports from relatives of probationers
about changes in attitudes demonstrated by the program participants.
For instance, the wife of one participant reported that her husband had
begun to send child-support payments in full and on time. Parents of another
program participant reported that their child had begun to show personal
responsibility by doing household chores around the home—something the
individual had previously never undertaken.
Thus, Pearson (1987) points to an unanticipated benefit from the program.
This illustration demonstrates the need both to keep the original study
2 5 2 CHAPTER ELEVEN
aim in mind and to remain open to multiple or unanticipated results that
emerge from the data.
2. Analyze the data minutely. Strauss (1987) cautions that researchers should
remember that they are conducting an initial coding procedure. As such, it is
important to analyze data minutely. Students in qualitative research should
remind themselves that in the beginning, more is better. Coding is much like
the traditional funnel used by many educators to demonstrate how to write
papers. You begin with a wide opening, a broad statement; narrow the statement
throughout the body by offering substantial backing; and finally, at the
small end of the funnel, present a refined, tightly stated conclusion. In the case
of coding, the wide end represents inclusion of many categories, incidents,
interactions, and the like. These are coded minutely during open coding. Later,
this effort ensures extensive theoretical coverage that will be thoroughly
grounded. At a later time, more systematic coding can be accomplished, building
from the numerous elements that emerge during this phase of open coding.
The question that arises, of course, is when to stop this open coding
process and move on to the speedier, more systematic coding phase. Typically,
as researchers minutely code, they eventually saturate the document
with repetitious codes. As this occurs and the repetition allows the
researchers to move more rapidly through the documents, it is usually safe to
conclude that the time has come to move on.
3. Frequently interrupt the coding to write a theoretical note. This third guideline
suggested by Strauss (1987) directs researchers closer to grounded theory.
Often, in the course of coding, a comment in the document triggers ideas.
Researchers should take a moment to jot down a note about these ideas,
which may well prove useful later. If they fail to do so, they are very likely to
forget the idea. In many instances, researchers find it useful to keep a record
of where in each document similar comments, concepts, or categories seem to
convey the same elements that originally triggered the theory or hypothesis.
For example, during the coding process of a study on adolescents' involvement
with alcohol, crime, and drugs, interview transcripts revealed youths
speaking about drugs and criminal activities as if they were almost partitioned
categories (Carpenter et al., 1988). Notes scribbled during coding later
led to theories on drug-crime event sequences and the nexus of drug-crime
events.
4. Never assume the analytic relevance of any traditional variable such as age, sex,
social class, and so on until the data show it to be relevant. As Strauss (1987, p. 32)
indicates, even these more mundane variables must "earn their way into the
grounded theory." This assumes that these variables are necessarily contributing
to some condition, but it does not mean you are prohibited from
intentionally using certain variables deductively. The first guideline, What
are the study data pertinent to? is germane to the coding process. Consequently,
if researchers are interested in gender differences, naturally, they
AN INTRODUCTION TO CONTENT ANALYSIS 2 5 3
begin by assuming that gender might be analytically relevant, but if the data
fail to support this assumption, the researchers must accept this result.
CODING FRAMES
Content analysis is accomplished through the use of coding frames. The coding
frames are used to organize the data and identify findings after open coding
has been completed. The first coding frame is often a multileveled process
that requires several successive sortings of all cases under examination.
Investigators begin with a general sorting of cases into some specified special
class. In many ways, this first frame is similar to what Strauss (1987, p. 32)
describes as axial coding. According to Strauss (1987) axial coding occurs after
open coding is completed and consists of intensive coding around one category.
The first sorting approximates Strauss's description of axial coding. An
example may better illustrate this process.
I (Berg, 1983) began my first sorting by separating all cases into Jewish
affiliational categories declared by respondents during an initial telephone
contact. Subjects' responses came after being asked in a screening question:
"With which of the following do you most closely associate yourself: Reform,
Orthodox, Conservative, or Nonpractiring?" (Subjects were consistently
asked this question using the preceding affiliational ordering in an attempt to
guard against certain acquiescent response sets.)
This procedure separated my sample (cases) into four groupings bearing
the conventional affiliational titles listed above. After completing this
sorting, I carefully read the responses to the identical question asked in the
course of each respondent's in-depth interview. Subsequently, each affiliational
grouping was subdivided into three groups using the following criteria
of selection:
1. The first subdivision in each category consisted of all cases in which
respondents' answers to the interview version of the question, "With
which of the following ..." (1) were consistent with the response given
during the telephone screening and (2) were offered with no qualification
or exception.
2. The second subdivision in each category consisted of cases in which
respondents qualified their responses with a simple modifier (usually
a single adjective), but were otherwise consistent with the response
offered on the telephone screening question (for example, "I am a
modern Orthodox Jew.").
3. The third subdivision consisted of all cases in which the respondents
offered detailed explanations for their affiliational declarations that
were also consistent with their telephone screening response. For example,
one male respondent explained that just as his father had switched
254 CHAPTER ELEVEN
from being an Orthodox to a Conservative affiliate, so too did he make
a switch from being a Conservative to a Reform affiliate. His declaration
of Reform, however, was consistent with what he had originally declared
during the telephone screening.
4. The fourth subdivision consisted of all cases in which the respondents
contradicted their original telephone screening question response or
indicated that they simply could not determine where they fit in terms
of the four conventional affiliational categories.
Using the above criteria, I sorted my cases into the indicated subdivisions.
Following this, and using a sorting process similar to the preceding
one, I again subdivided each newly created subgroup to produce a typological
scheme containing 16 distinct categories, the overarching or key linkage
in every case being the subjective declaration of each respondent (at two distinct
iterations of the same question).
Having sorted and organized my data, I was ready to interpret the patterns
apparent from both the organizational scheme and the details offered in
response to interview questions. At this juncture in my analysis, relevant theoretical
perspectives were introduced in order to tie the analysis both to established
theory and to my own emerging grounded theory (Glaser & Strauss,
1967). These theoretical considerations and sociological constructs led me to
analyze several other detailed responses to interview questions. These other
questions concerned respondents' involvement in and knowledge of religious
symbols and ceremonies. In order to preserve the key linkage throughout the
entire analysis process, each subsequent analysis of responses was performed
against the newly created typological scheme of subjective identification labels
(the 16-category scheme mentioned previously).
Another example of this axial coding or sorting process is offered by
Bing (1987), who examined plea bargaining by using an archival strategy. He
created a master list containing over 400 articles that examined plea bargaining
as represented in 12 major social science journals during the past 5 years
(Bing, 1987, pp. 50ff).
Following the creation of his master list, Bing sorted his articles, first by
manifest theoretical orientation and second by methodological approach. After eliminating
categories that contained only a single article and collapsing fundamentally
similar theoretical orientations, Bing identified 12 distinct theoretical
categories (for example, labeling theory, organizational theory, crime control/due
process theory, economic theory, dramaturgical theory, and so forth).
The second coding resulted in six distinct methodological approaches, which
Bing used to subdivide each of the theoretical categories. Bing established
objective criteria for each of the possible theoretical categories. As Bing (1987,
pp. 72-73) explains his criteria: "The general focus of the article was used to
determine the theoretical orientation [of each article]. In some instances, the
author would clearly state the theory; on other occasions, the theoretical
AN INTRODUCTION TO CONTENT ANALYSIS 255
approach was lifted based upon statements [offered by the author and] used to
characterize the study." In essence, Bing sought to identify theoretical
approaches by examining the expression by each author (either directly or indirectly)
of a theoretical declaration.
Strauss similarly outlines the coding process. According to Strauss
(1987, p. 28), the analyst begins with a procedure he calls open coding. This
procedure is described as an unrestricted coding of the data. With open coding,
you carefully and minutely read the document line by line and word by
word to determine the concepts and categories that fit the data. These concepts,
once uncovered, are entirely tentative. As you continue working with
and thinking about the data, questions and even some plausible answers also
begin to emerge. These questions and answers should lead you to other
issues and further questions concerning "conditions, strategies, interactions
and consequences" (Strauss, 1987, p. 28).
A Few More Words on Analytic Induction
As Robinson (1951) suggests, "Since Znaniecki stated it in 1934, the method
of analytic induction has come into important use." The use of analytic induction,
however, also has involved a number of refinements—including several
variations on its style and purpose. For example, Sutherland and Cressey
(1966) refined the method and suggest that it be used in the study of causes
of crime. Even before Sutherland (1950), Lindesmith (1947) had discovered
the usefulness of an analytic inductive strategy in a study of opiate users. Lindesmith
(1952, p. 492) describes analytic induction as follows:
The principle which governs the selection of cases to test a theory is that the
chances of discovering a decisive negative case should be maximized. The
investigator who has a working hypothesis concerning the data becomes aware
of certain areas of critical importance. If his theory is false or inadequate, he
knows that its weakness will be more clearly and quickly exposed if he proceeds
to the investigation of those critical areas. This involves going out of one's way
to look for negative evidence.
Adding further refinements to the method, Glaser and Strauss suggest
that analytic induction should combine analysis of data after the coding
process with analysis of data while integrating theory. In short, analysis of
data is grounded to established theory and is also capable of developing theory.
Glaser and Strauss (1967, p. 102) describe their refinements as follows:
We wish to suggest a third approach to the analysis of qualitative data—one
that combines, by an analytic procedure of constant comparison, the explicit
coding procedures of the first approach [analysis of data after coding] and the
style of theory development of the second [the integration of data and theory].
The purpose of the constant comparative method of joint coding and analysis is
256 CHAPTER ELEVEN
to generate theory more systematically than allowed by the second approach,
by using explicit coding and analytic procedures. While more systematic than
the second approach, this method does not adhere completely to the first, which
hinders the development of theory because it is designed for provisional testing
and not discovering hypotheses.
Glaser and Strauss (1967) suggest that such a joint coding and analysis
of data is a more honest way to present findings and analysis. Similarly, Merton
(1968, pp. 147-148) discusses the "logical fallacy underlying post factum
explanations" and hypothesis testing. Merton states:
It is often the case in empirical social research that data are collected and only
then subjected to interpretative comment. . . . Such post factum explanations
designed to "explain" observations, differ in logical function from speciously
similar procedures where the observational materials are utilized in order to
derive fresh hypotheses to be confirmed by new observations.
A disarming characteristic of the procedure is that the explanations are
indeed consistent with the given set of observations. This is scarcely surprising,
in as much as only those post factum hypotheses are selected which do accord
with these observations. . . . The method of post factum explanation does not
lend itself to nullifiability.
Researchers should make extensive use of Glaser and Strauss's (1967)
style of analytic induction and, perhaps more directly, of Strauss's (1987)
rearticulation of their position. According to Strauss (1987, p. 12):
Because of our earlier writing in Discovery (1967) where we attacked speculative
theory—quite ungrounded in bodies of data—many people mistakenly
refer to grounded theory as "inductive theory" in order to contrast it with say,
the theories of Parson or Blau. But as we have indicated, all three aspects of
inquiry (induction, deduction, and verification) are absolutely essential. ... In
fact, it is important to understand that various kinds of experience are central to
all these modes of activity—induction, deduction, and verification—that enter
into inquiry.
Throughout the analysis, researchers should incorporate all appropriate
modes of inquiry. Thus, both logically derived hypotheses and those that have
"serendipitously" (Merton, 1968) arisen from the data may find their way into
the research.
Interrogative Hypothesis Testing
In order to verify and assess the applicability of a given hypothesis, researchers
should use a style of negative case testing suggested by Robinson (1951), Lindesmith
(1952), Manheim and Simon (1977), and Denzin (1978). This process of
negative case testing essentially involves the following steps:
AN INTRODUCTION TO CONTENT ANALYSIS 2 5 7
1. Make a rough hypothesis based on an observation from the data.
2. Conduct a thorough search of all cases to locate negative cases (that is,
cases that do not fit the hypothesized relationship).
3. If a negative case is located, either discard or reformulate the hypothesis
to account for the negative case or exclude the negative case.
4. Examine all relevant cases from the sample before determining whether
"practical certainty" (Denzin, 1978) in this recommended analysis style
is attained.
For example, based on a reading of responses to the open-ended question,
"With which of the following do you most closely associate yourself: Conservative,
Orthodox, Reform, or Nonpracticing?" I (Berg, 1983) hypothesized that
certain groups of persons offered instrumentally oriented answers (that is, oriented
to achievement and goals) while other groups offered expressively oriented
answers (that is, sentimental, feeling oriented, and symbolic). I further
hypothesized that these styles of responses could be linked to particular categories
relevant to the analysis of differential involvement with religious activities
and subjective affiliational identification. However, after carefully reexamining
each case, with these hypotheses in mind, I found many negative cases.
At each negative juncture, I attempted to reformulate the hypotheses to account
for the cases that did not fit. Unfortunately, I soon realized that my hypotheses
had become artificial and meaningless. Consequently, I soon abandoned them.
None of the successive formulations were constructed de novo but were based
on some aspect of the preceding hypothetical relationship (see Denzin, 1978,
pp. 193-194; Lindesmith, 1947, pp. 9-10).
It may be argued that the search for negative cases sometimes neglects
contradictory evidence (that is, when a case both affirms and in some way
denies a hypothetical relationship) or distorts the original hypothetical relationship
(that is, when the observers read into the data whatever relationship
they have hypothesized—a variation on post factum hypothesizing). To accomplish
content analysis in the style recommended here, researchers must use several
safeguards against these potential flaws in analysis. First, whenever numbers
of cases allow, examples that illustrate a point should be lifted at random
from among the relevant grouped cases. Second, every assertion made in the
analysis should be documented with no fewer than three examples. Third, analytic
interpretations should be examined carefully by an independent reader
(someone other than the actual researchers) to ensure that their claims and
assertions are not derived from a misreading of the data and that they have
been documented adequately. Finally, whenever inconsistencies in patterns do
emerge, these too should be discussed in order to explain whether they have
invalidated overall patterns. Failure to mention these inconsistencies in pattern
is a less than forthright presentation of the data and analysis.
In effect, the use of the above safeguards avoids what Glaser and
Strauss (1967, p. 5) describe as exampling. According to Glaser and Strauss,
2 5 8 CHAPTER ELEVEN
exampling is finding examples for "dreamed-up, speculative, or logically
deducted theory after the idea occurred," rather than allowing the patterns to
emerge from the data. For instance, in the course of analyzing responses to
the question, "How do you celebrate Chanukkah, if at all?" during an early
analysis of the data, I (Berg, 1983) suggested that gift giving was emphasized
to a greater extent by some affiliational groups than by others. However,
when this section was read by an independent reader, the reader noticed that
several negative cases had been presented in evidence of this assertion. What
I had originally missed was that the more traditional affiliational group
members had described their style of gift giving in the midst of a number of
traditional (religious) rituals. On the other hand, many of the nonpracticing
affiliational group members had described gift giving as being in competition
with an observance of Christmas and thus actually fused their observance of
Chanukkah with an observance of Christmas.
STRENGTHS AND WEAKNESSES OF THE
CONTENT ANALYSIS PROCESS
Perhaps the most important advantage of content analysis is that it can be virtually
unobtrusive (Webb et al., 1981). Content analysis, although useful when
analyzing depth interview data, may also be used nonreactively: no one needs
to be interviewed, no one needs to fill out lengthy questionnaires, no one must
enter a laboratory. Rather, newspaper accounts, public addresses, libraries,
archives, and similar sources allow researchers to conduct analytic studies.
An additional advantage is that it is cost effective. Generally, the materials
necessary for conducting content analysis are easily and inexpensively
accessible. One college student working alone can effectively undertake a
content analysis, whereas undertaking a national survey, for instance, might
require enormous staff, time, and expense.
A further advantage to content analysis is that it provides a means by
which to study processes that occur over long periods of time or that may
reflect trends in a society (Babbie, 1998). As examples, you might study the
portrayal of women in the media from 1800 to 1993 or you might focus on
changing images of women in the media from 1982 to 1992. For instance,
McBroom (1992) recently examined women in the clergy as depicted in the
Christian Century between 1984 and 1987. McBroom (1992, p. 208) reports:
1984 was a year when the issue of women's ordination gained support in the
news media, as indicated by the number of positive references, especially articles,
during the year.... The next year, 1985, was a year of transition, as few references
were recorded for that year. The data for the years 1986 and 1987 indicate
a growing negative response to the issue of the ordination of women,
especially in the negative news reports.
AN INTRODUCTION TO CONTENT ANALYSIS 2 5 9
The data . . . indicate that, in general, conditions and opportunities for
women in the clergy in the United States deteriorated rather than improved
during these years.
Thus, using content analysis, McBroom (1992) was able to examine data during
individual years as well as over the span of all years under study.
The single serious weakness of content analysis may be in locating
unobtrusive messages relevant to the particular research questions. In other
words, content analysis is limited to examining already recorded messages.
Although these messages may be oral, written, graphic or videotaped, they
must be recorded in some manner in order to be analyzed.
Of course, when you undertake content analysis as an analysis tool
rather than as a complete research strategy, such a weakness is minimal. For
example, if researchers use content analysis to analyze interview data or
responses to open-ended questions (on written questionnaires), this weakness
is virtually nonexistent.
Another limitation (although some might call it a weakness) of content
analysis is that it is ineffective for testing causal relationships between variables.
Researchers and their audiences must resist the temptation to infer such
relationships. This is particularly true when researchers forthrightly present the
proportion or frequency with which a theme or pattern is observed. This kind
of information is appropriate to indicate the magnitude of certain responses;
however, it is not appropriate to attach cause to these presentations.
As with any analytic method, the advantages of content analysis must be
weighed against the disadvantages and against alternative research strategies.
Although content analysis may be appropriate for some research problems and
designs, it is not appropriate in every research situation. It is a particularly beneficial
procedure for assessing events or processes in social groups when public
records exist. It is likewise helpful in many types of exploratory or descriptive
studies. But if you are interested in conducting experimental or causal
research, content analysis is virtually useless.
COMPUTERS AND QUALITATIVE ANALYSIS
It is now 35 years since General Inquirer, the first software program designed
to assist in the analysis of textual data, became public (Stone et al., 1966;
Tesch, 1991). Of course, when General Inquirer came out, small affordable
personal computers did not exist. To use General Inquirer, one needed access
to a large mainframe computer and sufficient time to read and digest its
book-length instructions. This program still largely operated on the basis of
counting and numerous calculations. Yet it did work exclusively with textual
data (Tesch, 1991).