CHAPTER 11 AN INTRODUCTION TO CONTENT ANALYSIS Throughout the preceding chapters, techniques and strategies for collecting and organizing data have been discussed. With a partial exception for Chapters 4, 6, and perhaps 7, where limited analytic procedures are mentioned, analysis of data has not yet been extensively discussed. In this chapter the task of analysis is considered at length. Interviews, field notes, and various types of unobtrusive data are often not amenable to analysis until the information they convey has been condensed and made systematically comparable. An objective coding scheme must be applied to the notes or data. This process is commonly called content analysis. The instructions in this chapter are intended to assist novice researchers in their attempt to learn the methodological technique(s) for standard content analysis. First, a brief discussion of analysis approaches in qualitative research are outlined. Following this, some general concerns and debates regarding content analysis is presented. Then, a number of procedures for analyzing content analysis are discussed. These include consideration of what to count and what to analyze, the nature of levels and units of analysis, and how to effectively employ coding frames. In the next section, the strengths and weaknesses of content analysis as a research technique are discussed, and analytic induction is examined in relation to content analysis procedures. Finally, this chapter will address word crunching, the use of computers in qualitative research. ANALYSIS OF QUALITATIVE DATA There are a number of procedures used by qualitative researchers to analyze their data. Miles and Huberman (1994) identify three major approaches to qualitative data analysis: interpretative approaches, social anthropological approaches, and collaborative social research approaches. AN INTRODUCTION TO CONTENT ANALYSIS 2 3 9 Interpretative Approaches This orientation allows researchers to treat social action and human activity as text. In other words, human action can be seen as a collection of symbols expressing layers of meaning. Interviews and observational data, then, can be transcribed into written text for analysis. How one interprets such a text depends in part on the theoretical orientation taken by the researcher. Thus, a researcher with a phenomenological bent will resist condensing data or framing data by various sorting or coding operations. A phenomenologically oriented researcher might, instead, attempt to uncover or capture the telos (essence) of an account. This approach provides a means for discovering the practical understandings of meanings and actions. Researchers with a more general interpretative orientation (dramaturgists, symbolic interactionists, etc.) are likely to organize or reduce data in order to uncover patterns of human activity, action, and meaning. Social Anthropological Approaches Researchers following this orientation often have conducted various sorts of field or case study activities to gather data. In order to accomplish data collection, they have necessarily spent considerable time in a given community, or with a given assortment of individuals in the field. They have participated, indirectly or directly, with many of the individuals residing in or interacting with the study population. This provides the researcher with a special perspective on the material collected during the research, as well as a special understanding of the participants and how these individuals interpret their social worlds. Analysis of this sort of data can be accomplished by setting information down in field notes, and then applying the interpretative style of treating this information as text. However, frequently this analytic process requires the analysis of multiple sources of data such as diaries, observations, interviews, photographs, and artifacts. Determining what material to include or exclude, how to order the presentation of substantiating materials, and what to report first or last are analytic choices the researcher must make. Researchers employing the social anthropological approach usually are interested in the behavioral regularities of everyday life; language and language use, rituals and ceremonies, and relationships. The analytic task, then, is to identify and explain the ways people use or operate in a particular setting; how they come to understand things; account for, take action, and generally manage their day-to-day life. Many researchers using this approach begin with a conceptual or theoretical frame, then move into the field in order to test or refine this conceptualization. 240 CHAPTER ELEVEN Collaborative Social Research Approaches Researchers operating in this research mode work with their subjects in a given setting in order to accomplish some sort of change or action (see Chapter 7 on action research). The analysis of data gathered in such collaborative studies is accomplished with the participation of the subjects who are seen by the researcher as stakeholders in the situation in need of change or action. Data are collected, and then reflexively considered both as feedback to craft action and as information to understand a situation, resolve a problem, or to satisfy some sort of field experiment. The actual analytic strategies applied in this effort may be similar to the interpretative and social anthropology approaches. Given these diverse yet overlapping approaches, you can see certain facets of research that recur during any style of qualitative analysis. Below is a fairly standard set of analytic activities arranged in a general order of sequence: • Data are collected and made into text (e.g., field notes, transcripts, etc.). • Codes are analytically developed or inductively identified in the data and affixed to sets of notes or transcript pages. • Codes are transformed into categorical labels or themes. • Materials are sorted by these categories, identifying similar phrases, patterns, relationships, and commonalties or disparities. • Sorted materials are examined to isolate meaningful patterns and processes. • Identified patterns are considered in light of previous research and theories, and a small set of generalizations are established. During the remainder of this chapter, these features will be discussed and considered in relationship to content analysis. In the next section, I will consider the nature of content analysis as a technique. CONTENT ANALYSIS AS A TECHNIQUE In content analysis, researchers examine artifacts of social communication. Typically, these are written documents or transcriptions of recorded verbal communications. Broadly defined, however, content analysis is "any technique for making inferences by systematically and objectively identifying special characteristics of messages" (Holsri, 1968, p. 608). From this perspective, photographs, videotape, or any item that can be made into text are amenable to content analysis. In this chapter, objective analysis of messages conveyed in the data being analyzed is accomplished by means of explicit rules called criteria of selection, which must be formally established before the actual analysis of data. The criteria of selection used in any given content analysis must be sufficiently exhaustive to account for each variation of message content and must AN INTRODUCTION TO CONTENT ANALYSIS 2 4 1 be rigidly and consistently applied so that other researchers or readers, looking at the same messages, would obtain the same or comparable results. This may be considered a kind of reliability of the measures, and a validation of eventual findings (Selltiz et al., 1967). The categories that emerge in the course of developing these criteria should reflect all relevant aspects of the messages and retain, as much as possible, the exact wording used in the statements. They should not be merely arbitrary or superficial applications of irrelevant categories. Holsri (1968, p. 598) explains this type of content analysis procedure: "The inclusion or exclusion of content is done according to consistently applied criteria of selection; this requirement eliminates analysis in which only material supporting the investigator's hypotheses are examined." CONTENT ANALYSIS: QUANTITATIVE OR QUALITATIVE? One of the leading debates among users of content analysis is whether analysis should be quantitative or qualitative. Berelson (1952), for example, suggests that content analysis is "objective, systematic, and quantitative." Similarly, Silverman (1993, p. 59) dismisses content analysis from his discussion of qualitative data analysis "because it is a quantitative method." Selltiz et al. (1959, p. 336) however, state that concerns over quantification in content analysis tend to emphasize "the procedures of analysis," rather than the "character of the data available." Selltiz et al. suggest also that heavy quantitative content analysis results in a somewhat arbitrary limitation in the field by excluding all accounts of communications that are not in the form of numbers as well as those that may lose meaning if reduced to a numeric form (definitions, symbols, detailed explanations, photographs, and so forth). Other proponents of content analysis, notably Smith (1975), suggest that some blend of both quantitative and qualitative analysis should be used. Smith (1975, p. 218) explains that he has taken this position "because qualitative analysis deals with the forms and antecedent-consequent patterns of form, while quantitative analysis deals with duration and frequency of form." Abrahamson (1983, p. 286) suggests that "content analysis can be fruitfully employed to examine virtually any type of communication." As a consequence, content analysis may focus on either quantitative or qualitative aspects of communication messages. Some authors of methods books have written about the procedure of narrative analysis as distinguishable from the procedure of content analysis (see, for example, Silverman, 1993; Manning & Cullum-Swan, 1994). In narrative analysis, the investigator typically begins with a set of principles and seeks to exhaust the meaning of the text using specified rules and principles, but maintains a qualitative textual approach (Boje, 1991; Heise, 1992; Manning & 2 4 2 CHAPTER ELEVEN Cullum-Swan, 1994; Silverman, 1993). In contrast to this allegedly more textual approach, content analysis is suggested to be limited to counts of textual elements. Thus, the implication is that content analysis is more reductionistic and ostensively a more positivistic approach. I argue here that content analysis can be effective in qualitative analysis—that "counts" of textual elements merely provide a means for identifying, organizing, indexing, and retrieving data. Analysis of the data once organized according to certain content elements should involve consideration of the literal words in the text being analyzed, including the manner in which these words have been offered. In this way, content analysis provides a method for obtaining good access to the words of the text or transcribed accounts offered by subjects (Glassner & Loughlin, 1987). This offers, in turn, an opportunity for the investigator to learn about how subjects or the authors of textual materials view their social worlds. From this perspective, content analysis is not a reductionistic, positivistic approach. Rather, it is a passport to listening to the words of the text, and understanding better the perspective(s) of the producer of these words. This chapter strives for a blend of qualitative and quantitative analysis: the descriptions of quantitative analysis show how researchers can create a series of tally sheets to determine specific frequencies of relevant categories. The references to qualitative analysis show how researchers can examine ideological mind-sets, themes, topics, symbols, and similar phenomena, while grounding such examinations to the data. Manifest versus Latent Content Analysis Another controversy concerning the use of content analysis is whether the analysis should be limited to manifest content (those elements that are physically present and countable) or extended to more latent content. In the latter case, the analysis is extended to an interpretive reading of the symbolism underlying the physical data. For example, an entire speech may be assessed for how radical it was, or a novel could be considered in terms of how violent the entire text was. Stated in different words, manifest content is comparable to the surface structure present in the message, and latent content is the deep structural meaning conveyed by the message. Holsti (1969, p. 598) has tried to resolve this debate: "It is true that only the manifest attributes of text may be coded, but this limitation is already implied by the requirement of objectivity. Inferences about latent meanings of messages are therefore permitted but... they require corroboration by independent evidence." One reasonable interpretation of this passage, and a similar statement made by Berelson (1952, p. 488ff), suggests that although there are some dangers in directly inferring from latent symbolism, it is nonetheless possible to use it (see also Merton, 1968, pp. 366-370, on the use of content analysis in examining propaganda). To accomplish this sort of "deciphering" (Heilman, 1976) of latent symbolic meaning, researchers must first AN INTRODUCTION TO CONTENT ANALYSIS 2 4 3 incorporate independent corroborative techniques (for example, agreement between independent coders concerning latent content or some noncontent analytic source). Finally, and especially when latent symbolism may be discussed, researchers should offer detailed excerpts from relevant statements (messages) that serve to document the researchers' interpretations. A safe rule of thumb to follow is the inclusion of at least three independent examples for each interpretation. Blending Manifest and Latent Content Analysis Strategies Perhaps the best resolution of this dilemma about whether to use manifest or latent content is to use both whenever possible. In this case, a given unit of content would receive the same attention from both methods—to the extent that coding procedures (discussed presently) for both the manifest and latent content are reasonably valid and reliable (Babbie, 1998). By reporting the frequency with which a given concept appears in text, researchers suggest the magnitude of this observation. It is more convincing for their arguments when researchers demonstrate the appearance of a claimed observation in some large proportion of the material under study (e.g., 20 percent, 30 percent, 40 percent, and so on). Researchers must bear in mind, however, that these descriptive statistics—namely, proportions and frequency distributions—do not necessarily reflect the nature of the data or variables. If the theme "positive attitude toward shoplifting," appears 50 times in one subject's interview transcript and 25 times in another subject's, this would not be justification for the researchers to claim that the first subject is twice as likely to shoplift as the second subject. In short, researchers must be cautious not to take or claim magnitudes as findings in themselves. The magnitude for certain observations is presented to demonstrate more fully the overall analysis. COMMUNICATION COMPONENTS According to Holsti (1969) and Carney (1972), communications have three major components: the message, the sender, and the audience. The message should be analyzed in terms of explicit themes, relative emphasis on various topics, amount of space or time devoted to certain topics, and numerous other dimensions. Occasionally, messages are analyzed for information about the sender of the communication. According to Chadwick et al. (1984), the linkages between the message content and attributes of the sender are often slight. Nonetheless, some characteristics of the sender may be discernible, especially if numerous examples are available, audible (recorded) messages are examined, or verbatim transcriptions from recordings are used (including 244 CHAPTER ELEVEN literal representations of pauses, mispronounced words, grammatical errors, slang, and other language styles). Strauss (1987, p. 33) similarly differentiates between what he calls in vivo codes and sociological constructs. In vivo codes are the literal terms used by individuals under investigation, the terms used by the various actors themselves. "In vivo codes tend to be the behaviors or processes which will explain to the analyst how the basic problem of the actors is resolved or processed" (Strauss, 1987, p. 33). In contrast, sociological constructs are formulated by the analyst. Terms and categories such as professional attitude, family oriented, obsessive workaholic, and educationally minded might represent examples of sociological constructs. These constructs, of course, need not derive exclusively from sociology and may come from the fields of education, nursing, psychology, and the like. Strauss (1987, p. 34) explains that these constructs "are based on a combination of the researcher's scholarly knowledge and knowledge of the substantive field under study." The result of using constructs is the addition of certain social scientific meanings that might otherwise be missed in the analysis. Thus, sociological constructs add breadth and depth to observations by reaching beyond local meanings to broader social scientific ones. Researchers may additionally use content analysis to assess a message's effects on the audience. The Pornography and Television Violence Commissions tried, for example, to assess the impact of sexual or violent material on television and in movies on those who watched this genre of entertainment (Commission on Obscenity and Pornography, 1970; Comstock & Rubinstein, 1972). However, making accurate inferences about either the characteristics of the sender or the effects of the message on the audience is often tenuous at best. WHAT TO COUNT: LEVELS AND UNITS OF ANALYSIS When using a content analysis strategy to assess written documents, researchers must first decide at what level they plan to sample and what units of analysis will be counted. Sampling may occur at any or all of the following levels: words, phrases, sentences, paragraphs, sections, chapters, books, writers, ideological stance, subject topic, or similar elements relevant to the context. When examining other forms of messages, researchers may use any of the preceding levels or may sample at other conceptual levels more appropriate to the specific message. For example, when examining television programs for violent content, researchers might use segments between commercials as the level of analysis, or they might choose to use the entire television program (excluding commercials) as the level (see, for example, Fields, 1988). AN INTRODUCTION TO CONTENT ANALYSIS 2 4 5 CATEGORY DEVELOPMENT: BUILDING GROUNDED THEORY Strauss (1987) describes the considerable misconception surrounding the development of grounded theory. The term misconception, as Strauss (1987, p. 55) points out, seems more appropriate than criticism. Misconception implies an inaccurate reading of material pertaining to building grounded theory. On the other hand, criticism connotes more of a challenge to or detraction from the benefits of this process. Central to misconception are the notions that grounded theory is an entirely inductive process, that it does not verify findings, and that it somehow molds the data to the theory rather than the reverse. Strauss (1987, p. 55), in a lengthy note, singles out Miles and Huberman (1983) as illustrating several instrumental misconceptions (brackets in original text contain Strauss's responses): In Miles and Huberman (1983, p. 57) there is also a misunderstanding about grounded theory technology. The material in my book, written before their publication appeared, runs directly counter to some of their remarks, that: the grounded theory approach has a lot going for it. Data get well molded to the codes that represent them, and we get more of a code-in-use flavor than the generic code-for-many-uses generated by prefabricated start lists.... The tradeoff here is that earlier segments may have different codes than later ones. [They may, in part, of course.] Or to avoid this everything may have to be recorded once a more empirically sculpted scheme emerges. [No.] This means more overall coding time, and longer uncertainty about the coherence of the coding frame. [Probably, but deliberate, in part]. In addition, Miles and Huberman (1983, pp. 63-64) promote the worrisome notion that coding is not an enjoyable task, which suggests that other aspects of the research enterprise are more fun. This text as well as Strauss (1987) strongly disagree. Coding and other fundamental procedures associated with grounded theory development are certainly hard work and must be taken seriously, but just as many people enjoy finishing a complicated jigsaw puzzle, many researchers find great satisfaction in coding and analysis. As researchers move through the coding process and begin to see the puzzle pieces come together to form a more complete picture, the process can be downright thrilling. Time consuming, tiring, and even laborious as the process is, it is seldom boring! The categories researchers use in a content analysis can be determined inductively, deductively, or by some combination of both (Strauss, 1987). Abrahamson (1983, p. 286) indicates that an inductive approach begins with the researchers "immersing" themselves in the documents (that is, the various messages) in order to identify the dimensions or themes that seem meaningful to the producers of each message. In a deductive approach, researchers 2 4 6 CHAPTER ELEVEN use some categorical scheme suggested by a theoretical perspective, and the documents provide a means for assessing the hypothesis. In many circumstances, the relationship between a theoretical perspective and certain messages involves both inductive and deductive approaches. However, in order to present the perceptions of others (the producers of messages) in the most forthright manner, a greater reliance upon induction is necessary. Nevertheless, as will be shown, induction should not be undertaken to the exclusion of deduction. The development of inductive categories allows researchers to link or ground these categories to the data from which they derive. Certainly it is reasonable to suggest that insights and general questions about research derive from previous experience with the study phenomena. This may represent personal experience, scholarly experience (having read about it), or previous research undertaken to examine the matter. Researchers, similarly, draw on these experiences in order to propose tentative comparisons that assist in creating various deductions. Experience thus underpins both inductive and deductive reasoning. From this interplay of experience, induction, and deduction, Glaser and Strauss formulate their description of grounded theory. According to Glaser and Strauss (1967, pp. 2-3): To generate theory... we suggest as the best approach an initial, systematic discovery of the theory from the data of social research. Then one can be relatively sure that the theory will fit the work. And since categories are discovered by examination of the data, laymen involved in the area to which the theory applies will usually be able to understand it, while sociologists who work in other areas will recognize an understandable theory linked with the data of a given area. What to Count Seven major elements in written messages can be counted in content analysis: words or terms, themes, characters, paragraphs, items, concepts, and semantics (Berelson, 1952; Berg, 1983; Merton, 1968; Selltiz et al., 1959). Words. The word is the smallest element or unit used in content analysis. Its use generally results in a frequency distribution of specified words or terms. Themes. The theme is a more useful unit to count. In its simplest form, a theme is a simple sentence, a string of words with a subject and a predicate. Because themes may be located in a variety of places in most written documents, it becomes necessary to specify (in advance) which places will be searched. For example, researchers might use only the primary theme in a AN INTRODUCTION TO CONTENT ANALYSIS 2 4 7 given paragraph location or alternatively might count every theme in a given text under analysis. Characters. In some studies, characters (persons) are significant to the analysis. In such cases, you count the number of times a specific person or persons are mentioned rather than the number of words or themes. Paragraphs. The paragraph is infrequently used as the basic unit in content analysis chiefly because of the difficulties that have resulted in attempting to code and classify the various and often numerous thoughts stated and implied in a single paragraph. Items. An item represents the whole unit of the sender's message—that is, an item may be an entire book, a letter, speech, diary, newspaper, or even an in-depth interview. Concepts. The use of concepts as units to count is a more sophisticated type of word counting than previously mentioned. Concepts involve words grouped together into conceptual clusters (ideas) that constitute, in some instances, variables in a typical research hypothesis (Sanders & Pinhey 1959, p. 191). For instance, a conceptual cluster may form around the idea of deviance. Words such as crime, delinquency, kiting, and fraud might cluster around the conceptual idea of deviance (Babbie, 1998). To some extent, the use of a concept as the unit of analysis leads toward more latent than manifest content. Semantics. In the type of content analysis known as semantics, researchers are interested not only in the number and type of words used but also in how affected the word(s) may be—in other words, how strong or weak a word (or words) may be in relation to the overall sentiment of the sentence (Sanders & Pinhey, 1959). Combinations of Elements In many instances, research requires the use of a combination of several content analytic elements. For example, in my study (Berg, 1983) to identify subjective definitions for Jewish affiliational categories (Orthodox, Conservative, Reform, and Nonpracticing), I used a combination of both item and paragraph elements as a content unit. In order to accomplish a content analysis of these definitions (as items), I lifted every respondent's definitions of each affiliational category verbatim from an interview transcript. Each set of definitions was additionally annotated with the transcript number from which it had been taken. Next, each definition (as items) was separated into its component definitional paragraph for each affiliational category. An example of this definitional paragraphing is shown below (Berg, 1983, p. 76): 2 4 8 CHAPTER ELEVEN INTERVIEW #60: ORTHODOX Well, I guess, Orthodox keep kosher in [the] home and away from home. Observe the Sabbath, and, you know . . . , actually if somebody did [those] and considered themselves an Orthodox Jew, to me that would be enough. I would say that they were Orthodox. INTERVIEW #60: CONSERVATIVE Conservative, I guess, is the fellow who doesn't want to say he's Reform because it's objectionable to him. But he's a long way from being Orthodox. INTERVIEW #60: REFORM Reform is just somebody that, they say they are Jewish because they don't want to lose their identity. But actually I want to be considered a Reform, 'cause I say I'm Jewish, but I wouldn't want to be associated as a Jew if I didn't actually observe any of the laws. INTERVIEW #60: NONPRACTICING Well, a Nonpracticing is the guy who would have no temple affiliation, no affiliation with being Jewish at all, except that he considers himself a Jew. I guess he practices in no way, except to himself. Units and Categories Content analysis involves the interaction of two processes: specification of the content characteristics (basic content elements) being examined and application of explicit rules for identifying and recording these characteristics. The categories into which you code content items vary according to the nature of the research and the particularities of the data (that is, whether they are detailed responses to open-ended questions, newspaper columns, letters, television transcripts, and so on). As with all research methods, conceptualization and operationalization necessarily involve an interaction between theoretical concerns and empirical observations. For instance, if researchers wanted to examine newspaper orientations toward changes in a state's seat-belt law (as a potential barometer of public opinion), they might read newspaper articles and/or editorials. As they read each article, the researchers could ask themselves which ones were in favor of and which ones were opposed to changes in the law. Were the articles' positions more clearly indicated by their manifest content or by some undertone? Was the decision to label one article pro or con based on the use of certain terms, on presentation of specific study findings, or because of statements offered by particular characters (for example, celebrities, political figures, and so on)? The answers to these questions allow the researchers to develop inductive categories in which to slot various units of content. AN INTRODUCTION TO CONTENT ANALYSIS 2 4 9 As previously mentioned, researchers need not limit their procedures to induction alone. Both inductive and deductive reasoning may provide fruitful findings. If, for example, investigators are attempting to test hypothetical propositions, their theoretical orientation should suggest empirical indicators of concepts (deductive reasoning). If they have begun with specific empirical observations, they should attempt to develop explanations grounded in the data (grounded theory) and apply these theories to other empirical observations (inductive reasoning). There are no easy ways to describe specific tactics for developing categories or to suggest how to go about defining (operationalizing) these tactics. To paraphrase Schatzman and Strauss's (1973, p. 12) remark about methodological choices in general, the categorizing tactics worked out—some in advance, some developed later—should be consistent not only with the questions asked and the methodological requirements of science but also with a relation to the properties of the phenomena under investigation. Stated succinctly, categories must be grounded in the data from which they emerge (Denzin, 1978; Glaser & Strauss, 1967). The development of categories in any content analysis must derive from inductive reference (to be discussed in detail later) concerning patterns that emerge from the data. For example, in a study evaluating the effectiveness of a Florida-based delinquency diversion program, I (Berg, 1986) identified several thematic categories from information provided on intake sheets. By setting up a tally sheet, I managed to use the criminal offenses declared by arresting officers in their general statements to identify two distinct classes of crime, in spite of arresting officers' use of similar-sounding terms. In one class of crime, several similar terms were used to describe what amounted to the same type of crime. In a second class of crime, officers more consistently referred to the same type of crime by a consistent term. Specifically, I found that the words shoplifting, petty theft, and retail theft each referred to essentially the same category of crime involving the stealing of some type of store merchandise, usually not exceeding $3.50 in value. Somewhat surprisingly, the semantically similar term petty larceny was used to describe the taking of cash whether it was from a retail establishment, a domicile, or an auto. Thus, the data indicated a subtle perceptual distinction made by the officers reporting juvenile crimes. Recently, Dabney (1993) examined how practicing nurses perceived other nurses who worked while impaired by alcohol or drugs. He developed several thematic categories based on previous studies found in the literature. He was also able to inductively identify several classes of drug diversion described by subjects during the course of interviews. For instance, many subjects referred to stockpiled drugs that nurses commonly used for themselves. These drugs included an assortment of pain killers and mild sedatives stored in a box, a drawer, or some similar container on the unit or floor. These stockpiled drugs 250 CHAPTER ELEVEN accumulated when patients died or were transferred to another hospital unit and this information did not immediately reach the hospital pharmacy. Classes and Categories Three major procedures are used to identify and develop classes and categories in a standard content analysis and to discuss findings in research that use content analysis: common classes, special classes, and theoretical classes. Common Classes. The first are the common classes of a culture in general. These classes are used by virtually anyone in society to distinguish between and among persons, things, and events (for example, age, gender, mother, father, teacher, and so on). These common classes, as categories, provide for lay people a means of designation in the course of everyday thinking and communicating, and to engender meaning in their social interactions (see Duncan, 1962; Schatzman & Strauss, 1973; Strauss, 1959). These common classes are essential in assessing whether certain demographic characteristics are related to patterns that may arise during a given data analysis. Special Classes. Special classes are those labels used by members of certain areas (communities) to distinguish among the things, persons, and events within their limited province (Schatzman & Strauss, 1973). These special classes can be likened to jargonized terms used commonly in certain professions but not by lay people. Alternatively, these special classes may be described as out-group versus in-group classifications. In the case of the out-group, the reference is to labels conventionally used by the greater (host) community or society; as for the in-group, the reference is to conventional terms and labels used among some specified group or that may emerge as theoretical classes. Theoretical Classes. The theoretical classes are those that emerge in the course of analyzing the data (Schatzman & Strauss, 1973). In most content analysis, these theoretical classes provide an overarching pattern (a key linkage) that occurs throughout the analysis. Nomenclature that identifies these theoretical classes generally borrows from that used in special classes and, together with analytically constructed labels, accounts for novelty and innovations. According to Schatzman and Strauss (1973), these theoretical classes are special sources of classification because their specific substance is grounded in the data. Because these theoretical classes are not immediately knowable or available to observers until they spend considerable time going over the ways respondents (or messages) in a sample identify themselves and others, it is necessary to retain the special classes throughout much of the analysis. The next problem to address is how to identify various classes and categories in the data set, which leads to a discussion of open coding. AN INTRODUCTION TO CONTENT ANALYSIS 2 5 1 OPEN CODING Inexperienced researchers, although they may intellectually understand the process described so far, usually become lost at about this point in the actual process of coding. Some of the major obstacles that cause anguish include the so-called true or intended meaning of the sentence and a desire to know the real motivation behind a subject's clearly identifiable lie. If the researchers can get beyond such concerns, the coding can continue. For the most part, these concerns are actually irrelevant to the coding process, particularly with regard to open coding, the central purpose of which is to open inquiry widely. Although interpretations, questions, and even possible answers may seem to emerge as researchers code, it is important to hold these as tentative at best. Contradictions to such early conclusions may emerge during the coding of the very next document. The most thorough analysis of the various concepts and categories will best be accomplished after all the material has been coded. The solution to the novice investigators' anguish, then, as suggested by Strauss (1987, p. 28) is to "believe everything and believe nothing" while undertaking open coding. Strauss (1987, p. 30) suggests four basic guidelines when conducting open coding. These are: (1) ask the data a specific and consistent set of questions, (2) analyze the data minutely, (3) frequently interrupt the coding to write a theoretical note, and (4) never assume the analytic relevance of any traditional variable such as age, sex, social class, and so forth until the data show it to be relevant. A detailed discussion of each of these guidelines follows. 1. Ask the data a specific and consistent set of questions. The most general question researchers must keep in mind is, What study are these data pertinent to? In other words, what was the original objective of the research study? This is not to suggest that the data must be molded to that study. Rather, the original purpose of a study may not be accomplished and an alternative or unanticipated goal may be identified in the data. For example, in Pearson's (1987) evaluation of a New Jersey intensive problem supervision program, the original aim was to demonstrate cost effectiveness. Although objective indicators failed to support the cost effectiveness of the experimental program, several indirect indicators suggested that the program nonetheless was fairly successful. These other measures involved repeated reports from relatives of probationers about changes in attitudes demonstrated by the program participants. For instance, the wife of one participant reported that her husband had begun to send child-support payments in full and on time. Parents of another program participant reported that their child had begun to show personal responsibility by doing household chores around the home—something the individual had previously never undertaken. Thus, Pearson (1987) points to an unanticipated benefit from the program. This illustration demonstrates the need both to keep the original study 2 5 2 CHAPTER ELEVEN aim in mind and to remain open to multiple or unanticipated results that emerge from the data. 2. Analyze the data minutely. Strauss (1987) cautions that researchers should remember that they are conducting an initial coding procedure. As such, it is important to analyze data minutely. Students in qualitative research should remind themselves that in the beginning, more is better. Coding is much like the traditional funnel used by many educators to demonstrate how to write papers. You begin with a wide opening, a broad statement; narrow the statement throughout the body by offering substantial backing; and finally, at the small end of the funnel, present a refined, tightly stated conclusion. In the case of coding, the wide end represents inclusion of many categories, incidents, interactions, and the like. These are coded minutely during open coding. Later, this effort ensures extensive theoretical coverage that will be thoroughly grounded. At a later time, more systematic coding can be accomplished, building from the numerous elements that emerge during this phase of open coding. The question that arises, of course, is when to stop this open coding process and move on to the speedier, more systematic coding phase. Typically, as researchers minutely code, they eventually saturate the document with repetitious codes. As this occurs and the repetition allows the researchers to move more rapidly through the documents, it is usually safe to conclude that the time has come to move on. 3. Frequently interrupt the coding to write a theoretical note. This third guideline suggested by Strauss (1987) directs researchers closer to grounded theory. Often, in the course of coding, a comment in the document triggers ideas. Researchers should take a moment to jot down a note about these ideas, which may well prove useful later. If they fail to do so, they are very likely to forget the idea. In many instances, researchers find it useful to keep a record of where in each document similar comments, concepts, or categories seem to convey the same elements that originally triggered the theory or hypothesis. For example, during the coding process of a study on adolescents' involvement with alcohol, crime, and drugs, interview transcripts revealed youths speaking about drugs and criminal activities as if they were almost partitioned categories (Carpenter et al., 1988). Notes scribbled during coding later led to theories on drug-crime event sequences and the nexus of drug-crime events. 4. Never assume the analytic relevance of any traditional variable such as age, sex, social class, and so on until the data show it to be relevant. As Strauss (1987, p. 32) indicates, even these more mundane variables must "earn their way into the grounded theory." This assumes that these variables are necessarily contributing to some condition, but it does not mean you are prohibited from intentionally using certain variables deductively. The first guideline, What are the study data pertinent to? is germane to the coding process. Consequently, if researchers are interested in gender differences, naturally, they AN INTRODUCTION TO CONTENT ANALYSIS 2 5 3 begin by assuming that gender might be analytically relevant, but if the data fail to support this assumption, the researchers must accept this result. CODING FRAMES Content analysis is accomplished through the use of coding frames. The coding frames are used to organize the data and identify findings after open coding has been completed. The first coding frame is often a multileveled process that requires several successive sortings of all cases under examination. Investigators begin with a general sorting of cases into some specified special class. In many ways, this first frame is similar to what Strauss (1987, p. 32) describes as axial coding. According to Strauss (1987) axial coding occurs after open coding is completed and consists of intensive coding around one category. The first sorting approximates Strauss's description of axial coding. An example may better illustrate this process. I (Berg, 1983) began my first sorting by separating all cases into Jewish affiliational categories declared by respondents during an initial telephone contact. Subjects' responses came after being asked in a screening question: "With which of the following do you most closely associate yourself: Reform, Orthodox, Conservative, or Nonpractiring?" (Subjects were consistently asked this question using the preceding affiliational ordering in an attempt to guard against certain acquiescent response sets.) This procedure separated my sample (cases) into four groupings bearing the conventional affiliational titles listed above. After completing this sorting, I carefully read the responses to the identical question asked in the course of each respondent's in-depth interview. Subsequently, each affiliational grouping was subdivided into three groups using the following criteria of selection: 1. The first subdivision in each category consisted of all cases in which respondents' answers to the interview version of the question, "With which of the following ..." (1) were consistent with the response given during the telephone screening and (2) were offered with no qualification or exception. 2. The second subdivision in each category consisted of cases in which respondents qualified their responses with a simple modifier (usually a single adjective), but were otherwise consistent with the response offered on the telephone screening question (for example, "I am a modern Orthodox Jew."). 3. The third subdivision consisted of all cases in which the respondents offered detailed explanations for their affiliational declarations that were also consistent with their telephone screening response. For example, one male respondent explained that just as his father had switched 254 CHAPTER ELEVEN from being an Orthodox to a Conservative affiliate, so too did he make a switch from being a Conservative to a Reform affiliate. His declaration of Reform, however, was consistent with what he had originally declared during the telephone screening. 4. The fourth subdivision consisted of all cases in which the respondents contradicted their original telephone screening question response or indicated that they simply could not determine where they fit in terms of the four conventional affiliational categories. Using the above criteria, I sorted my cases into the indicated subdivisions. Following this, and using a sorting process similar to the preceding one, I again subdivided each newly created subgroup to produce a typological scheme containing 16 distinct categories, the overarching or key linkage in every case being the subjective declaration of each respondent (at two distinct iterations of the same question). Having sorted and organized my data, I was ready to interpret the patterns apparent from both the organizational scheme and the details offered in response to interview questions. At this juncture in my analysis, relevant theoretical perspectives were introduced in order to tie the analysis both to established theory and to my own emerging grounded theory (Glaser & Strauss, 1967). These theoretical considerations and sociological constructs led me to analyze several other detailed responses to interview questions. These other questions concerned respondents' involvement in and knowledge of religious symbols and ceremonies. In order to preserve the key linkage throughout the entire analysis process, each subsequent analysis of responses was performed against the newly created typological scheme of subjective identification labels (the 16-category scheme mentioned previously). Another example of this axial coding or sorting process is offered by Bing (1987), who examined plea bargaining by using an archival strategy. He created a master list containing over 400 articles that examined plea bargaining as represented in 12 major social science journals during the past 5 years (Bing, 1987, pp. 50ff). Following the creation of his master list, Bing sorted his articles, first by manifest theoretical orientation and second by methodological approach. After eliminating categories that contained only a single article and collapsing fundamentally similar theoretical orientations, Bing identified 12 distinct theoretical categories (for example, labeling theory, organizational theory, crime control/due process theory, economic theory, dramaturgical theory, and so forth). The second coding resulted in six distinct methodological approaches, which Bing used to subdivide each of the theoretical categories. Bing established objective criteria for each of the possible theoretical categories. As Bing (1987, pp. 72-73) explains his criteria: "The general focus of the article was used to determine the theoretical orientation [of each article]. In some instances, the author would clearly state the theory; on other occasions, the theoretical AN INTRODUCTION TO CONTENT ANALYSIS 255 approach was lifted based upon statements [offered by the author and] used to characterize the study." In essence, Bing sought to identify theoretical approaches by examining the expression by each author (either directly or indirectly) of a theoretical declaration. Strauss similarly outlines the coding process. According to Strauss (1987, p. 28), the analyst begins with a procedure he calls open coding. This procedure is described as an unrestricted coding of the data. With open coding, you carefully and minutely read the document line by line and word by word to determine the concepts and categories that fit the data. These concepts, once uncovered, are entirely tentative. As you continue working with and thinking about the data, questions and even some plausible answers also begin to emerge. These questions and answers should lead you to other issues and further questions concerning "conditions, strategies, interactions and consequences" (Strauss, 1987, p. 28). A Few More Words on Analytic Induction As Robinson (1951) suggests, "Since Znaniecki stated it in 1934, the method of analytic induction has come into important use." The use of analytic induction, however, also has involved a number of refinements—including several variations on its style and purpose. For example, Sutherland and Cressey (1966) refined the method and suggest that it be used in the study of causes of crime. Even before Sutherland (1950), Lindesmith (1947) had discovered the usefulness of an analytic inductive strategy in a study of opiate users. Lindesmith (1952, p. 492) describes analytic induction as follows: The principle which governs the selection of cases to test a theory is that the chances of discovering a decisive negative case should be maximized. The investigator who has a working hypothesis concerning the data becomes aware of certain areas of critical importance. If his theory is false or inadequate, he knows that its weakness will be more clearly and quickly exposed if he proceeds to the investigation of those critical areas. This involves going out of one's way to look for negative evidence. Adding further refinements to the method, Glaser and Strauss suggest that analytic induction should combine analysis of data after the coding process with analysis of data while integrating theory. In short, analysis of data is grounded to established theory and is also capable of developing theory. Glaser and Strauss (1967, p. 102) describe their refinements as follows: We wish to suggest a third approach to the analysis of qualitative data—one that combines, by an analytic procedure of constant comparison, the explicit coding procedures of the first approach [analysis of data after coding] and the style of theory development of the second [the integration of data and theory]. The purpose of the constant comparative method of joint coding and analysis is 256 CHAPTER ELEVEN to generate theory more systematically than allowed by the second approach, by using explicit coding and analytic procedures. While more systematic than the second approach, this method does not adhere completely to the first, which hinders the development of theory because it is designed for provisional testing and not discovering hypotheses. Glaser and Strauss (1967) suggest that such a joint coding and analysis of data is a more honest way to present findings and analysis. Similarly, Merton (1968, pp. 147-148) discusses the "logical fallacy underlying post factum explanations" and hypothesis testing. Merton states: It is often the case in empirical social research that data are collected and only then subjected to interpretative comment. . . . Such post factum explanations designed to "explain" observations, differ in logical function from speciously similar procedures where the observational materials are utilized in order to derive fresh hypotheses to be confirmed by new observations. A disarming characteristic of the procedure is that the explanations are indeed consistent with the given set of observations. This is scarcely surprising, in as much as only those post factum hypotheses are selected which do accord with these observations. . . . The method of post factum explanation does not lend itself to nullifiability. Researchers should make extensive use of Glaser and Strauss's (1967) style of analytic induction and, perhaps more directly, of Strauss's (1987) rearticulation of their position. According to Strauss (1987, p. 12): Because of our earlier writing in Discovery (1967) where we attacked speculative theory—quite ungrounded in bodies of data—many people mistakenly refer to grounded theory as "inductive theory" in order to contrast it with say, the theories of Parson or Blau. But as we have indicated, all three aspects of inquiry (induction, deduction, and verification) are absolutely essential. ... In fact, it is important to understand that various kinds of experience are central to all these modes of activity—induction, deduction, and verification—that enter into inquiry. Throughout the analysis, researchers should incorporate all appropriate modes of inquiry. Thus, both logically derived hypotheses and those that have "serendipitously" (Merton, 1968) arisen from the data may find their way into the research. Interrogative Hypothesis Testing In order to verify and assess the applicability of a given hypothesis, researchers should use a style of negative case testing suggested by Robinson (1951), Lindesmith (1952), Manheim and Simon (1977), and Denzin (1978). This process of negative case testing essentially involves the following steps: AN INTRODUCTION TO CONTENT ANALYSIS 2 5 7 1. Make a rough hypothesis based on an observation from the data. 2. Conduct a thorough search of all cases to locate negative cases (that is, cases that do not fit the hypothesized relationship). 3. If a negative case is located, either discard or reformulate the hypothesis to account for the negative case or exclude the negative case. 4. Examine all relevant cases from the sample before determining whether "practical certainty" (Denzin, 1978) in this recommended analysis style is attained. For example, based on a reading of responses to the open-ended question, "With which of the following do you most closely associate yourself: Conservative, Orthodox, Reform, or Nonpracticing?" I (Berg, 1983) hypothesized that certain groups of persons offered instrumentally oriented answers (that is, oriented to achievement and goals) while other groups offered expressively oriented answers (that is, sentimental, feeling oriented, and symbolic). I further hypothesized that these styles of responses could be linked to particular categories relevant to the analysis of differential involvement with religious activities and subjective affiliational identification. However, after carefully reexamining each case, with these hypotheses in mind, I found many negative cases. At each negative juncture, I attempted to reformulate the hypotheses to account for the cases that did not fit. Unfortunately, I soon realized that my hypotheses had become artificial and meaningless. Consequently, I soon abandoned them. None of the successive formulations were constructed de novo but were based on some aspect of the preceding hypothetical relationship (see Denzin, 1978, pp. 193-194; Lindesmith, 1947, pp. 9-10). It may be argued that the search for negative cases sometimes neglects contradictory evidence (that is, when a case both affirms and in some way denies a hypothetical relationship) or distorts the original hypothetical relationship (that is, when the observers read into the data whatever relationship they have hypothesized—a variation on post factum hypothesizing). To accomplish content analysis in the style recommended here, researchers must use several safeguards against these potential flaws in analysis. First, whenever numbers of cases allow, examples that illustrate a point should be lifted at random from among the relevant grouped cases. Second, every assertion made in the analysis should be documented with no fewer than three examples. Third, analytic interpretations should be examined carefully by an independent reader (someone other than the actual researchers) to ensure that their claims and assertions are not derived from a misreading of the data and that they have been documented adequately. Finally, whenever inconsistencies in patterns do emerge, these too should be discussed in order to explain whether they have invalidated overall patterns. Failure to mention these inconsistencies in pattern is a less than forthright presentation of the data and analysis. In effect, the use of the above safeguards avoids what Glaser and Strauss (1967, p. 5) describe as exampling. According to Glaser and Strauss, 2 5 8 CHAPTER ELEVEN exampling is finding examples for "dreamed-up, speculative, or logically deducted theory after the idea occurred," rather than allowing the patterns to emerge from the data. For instance, in the course of analyzing responses to the question, "How do you celebrate Chanukkah, if at all?" during an early analysis of the data, I (Berg, 1983) suggested that gift giving was emphasized to a greater extent by some affiliational groups than by others. However, when this section was read by an independent reader, the reader noticed that several negative cases had been presented in evidence of this assertion. What I had originally missed was that the more traditional affiliational group members had described their style of gift giving in the midst of a number of traditional (religious) rituals. On the other hand, many of the nonpracticing affiliational group members had described gift giving as being in competition with an observance of Christmas and thus actually fused their observance of Chanukkah with an observance of Christmas. STRENGTHS AND WEAKNESSES OF THE CONTENT ANALYSIS PROCESS Perhaps the most important advantage of content analysis is that it can be virtually unobtrusive (Webb et al., 1981). Content analysis, although useful when analyzing depth interview data, may also be used nonreactively: no one needs to be interviewed, no one needs to fill out lengthy questionnaires, no one must enter a laboratory. Rather, newspaper accounts, public addresses, libraries, archives, and similar sources allow researchers to conduct analytic studies. An additional advantage is that it is cost effective. Generally, the materials necessary for conducting content analysis are easily and inexpensively accessible. One college student working alone can effectively undertake a content analysis, whereas undertaking a national survey, for instance, might require enormous staff, time, and expense. A further advantage to content analysis is that it provides a means by which to study processes that occur over long periods of time or that may reflect trends in a society (Babbie, 1998). As examples, you might study the portrayal of women in the media from 1800 to 1993 or you might focus on changing images of women in the media from 1982 to 1992. For instance, McBroom (1992) recently examined women in the clergy as depicted in the Christian Century between 1984 and 1987. McBroom (1992, p. 208) reports: 1984 was a year when the issue of women's ordination gained support in the news media, as indicated by the number of positive references, especially articles, during the year.... The next year, 1985, was a year of transition, as few references were recorded for that year. The data for the years 1986 and 1987 indicate a growing negative response to the issue of the ordination of women, especially in the negative news reports. AN INTRODUCTION TO CONTENT ANALYSIS 2 5 9 The data . . . indicate that, in general, conditions and opportunities for women in the clergy in the United States deteriorated rather than improved during these years. Thus, using content analysis, McBroom (1992) was able to examine data during individual years as well as over the span of all years under study. The single serious weakness of content analysis may be in locating unobtrusive messages relevant to the particular research questions. In other words, content analysis is limited to examining already recorded messages. Although these messages may be oral, written, graphic or videotaped, they must be recorded in some manner in order to be analyzed. Of course, when you undertake content analysis as an analysis tool rather than as a complete research strategy, such a weakness is minimal. For example, if researchers use content analysis to analyze interview data or responses to open-ended questions (on written questionnaires), this weakness is virtually nonexistent. Another limitation (although some might call it a weakness) of content analysis is that it is ineffective for testing causal relationships between variables. Researchers and their audiences must resist the temptation to infer such relationships. This is particularly true when researchers forthrightly present the proportion or frequency with which a theme or pattern is observed. This kind of information is appropriate to indicate the magnitude of certain responses; however, it is not appropriate to attach cause to these presentations. As with any analytic method, the advantages of content analysis must be weighed against the disadvantages and against alternative research strategies. Although content analysis may be appropriate for some research problems and designs, it is not appropriate in every research situation. It is a particularly beneficial procedure for assessing events or processes in social groups when public records exist. It is likewise helpful in many types of exploratory or descriptive studies. But if you are interested in conducting experimental or causal research, content analysis is virtually useless. COMPUTERS AND QUALITATIVE ANALYSIS It is now 35 years since General Inquirer, the first software program designed to assist in the analysis of textual data, became public (Stone et al., 1966; Tesch, 1991). Of course, when General Inquirer came out, small affordable personal computers did not exist. To use General Inquirer, one needed access to a large mainframe computer and sufficient time to read and digest its book-length instructions. This program still largely operated on the basis of counting and numerous calculations. Yet it did work exclusively with textual data (Tesch, 1991).