Draft section (December 19, 1994) of an article with Greg Feist intended for Psychological Bulletin. Comments welcome: please do not quote without permission.
Michael E. Gorman, University of Virginia, meg3c@virginia.edu
There is neither space nor time in this section to review the history of attempts by psychologists to understand scientific thinking. (See Campbell, 1989 for a good overview.) Rather, we will look at the current state of the literature in this area, making occasional references to classic studies. This literature has grown tremendously in the last decade and, like much of cognitive psychology, it overlaps with cognitive science. It also has strong connections to philosophy of science.
Basically, this literature can be organized in two ways:
1. Conceptually, by using the classic distinction between discovery and justification favored by some philosophers of science (Reichenbach, 1938). Recent cognitive psychological research on science can be said to have begun with the study of justification, or hypothesis-testing, and moved to looking more at discovery, or where scientific ideas come from.
2. Methodologically, based on whether experimental, computational, field or some combination of these techniques were used. These two organizational categories interact, as we will see below.
Popper's idea that science progresses by falsification inspired three decades of experimental research primarily directed at the context of justification. Wason (1960) developed the "2-4-6 task," in which a subject was given the number triple '2,4,6' and told to discover a rule that determined which triples were correct or incorrect. Subjects conducted experiments by proposing triples; the experimenter told them whether each triple fit the rule, and when they announced a guess about the rule, whether it was correct or not. Wason's goal was to determine whether subjects would falsify their hypotheses about the rule. Subjects typically started with hypotheses like 'even numbers in order' and failed to propose triples that ought to be wrong if their hypotheses were right. (The actual rule was 'numbers ascend in order of magnitude.') Wason attributed this to a 'verification bias.'
Wason was not particularly interested in simulating scientific reasoning, but a group at Bowling Green took his task in that direction. (See Tweney et al, 1980.) They developed another, more sophisticated simulation in which subjects fired particles at shapes on a computer in an effort to discover the laws governing their interaction with these shapes. (See Mynatt et al., 1977, 1978.) To both of these tasks, the 'Bowling Green group' added instructions to disconfirm, in an effort to combat the apparent bias that Wason had discovered. These instructions failed to improve performance.
In contrast, (Gorman & Gorman, 1984) found that instructions to falsify significantly improved performance on Wason's task when subjects could not ask the experimenter whether any of their guesses about the rule were correct. But if the rule was made even more general than Wason's 'numbers must ascend in order of magnitude'-- (Gorman, Stafford & Gorman, 1987) used 'the three numbers must be different'--disconfirmatory instructions did not have a positive effect.
This was especially surprising in the light of an analysis by Klayman & Ha (1987), who showed that when the subject's hypothesis is narrower than the target rule, a 'negative test heuristic' is most likely to produce disconfirmation; however, if the problem space described by the hypothesis is broader than the target rule, a negative test heuristic will produce only confirmations. On Wason's task, negative tests are attempts to get triples wrong and positive tests are attempts to get them right. Klayman & Ha emphasized that it is important to distinguish positive and negative test strategies from confirmation and disconfirmation. Gorman's confirmatory instructions emphasized trying to get triples right and his disconfirmatory instructions emphasized trying to get triples wrong. Klayman & Ha would argue that these instructions succeeded on Wason's 'ascending in order of magnitude' rule because they encouraged a negative test heuristic in a situation where the subjects' initial hypotheses were embedded within the target rule. But the same analysis suggests that these instructions should have succeeded on an even more general rule, 'the three numbers must be different.'
Gorman's subjects were clearly trying to obtain negative evidence, but they did not know where to find it. Following Tweney et al. (1980), Gorman changed the task from a search for a single rule which would determine which triples were right and wrong to a search for two rules arbitrarily labeled DAX and MED: the DAX rule was "the three numbers must be different" and the MED rule was "two or more numbers must be the same." As in Tweney et al.'s earlier study, this manipulation resulted in greatly improved performance over simply giving subjects instructions to falsify.
Gorman (1992) concluded that falsification depended at least in part on what Johnson-Laird (1983) has called a 'mental model' of the task. Subjects who imagined they were searching for two complementary rules automatically searched the boundaries of each; subjects who thought they were searching for a single rule often ended-up finding no negative evidence and proposing that it must be 'any number.' (For a recent series of experiments that supports this analysis, see (Wharton, Cheng, & Wickens, 1993) .) These results suggested that the critical relationship in Klayman & Ha's analysis was between the subject's hypothesis and her representation of the target rule.
Farris & Revlin (1989; 1989a) argued that many subjects who appear to be following a disconfirmatory strategy are actually searching for positive instances of a counterfactual hypothesis. For example, a subject who thinks the rule is 'even numbers' may propose 'odd numbers' as a counterfactual hypothesis, then test that with a triple like '3,5,7' which is a negative test with respect to 'even numbers' but confirmatory with respect to the counterfactual hypothesis 'odd numbers.' Farris and Revlin claim their results show that a counterfactual strategy is superior to a disconfirmatory one. Gorman (1991) pointed out that there was a crucial methodological difference between their work and his--in Farris and Revlin's studies, subjects were given feedback on their rule announcements, and in Gorman's, they were not. (For a review and analysis of this controversy, see (Oaksford & Chater, 1994).) In any case, a counterfactual strategy may be a successful way of converting the standard version of the 2-4-6 task to a DAX-MED problem, because a counterfactual hypothesis is roughly equivalent to a hypothesis about the MED rule, and successful DAX-MED subjects pursue positive instances of the MED rule.
Kevin Dunbar (1989) created a computerized molecular genetics laboratory in which subjects were posed a problem similar to the one for which Monod & Jacob won the Nobel Prize in 1961. Dunbar did not intend to have subjects simulate the actual discovery path followed by Monod & Jacob; instead, he wanted "to use a task that involves some real scientific concepts and experimentation to address the cognitive components of the scientific discovery process," (Dunbar, 1989, p. 427).
Subjects were given elementary training in concepts of molecular genetics, using an interactive environment on a Macintosh computer. Then they were allowed to perform experiments with three controller and three enzyme-producing genes; they could vary the amount of nutrient, remove genes, and measure the enzyme output. The mechanism the subjects had to discover was inhibition, whereas the mechanism they had learned in training was activation.
Note despite Dunbar's protestations to the contrary--"rather than inventing an arbitrary task that embodies certain aspects of science it is possible to give subjects a real scientific task to work with" (p. 427)--this task bears more resemblance to other artificial universes than it does to actual scientific problems. Subjects are given instructions which explain their little universe; these instructions, like the starting triple 2-4-6, bias them towards a hypothesis that is different from the one they are trying to find, and they are able to do a wide variety of mini-experiments to discover the rule--which, although it represents an actual scientific relationship, is as arbitrary to them as the numerical formulas discovered by subjects in the 2-4-6 task. There are none of the potential sources of error that occur in actual genetics experiments and no new techniques to be mastered.
All of which is to say that this simulation is a valuable complement to the reasoning experiments discussed earlier. Indeed, Dunbar relates his findings to the literature on disconfirmation. In this task, all subjects eventually disconfirmed their initial hypotheses about the role of the activator gene--no matter what genes were present or absent, there was always an output. What is interesting is what they did next: 6 groups re-interpreted activation to mean a search for the gene that facilitated enzyme production, 7 searched diligently for an activator gene and eventually gave up, and 7 set the goal of explaining their surprising results. Five out of the 7 groups in this category actually found the inhibitor gene. Note that Dunbar's results support the thesis that successful disconfirmation depends on how subjects or scientists represent the task.
Scientists are acutely aware of the possibility of error when they design and evaluate experiments, yet most of the simulations of scientific reasoning--whether experimental or computational--do not incorporate this important aspect of science. Apparent anomalies and falsifications are often the results of errors. To guard against this possibility, the 'methodological falsificationist' (Lakatos, 1978) advocates systematic replication. For example, Einstein's theory of special relativity was apparently falsified by the eminent physicist Kaufmann; Einstein himself remained undisturbed, and called for replication. Kaufmann's result was later found to be an error. (See Gorman, 1992.)
Gorman conducted a series of studies to determine whether and under what circumstances the methodological falsificationist's advice worked. (See (Gorman, 1989(c)) .) To understand how error can be added to one of the problems that simulate scientific reasoning, let us once again use the 2-4-6 task as an example. In the usual version, every result is 100% reliable and unambiguous. In a possible-error version, Gorman told subjects anywhere from 0 to 20% of their results might be erroneous, i.e., a triple that was classified as incorrect might be correct and vice-versa. Error would occur at random, as determined by a random number generator on a calculator.
Initially, the error rate in Gorman's study was set at 0; subjects were told that error was possible, but encountered no actual errors. Subjects used 'replication plus extension' to eliminate the possibility of error: they proposed triples that were similar to, but not exactly the same as, previous triples in an effort to replicate the current pattern and extend it slightly, e.g., following '2,4,6' by '4,6,8.' This looks much like the positive test heuristic recommended by Klayman & Ha; the difference is now this strategy is used to eliminate error as well as corroborate one's hypothesis.
When replication was made much more difficult and costly, replication plus extension interfered with disconfirmation, causing subjects to adopt rules like 'numbers go up by ones' that were sub-sets of the actual rule (Gorman, 1989(c)). Note that this is one of the dangers of excessive reliance on the positive-test heuristic.
Changing the rate of error from zero to 20% greatly interfered with subjects' ability to solve even the simplest problems; only 13% of subjects in a 20% error condition solved Wason's classic 'ascending in order of magnitude rule, as opposed to 79% in a possible-error condition (Gorman, 1989(c)). These 20% error subjects made repeated attempts to replicate; this finding illustrates that, even on very simple artificial tasks, replication alone is not sufficient to isolate and eliminate errors. Obviously, scientists rely on other kinds of checks in addition to replication, e.g., refinement of procedures. Future experiments should simulate such procedures to investigate their relationship to replication.
Like many of the research traditions in cognitive psychology of science, studies of hypothesis-testing have become focused increasingly on issues raised by the research and not on applications to science. But the findings clearly have implications for falsification--both positive and negative test heuristics can falsify theories, which of these heuristics is most useful depends on how a scientist represents the relationship between the rule she is seeking and her current hypothesis and also on the amount of error in the data. As we will see under "Naturalistic Studies" below, there have been some recent attempts to apply these laboratory findings in more ecologically-valid settings.
Whereas the hypothesis-testing literature drew its original inspiration from Popper, the literature on conceptual change is at least partly inspired by (Kuhn, 1962) . For example, Chi (1992) used a Kuhnian framework to review the literature on conceptual changes in children and adults. She argues that radical conceptual change often occurs before anomaly recognition, whereas most of the hypothesis-testing literature tends to take anomaly recognition for granted--except under error conditions, it is clear when a triple is at variance with a hypothesis. Her own analysis suggests that recognition and resolution of anomalies requires a shift to a new system of categories similar to the kind of paradigm shift made famous by Kuhn.
Similarly, Carey (1992) compared the problems children ages 3 to 5 have differentiating weight and density with the problem scientists before Black had differentiating heat and temperature: in both cases, the view before differentiation seems to belong to a different, incommensurable paradigm from the view afterward. Neither Carey nor Chi can completely account for the circumstances that lead to these shifts; both suggest the science classroom is the place to look for answers, where one can try a variety of interventions to produce such changes.
Brewer & Samarapungavan (1991) reviewed the literature on whether children construct theories in a manner similar to scientists; these authors concluded "that the child can be thought of as a novice scientist, who adopts a rational approach to dealing with the physical world, but lacks the knowledge of the physical world and experimental methodology accumulated by the institution of science" (p. 210). Their point is that the apparent differences in thinking between children and adults is really due to differences in knowledge, not reasoning strategies. For example, they studied second-graders and showed that those who had a flat-earth mental model could incorporate disconfirmatory information consistent with a Copernican view by transforming their model into a hollow sphere. They then used this new mental model to solve a range of problems about the day/night cycle and motion of individuals and objects across the earth' surface. (See Vosniadou & Brewer, In Press.)
Brewer & Chinn (1991) have also explored the scientific beliefs of adults, by giving them brief readings on quantum theory or special relativity and asking them a series of follow-up questions. Both quantum theory and relativity make predictions that conflict with common-sense beliefs about space and time and cause and effect. Some subjects simply rejected the new information, resembling those scientists who cling to the old paradigm. Other subjects showed at least partial assimilation of the new material: they were able to give an answer that corresponded to what they had read, but they "sure didn't believe it" (p. 70). Another move was to interpret the answer in terms of existing beliefs, for example, by treating relativistic phenomena as optical illusions.
Other researchers have related the beliefs of adults to periods in history of science. (McCloskey, 1983) found that college students held beliefs about physics that resembled those of Philoponus (6th- century) and Buridan (14th-century), who thought that a force was required to set a body in motion, and that the force gradually dissipated. (Clement, 1983) found that freshman engineering students were a little more advanced: protocols of their attempts to solve motion problems resembled Galileo's reasoning in De Motu. These beliefs persist even among advanced physics and engineering students (Clement, 1982) . (Pittenger, 1991) calls for more research on how they could be changed, suggesting that the literature has focused too much on textbook science problems and we need to know more about how students view a range of naturalistic phenomena.
Another way to study conceptual change is to try to describe the cognitive processes that make experts better problem-solvers than novices in a particular domain. (Larkin et al, 1980) argued that experts have assembled and compiled a set of condition-action pairs known as productions. When an expert physicist encountered a familiar problem, the initial information typically triggered a set of productions which rapidly produced the correct equations--the expert had automated much of the problem-solving process, and worked forward from the information given. Novices, on the other hand, had to struggle backwards from the unknown solution, trying to find the right equations and quantities; they therefore took much longer even when they were able to find the correct result.
Expert physicists also represent problems in very different ways than novices; the latter try to use weak or general heuristics like 'working backward' to explore solutions, whereas the former try to re-represent a problem as an instance of a familiar class of problems that can be solved using appropriate equations. Interestingly, novices tended to try to apply equations early, whereas experts reason qualitatively until they arrive at a representation that suggests what set of equations to use (Larkin, 1983) . Similarly, (Chi, Feltovich, & Glaser, 1981) found "that experts tended to categorize problems into types that are defined by the major physics principles that will be used in solution, whereas novices tend to categorize them into types as defined by the entities contained in the problem statement" (p. 150).
Clement (1991) compared the way technical experts and novices solved problems like determining what happens when the width of the coils on a spring are doubled and the suspended weight is held constant. Experts used informal, qualitative reasoning processes to solve these sorts of problems; for example, they often constructed an analogous simpler case, e.g., imagining what happens if the coils are replaced by a U-shaped spring of the same length. Then they related the analogy to the case. They also used strategies like counterfactual reasoning, but only when pressed to justify their solutions more thoroughly.
Gentner & Gentner (1983) provided further evidence for the importance of analogical reasoning in expert problem-solving. They instructed naive subjects to use either a 'flowing waters' or a 'moving crowd' analogy; those with the latter training were better at problems involving parallel and series resistors than those with the former. Although this study did not involve experts, it suggests that analogical reasoning can play an important role in converting novices into experts. Maxwell, for example, made an important analogy between continuum dynamics and Faradays' electromagnetic fields (Nersessian & Greeno, 1990) ; this analogical reasoning involved manipulating abstract models in a way reminiscent of experts working on physics problems. (For a good overview of the role of analogies in scientific thinking, see Thagard's newest book--I'm still waiting for a reference.)
Subjects in these expert-novice comparisons typically work on textbook-style word problems, not hands-on laboratory problems. Indeed, (Dee-Lucas & Larkin, 1988) compared how novices and experts rated the importance of textbook presentations of work, energy, and fluid statics; novices used category information like whether the text included definitions, whereas expert physicists relied almost entirely on content. Therefore, findings from the expert-novice literature are especially relevant to educational situations (Reif & Larkin, 1991) but may have less relevance to scientific practice. (Green & Gilhooly, 1992) argues that "the standard expert-novice contrastive paradigm by requiring use of problems accessible to novices has led to a relative neglect of how experts tackle difficult problems and how experts detect and recover from errors in the face of task difficulty" (p. 67).
Furthermore, Klahr, Fay & Dunbar (1993) point-out that many of the studies cited above do not give children or adults the opportunity to design new experiments and formulate and evaluate hypotheses, whereas experiments with simulations like the 2-4-6 task do. To study conceptual change in novices and experts, one needs to look at how they search for new information. (Green & Gilhooly, 1992) found that good learners adopted an exploratory approach to learning how to use a statistical package on a computer and made good use of negative as well as positive feedback. In a study using a task that permitted children and adults to generate experiments and hypotheses, Klahr, Fay & Dunbar (1993) found that superior adult performance "appears to come from a set of domain-general skills that go beyond the logic of confirmation and disconfirmation and deal with the coordination of search in two spaces" (p. 141).
Perhaps the best work on search spaces in scientific reasoning has been conducted by Klahr, Dunbar, Shrager and others at Carnegie-Mellon (cf. Klahr & Dunbar, 1988; Klahr, Dunbar & Fay, 1990). They asked subjects to learn how a device called a 'Big Trak' functions by conducting experiments. Subjects had to generate and test hypotheses using strategies like confirmation and disconfirmation. The most successful subjects reacted to falsificatory evidence by developing new hypotheses that represented a 'shift in frame' which in turn suggested new areas of the problem space to search for evidence.
For example, in Klahr & Dunbar's study, subjects had to discover the function of a 'RPT' key. Most began with the idea that an instruction like RPT 4 meant 'repeat whatever program had been typed in four times' or 'repeat the last step in the program four times.' Typically, they began with confirmatory results and quickly obtained disconfirmatory information. In order to discover the rule, subjects had to change their representation of the role of the repeat key: it selected the step to be repeated, so that 'RPT 4' meant 'repeat step 4.' Subjects had to realize that the RPT key might serve as a selector, indicating which line was to be repeated, instead of a counter, indicating the number of times something was to be repeated. The shift from a counter to a selector frame directed subjects to a different part of the problem space to search for confirmations and disconfirmations. Klahr & Dunbar referred to this as switching between hypothesis and experimental spaces: a change in the type of hypothesis one is pursuing directs one to look for new kinds of evidence.
This is akin to how the DAX-MED version transforms the 2-4-6 task: instead of looking for one rule with exceptions, subjects shift to hypotheses consisting of two mutually-exclusive rules, and this also changes the type of experiments they conduct and how they interpret the evidence. Indeed, recent research by (Vallee-Tourangeau, Austin, & Rankin, In Press) suggests that the DAX-MED manipulation may be effective because it encourages subjects to consider a broader range of hypotheses. In Klahr & Dunbar's terms, this amounts to conducting a more thorough search of the hypothesis space. Similarly, the counterfactual strategy involves generating two hypotheses which cover different parts of the problem space and then--depending on the result--iterating through the generation-experimentation cycle again, coordinating hypothesis and experimental searches until the rule is found. This iteration may also lead to consideration of more hypotheses.
In a more recent study using a version of their RPT task, Klahr, Fay & Dunbar (1993) established that third and to a lesser extent sixth graders had trouble with evidence that disconfirmed counter hypotheses, in part because they could not switch to a selector hypotheses. Klahr, Fay & Dunbar (1993) interpret this as a failure to coordinate searches in hypothesis and experiment spaces, but their results also appear to support the idea that younger children had trouble processing falsifications: when children were trying to demonstrate that a counter hypothesis would work, "inconsistencies were interpreted not as disconfirmations, but rather as either errors or temporary failures to demonstrate the desired effect" (p. 140).
Another way to try to understand scientific thinking is to model it on a computer. Beginning with (Langley et al., 1987) , many of these simulations have emulated aspects of scientific discovery, though a few have overlapped a bit with hypothesis-testing. Some of these programs have been what (Cheng, 1992) calls 'psychologically plausible,' meaning that their processes resemble those of human problem-solvers; others have focused more on machine-learning and are more relevant to the creation of expert systems than the understanding of scientific thinking. (For a good review, see (Shrager & Langley, 1990) .) Naturally, we will restrict our coverage to psychologically plausible simulations. (Cheng, 1992) summarizes this wide range of systems as follows:
Almost all systems have modeled the activities of an individual scientists within a single research program, with an emphasis on inferences to generate or modify theoretical knowledge, often using empirical data that is near perfect (i.e., correct and noise free). When the experimental component is manifest, it is typically in the form of empirical data, although the multiple-process systems have begun to model experiments more fully. Systems often model quantitative or qualitative knowledge alone, although some do combine both. Finally, physics and chemistry have been the most popular modeling domains (p. 222).
Space does not permit a detailed coverage of this extensive literature, but let us consider a few examples in the light of Cheng's summary. Langley, Simon, Bradshaw & Zytkow (1987) described a series of computer programs including BACON, which re-discovered Kepler's laws of planetary motions by applying a series of heuristics to columns of numerical data. (Qin & Simon, 1990) demonstrated that human beings confronted with similar data followed similar processes, establishing the psychological plausibility of BACON. However, in this case, neither subjects nor program knew what they were discovering. Therefore, the performance of BACON is little different from that of a statistical package like SPSS that can be used to find correlations and compute relationships. (See Gorman, 1992.)
A number of the programs described by (Langley et al., 1987) can also discover qualitative relationships. The most sophisticated of these was developed by Kulkarni & Simon (1988) to closely model a historical case. They used (Holmes, 1980) account to reconstruct Krebs' path to the discovery of the ornithine cycle. Then they created a program called Kekada which simulated that path, treating falsifications as surprises that led Kekada to propose new experiments; results of these experiments were provided by the programmers. The authors pointed out that in terms of actually conducting experiments, Krebs' 'secret weapon' was his ability to slice tissues in a particular way. As (Shrager & Langley, 1990) argue, these programs--like the hypothesis-testing tasks considered in an earlier section--do not emulate the 'embodied' or 'hands on' aspects of scientific discovery.
While programs may not be embodied, these simulations have been extended to the role of visualization in discovery. Visualization is an important aspect of scientific problem-solving--experts may store chunks of perceptual knowledge as 'diagrammatic configuration schemas' (Koedinger & Anderson, 1990) . (Cheng & Simon, In Press) developed a program called HUYGENS which can simulate Galileo's discovery of the conservation of momentum. HUYGENS creates law-encoding diagrams that serve a role similar to Koedinger's diagrammatic configuration schemas, in that they encode expert knowledge in an abstract, physical representation. Novices could be taught these sorts of diagrammatical tools as a way of speeding their transition to expert status (Cheng, In Press).
To summarize, computer simulations can provide models for data-driven discovery, the use of diagrams in the development of theories, hypothesis-modification in the face of surprises, and anomaly resolution. (For additional examples, see the excellent volume edited by (Shrager & Langley, 1990). These models can then be studied experimentally and used to provide frameworks for observational studies of scientists. Naturalistic studies in turn can provide new cases which will be used in the development of future computational and experimental simulations.
Recently, cognitive scientists have shown a resurgent interest in naturalistic settings (cf. (Neisser, 1982) ). For example, (Kolodner, 1991) discusses the importance of developing machines that assist, rather than replace experts; such systems will have to complement the kind of case-based reasoning experts use in naturalistic settings. (Klein, 1989) explored expert/novice differences by studying fireground commanders and tank commanders; instead of textbook problems, he relied on observations and interviews about actual fires and combat situations.
All of our studies involved tasks that people had elected to perform and to master. Our "subjects" were not randomly assigned to these careers. There are advantages and disadvantages to this experimental approach. It gains in ecological validity and richness of observation while losing in rigor and certainty of conclusions. In our view, such studies are an important balance to laboratory paradigms that can carefully control the variables of interest but risk losing perspective about which variables are most important (p. 65).
A recent issue of the journal of cognitive science was devoted to a debate about "Situated Action," an approach that emphasizes "the role of the environment, the context, the social and cultural setting, and the situation in which actors find themselves..." (Norman, 1993, p. 1). Authors disagreed over the extent to which situated action was inconsistent with, or could complement, the more traditional symbolic processing view of cognition (Greeno & Moore, 1993; Vera & Simon, 1993) . One way to find out more about situated action--or situated cognition, as it is also called, is to conduct more naturalistic studies, looking at "problems arising in the performance of everyday activities, in which the problem is not defined aside form difficulties arising the activity itself and social and physical interaction enter into both the definition of the problem and the construction of a solution" ( Bredo, 1994, p. 23). These kinds of 'real-world' problem situations can potentially be analyzed from a symbolic perspective, but they tend to stretch the computational which might " become ladders to be thrown away after one has climbed them" ( Bredo, 1994, p. 34).
Most naturalistic work in cognitive psychology of science has been done with historical cases. (We will discuss an exception below.) Probably the most notable efforts are Gruber and Barrett's (1974) work on Darwin, and Tweney (1985) and Gooding's (1990a.) work on Michael Faraday.
Gruber originally expected to rely on the work of historians in his cognitive analysis of Darwin's development of evolutionary theory, but he found that his Piagetian background enabled him to see patterns in Darwin's activities that had eluded historians. He noticed that Darwin's apparently disparate activities were part of a 'network of enterprises' that had a motivational function: "If someone, for example Charles Darwin, becomes fascinated by science, the work itself draws him on; if he works hard, it is because the task is hard and he is engrossed in it...If he seems ascetic, it is because no jewel is more beautiful than the atom, no luxury cruise more fascinating than a voyage of discovery" (Gruber, 1989, p. 250). To understand a particular aspect of Darwin's work, it helped to understand how it fit into his overall network of enterprises. Gruber's provocative work on Darwin might profit from a dual-space analysis; Darwin's 'observational space' included detailed studies of barnacles, worms and coral and influenced his work in the evolutionary 'hypothesis space' in ways that are worth tracing in detail. Such a study would almost certainly modify or even refute aspects of the dual-space approach.
Tweney (1991) and Gooding (1990a) constructed detailed problem-behavior graphs of Faraday's problem-solving processes. This is a method of analyzing protocols developed by Newell and Simon (1972). Tweney (1985; 1989) used his own experimental work on confirmation and disconfirmation (see section on hypothesis-testing above) to frame and enrich his account of how Faraday used these strategies. Faraday wrote about the dangers of 'inertia of the mind," by which he meant 'premature attachment to one's own ideas,' but he also argued that it is important to ignore disconfirmatory evidence when one is dealing with a new hypothesis (Tweney, 1989, p. 355). In general, Faraday followed a kind of 'confirm early, disconfirm late' heuristic: confirm until you have a well-corroborated hypothesis, then try to disconfirm it. For example, his initial attempts to use magnets to induce an electric current produced apparent disconfirmations, but he ignored them--a single confirmation was more powerful than half-a-dozen disconfirmations, especially given the high possibility of error in his initial experiments. When he obtained a more powerful magnet, he was able to reduce the level of noise and obtain consistent confirmations. He later followed this early/late heuristic in a long series of attempts to use gravity to induce an electric current. While he never confirmed this hypothesis, he also never accepted the disconfirmatory evidence as decisive.
In the course of his research, (Gooding, 1990a) developed modified problem-behavior graphs and replicated many of Faraday's experiments in order to create a thorough, detailed protocol of his discovery of electromagnetic induction, showing the role of mental models and mechanical representations in his processes (Gooding, 1990b) . More recently, (Gooding & Addis, 1990, September 14-17) have developed an entire programming language called Clarity which allows them to chart Faraday's progress in a rigorous way. The goal is not to develop another discovery program; instead, it is to create a set of tools for cognitive analysis, one that others could employ to study new cases.
Gruber, Tweney and Gooding all illustrate that cognitive-psychological analyses of scientific discoveries can make a unique contribution that does not duplicate the work of historians and philosophers. There are also two cognitive studies of historical inventors that are also relevant to cognitive psychology of science: (Gorman, In Press) ) work on Alexander Graham Bell and (Bradshaw, 1992) comparison of the Wright brothers and early airplane inventors. Gorman collaborated with a historian of technology, W. Bernard Carlson, and adopted a framework similar to the one outlined by Tweney (1989). Like Gruber, Gorman found that cognitive approach forced him to look at primary sources in a different way. For example, it is sometimes argued that Bell 'stole' the idea for his invention from a rival inventor named Elisha Gray, but a cognitive analysis revealed that even though the two inventors produced sketches for devices that were physically similar, they had different mental models of how the devices functioned (Gorman, Mehalik, Carlson, & Oblon, 1993) . Recently, (Gorman, In Press) has modified Tweney's problem-behavior graphs to show how Bell employed confirmation and disconfirmation to test his hypotheses about the transmission of speech.
Gary Bradshaw has used the dual-space-search approach (see above) to discover why the Wright brothers succeeded when so many rivals failed. For experiment and hypothesis spaces, Bradshaw substituted design and function spaces, respectively. For example, design-space parameters could include the number, shape and angle of wings; each of these parameters can produce a large number of variations, and many airplane inventors got stuck trying endless variations.
In contrast, the Wright brothers focused on the function space, trying to maximize functions like lift by conducting systematic experiments and even re-calculating the existing values for lift. This approach might suggest that the Wright were actually working in an experiment space, but Bradshaw's point is that their designs and test were governed by their analysis of the different functions an airplane would have to perform--this functional breakdown was the key to their success, and why they were so far ahead of the competition.
Cognitive studies of historical cases can contribute a great deal to our understanding of both science and problem-solving, but they need to be complemented by fine-grained studies of contemporary scientists. However, most of the case-studies of working scientists are done from a sociological perspective that typically disparages the importance of cognitive factors (Latour, 1986) .
An exception is (Dunbar, In Press) , who is conducting a naturalistic study of four molecular biology laboratories, gathering data before, during and after laboratory meetings. He used previous experimental work (cf. (Dunbar, 1989) ) to provide a framework for analyzing protocols of the group interactions, but quickly found surprises.
For example, in terms of hypothesis-testing, unlike many laboratory subjects ,scientists quickly modified hypotheses in the face of disconfirmatory evidence. If the disconfirmatory evidence was inconsistent with any of the set of hypotheses the scientists was working on, she tended to dismiss it. But in laboratory meetings, the group of scientists often pushed the individual scientist to consider alternate hypotheses. Less experienced scientists showed more tendency to hold onto hypotheses than experienced ones; indeed, senior scientists displayed a kind of 'falsification bias,' discarding data that could have confirmed their current hypothesis. Dunbar speculated that this falsification bias was a protection against airing hypotheses that might later be proved wrong, a frequent experience for the senior scientists.
The problem with naturalistic cognitive studies of science and invention is the multiplicity of conceptual frameworks, each of which reveals new facets of the processes under investigation, but makes comparisons more difficult. In this sense, naturalistic studies mirror the larger literature. (Gorman, 1992; Tweney, 1989) have tried to provide cognitive frameworks that would accommodate a number of approaches and (Gooding & Addis, 1990, September 14-17) has tried to provide a set of tools that could add rigor and provide a means of translation among different approaches, but as long as cognitive psychologists of science are publishing in multiple different journals and attending different conferences, synthesis will be difficult. This is a problem that applies to psychology of science in general; we will address it more thoroughly at the end of this paper.
Dunbar's research suggests one possible solution: to shift iteratively between what he calls in vitro and in vivo studies, the former referring to experimental simulations and the latter naturalistic studies. The former allows us to isolated strategies like disconfirmation and study them under controlled conditions, but does not guarantee that results will generalize to actual scientific practice; the later allows us to explore how such strategies are actually used, but in a rich context where they are confounded with many other variables. The solution is to make these methods complement one another. For example, Dunbar is now conducting an experiment to investigate the circumstances under which falsification bias occurs.
Dunbar's work reminds us that science is frequently a group activity. There have been a few attempts to investigate the social psychology of science; we will turn to these in the next section.
Bradshaw, G. (1992). The Airplane and the Logic of Invention. In R. N. Giere (Ed.), Cognitive Models of Science (pp. 239-250). Minneapolis: University of Minnesota Press.
Bredo, E. (1994). Reconstructing educational psychology: Situated cognition and Deweyain pragmatism. Educational Psychologist, 29(1), 23-35.
Brewer, W. F., & Chinn, C. A. (1992). Entrenched beliefs, inconsistent information, and knowledge change. In L. Birnbaum (Ed.), Proceedings of 1991 International Conferences on the Learning Sciences, (pp. 67-73). Charlottesville: Association for the Advancement of Computing in Education.
Brewer, W. F., & Samarapungavan, A. (1991). Children's theories vs. scientific theories: Differences in reasoning or differences in knowledge? In R. R. Hoffman & D. S. Palermo (Eds.), Cognition and the symbolic processes: Applied and ecological perspectives Hillsdale, NJ: Lawrence Erlbaum Associates.
Campbell, D. T. (1989). Fragments of the fragile history of psychological epistemology and theory of science. In B. Gholson Shadish Jr., W. R., Neimeyer, R. A., & Houts, A. C. (Ed.), Psychology of Science (pp. 21-46). Cambridge: Cambridge University Press.
Carey, S. (1992). The Origin and Evolution of Everyday Concepts. In R. N. Giere (Ed.), Cognitive Models of Science (pp. 89-128). Minneapolis: University of Minnesota Press.
Cheng, P. C.-H. (1992). Approaches, models and issues in computational scientific discovery. In M. T. Keane & K. J. Gilhooly (Eds.), Advances in the psychology of thinking (pp. 203-236). London:
Cheng, P. C.-H. (In Press). Problem solving and learning with diagrammatic representations in physics. In D. Peterson (Ed.), Alternate representations: An interdisciplinary theme in cognitive science Intellect Books.
Cheng, P. C.-H., & Simon, H. A. (In Press). Scientific discovery and creative reasoning with diagrams. In S. Smith, T. Ward, & R. Finke (Eds.), The Creative Cognition Approach Cambridge, MA: MIT Press.
Chi, M. T. H., Feltovich, P. J., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. , 5, 121-152.
Chi, M. T. H. (1992). Conceptual change within and across ontological categories: Examples from learning and discovery in science. In R. N. Giere (Ed.), Cognitive Models of Science (pp. 129-186). Minneapolis: University of Minnesota Press.
Clement, J. (1982). Students preconceptions in introductory mechanics. American Journal of Physics, 50, 66-71.
Clement, J. (1983). A conceptual model discussed by Galileo and used intuitively by physics students. In D. &. S. Gentner A. L. (Ed.), Mental Models Hillsdale, New Jersey: Lawrence Erlbaum Associates.
Clement, J. (1991). Experts and science students: The use of analolgies, extreme cases, and physical intuition. In J. F. Voss, D. N. Perkins, & S. D.W. (Eds.), Informal Reasoning and Education Hillsdale: Lawrence Erlbaum Associates.
Dee-Lucas, D., & Larkin, J. H. (1988). Novice rules for assessing importance in scientific texts. , 27, 288-308.
Dunbar, K. (1989). Scientific reasoning strategies in a simulated molecular genetics environment. Program of the Eleventh Annual Conference of the Cognitive Science Society, 426-433.
Dunbar, K. (In Press). How scientists really reason: Scientific reasoning in real-world laboratories. In R. J. Sternberg & J. Davidson (Eds.), The Nature of Insight Cambridge, MA: MIT Press.
Farris, H. &. Revlin., R. (1989). The discovery process: A counterfactual strategy. Social Studies of Science, 19, 497 - 513.
Farris, H. &. Revlin., R. (1989a). Sensible reasoning in two tasks: Rule discovery and hypothesis evaluation. Memory & Cognition, 17(2), 221-232.
Gentner, D., & Gentner, G. R. (1983). Flowing waters or teeming crowds: Mental models of electricity. In D. Gentner & Stevens, A. L. (Ed.), Mental Models (pp. 99-129). Hillsdale, NJ: Lawrence Erlbaum Associates.
Gooding, D. (1990a). Experiment and the Making of Meaning: Human Agency in Scientific Observation and Experiment. Dordrecht: Kluwer Academic Publishers.
Gooding, D. (1990b). Mapping experiment as a learning process: How the first electromagnetic motor was invented. Science, Technology and Human Values, 15(2), 165-201.
Gooding, D., & Addis, T. (1990, September 14-17). Towards a dynamical representation of experimental procedures. In Rediscovering Skill in Science, Technology and Medicine,. Bath, UK:
Gorman, M. E., & Gorman, Margaret E. (1984). A Comparison of disconfirmatory, confirmatory and a control strategy on Wason's 2-4-6 task. Quarterly Journal of Experimental Psychology, 36A, 629-648.
Gorman, M. E., Stafford, A., & Gorman, Margaret E. (1987). Disconfirmation and dual hypotheses on a more difficult version of Wason's 2-4-6 task. Quarterly Journal of Experimental Psychology, 39A(1-28).
Gorman, M. E. (1989(c)). Error, falsification and scientific inference: An experimental investigation. Quarterly Journal of Experimental Psychology, 41A, 385-412.
Gorman, M. E. (1991). Counterfactual simulations of science: A response to Farris and Revlin. Social Studies of Science, 21, 561-564.
Gorman, M. E. (1992). Simulating Science: Heuristics, Mental Models and Technoscientific Thinking. Bloomington: Indiana University Press.
Gorman, M. E. (In Press). Confirmation, Disconfirmation and Invention: The Case of Alexander Graham Bell and the Telephone. Thinking and Reasoning, I.
Gorman, M. E., Mehalik, M. M., Carlson, W. B., & Oblon, M. (1993). Alexander Graham Bell, Elisha Gray and the Speaking Telegraph: A Cognitive Comparison. History of Technology, 15, 1-56.
Green, A. J. K., & Gilhooly, K. J. (1992). Empirical advances in expertise research. In M. T. Keane & K. J. Gilhooly (Eds.), Advances in the psychology of thinking London:
Greeno, J. G., & Moore, J. L. (1993). Situativity and symbols: Response to Vera and Simon., 17, 49-59.
Gruber, H., & Barrett, P. H. (1974). Darwin on Man. New York: Dutton.
Gruber, H. E. (1989). Networks of enterprise in creative scientific thought. In B. Gholson Shadish, W. R., Neimeyer, R. A., & Houts, A. C. (Ed.), Psychology of Science: Contributions to Metascience (pp. 246-266). Cambridge: Cambridge University Press.
Holmes, F. L. (1980). Hans Krebs and the discovery of the ornithine cycle. Federation Proceedings, 39, 216-225.
Johnson-Laird, P. N. (1983). Mental Models. Cambridge: Harvard University Press.
Klayman, J., & Ha, Y.-W. (1987). Confirmation, disconfirmation and information in hypothesis testing. Psychological Review, 94, 211-228.
Klahr, D., & Dunbar, K. (1988). Dual space search during scientific reasoning. Cognitive Science, 12, 1-48.
Klahr, D., Dunbar, K., & Fay, A. L. (1990). Designing good experiments to test bad hypotheses. In J. Shrager & Langley, P. (Ed.), Computational Models of Discovery and Theory Formation (pp. 355-402). San Mateo, CA: Morgan Kaufmann Publishers, Inc.
Klahr, D., & Fay, A. L. (1993). Heuristics for Scientific Experimentation: A Developmental Study. Cognitive Psychology, 25, 111-146.
Klein, G. A. (1989). Recognition-primed decisions. In W. B. Rouse (Ed.), Advances in man-machine systems research (pp. 47-92). Greenwich, CT: JAI Press.
Koedinger, K. R., & Anderson, J. R. (1990). Abstract planning and perceptual chunks: Elements of expertise in geometry. Cognitive Science, 14, 511-550.
Kolodner, J. L. (1991). Improving human decision making through case-based decision aiding. AI Magazine(Summer), 52-68.
Kuhn, T. S. (1962). The Structure of Scientific Revolutions. Chicago: University of Chicago Press.
Kulkarni, D., & Simon, H. A. (1988). The processes of scientific discovery: The strategies of experimentation. Cognitive Science, 12, 139-175.
Lakatos, I. (1978). The Methodology of Scientific Research Programmes. Cambridge: Cambridge University Press.
Langley, P., Simon, H. A., Bradshaw, G. L., & Zykow, J. M. (1987). Scientific Discovery: Computational Explorations of the Creative Processes. Cambridge: MIT Press.
Larkin, J. H., McDermott, J., Simon, D. P., & Simon, H. A. (1980). Expert and novice performance in solving physics problems. Science, 208, 1335-1342.
Larkin, J. (1983). The role of problem representation in physics. In D. Gentner & Stevens, A. L. (Ed.), Mental Models (pp. 75-98). Hillsdale, NJ: Lawrence Erlbaum Associates.
Latour, B. &. W., S. (1986). Laboratory Life: The Construction of Scientific Facts. Princeton: Princeton University Press.
McCloskey, M. (1983). Naive theories of motion. In D. Gentner & Stevens, A. L. (Ed.), Mental Models (pp. 299-324). Hillsdale, NJ: Lawrence Erlbaum Associates.
Mynatt, C. R., Doherty, M. E., & Tweney, R.D. (1977). Confirmation bias in simulated research environment: An experimental study of scientific inference. Quarterly Journal of Experimental Psychology, 29, 85-95.
Mynatt, C. R., Doherty, M. E., & Tweney, R. D. (1978). Consequences of confirmation and disconfirmation in a simulated research environment. Quarterly Journal of Experimental Psychology, 30, 395-406.
Neisser, U. (1982). Memory Observed. San Francisco: W.H. Freeman.
Nersessian, N. J., & Greeno, J. G. (1990). Multiple abstracted representations in problem solving and discovery in physics. Proceedings of the Cognitive Science Society, 12, 77-84.
Newell, A., & Simon, H. A. (1972). Human Problem Solving. Englewood Cliffs, NJ: Prentice-Hall.
Norman, D. A. (1993). Cognition in the head and the world: An introduction to the special issue on situated action., 17, 1-6.
Oaksford, M., & Chater, N. (1994). Another look at eliminative and enumerative behaviour in a conceptual task. The European Journal of Cognitive Psychology, 6, 149-169.
Pittenger, J. B. (1991). Cognitive physics and event perception: Two approaches the assessment of people's knowledge of physics. In R. R. Hoffman & D. S. Palermo (Eds.), Cognition and the symbolic processes: Applied and ecological perspectives Hillsdale, NJ: Lawrence Erlbaum Associates.
Qin, Y., & Simon, H. A. (1990). Laboratory replication of scientific discovery processes. Cognitive Science, 14, 281-312.
Reichenbach, H. (1938). Experience and Prediction. Chicago: University of Chicago Press.
Reif, F., & Larkin, J. H. (1991). Cognition in scientific and everyday domains: Comparison and learning implications., 28(9), 733-760.
Shrager, J., & Langley, P. (1990). Computational Models of Scientific Discovery and Theory Formation. San Mateo, CA: Morgan Kaufmann Publishers, Inc.
Tweney, R. D., Doherty, M. E., Worner, W. J., Pliske, D. B., Mynatt, C. R., Gross, K. A., & Arkkelin, D. L. (1980). Strategies of rule discovery on an inference task. Quarterly Journal of Experimental Psychology, 32, 109-123.
Tweney, R. D. (1985). Faraday's discovery of induction: A cognitive approach. In D. Gooding & James, F. (Ed.), Faraday Rediscovered: Essays on the Life and Work of Michael Faraday: 1791-1867 New York: Stockton Press.
Tweney, R. D. (1989). A framework for the cognitive psychology of science. In B. Gholson Shadish Jr., W. R., Neimeyer, R. A., & Houts, A. C. (Ed.), Psychology of Science Cambridge: Cambridge University Press.
Tweney, R. D. (1991). Faraday's notebooks: the active organization of creative science. Physical Education, 26, 301-306.
Vallee-Tourangeau, F., Austin, N. G., & Rankin, S. (In Press). The Dax-Med effect in the 2 4 6 induction task: A test of the information-quantity hypothesis, a refutation of the goal-complementarity hypothesis, and a few other points.
Vera, A. H., & Simon, H. A. (1993). Situated action: A symbolic interpretation., 17(1), 7-48.
Vosniadou, S., & Brewer, W. F. (In Press). Mental models of the Earth: A study of conceptual change in childhood. Cognitive Psychology.
Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. Quarterly Journal of Experimental Psychology, 12, 129-140.
Wharton, C. M., Cheng, P. W., & Wickens, T. D. (1993). Hypothesis-testing strategies: Why two goals are better than one. , 46A(4), 743-758.