This posting describes the difference between Exploratory Data Analysis (EDA) and Confirmatory Data Analysis (CDA). Tukey (1977) distinguished between EDA and CDA. Confirmatory Data Analysis tests hypotheses and produces estimates with a specified precision. Regression analysis, Analysis of Variance, and Hypothesis Tests are examples of Confirmatory Data Analysis. Confirmatory Data Analysis requires hypotheses or assumptions to consider and evaluate.
Exploratory Data Analysis makes few assumptions, and its purpose is to suggest hypotheses and assumptions. Consider the OEM manufacturer described in the posting on 1/30/2008. The company was experiencing customer complaints. A team wanted to identify and remove causes of these complaints. They asked customers for usage data so the team could calculate defect rates. This started an Exploratory Data Analysis. The team plotted a control chart, and these charts identified a high defect rate in October, 1991. The investigation established that a supplier used the wrong raw material. Discussions with the supplier and team members motivated further analysis of raw material, and its composition. This decision to analyze raw material completed the Exploratory Data Analysis. The Exploratory Data Analysis used both data analysis and process knowledge possessed by team members. The supplier and company conducted a series of designed experiments which identified an improved raw material composition. Using this composition, the defect rate improved from .023% to .004%. The experimental design and its analysis was Confirmatory Data Analysis. Note that the experimental design required a hypothesis generated by the Exploratory Data Analysis.
Tukey states that EDA is detective work. He uses the criminal justice process as an analogue to illustrate the roles of EDA and CDA. A detective investigating a crime needs both tools and understanding. The detectives and other investigative units search for and produce evidence. The juries and judges evaluate the evidence’s strength. Exploratory Data Analysis uncovers statements or hypotheses for Confirmatory Data Analysis to consider. Experimental design and regression modeling are more effective if Exploratory Data Analysis uncovers precise statements or hypotheses. Admittedly, one can conduct experiments searching for hypotheses; however, our viewpoint is that preliminary Exploratory Data Analyses may reduce the costs of these experiments.
Exploratory and Confirmatory Data Analyses can be thought of as part of statistical thinking. De Mast and Trip (2007) present principles for more effective EDA in quality improvement projects. We will examine results from their paper in future postings. Their paper won the Nelson award for the paper having the greatest immediate impact for practitioners published during 2007 in the Journal of Quality Technology.
References
- John W. Tukey (1977). Exploratory Data Analysis, Addison-Wesley Publishing Co.
- de Mast, Jeroen and Albert Trip (2007). “Exploratory Data Analysis in Quality-Improvement Projects”, Journal of Quality Technology, 39(4): 301-311.