The purpose of Exploratory Data Analysis (EDA) is to generate hypotheses or clues that guide us in improving quality or process performance. Breyfogle (2003, pgs. 10-11) views Six Sigma as a murder mystery where we use a structured approach to uncover clues that lead us to improve process outputs. These clues are Key Process Input Variables (KPIVS) and process improvement strategies. As an example, he considers the process of traveling to work where the Key Process Output Variable (KPOV) is the arrival time. Examples of KPIVs are the setting of our alarm clock and our departure time. An alternative process improvement strategy might be a different travel route that is less subject to variation during congested time periods. Then, the route selected is another KPIV, and the travel time along that route is a function of both the route and departure time. Exploratory Data Analysis helps us identify these KPIVs.
De Mast and Trip (2007) state that the purpose of EDA from a quality improvement project viewpoint is to identify the dependent (Y) and independent (X) variables that may help understand or solve the quality problem. The dependent Y variables are KPOVs, and the independent X variables are KPIVs. Leitnaker (2000) gives an example of EDA to identify KPIVs. The example is a molding operation where:
- Yields are erratic
- Parts are produced that do not meet specifications
- Shipment schedules are not consistently met
A team studied a molding operation supplying plastic switches to industrial customers for use in assembled control pads. The operation has eight machines, each machine has two molds, and each mold has four cavities. To investigate the process capability, the team took a sample of size 5 from the output of one machine every 4 hours. The following control chart displays the results for a critical dimension.
The process is in control, and the range chart supported this conclusion. But the variation is large. Next the team investigated the effect of the cavities and molds on the measured dimension. To do this, they sampled one part from each of the four cavities of the two molds on one machine. Breaking down the data by cavity and mold is an example of stratification. Control charts for the individual cavities and molds showed that all cavities and molds appear to be in control. However, mold 2 cavities have larger averages than mold 1 cavities, and the averages for the cavities increases with cavity number. The following figure clearly shows this pattern.
The figure leads us to identify mold and cavities numbers as KPIVs. The exploratory data analysis produced a clue which generated a search for the reasons that molds and cavities produced different average dimensions. The team can proceed to reduce the variability in the measured dimension by reducing the differences in averages for the molds and cavities.
References
- Breyfogle, F. W. (2003). Implementing Six Sigma. Hoboken, New Jersey, John Wiley & Sons, Inc.
- De Mast, Jeroen and Albert Trip (2007). “Exploratory Data Analysis in Quality-Improvement Projects”, Journal of Quality Technology, 39(4): 301-311.
- Leitnaker, M. G. (2000). Using the Power of Statistical Thinking, Special Publication of the ASQ Statistics Division, Summer 2000.