Process Average and Standard Deviation Calculations for Samples and Populations
The knowledge of certain properties of a subset (sample), can be used to draw conclusions about the properties of the whole set (populations). Properties can be of two types, as discussed in earlier chapters:
1. Quantitative (variable). These properties can be observed and recorded in units of measure such as the diameter of shafts. The units are all produced under replicating conditions in production.
2. Qualitative (attribute). These properties can be observed when units are being tested with the same set of gauges or test equipment; for example, the set of all shafts produced under the same conditions, either fitting or not fitting into a tester consisting of a dual set of collars. A shaft with a diameter within specifications should fit into one of the collars whose diameter is equal to the shaft upper specification limit, and the shaft should not fit into the other collars whose diameter is equal to the lower specification limit.
The sample size (?i) is the random choice of n objects from a population, each independent of each other. As rt approaches c〇, the sample distribution values of average and standard deviation become equal to that of the population.
It has been shown in Chapter 3 that variable control charts constitute a distribution of sample averages, with constant sample size n.
This distribution is always normal, even if the parent population distribution is not normal. It has also been shown the standard deviation S of the distribution of sample averages is related to the parent distribution standard deviation by the central limit theorem, which states that s =α/V^ (Equation 3.5). The number of samples needed to construct the variable chart control limits was also set at a high level of 20 successive samples to ensure that the populationσ will be known.
When the total number in the samples (/z) is small, very little can be determined by the sampling distribution for small values of n, unless an assumption is made that the sample comes from a normal distribution. The normal distribution assumes an infinite number of occurrences that are represented by the process average μ and standard deviationσ. The Student’s t distribution is used when n is small. The data needed to construct this distribution are the sample average X and sample standard deviation s, as well as the parent normal distribution average μ,
where t is a random variable having the f distribution with v = n - 1 .
v = degrees of freedom (DOF) = n — 1 (5.2)
It can be seen from Figure 5.1 that the shape of the t distribution is similar to the normal distribution. Both are bell-shaped and distributed symmetrically around the average. The t distribution average is equal to zero and the number of degrees of freedom governs each t distribution. The spread of the distribution decreases as the number 〇f degrees of freedom increases. The variance of the t distribution al> ways exceeds 1, but it approaches 1 when the number n approaches infinity. At that time, the t distribution becomes equal to the normal distribution.
The t distribution can be used to determine the area under the curve,called significance orα given a t value. However, the f distribution is different from the normal distribution in that the number in the sample or degrees of freedom v have to be considered. The table output value of variable t, called ta, is given, corresponding to each area under the t distribution curve to the right ofα and with v degrees of freedom. Figure 5.2 shows an example of how the ta is related to the significance. The term “significance” is not commonly used, but its complement is called confidence, which is set to 1 minus significance and expressed as a percent value:
Table 5.1 shows a selected set of the values of ta. The t distribution is used in statistics to confirm or refute a particular claim about a sample versus the population average. It is always assumed that the parent distribution of the t distribution is normal. This is not easily verified using the formal methods discussed in Chapter 2, since the sample size is small. In most cases, the graphical plot method of the sample data discussed in Chapter 2 is the only tool available.
Historically, the confidence percentage used depended on the particular products being made. For commercial products, a 95% confidence level is sufficient, whereas for medical and defense products, which require higher reliability, 99% confidence has been used.
The higher the confidence percentage, the larger the span of the confidence interval and its endpoints, the confidence limits. For low-volume production data, the confidence limits for the population average fi and standard deviation <r estimates are used to give an estimate of the span of these two variables. The 95% confidence limits can be used for calculating six sigma data (Cpk, defect rates, FTY), whereas higher confidence numbers (99% and 99.9%) can be used as worst- case conditions checks on the base calculations.