Population & Samples

Many times it isn't possible or practical to analyze an entire population, Therefore, samples are obtained from a portion of the population. There are various techniques for gathering samples and these should be understood by a Six Sigma project manager. 

When there is an attempt to measure across the entire population, then this is referred to as census. But this can be costly, destructive, and time consuming.

There are several methods of sampling. It is important to choose the best plan to provide the best output of information about the entire population. A few common methods are listed below and explained further down the page.

Often when sample sizes are large enough and the data is normally distributed, the Central Limit Theorem applies opening up the use a several simple parametric hypothesis tests for statistical analysis in the ANALYZE phase.

It is very important for the GB/BB to get the proper amount of samples to understand the power of the test, ensure the assumptions are met, analyze normality, but yet minimize resources and destruction of parts.

Random Sampling Methods

  • Simple Random Sampling
  • Stratified Sampling
  • Sequential Sampling
  • If you were trying to evaluate the average length of every rainbow trout in the freshwater lakes of Minnesota, it would not be practical or affordable. A sampling plan would be devised to gather a some of the trout and study them. From this, references to the population with specified levels of confidence can be done. 

    Nonrandom Sampling

  • Judgement Sampling
  • Convenience Sampling
  • Snowball Sampling 
  • These techniques are not preferred due to the additional risk of sampling error introduced. The error can not be calculated and these results are not preferred to infer about the population from with the sample was selected. 

    Population and a Sample



    The denominator in the standard deviation for a population is N, the denominator for a sample is n-1. The "n-1" is an unbiasing factor and as the sample size approaches infinity, the value of "n-1" approaches "N".

    Understand the results of the statistical program or calculator being used.

    The difference between long term and short term samples



    Notations

    Population and Sample


    Descriptive measures that describe a POPULATION are called PARAMETERS.
    Descriptive measures that describe a SAMPLE are called STATISTICS.

    Greek letters are typically used to denote PARAMETERS.




    Random Sampling Methods

    Simple Random Sampling

    Select this plan if every sample in the population has an equal chance of being selected and there are no subgroups known within the population. The picture below assumes a samples (x’s) are equal and that selecting any of them (a sampling) from the entire population will represent and behave similar to the rest of the population.

    • The population consists of "N" items.
    • The sample consists of "n" items.
    • And all possible samples have equal chance of being selected.

    Statistical analysis is only valid when there is a random sampling approach. One example of this method is picking names out of a hat. If all the names are in a hat on the exact same medium (none are heavier, bigger, etc) and each name is entered the same amount of times then each name has the same chance of being selected. This is analogous to the lottery approach

    Random Sample



    Stratified Sampling

    Dividing the population into subgroups of interest and sampling either sequentially or randomly within each subgroup. This is important to make sure there is representation from all stratifications in the population.

    A subgroup may be data taken at certain temperature range, specific shift, under certain pressure, different machine groups, slower speed versus higher speed, and other different conditions.

    Stratified Sampling



    Sequential Sampling

    Acquiring data at specified intervals such as every hour, every 5th form, or on a particular shift. Ensure the interval does not introduce a pattern that may be biased to a specific person, machine, or part each time the data point is collected.



    Sequential Sampling


    An appropriate and disciplined plan needs to be clearly understood by those collecting the data. Since the collection process can be expensive and time consuming there may be bias introduced by people making educated guesses, predictions of data, and collecting data that is convenient and simple.

    There are also guidelines for the quantity of samples needed for various types of data. The more data you can obtain the more likely it will represent the performance of the entire population (long-term performance of the process).

    When describing and presenting the data, inform the audience and record the method used to collect the data on the Data Collection Plan.




    Return to the DEFINE phase

    Return to the MEASURE phase

    Templates and Calculators

    Return to the Six-Sigma-Material Home Page


    Recent Articles

    1. Data Classification

      Jul 17, 16 12:12 AM

      Proper data classification is necessary to select correct statistical tools

      Read More

    2. 7-Wastes

      Jun 22, 16 07:13 PM

      Description of the 7-Wastes, also called Muda

      Read More

    3. Process Capability Indices

      Feb 03, 16 10:43 PM

      Determing the process capability indices, Pp, Ppk, Cp, Cpk, Cpm

      Read More


    Click for a Password


    Six Sigma
    Templates & Calculators
    $14.95 USD



    Six Sigma Modules

    The following presentations are available to download

    Click Here

    Green Belt Program 1,000+ Slides
    Basic Statistics
    SPC
    Process Mapping
    Capability Studies
    MSA
    Cause & Effect Matrix
    FMEA
    Multivariate Analysis
    Central Limit Theorem
    Confidence Intervals
    Hypothesis Testing
    T Tests
    1-Way Anova Test
    Chi-Square Test
    Correlation and Regression
    SMED
    Control Plan


    Six Sigma & Lean  Courses

    Agile & Scrum Online Course