Basic Concepts of Quantitative Research

Dr. R. Ouyang

 

Inferential Statistics

 

 Inferential statistics deal with, of all things, inferences.  Inferences about populations based on the behavior of samples.  Inferential statistics are concerned with determining how likely it is that results based on a sample or samples are the same results that would have been obtained for the entire population.

 

Concept of standard error

            If we randomly select a number of samples from the same population and compute the mean for each it is likely that each mean will be somewhat different from each other mean, and that none of the means will be identical to the population mean.  The chance variation among the means is referred to as sample error.  If a difference is found between sample means, the question of interest is whether the difference is a result of sampling error or a reflection of a true difference.

 

Sample size and standard error

            Sample errors are interesting.  They are normally distributed and most of the sample means will be very close to the population mean; the number of means which are considerably different from the population mean will decrease as the size of the difference increases.

            Standard deviation of the sample means (the standard deviation of sampling errors) is usually referred to as the standard error of the mean.  The standard error of the mean tells us by how much we would expect our sample means to differ if we used other samples from the same population.  According to normal curve percentages, we can say that approximately 68% of the sample means will fall between plus and minus one standard error of the mean, 95% will fall between plus and minus two standard errors, and 99+% will fall between plus and minus three standard errors.

            If we know the standard deviation, then the standard error of the mean is equal to the standard deviation divided by the square root of the sample size.  SE(mean) = SD / square root of (N-1).  If a sample mean is 80, and the SE mean is 1.00, if we say that the population mean falls between 79 and 81, we have approximately  68% chance of being correct, if we say that the population mean falls between 78 and 82, we will have approximately a 95% chance of being correct, if we say that the population mean falls between 77 and 83, we will have approximately a 99+% chance of being correct.  In another word, the probability of the population mean being less than 77 and larger than 83 is less than 1%.

            It is obvious now that a smaller standard error indicates less sampling error.  The major factor affects standard error of the mean is same size.  The size of the sample increases, the standard error of the mean decreases.  Another factor affecting the standard error of the mean is the size of the population standard deviation.  If the population standard deviation is large, members of the population are very spread out on the variable of interest, and the sample means will also be very spread out. 

            In order to determine whether or not the difference between those means probably represents a true population difference, we need an estimate of the standard error of the difference between two means. 

 

Test Null hypothesis

            When we talk about the difference between two sample means being a true difference we mean that the difference was caused by the treatment and not by chance.  The chance explanation for the difference is called the null hypothesis.  The null hypothesis says in essence that there is no difference or relationship between parameters in the populations and that any difference or relationship found for the samples is the result of sampling error.  The research hypothesis usually states that one method is expected to be more effective than another.  Utilizing null hypothesis is more conclusive support for a positive research hypothesis.  Suppose one hypothesizes that all research textbooks contain a chapter on sampling.  If he or she examines and finds that a book does contain the chapter, it does not approve the hypothesis, because it is only one book.  In other word, if he or she finds a book does not contain the chapter, it is enough to disapprove the hypothesis.  The result of a study can reject the null hypothesis or not reject the null hypothesis.  If it is rejected, the hypothesis PROBABLY false, if it is not rejected, the hypothesis PROBABLY true.

 

Test of significance

              In order to test a null hypothesis we need a test of significance and we need to select a probability level that indicates how much risk we are willing to take that the decision we make is wrong.

            At the end of an experimental research study, if there is a little difference between the group means, then researcher needs to decide whether the difference is significant or different enough to conclude that they represent a true difference.

            The test of significance is made at a pre-selected probability level and allows the researcher to state that he has rejected the null hypothesis.  The level will be usually set at 0.05, or 0.01.  That means the researcher will have 5% or 1% times to find the difference by chance.  There are a number of different tests of significance that can be applied in research studies, t test, analysis of variance and chi square etc.

 

Type I and type II errors

            Based on a test of significance the researcher will either reject or not reject the null hypothesis as a probable explanation for results.  Here are four possible situations:

1) The null hypothesis is true (A=B), and the researcher concludes that it is true, no difference between A and B.

 

2) The null hypothesis is false (A not = B), and the researcher concludes that it is false, difference exists between A and B.

 

3) The null hypothesis is true (A=B), and the researcher concludes that it is false, difference exists between A and B.

 

4) The null hypothesis is false (A not = B), and the researcher concludes that it is true, no difference between A and B.

 

            In 1 and 2, the researcher is making correct conclusion.  However, in the case of 3 and 4, the researcher is making wrong conclusion, is making errors.  In case 3, the researcher rejects a null hypothesis that really true, and is making a Type I error.  In the case 4, the researcher fails to reject the null hypothesis that is really false, and is making a Type II error.

            The researcher makes the decision to reject or not reject the null hypothesis with a given probability of being correct.  This probability of being correct is referred to as the significance level or probability level of the test of significance, .05 or .01.  If .05 is set, the researcher will have a 5% probability of making a type I error, whereas, if .01 is selected, the researcher will have only 1% of probability of committing Type I error.  In other word, working at .05, the researcher has 95% chance of making correct decision; working with .01 the researcher will have 99% of chance of making correct decision.

 

Concept of significant level

            The question follows, .01 or .05, which of the level is better or should we select for our research.  The .01  is smaller than .05, selecting .01, we can definitely decrease our chances of committing a Type I error; but, we increase the probability of committing a Type II error.

            A common misconception among beginning researchers is the notion that if you reject a null hypothesis you have "proven" your researcher hypothesis.  In fact, rejection of a null hypothesis of lack of rejection, only supports or does not support a research hypothesis.  The conclusion of a significant difference existed between the variables, it does not the difference was the reason you hypothesized in your research.  On the other hand, if the conclusion is not to reject the hypothesis, that does not mean the research hypothesis is wrong.

 

One-tailed and two-tailed tests

            A two tailed test allows for the possibility that a difference may occur in either direction; either group mean may be higher then the other (A>B or B>A).  A one-tailed test assumes that a difference can only occur in one direction; the null hypothesis states that one group is not better than another (A not > B).

            Test of significance are almost always two-tailed.  To select a one-tailed test of significance the researcher has to be pretty sure that a difference can only occur in one direction.  Of course, it is not very often case. 

            Suppose computing significance with a t test at the .05 level, the two-tailed test will be allowed to have the possibility of a positive t and a negative t.  (.25 + .25 = .05).

 

Degree of freedom

            Degrees of freedom (df) are a function of such factors as the number of subjects and the number of groups.  Suppose I ask you name any 5 numbers "32, 45, 65, 67, 78", you have five choices or 5 degrees of freedom.  Now count 5 numbers more, "1, 2, 3, 4...", and I want to you to have the mean of the five number equals to 4, the last number must be 10.  You lost 1 choice or 1 degree of freedom in this case, because you had one restriction that the mean must be 4.  For Pearson r, the degrees of freedom are always N-2.

 

Parametric and non-parametric tests

            Parametric tests are usually more powerful and generally to be preferred.  However, parametric tests require that certain assumptions be met in order for them to be valid.  Three common assumptions are: normal distribution of data, interval or ratio data, and randomization of sampling.

            If the distribution is extremely skewed, nonparametric tests should be used.  Nonparametric tests make no assumptions about the shape of the distribution.  The advantages of use parametric tests are: 1) more powerful, 2) to test some hypotheses that cannot be tested with nonparametric tests.

 

t test

            The t test is used to determine whether two means are significantly different at a selected probability level. The t test involves forming the ratio of actual observed mean and expected mean.  The numerator for a t test is the difference between the sample means X1 and X2, and the denominator is the chance difference that would be expected if the null hypothesis were true-- the standard error of the difference between the means.

            The t ration determines whether the observed difference is sufficiently larger than a difference that would be expected by chance.  The t value from calculation will be compared with the appropriate t table value (depending upon the probability level and the degrees of freedom).  If the calculated t value is equal or larger than the table value, then the null hypothesis is rejected.

Independent and non-independent samples

            Independent samples are samples that are randomly formed, that is formed without any type of matching.

            Nonindependent samples are samples formed by some type of matching.

t test for independent samples

            The t test for independent samples is used to determine whether there is probably a significant difference between the means of two independent samples.

                  _    _

            t = X1 - X2  / square root of (SS1 + SS2) / (n1 + n2 -2)  * (1 / n1  + 1 / n2) 

 


t test for nonindependent samples

            The t test for nonindependent samples is used to determine whether there is probably a significant difference between the means of two matched or nonindependent samples or between the means for one sample at two different times.

            When samples are nonindependent, the error term of the t test tends to be smaller and therefore there is a higher probability that the null hypothesis will be rejected.

            t = D / square root of (sum of D square - sum square of D / N)  / N(N-1) 


 

Major problem associated with analyzing gain scores

            The gain scores mean the difference between the pretest and posttest.  The major problem associated with calculating a t value for the difference (gain scores) is being lack of equal opportunity to grow.

            If both groups are essentially the same on the pretest, t test can be used directly; on the other hand, there is a difference between the two groups on the pretest, the preferred posttest analysis is analysis of covariance.

Simple analysis of variance

            Simple analysis of variance is also named as one-way analysis of variance (ANOVA).  ANOVA is used to determine whether there is a significant difference between two or more means at a selected probability level. ANOVA is applied and an F ratio is computed in the analysis.  The greater the difference is, the larger the F ratio will be.  To determine whether or not the F ratio is significant an F table is entered at the place corresponding to the selected probability level and appropriate degrees of freedom.

            Total sum of squares = between sum of squares + within sum of squares 


                        SS (total) = SS (between) + SS (within)


           

Multiple comparison of variance

            Based on ANOVA, if the F ratio is determined to be nonsignificant, the party is over.  But if it is significant, and more than two means are involved, multiple comparison procedures are used to determine which means are significantly different from which other means.  Scheffe test is one of the multiple comparison techniques and is appropriate for making any and all possible comparisons involving a set of means.  Scheffe test is very conservative.  It commits to a Type I error and possibly to find no significant difference with given means even F for the analysis of variance was significant.

 

            F = Square of (X1 -X2) / MS (within) * (1/n1 + 1/n2) * (K-1)  with df = (K-1), (N-K)


 


Analysis of covariance

            Analysis of covariance (ANCOVA) essentially 1) adjusts posttest scores for initial differences on some variable and compares adjusted scores, 2) increases the power of a statistical test by reducing within-group (error) variance.  Although increasing sample size also increases power, the researcher is often limited to samples of a given size because of financial and practical reasons.

 

            ANCOVA is a control technique used in both causal-comparative studies in which already formed, not necessarily equal groups involved and in experimental studies in which either exiting group or randomly formed groups are involved.

            ANCOVA is quite complex, lengthy procedure that is hardly ever hand calculated.  It usually calculated with computer programs for its accuracy and sanity.

  ______________________________________________________________________________________

Reference:

Gay, L. R. (1996). Educational research: Competencies for analysis and application.  Upper Saddle River, NJ: Merrill.

 

 Back to topics