Basic Concepts of Quantitative Research

Dr. R. Ouyang

 

Descriptive Statistics

 

Descriptive statistics include the measures of central tendency, measures of variability, measures of relative position, and measures of relationship.

 

Steps involved in constructing a frequency polygon

             The most common method of graphing research data is to construct a frequency polygon.  The first step in constructing a frequency polygon is to list all scores and to tabulate how many subjects received each score.  Steps involved in constructing a frequency polygon are: 1) list all scores and tabulate how many subjects received each score, 2) place all the scores on a horizontal axis, at equal intervals from lowest score to the highest, 3) place the frequencies of scores at equal intervals on the vertical axis, starting with zero, 4) for each score, find the point where the score intersects with its frequency of occurrence and make a dot, 5) connect all the dots with straight lines. 

 

            In fact, along with the development of computer technology, there are several graphic designing software packages that can generalize the frequency polygon easily, such as Spreadsheet in ClarisWorks and Excel in Microsoft office.

 Three measures of central tendency

            Measures of central tendency give the researcher a convenient way of describing a set of data with a single number.  Three most frequently encountered indices of central tendency are the mode, the median, and the mean.  The mode is appropriate for the measurement of nominal data, the median for the ordinal data, and the mean for the interval or ratio data.

            Mode:  The mode is the score that is attained by more subjects than any other score.  The mode is not established through calculation, it is determined by looking at a set of scores or at a graph of scores and seeing which score occurs most frequently.

            Median:  The median is that point in a distribution above and below which are 50% of the scores; in other words, the median is the midpoint.  If there are an odd number of scores, the median is the middle score.  In the set of scores, 75, 80, 82, 83, 87, the median is 82, the median score.  If there is an even number of scores, the median is the point halfway between the two middle scores.  In the set of scores of 21, 23, 24, 25, 26, and 30, the median is 24.5; for the scores 50, 52, 55, 57, 59, and 61, the median is 56.  The median is only the midpoint of the scores and does not take into account each and every score, it ignore extremely high scores and extremely lower scores.   

            Mean: The mean is the arithmetic average of the scores and is the most frequently used measure of central tendency.  It is calculated by adding up all of the scores and dividing that total by the number of scores.  In general, the mean is the preferred measure of central tendency.  It is appropriate when the data represent either an interval or a ratio scale and is a more precise, stable index than both the median and the mode.

Three measures of variability

            Although measures of central tendency are very useful statistics for describing a set of data, they are not sufficient. 

            Set A:  79, 79, 79, 80, 81, 81, 81

            Set B:  50, 60, 70, 80, 90, 100, 110

            The mean of both sets of scores is 80 and the median of both is 80, but set A is very different from set B.  In set B, there are much more variation or variability.  To measure how much the variability is for a set of scores, three most frequently encountered are the range, the quartile deviation and the standard deviation. The range is the only appropriate measure of variability for ordinal data; the quartile deviation is the appropriate index of variability for ordinal data.  The standard deviation is often used to measure the variability of interval or ratio data.

            Range:  The range is simply the difference between the highest score and the lowest score in a distribution and is determined by subtraction. For the set A mentioned above, the range is 81-79 = 2, for the set B the range is 110-50 = 60.

            Quartile deviation:  In "research talk", the quartile deviation is one-half of the difference between the upper quartile and the lower quartile in a distribution.  In English, the upper quartile is the 75th percentile; it means there are 75% scores below than that point.  By subtracting the lower quartile from the upper quartile and then dividing the result by two, we get a measure of variability.  If the quartile deviation is small the scores are close together, whereas if the quartile deviation is large the scores are more spread out.  The quartile deviation is a more stable measure of variability than the range and is appropriate whenever the median is appropriate.

            Standard Deviation:  The standard deviation is the square root of the variance, which is based on the distance of each score from the mean.  It is appropriate when the data represent an interval or ratio scale.  It is the most stable measure of variability and takes into account each and every score.  Measuring standard deviation is to find out how far each score is from the mean, that is, subtracting the mean from each score.  Steps for calculating the standard deviation are:  1) Find out N, the number of subjects, 2) Calculate the sum of the scores, 3) square each score, 3) Add all the squares, to get the sum of squares of the scores, 4) Square the sum of the scores and divide by the number of scores (we have a measure of variability called variance), 5) Subtract the variance from the sum of the squares of scores to get the sum of the squares (SS), 6) divide the SS by N-1.   A small standard deviation indicates the scores are close together and a large standard deviation indicates that the scores are more spread out. 

 

 


Normal distribution

            If a variable is normally distributed, that is, does form a normal curve, the following will be true.

            1) Fifty percent of the scores are above the mean and 50% are below the mean,

            2) The mean, the median, and the mode are the same,

            3) Most scores are near the mean.  The farther from the mean a score is, the fewer the number of the subjects who attained that score,

            4) The same number, or percentage, of scores is between the mean and plus on standard deviation (X + 1 SD) as is between the mean and minus on standard deviation (X - 1 SD), and similarly for X + or - 2 SD, and X + or - 3 SD. 

            When a distribution is not normal, it is said to be skewed.  If the extreme scores are at the lower end of the distribution, the distribution is said to be negatively skewed; if the extreme scores are at the upper, or higher, end of the distribution, the distribution is said to be positively skewed.   

Two measures of relationship

            The two most frequently used correlational analyses are the rank difference correlation coefficient, usually referred to as the Spearman rho and the product moment correlation coefficient, usually referred to as the Pearson r.

            The Spearman rho:  The Spearman rho is appropriate when the data represent an ordinal scale (although it may be used with interval data) and is used when the median and quartile deviation are used.

The Pearson r:  The Pearson r is the most appropriate measure of correlation when the sets of data to be correlated represent either interval or ratio scales. The relationship is expressed by correlation coefficient, which is a number between .00 and 1.00.

 


 


Four measures of relative position

            Measures of relative position indicate where a score is in relation to all other scores in the distribution.  It permits us to tell how well an individual has performed as compared to all other individuals.  If a student score in reading 40, and math in 35, it does not mean he did better in reading.  40 may be the lowest score on the reading test, and 35 may be the highest score in math test.

            Two most frequently used measures of relative positions are percentile ranks and standard scores (z score, t score, and stanines).

            Percentile Ranks: A percentile rank indicates the percentage of scores that fall at or below a given score.  If a score of 65 corresponds to a percentile rank of 80, the 80th percentile, this means that 80% of the scores in the distribution are lower than 65.

            A standard score is a derived score that expresses how far a given raw score is from some reference point, typically the mean, in terms of standard deviation units.  The most commonly reported and used standard scores are z scores, t scores, and stanines.

            z scores:  A z score is the most basic standard score to express how far a score is from the mean in terms of standard deviation units.  A score is exactly on the mean, its z score will be 0, if the score corresponds exactly to 1 standard deviation, its score is 1, if the score is exactly 2 standard deviations below the mean, its z score will be - 2. 

 

 

Raw score

Mean x

SD

z

Percentile

Reading

50

60

10

-1.00

16th

Math

40

30

10

+1.00

84th

           

            z = X (raw score) - mean (x) / SD

            T scores:  T score is nothing more than a z score expressed in a different form.  T = 10 * z + 50

            Stanines:  stanines are standard scores that divide a distribution into nine parts.  Stanine stands for "standard nine."  Stanine equivalencies are derived using the formula 2* z + 5 and rounding resulting values to the nearest whole number.  Like percentiles, stanines are very frequently reported in norms tables for standardized tests.  They may be used as a criterion for selecting students for special programs.

________________________________________________________________________________________

Reference:

Gay, L. R. (1996). Educational research: Competencies for analysis and application.  Upper Saddle River, NJ: Merrill.

 

 Back to topics