DATA ANALYSIS – GENERATING DESCRIPTIVE STATISTICS
1. Checking the accuracy of data entry
The very first thing you should check is that you have entered your data correctly, as no
matter how careful you are, mistakes can be made. This ‘cleaning’ of the data also starts
the process of examining the data. One of the most common errors is entering the wrong
code for a particular response, for instance ‘11’ instead of ‘1’ (i.e. the value entered doesn’t
match a code you’ve set up), and the quickest way of checking for this is to look at the
frequency distribution (or summary of response) for each variable.
In SPSS, descriptive statistics, such as frequency distribution information, are ac-
cessed from the ‘analyse’ drop-down menu. There are various descriptive statistics options
Analysing Quantitative Data
491
but frequency distribution tables are generated using ‘frequencies’. By selecting this op-
tion, SPSS produces a dialogue box asking you which variables in your database you want
frequency information about. You need to select the variable(s) of interest and ask for the
relevant information. SPSS then generates the frequency distribution table(s) requested,
which is produced in an SPSS output file that appears on the screen. For each variable re-
quested, you will be able to see the different response categories listed in the first column
of the table and the number of responses for that category in the second column. A quick
glance down the first column will reveal whether there are any illegal responses. Tables
12.2 and 12.3 show frequency distributions for two of my variables, ‘gender’ and ‘task moti-
vation (science)’ scale scores.
Table 10.2
Frequency distribution for gender
AnAlysing QuAntitAtive DAtA
195
wrong code for a particular response, for instance ‘11’ instead of ‘1’ (i.e. the value
entered doesn’t match a code you’ve set up), and the quickest way of checking for this
is to look at the frequency distribution (or summary of response) for each variable.
In SPSS, descriptive statistics, such as frequency distribution information, are accessed
from the ‘analyse’ drop-down menu. There are various descriptive statistics options but
frequency distribution tables are generated using ‘frequencies’. By selecting this option,
SPSS produces a dialogue box asking you which variables in your database you want
frequency information about. You need to select the variable(s) of interest and ask for
the relevant information. SPSS then generates the frequency distribution table(s)
requested, which is produced in an SPSS output file that appears on the screen. For each
variable requested, you will be able to see the different response categories listed in the
first column of the table and the number of responses for that category in the second
column. A quick glance down the first column will reveal whether there are any illegal
responses. Tables 12.2 and 12.3 show frequency distributions for two of my variables,
‘gender’ and ‘task motivation (science)’ scale scores.
Table 12.2 Frequency distribution for gender
Frequency
Per cent
Valid per cent
Cumulative per cent
Valid
Boys
710
41.2
43.0
43.0
Girls
942
54.7
57.0
100.0
Total
1652
95.9
100.0
Missing
9.00
71
4.1
Total
1723
100.0
If you don’t have any illegal responses, as here, you can move on to other descriptive
analyses. If you want to save the frequency distribution data (which you might if you
planned to copy and paste the table into a report), you can save the output file in the
same way you would save any other file. If you have some rogue responses, you will
need to go back to your data to find them. This can be done using ‘find’ in ‘data view’
(much as you would find and replace in Word or other packages). By looking at the
identifier for the case identified, you can go back to the original data (for instance, the
original questionnaire) to amend the response.
2. Producing variable summaries
While the frequency distribution provides information about all the responses for a par-
ticular variable, quite often we want to give a more succinct summary, especially for
variables measured at an interval level. For instance, knowing that the task motivation
13-Wilson-Ch-12.indd 195
8/31/2012 5:41:40 PM
If you don’t have any illegal responses, as here, you can move on to other descriptive anal-
yses. If you want to save the frequency distribution data (which you might if you planned
to copy and paste the table into a report), you can save the output file in the same way
you would save any other file. If you have some rogue responses, you will need to go back
to your data to find them. This can be done using ‘find’ in ‘data view’ (much as you would
find and replace in Word or other packages). By looking at the identifier for the case
identified, you can go back to the original data (for instance, the original questionnaire) to
amend the response.
2. Producing variable summaries
While the frequency distribution provides information about all the responses for a partic-
ular variable, quite often we want to give a more succinct summary, especially for variables
measured at an interval level. For instance, knowing that the task motivation (science) scale
scores in my study varied from 8 to 30 does not really help paint a picture of the over-
all response to this scale. If I told you a particular student scored 18, you wouldn’t know
without looking carefully at Table 12.2 whether this seemed typical or not. We really need
to know two things to make judgements about individual scores.
Firstly: what is a typical response? For this, we would normally calculate the mean score
(obtained by adding up every individual score and dividing the sum by the number of
scores included). Secondly: how much variation is there in the responses? If there is a
small amount of variation, then most people are recording the same responses. A larger
Analysing Quantitative Data
492
variation implies that scores that markedly differ from the average (lower and higher)
are not that unusual. The commonly used measure for this is the standard deviation,
which is one measure of the average deviation from the mean score.
Table 10.3
Frequency distribution for task motivation (science)
SCHOOL-BASED RESEARCH
196
(science) scale scores in my study varied from 8 to 30 does not really help paint a picture
of the overall response to this scale. If I told you a particular student scored 18, you
wouldn’t know without looking carefully at Table 12.2 whether this seemed typical or
not. We really need to know two things to make judgements about individual scores.
Firstly: what is a typical response? For this, we would normally calculate the mean score
(obtained by adding up every individual score and dividing the sum by the number of
scores included). Secondly: how much variation is there in the responses? If there is a
small amount of variation, then most people are recording the same responses. A larger
variation implies that scores that markedly differ from the average (lower and higher)
are not that unusual. The commonly used measure for this is the standard deviation,
which is one measure of the average deviation from the mean score.
Table 12.3 Frequency distribution for task motivation (science)
Frequency
Per cent
Valid per cent
Cumulative per cent
Valid
8.00
1
.1
.1
.1
9.00
1
.1
.1
.1
10.00
3
.2
.2
.4
11.00
1
.1
.1
.4
13.00
4
.2
.3
.7
14.00
6
.3
.4
1.1
15.00
9
.5
.6
1.8
16.00
8
.5
.6
2.3
17.00
14
.8
1.0
3.3
18.00
31
1.8
2.2
5.5
19.00
32
1.9
2.2
7.7
20.00
52
3.0
3.6
11.3
21.00
66
3.8
4.6
16.0
22.00
88
5.1
6.2
22.1
23.00
111
6.4
7.8
29.9
24.00
131
7.6
9.2
39.1
25.00
144
8.4
10.1
49.2
26.00
167
9.7
11.7
60.9
27.00
159
9.2
11.1
72.0
28.00
167
9.7
11.7
83.7
29.00
126
7.3
8.8
92.5
30.00
107
6.2
7.5
100.0
Total
1428
82.9
100.0
Missing System
1
295
17.1
Total
1723
100.0
Note:
1
Missing system responses appear because students had not supplied answers for one or more of the
questionnaire items that make up the task motivation (science) scale.
13-Wilson-Ch-12.indd 196
8/31/2012 5:41:40 PM
Summary statistics including the mean and standard deviation (usually referred to
in textbooks as descriptive statistics) are obtained using the ‘descriptives’ option of descrip-
tive statistics. Summary statistics for ‘gender’, ‘task motivation (science)’ and ‘level of cogni-
tive development’ are shown in Table 10.4.
Returning to the question of whether 18 is a low score on the task motivation
(science) scale, knowing that the mean score for this scale is 25.0 (rounded off) and that
Analysing Quantitative Data
493
the standard deviation is 3.6 suggests that a score of 18 is unusually low, as most people
are scoring between 21.4 (3.6 below the mean of 25) and 28.6.
Table 12.4 also illustrates an important point to bear in mind when using a com-
puter program. As gender is a nominal-level variable, it makes no sense to talk about the
average or mean gender, as clearly my participants were either boys or girls. Yet SPSS will
calculate this value if asked. You always need to ask yourself the question, is what I’m asking
the computer to calculate sensible? In simple terms, if you put rubbish in, you will get rub-
bish out!
Table 10.4
Descriptive statistics for gender, task motivation (science) and level of cognitive
development
AnAlysing QuAntitAtive DAtA
197
Summary statistics including the mean and standard deviation (usually referred to in
textbooks as descriptive statistics) are obtained using the ‘descriptives’ option of
descriptive statistics. Summary statistics for ‘gender’, ‘task motivation (science)’ and
‘level of cognitive development’ are shown in Table 12.4.
Returning to the question of whether 18 is a low score on the task motivation
(science) scale, knowing that the mean score for this scale is 25.0 (rounded off) and that
the standard deviation is 3.6 suggests that a score of 18 is unusually low, as most people
are scoring between 21.4 (3.6 below the mean of 25) and 28.6.
Table 12.4 also illustrates an important point to bear in mind when using a computer
program. As gender is a nominal-level variable, it makes no sense to talk about the aver-
age or mean gender, as clearly my participants were either boys or girls. Yet SPSS will
calculate this value if asked. You always need to ask yourself the question, is what I’m
asking the computer to calculate sensible? In simple terms, if you put rubbish in, you
will get rubbish out!
Table 12.4 Descriptive statistics for gender, task motivation (science) and level of cognitive development
N
Minimum
Maximum
Mean
Std Deviation
Gender
1652
1.00
2.00
1.5702
.49519
Task motivation (science)
1428
8.00
30.00
24.9965
3.60574
Level of cognitive Development
1723
2.00
9.00
6.1977
2.18647
3. Producing graphs
It is often easier to get a sense of how frequency distributions look by plotting graphs
rather than looking at tables or summary statistics. For instance, you have seen the
frequency distribution table and know that the mean score on the task motivation
(science) scale was 25.0, while scores ranged from 8 to 30 and there was a relatively
small amount of variation as the standard deviation was 3.6. If you think about these
figures carefully, they suggest that scores were bunched up at the higher end of the
scale. This can be seen much more clearly by plotting a histogram, which is a graph of
the frequency distribution.
In SPSS, graphs can be produced through the ‘graph’ drop-down menu, which gives
you the option of producing a histogram. The histogram for the task motivation (science)
scale is shown in Figure 12.1 and, as expected, students’ scores are bunched up towards
the top of the scale. This scale is about learning things so it is hardly surprising that most
students say they are motivated by this, given that they know their answers are going to
be scrutinized by someone else.
13-Wilson-Ch-12.indd 197
8/31/2012 5:41:40 PM
3. Producing graphs
It is often easier to get a sense of how frequency distributions look by plotting graphs
rather than looking at tables or summary statistics. For instance, you have seen the fre-
quency distribution table and know that the mean score on the task motivation (science)
scale was 25.0, while scores ranged from 8 to 30 and there was a relatively small amount
of variation as the standard deviation was 3.6. If you think about these figures carefully,
they suggest that scores were bunched up at the higher end of the scale. This can be
seen much more clearly by plotting a histogram, which is a graph of the frequency distri-
bution.
In SPSS, graphs can be produced through the ‘graph’ drop-down menu, which
gives you the option of producing a histogram. The histogram for the task motivation
(science) scale is shown in Figure 12.1 and, as expected, students’ scores are bunched up
towards the top of the scale. This scale is about learning things so it is hardly surprising
that most students say they are motivated by this, given that they know their answers are
going to be scrutinized by someone else.
Analysing Quantitative Data
494
Figure 10.1
Histogram showing the distribution of response to the task motivation (sci-
ence) scale
A second type of graph that is useful when conducting a descriptive analysis of
data, which is also an option from the graph menu, is a scatter diagram. This shows
the relationship between two interval-level variables. For instance, I was very interested
in knowing whether there was a relationship between students’ cognitive
development scores and their task motivation scores. I expected students who said
they were motivated to try hard to learn new things to develop cognitively. A scatter
diagram would help me decide whether I was right about this relationship and this is
shown in Figure 10.2.
SCHOOL-BASED RESEARCH
198
A second type of graph that is useful when conducting a descriptive analysis of
data, which is also an option from the graph menu, is a scatter diagram. This shows
the relationship between two interval-level variables. For instance, I was very inter-
ested in knowing whether there was a relationship between students’ cognitive
development scores and their task motivation scores. I expected students who said
they were motivated to try hard to learn new things to develop cognitively. A scatter
diagram would help me decide whether I was right about this relationship and this is
shown in Figure 12.2.
200
150
100
50
0
10.00
15.00
20.00
25.00
30.00
Task motivation (science)
Frequency
Mean
= 24.9965
Std Dev
= 3.60574
N
= 1-428
Figure 12.1 Histogram showing the distribution of response to the task motivation (science) scale
13-Wilson-Ch-12.indd 198
8/31/2012 5:41:41 PM
Analysing Quantitative Data
495
AnAlysing QuAntitAtive DAtA
199
Each circle represents one or more students’ scores on the variables in question. For
instance, the circle at the bottom right of the graph represents students that scored 30
(the maximum value) on the task motivation (science) scale (so are highly motivated
to learn new things), but at the same time got the lowest score of 2 on the level of
cognitive development test. So, for these students, being very keen to learn new things
does not appear to be associated with helping them to develop cognitively. In general,
if there was a positive association, which statisticians call a positive correlation,
between task motivation and level of cognitive development, the graph would show a
series of circles falling around an imaginary line sloping from the bottom left to the top
right of the graph (i.e. sloping upwards). Here, we have a general swirl of dots filling
most of the graph, suggesting no relationship (or no correlation) between task motiva-
tion and level of cognitive development. This is not what I expected. Note that it is also
possible to have a negative correlation between two variables. For instance, you would
9.00
8.00
7.00
6.00
5.00
4.00
3.00
2.00
5.00
10.00
15.00
20.00
25.00
30.00
Task motivation (science)
Level of cognitive develpment
Figure 12.2 Scatter diagram showing the relationship between task motivation (science) and level of
cognitive development
13-Wilson-Ch-12.indd 199
8/31/2012 5:41:41 PM
Figure 10.2
Scatter diagram showing the relationship between task motivation (science)
and level of cognitive development
Each circle represents one or more students’ scores on the variables in question. For in-
stance, the circle at the bottom right of the graph represents students that scored 30 (the
maximum value) on the task motivation (science) scale (so are highly motivated to learn
new things), but at the same time got the lowest score of 2 on the level of cognitive devel-
opment test. So, for these students, being very keen to learn new things does not appear
to be associated with helping them to develop cognitively. In general, if there was a positive
association, which statisticians call a positive correlation, between task motivation and level
of cognitive development, the graph would show a series of circles falling around an imag-
inary line sloping from the bottom left to the top right of the graph (i.e. sloping upwards).
Here, we have a general swirl of dots filling most of the graph, suggesting no relationship
(or no correlation) between task motivation and level of cognitive development. This is not
what I expected. Note that it is also possible to have a negative correlation between two
variables. For instance, you would expect a negative correlation between alienation and
cognitive development, which would appear as a band of circles following an imaginary line
from the top left to the bottom right (i.e. sloping downwards).
Analysing Quantitative Data
496
DATA ANALYSIS – INFERENTIAL STATISTICS
Having explored the data and gained a sense of the relationships between different
variables, it is now time to get to the crux of the matter and answer the original
research questions. I was interested in the relationship between motivation and level
of cognitive development. I predicted that students scoring highly on the task motivation
scale (the desire to learn and master new things) would have overall higher
scores on the test of cognitive development than students with low scores on the task
motivation scale. Similarly, I would expect that students with high scores on the
alienation scale (actively disrupting learning) would have overall lower scores on the
test of cognitive development than students who have low scores on the alienation
scale. I also thought that girls would have higher scores on the task motivation scale
and lower scores on the alienation scale than boys, and that girls’ level of cognitive
development would be higher.
In order to assess whether these hypotheses are correct, we need to run a number of
statistical tests. Essentially, I am asking two different types of question:
1. Is there a relationship between two variables (for instance, between task motivation
and level of cognitive development)?
2. Is there a difference between two groups on a given variable (for instance, between
boys and girls on task motivation)?
Each type of question requires a specific statistical test, and these are outlined below.
1. Tests of correlation
The first type of question relates to relationships or correlations between variables,
therefore the appropriate statistical test in this case is a test of correlation. The logic
behind this type of test is that we look at the actual relationships found (so we need
to calculate a particular entity to assess this) and then make a judgement as to how
likely this result would be, if in fact there wasn’t a relationship between the variables
of interest. In essence, we are judging the likelihood that the results we found were
a fluke, i.e. in reality, motivation and cognitive development aren’t related in any way;
it just happened in this sample of schools that there was some type of relationship.
The reason we have to take this approach is because we simply don’t know whether there
is in fact a relationship or not, and we have to make the best judgement we can based on
the data we have on the balance of probabilities. These probabilities can only be calculated
by starting from a position that the variables are not related. A slightly different starting
point is taken when we are looking at differences between groups, which is described
below. However, it is the case that the calculations conducted in any statistical test involve
calculating probabilities to enable the person running the test to make a judgement call.
This is why the outcomes of the calculations made are referred to as inferential statistics
and the tests themselves are often called significance tests.
The entity calculated in a test of correlation to quantify the relationship between
Analysing Quantitative Data
497
the two variables of interest is a correlation coefficient. The statistical test enables me to
judge whether this is significant (i.e. the balance of probabilities is that there is a relation-
ship between motivation and cognitive development). At this point, I have several choices
of correlation coefficients to calculate, dependent on the measurement level and distribu-
tion of my variables. If you intend to use inferential statistics, you will need to read up on
this in more detail, as all I am doing here is giving an introduction to this area. Specifically,
you can choose to do either a parametric or non-parametric test. The former is generally
preferred because it is more powerful and sensitive to your data, however it also makes
certain assumptions about your data. For reasons there isn’t the space here to explain, my
data are acceptable for a parametric test, hence I need to calculate the appropriate cor-
relation coefficient, a Pearson correlation coefficient, and then look at the significance test
results. In SPSS, this procedure is conducted in the ‘correlate’ option of the analyse menu.
The results for the test of correlation between level of cognitive development and task
motivation (science) scores are shown in Table 10.5.
Достарыңызбен бөлісу: |