Non-experimental
In this approach, you don’t do your own experiment; you can’t control any variables. You
just look at what’s already happened and try to understand it. This means that you take
the variable showing an effect (the dependent variable) and try to find the variable(s) (the
independent variable(s)) which could have a causative relationship with it.
This approach is very common in education, and is also useful when ethics preclude
an experimental approach. For example, it would not be ethical to run an experiment
deliberately subjecting pupils to something detrimental to their learning, but studying
the outcomes of such a situation (which already exists) would be perfectly ethical. Two
examples may help make things clear.
At a particular school, the governors suddenly realized that the proportion of pupils
gaining five GCSEs at A–C (‘5 A–C’) has been dropping over the past 15 years. At their
meeting, they came up with a number of variables which could be responsible. They find
that as the number of teachers at the school went down each year, so did ‘5 A–C’ (we can
say that ‘5 A–C’ correlates significantly with the number of teachers at the school – you’ll
learn more about correlations in Chapter 12). One of them suggests an explanation – the
presence of fewer teachers leads to less variety of ideas for teaching and learning, and
hence the drop in attainment.
Taking A Quantitative Approach
433
But was the change in teacher numbers the cause of the change in ‘5 A–C’? Look at Figure
11.1 Actually, it may be the other way around (reverse causation). For example, the ‘5 A–C’
may have dropped one year, and encouraged parents who are interested in academic suc-
cess to send their children elsewhere. With fewer pupils, the numbers on roll go down and
the same number of teachers is no longer required.
Let’s look at a different example, which tries to explain observed differences between
two groups. The head teacher at school A knows that different tutor groups have different
levels of truancy. He uses a statistical test to confirm that tutor groups led by women have
less truancy than those led by men. Is he justified in concluding that female form tutors are
better?
Not necessarily. At this school, the extent of truancy happens to vary with age (older chil-
dren truant more), the Year 9 tutor team is composed mainly of women and the Year 11
tutor team is composed mainly of men. Hence, we’ve a third variable – year group – which
affects both truancy levels and the sex of the tutor (Figure 11.2), creating a plausibly causal,
but in fact artifactual, relationship between a tutor’s sex and truancy levels.
So it is important to realize that finding a relationship does not mean that it is a causal
relationship, and that causation could happen in either direction. In fact, sometimes a rela-
tionship may be pure coincidence – there is a proven relationship between stork popula-
tion size and human birth rate!
To some extent, you can control the effect of potentially confounding variables by how
you choose your sample, and by what you measure. For example, the head teacher above
could do the following:
• Randomly choose, and compare, equal numbers of male- and female-led tutor groups
from each year group. This does reduce the sample size, but removes the effect of age.
• Compare multiple pairs of tutor groups, matched as closely as possible by the chil-
dren’s socio-economic status and other potentially relevant variables, with the only
difference being the tutors’ sex.
• Decide which variables are likely to affect truancy levels, and measure them at the
same time as collecting the rest of the data. Some clever statistics (beyond the scope
of this book) can remove the effect of the extra variables, and ‘uncover’ any effect of
the tutor’s sex.
TAKING A QUANTITATIVE APPROACH
179
But was the change in teacher numbers the cause of the change in ‘5 A–C’? Look at
Figure 11.1 Actually, it may be the other way around (reverse causation). For example,
the ‘5 A–C’ may have dropped one year, and encouraged parents who are interested in
academic success to send their children elsewhere. With fewer pupils, the numbers on
roll go down and the same number of teachers is no longer required.
Let’s look at a different example, which tries to explain observed differences between
two groups. The head teacher at school A knows that different tutor groups have differ-
ent levels of truancy. He uses a statistical test to confirm that tutor groups led by women
have less truancy than those led by men. Is he justified in concluding that female form
tutors are better?
Not necessarily. At this school, the extent of truancy happens to vary with age (older
children truant more), the Year 9 tutor team is composed mainly of women and the Year
11 tutor team is composed mainly of men. Hence, we’ve a third variable – year group –
which affects both truancy levels and the sex of the tutor (Figure 11.2), creating a plausi-
bly causal, but in fact artifactual, relationship between a tutor’s sex and truancy levels.
So it is important to realize that finding a relationship does not mean that it is a causal
relationship, and that causation could happen in either direction. In fact, sometimes a
relationship may be pure coincidence – there is a proven relationship between stork
population size and human birth rate!
To some extent, you can control the effect of potentially confounding variables by
how you choose your sample, and by what you measure. For example, the head teacher
above could do the following:
• Randomly choose, and compare, equal numbers of male- and female-led tutor groups
from each year group. This does reduce the sample size, but removes the effect of age.
• Compare multiple pairs of tutor groups, matched as closely as possible by the chil-
dren’s socio-economic status and other potentially relevant variables, with the only
difference being the tutors’ sex.
• Decide which variables are likely to affect truancy levels, and measure them at the
same time as collecting the rest of the data. Some clever statistics (beyond the scope
of this book) can remove the effect of the extra variables, and ‘uncover’ any effect of
the tutor’s sex.
Variable A causes a
change in variable B
Variable A
(Number of staff)
Variable B
(‘5A–C’)
Variable B causes a
change in variable A
Figure 11.1
Making sense of a correlation between two variables
12-Wilson-Ch-11.indd 179
8/31/2012 5:41:31 PM
Figure 9.1
Making sense of a correlation between two variables
Taking A Quantitative Approach
434
Figure 9.2
Making sense of a correlation between three variables
Even with these strategies, it is still difficult to know you’ve accounted for all potentially
confounding variables. As such, your interpretation of the data is vulnerable to
your own biases, and it is often possible to come up with contradictory explanations for
the same set of data. You should discuss such explanations when you write up your
research, pointing out the limitations of your derived ideas. Using other sources of data
to shed more light on such alternatives would also be sensible. In an ideal world, explana-
tions and hypotheses would be tested by experiment at a later date, but that’s often
not possible. The benefits and potential pitfalls of the non-experimental approach are
summarized in Box 11.5 and Box 11.6.
SCHOOL-BASED RESEARCH
180
There is a relationship
between variable A and
variable B
A change in variable C
causes a change in
variable B
A change in variable C
causes a change in
variable A
Variable A
(Tutor’s sex)
Variable B
(Truancy level)
Variable C
(Year team)
Figure 11.2 Making sense of a correlation between three variables
Even with these strategies, it is still difficult to know you’ve accounted for all poten-
tially confounding variables. As such, your interpretation of the data is vulnerable to
your own biases, and it is often possible to come up with contradictory explanations for
the same set of data. You should discuss such explanations when you write up your
research, pointing out the limitations of your derived ideas. Using other sources of data
to shed more light on such alternatives would also be sensible. In an ideal world, expla-
nations and hypotheses would be tested by experiment at a later date, but that’s often
not possible. The benefits and potential pitfalls of the non-experimental approach are
summarized in Box 11.5 and Box 11.6.
BOX 11.5 Benefits of a non-experimental approach
1 It is the ‘next best thing’ where an experimental approach is impossible.
2 It is good for initial exploration of a data set and for generating hypotheses,
which can be experimentally tested later.
3 It avoids using an artificial intervention, examining only ‘natural’ variation in the data.
12-Wilson-Ch-11.indd 180
8/31/2012 5:41:32 PM
BOX 9.5
Benefits of a non-experimental approach
1. It is the ‘next best thing’ where an experimental approach is impossible.
2. It is good for initial exploration of a data set and for generating hypotheses,
which can be experimentally tested later.
3. It avoids using an artificial intervention, examining only ‘natural’ variation in
the data.
Taking A Quantitative Approach
435
LOOKING AT OTHER PEOPLE’S DATA
Looking at data with a healthily sceptical eye is always a good idea! In the rest of the chap-
ter, I’ll help you to examine the story of pupil and school performance data.
League tables
When league tables were first introduced in 1992, exam results were used to decide if a
school was effective – on the assumption that better results mean better schooling. Ofsted
based their assessment of school performance on raw exam results, and teachers were
(and are) expected to use pupils’ attainment data to support their own promotion appli-
cations. So what’s the problem?
BOX 9.6
Pitfalls of a non-experimental approach
1. It is difficult to control variables or get truly random samples.
2. It is difficult to identify all the variables (including the one(s) which actually has
a causative effect) that are potentially relevant to an observed phenomenon.
3. Cause and effect are not always obvious; it is easy to assume one variable
causes another if it fits in with your prior ideas.
Activity 9.1
Many commentators have criticized ‘exam results’ as not being a valid measure
of school effectiveness, and suggested that they should not be used to compare
schools. Think about the following questions.
• Why do ‘exam results’ only give a limited picture of school ‘effectiveness’?
• What other variables could affect exam results, which are unrelated to the
school itself?
• Suggest reasons why it is not valid to reward secondary science teachers on
the basis of their classes’ exam results.
Taking A Quantitative Approach
436
Value-added
In 2004, the Department for Education and Skills published a value-added measure; it
was billed as a more valid measure of the ‘value’ that a school ‘adds’, taking into account
prior attainment. The aim was to help parents make valid comparisons between schools.
For primary schools, data were reported in relation to a figure of 100.
Let’s look at how they worked out value-added between Key Stage 2 and Key Stage 3,
using SATs results as a measure of attainment in each case, but comparing pupils with
similar levels of prior attainment. Here’s a simplified explanation of it:
• All the pupils in the country who performed very similarly (e.g. within half a level) at
Key Stage 2 are put in order of numerical result for their Key Stage 3 results.
• For each set of pupils, the median result (the middle one) is found, and every pupil is
given a value-added score relative to that result. If someone does better than the medi-
an pupil, they get a positive score. If someone does worse, they get a negative score.
• Hence, for schools with a low-attaining intake, it should be easier to see the ‘value’ they
are ‘adding’, even if their Key Stage 3 results are much lower than, for example, another
school which selects on prior attainment.
Activity 9.1 (Continued)
Hopefully, you’ve realized that there’s an enormous number of variables which
could affect exam results. Research suggests that 76 per cent of variation in
GCSE scores is determined by pupils’ prior attainment (which itself is related to
their own prior education and socio-economic characteristics). This means that
the influence of a secondary school on GCSE scores is pretty limited, and the
best way for schools to improve
Activity 9.2
Explain why this ought to be a more valid measure of school effectiveness, and
what you think it still lacks.
Taking A Quantitative Approach
437
To assess the whole school’s performance, you average the results of all the pupils in the
school and add the result to 100. Hence, a score of 100 reflects ‘average’ value-added, 98
reflects ‘below average’ value-added and 102 reflects ‘above average’ value-added.
Activity 9.3
A local journalist is looking at the value-added scores for the schools in her
county. She realizes that half of the schools have value-added scores below 100
and publishes the story with the following headline: ‘Scandal! Letting down our
children – half of Smedshire’s schools are below average’. What is wrong with
this headline?
But how do you interpret these scores? If the average (mean) value-added
score for a school is a long way below the mean value (100) for all the schools,
then you may be pretty confident that there’s a problem. If it’s really way above
the mean, then you know something is going right. But how far apart do schools’
scores need to be to know that they are really significantly different?
Activity 9.4
Look at Figure 11.3. For each school, it shows the mean and confidence intervals.
Which two of the following is it possible to say with 95 per cent confidence?
• School A is significantly better than school B.
• School D is significantly worse than school B.
• School C is significantly better than the national average.
• There is a significant difference between school A and school C.
Statisticians work this out by calculating something called confidence limits above and
below the mean for each school. This is often not quoted in the media, but without it, you
can’t interpret league tables properly.
• You can only say that two schools really have significantly different value-added scores if
the confidence intervals (between the upper and lower limit) do not overlap.
• You can only say that a school is above the national average if the school mean is
above the national mean, and if the school confidence interval does not include the
national mean.
Taking A Quantitative Approach
438
SCHOOL-BASED RESEARCH
184
However, it gets even more complicated. The way in which confidence limits are calcu-
lated means that:
• if you calculate the mean and confidence interval for a small school with few pupils,
the limits are likely to be relatively far apart
• if you calculate the mean and confidence interval for a school with a lot of pupils, the
limits are likely to be relatively close together.
These differences make it difficult to validly compare schools of different sizes, and even
more difficult when comparing particular subjects or classes with even fewer pupils.
1.5
A
0.5
1
−1
−0.5
0
National average
−1.5
Confidence limits
Mean
D
C
B
Figure 11.3 Understanding confidence limits 1
1
1.5
A
−0.5
0
0.5
National average
−1.5
−1
0
C
B
Figure 11.4 Understanding confidence limits 2
12-Wilson-Ch-11.indd 184
8/31/2012 5:41:33 PM
Figure 9.3
Understanding confidence limits 1
However, it gets even more complicated. The way in which confidence limits are calculated
means that:
• if you calculate the mean and confidence interval for a small school with few pupils, the
limits are likely to be relatively far apart
• if you calculate the mean and confidence interval for a school with a lot of pupils, the
limits are likely to be relatively close together.
These differences make it difficult to validly compare schools of different sizes, and even
more difficult when comparing particular subjects or classes with even fewer pupils.
Figure 11.4
Understanding confidence limits 2
SCHOOL-BASED RESEARCH
184
However, it gets even more complicated. The way in which confidence limits are calcu-
lated means that:
• if you calculate the mean and confidence interval for a small school with few pupils,
the limits are likely to be relatively far apart
• if you calculate the mean and confidence interval for a school with a lot of pupils, the
limits are likely to be relatively close together.
These differences make it difficult to validly compare schools of different sizes, and even
more difficult when comparing particular subjects or classes with even fewer pupils.
1.5
A
0.5
1
−1
−0.5
0
National average
−1.5
Confidence limits
Mean
D
C
B
Figure 11.3 Understanding confidence limits 1
1
1.5
A
−0.5
0
0.5
National average
−1.5
−1
0
C
B
Figure 11.4 Understanding confidence limits 2
12-Wilson-Ch-11.indd 184
8/31/2012 5:41:33 PM
Taking A Quantitative Approach
439
Contextual value-added
Using value-added scores still assumes that all changes in pupil performance are deter-
mined by what happens in the school. But what about the other variables that could affect
performance? Is it possible to tease apart their effects from the effects of the school?
Initial attempts to do this adjusted for the number of pupils on free school meals as an in-
dex of socio-economic status. Then in 2006 the government developed a more complicat-
ed statistical model. They tried to identify relevant variables, and remove the effect of each,
one by one, using clever statistics called multi-level modelling, to provide a way of predict-
ing ‘expected’ attainment for each pupil. The deviation from that ‘expected’ attainment then
provides a contextual-value-added score (CVA). The variables included in calculating the
model can change year-on-year, but commonly have included most of the following:
• pupil prior attainment
• gender
• Special Educational Needs
• first language
• measures of pupil mobility
• age
• an indicator of whether the pupil is ‘in care’
• ethnicity
• free School Meals
• Income Deprivation Affecting Children Index (IDACI)
• the average and range of prior attainment within the school (KS2–3, KS2–4 and KS3–4
only).
This sounds like the ‘holy grail’ for assessing school effectiveness, but there are some prob-
lems with this model in particular. The way in which the CVA model handles missing data
could generate bias, and some variables may also interact to affect CVA in nonlinear
ways (e.g. income deprivation may affect the attainment of younger pupils
Activity 9.5
Look at Figure 11.4. It shows mean and confidence intervals for three schools.
School A has 2000 pupils, school B has 500 pupils, and school C has 2000 pupils.
Which of the following can you say with 95 per cent confidence?
• School A is significantly below the national average.
• School B is significantly above the national average.
• School C is significantly above the national average.
• There is no significant difference between school A and school B.
• There is no significant difference between school B and school C.
• There is no significant difference between school A and school C.
Taking A Quantitative Approach
440
more than that of older pupils). Having pointed these things out, CVA is just one of
a number of value-added measures which have emerged commercially since 1991,
including Fischer Family Trust scores, and models derived by the Centre for
Educational Management (CEM) at the University of Durham (examples include
MIDYIS, Yellis and ALIS). There are differences in the population data used to derive
these models, and again in some of the variables included, but each is based on a
CVA-type approach.
The data provided by these kinds of models have been used as part of accountability
procedures (e.g. via Ofsted), to direct school improvement planning (e.g. for schools,
teachers or pupils), and by parents when choosing a school for their child. You’ll probably
recognize many of the names given above. Indeed, Ofsted compelled schools to
show evidence that they use CVA data in school improvement, providing it to schools
via an online platform called RAISE online, and CVA data are published for access by
parents. However, using the same data, and hence the same model, for each of these
purposes can be problematic, and the variables to be included in such models may need
to differ, depending on the use each model is put to.
For example, if CVA is to be a predictor of future attainment (as it is for students in
many of the CEM models), then every factor should be included in the model to
make such a prediction as accurate as possible. On the other hand, if CVA is to be
used as a focus for school improvement, then it is more sensible to include only
those variables which the school actually has control over. Likewise, if CVA is to be
used for accountability, then all stakeholders should be able to understand the meaning
of the indices produced, and hence complex multi-level modelling like this may
be inappropriate.
Given the above, the apparent precision and ‘authority’ of value added data like CVA
or Fischer Family Trust scores can mislead parents, teachers and children into trusting
such data unquestioningly. There are two good examples of when this becomes problemat-
ic with respect to how parents interpret CVA data.
1. We’ve already seen how parents can assume that a greater value-added score implies
a better school, without looking at the confidence intervals. However, given that CVA
models use data from pupils who have been in the school for, for example, the last
seven years, they may not actually provide insights for parents into future school
effectiveness (after all, if your child is just starting in a school you want to know how
it will perform in seven years’ time). Instead, they reflect the performance of the
school over previous years, and do not take into account any uncertainty associated
with predicting school performance into the future. When statisticians introduce
such prediction uncertainty into CVA-type models, schools’ predicted performance
is so similar, it becomes almost impossible to differentiate between schools on the
basis of their CVA score.
2. Although a CVA score for a school may provide a measure of school effectiveness,
controlled for the range of factors included in the model, when creating a model to help
parents choose a school, it is important not to control for school-level variables. This is be-
Taking A Quantitative Approach
441
cause a parent wants to know whether a particular school (with all its contextual features)
will provide a better education for their child (with his/her own characteristics) than an-
other school. If the school-level factors are associated with achievement, then this is part of
the effect the parent is interested in. Hence, using raw CVA data would not be appropriate.
Parents and others instinctively base their judgements on figures they understand, and
continuing to publish raw data about school performance is misleading. However, it is im-
portant to ensure parents and teachers realize that even value-added measures like CVA
still provide a narrow metric of school or teacher effectiveness, and that such data should
really be used only as a starting point for conversation, rather than as concrete evidence of
school or teacher effectiveness.
Достарыңызбен бөлісу: |