...

Setting a Context

by user

on
Category:

education

110

views

Report

Comments

Transcript

Setting a Context
Innovation and Growth of
Large Scale Assessments
Irwin Kirsch
Educational Testing Service
February 18, 2013
Overview
• Setting a context
• Growth in Large Scale Assessments (LSA)
• Features of Large Scale Assessments (LSA)
• Growing importance of CBA
• Innovations in recent LSA (PIAAC and PISA)
• Future areas for innovation
Setting a Context
• Until relatively recently educational data were not
collected in a consistent or standardized manner.
• In 1958, a group of scholars representing various
disciplines met at UNESCO in Hamburg, Germany to
discuss issues surrounding the evaluation of schools
and students through the systematic collection of
data relating to knowledge, skills and attitudes.
• Their meeting led to the development of a
feasibility study of 13 year olds in 12 countries
covering 5 content areas and the legal entity known
as IEA in 1967.
Setting a Context
• Back in the United States the Commissioner of
Education, Francis Keppel, invited Ralph Tyler in
1963 to develop a plan for the periodic
assessment of student learning.
• Planning meetings were held in 1963 and 1964
and a technical advisory committee formed in
1965.
• In April 1969, NAEP first assessed in-school 17
year olds in citizenship, science and writing.
Setting a Context
• Tyler’s vision for NAEP was that it would focus on
what groups of students know and can do rather
than on what score an individual might receive on a
test.
• The assessment would be based on identified
objectives whose specifications would be
determined by subject matter experts.
• Reports would be based on the performance of
selected groups, not individuals, who responded
correctly to the exercises and would not rely on
grade-level norms.
Setting a Context
• Prior to IEA and NAEP there were no
assessment programs to measure students or
adults as a group.
• The primary focus of educational testing had
been on measuring individual differences in
achievement rather than on students’ learning.
• And, the data that were collected dealt
primarily with the inputs to education rather
than the yield of education.
Setting a Context
•
Interpretations would be limited to the set of items
used in each assessment. This basic approach to large
scale assessments remained in place through all of the
1970s.
•
In the 1980s programs beginning with NAEP began to
use item response theory (IRT) to allow for the creation
of scales and the broadening of inferences to include
items not included in the assessment.
•
New methodology involving marginal estimation was
developed to optimize the reporting of proficiency
distributions based on complex designs such as BIB
spiraling. This approach remains in use today.
Growth and Expansion
… not being satisfied with assertions or self
reports
… in response to policy makers and researchers
wanting to know more
… asking more challenging questions
… and creating both the need and opportunity for
new methodological and technological developments
Growth and Expansion
• Number of assessments
• Participation of countries
• Populations who are surveyed
• Domains / Constructs that are measured
• Methodology
• Modes
Growth
and
Expansion
Overview
Large-Scale International
Surveys
School-Based
Adults
PIRLS
IALS
TIMSS
ALL
PISA
PIAAC
10
STEP
Growth and Expansion
Life skills
Curriculum
Measurement
Growth and Expansion
Life skills
Curriculum
Measurement
Features of LSA Assessment
• LSA are primarily concerned with the accuracy
of estimating the distribution of a group of
respondents rather than individuals.
• In this way, the focus is on providing
information that can inform policy and further
research
• Differ from individual testing in key ways
Features of LSA Assessment
• Extensive framework development
• Sampling
• Weighting
• Use of Complex Assessment Designs
• IRT Modeling
• Population Modeling
• Connection to background variables
• Increasing reliance on CBA
Growing Importance of
Computer Based Assessments
• Until very recently all large scale national and
international assessments were paper based
assessments with some optional computer based
components.
• PIAAC (2012) was the first large scale survey of adult
skills in which the primary mode of delivery was
computer and paper and pencil became the option.
• In 2015, PISA will also use computers as the primary
mode of delivery with paper and pencil becoming an
option for countries
Why is Computer Based Assessment
Why is a Computer Delivered Assessment
Important for Surveys such as PIAAC and
Important for PISA?
PISA?
• Better reflects the ways in which students & adults
access, use and communicate information
• Enables surveys like PIAAC and PISA to broaden
the range of skills that can be measured;
• Allow these surveys to take better advantage of
both operational and measurement efficiencies that
technology can provide
Goals of the PIAAC 2012 and PISA 2015
Assessment Designs
• Establish the comparability of inferences across
countries, across assessments and across modes
• Broaden what can be measured by both
extending the existing constructs and by being
able to introduce new constructs
• Reduce random and systematic error through the
use of more complex designs, automated scoring;
use of timing information; and the use of
adaptive testing
PIAAC Main Study
Cognitive Assessment Design
ICT use from BQ
No computer
experience
Computer
experience
Fail
CBA-Core
Stage 1: ICT
Pass
CORE
4L + 4N
Pass
Fail
CBA-Core
Stage 2: 3L + 3N
Pass
Pass
Fail
LITERACY
20 Tasks
NUMERACY
20 Tasks
READING COMPONENTS
:Random
assignment
LITERACY
NUMERACY
Stage 1 (9 tasks)
Stage 2 (11 tasks)
Stage 1 (9 tasks)
Stage 2 (11 tasks)
NUMERACY
LITERACY
Stage 1 (9 tasks)
Stage 2 (11 tasks)
Stage 1 (9 tasks)
Stage 2 (11 tasks)
PS in TRE
PS in TRE
Average Proficiency Scores
By Domain and Subgroups
Literacy
Numeracy
No ICT
225
209
Failed CBA Core
255
236
Refused CBA
265
249
CBA
281
274
PSTRE
281
Cumulative Distribution of Numeracy
Proficiency by Subgroups
No ICT
Failed CBA core
Refused CBA
CBA
1.00
Cumulative Proportion
0.75
0.50
0.25
0.00
75
125
175
225
275
Numeracy proficiency
325
375
425
Percentage of
Item-by-Country Interactions
Literacy
Numeracy
PSTRE
8%
7%
3%
146 out of 1748 pairs
(76 items x 23 countries)
118 out of 1748 pairs
(76 items x 23 countries)
8 out of 280 pairs
(14 items x 20 countries)
* Literacy and numeracy interactions go across modes and time
Number of Unique Parameters for
Each Country - Numeracy
Unique
General
80
Number of Numeracy Items
70
60
50
40
30
20
10
0
Countries
Maintaining and Improving
Measurement of Trends
• Proposal for PISA 2015 is to enhance and
stabilise the measurement of trend data
• Refocus the balance between random and
systematic errors
Maintaining and Improving
Measurement of Trends
Recommended
Approach for
Measuring Trends in
PISA 2015 and
Width conveys the relative
Beyond
Construct Coverage
in the Current PISA
Design by Major and
Minor Domains
number of students who
respond to each item within
the domain
MAJOR
minor
minor
Construct
The
reduced height of the bars for
Coverage
the minor domains represents the
reduction of items in that domain
and therefore the degree to which
construct coverage has been
minor minor
MAJOR
reduced
Height of the bars represents
Recommended approach stabilizes trend
the proportion of items
through reducing bias by including all items in
measured in each assessment
each minor domain while reducing the number
cycle by domain
of students responding to each item
Maintaining and Improving Measurement of Trends
New
Items
Trend
Items
Impact Over Cycles
New Items Reflecting
New Construct
New Items Reflecting
Old Construct
MAJOR
2006
minor
2009
Domain Rotation
minor
2012
minor
2018
MAJOR
2015
minor
2021
Domain Rotation
Scientific Literacy as a major
domain - new items
Scientific Literacy
as a minor
domain – new
trend line from a
construct point of
view
Future Innovations
• Introduction of new item types
• Use of fully automated scoring
• More flexible use of languages
• Development of research around process
information contained in log files
• Introduction of more complex psychometric
models
• Development of derivative products
Summary
• Large scale international assessments continue
to grow in importance
• Computer based assessments are now feasible
and will become the standard for development
and delivery …
• better reflect the ways in people now access,
use and communicate information
• add efficiency and quality to the data
• introduce innovation that broadens what can
be measured and reported
Questions and Discussion
The design for PIAAC was able to …
• Broaden what was measured;
• Demonstrate high comparability among
countries, over time and across modes;
• Introduce multi-stage adaptive testing;
• Include the use of timing information to
better distinguish between omit and not
reached items;
• Demonstrate an improvement in the
quality of the data that was collected
Growth and Expansion
Life skills
Curriculum
Measurement
Fly UP