Anderson, Lorin W.. Classroom Assessment : Enhancing the Quality of Teacher Decision Making, Taylor & Francis Group, 2002. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/univ-people-ebooks/detail.action?docID=362333.
Created from univ-people-ebooks on 2025-03-20 10:04:33.
“What makes a good teacher?” This question has been debated at least since
formal schooling began, if not long before. It is a difficult question to answer
because, as Rabinowitz and Travers (1953) pointed out almost a half-century
ago, the good teacher “does not exist pure and serene, available for scientific
scrutiny, but is instead a fiction of the minds of men” (p. 586). Some have
argued that good teachers possess certain traits, qualities, or characteristics.
These teachers are understanding, friendly, responsible, enthusiastic,
imaginative, and emotionally stable (Ryans, 1960). Others have suggested that
good teachers interact with their students in certain ways and use particular
teaching practices. They give clear directions, ask higher order questions, give
feedback to students, and circulate among students as they work at their desks,
stopping to provide assistance as needed (Brophy & Good, 1986). Still others
have argued that good teachers facilitate learning on the part of their students.
Not only do their students learn, but they also are able to demonstrate their
learning on standardized tests (Medley 1982). What each of us means when we
use the phrase good teacher, then, depends primarily on what we value in or
about teachers.
Since the 1970s, there has been a group of educators and researchers who
have argued that the key to being a good teacher lies in the decisions that
teachers make:
Any teaching act is the result of a decision, whether conscious or
unconscious, that the teacher makes after the complex cognitive
processing of available information. This reasoning leads to the
hypothesis that the basic teaching skill is decision making.
(Shavelson, 1973, p. 18); (emphasis added)
In addition to emphasizing the importance of decision making, Shavelson made
a critically important point. Namely teachers make their decisions “after the
complex cognitive processing of available information.” Thus, there is an
essential link between available information and decision making. Using the
terminology of educational researchers, information is a necessary, but not
sufficient condition for good decision making. In other words, without
information, good decisions are difficult. Yet simply having the information
does not mean that good decisions are made. As Bussis, Chittenden, and Amarel
(1976) noted:
Decision-making is invariably a subjective, human activity
involving value judgments…placed on whatever evidence is
available…. Even when there is virtual consensus of the “facts of
the matter,” such facts do not automatically lead to decisions
regarding future action. People render decisions; information
does not. (p. 19)
As we see throughout this book, teachers have many sources of information they
can use in making decisions. Some are better than others, but all are typically
considered at some point in time. The critical issue facing teachers, then, is what
information to use and how to use it to make the best decisions possible in the
time available. Time is important because many decisions need to be made
before we have all the information we would like to have.
UNDERSTANDING TEACHERS’ DECISIONS
The awareness that a decision needs to be made is often stated in the form of a
should question (e.g., “What should I do in this situation?”). Here are some
examples of the everyday decisions facing teachers:
1. Should I send a note to Barbara’s parents informing them that she
constantly interrupts the class and inviting them to a conference to
discuss the problem?
2. Should I stop this lesson to deal with the increasing noise level in the
room or should I just ignore it, hoping it will go away?
3. What should I do to get LaKeisha back on task?
4. Should I tell students they will have a choice of activities tomorrow if
they complete their group projects by the end of the class period?
5. What grade should I give Jorge on his essay?
6. Should I move on to the next unit or should I spend a few more days
reteaching the material before moving on?
Although all of these are should questions, they differ in three important ways.
First, the odd-numbered questions deal with individual students, whereas the
even-numbered questions deal with the entire class. Second, the first two
questions deal with classroom behavior, the second two questions with student
effort, and the third two questions with student achievement. Third, some of the
decisions (e.g., Questions 2, 3, and, perhaps, 6) must be made on the spot,
whereas for others (e.g., Questions 1, 4, and, to a certain extent, 5) teachers have
more time to make their decisions. These should questions (and their related
decisions), then, can be differentiated in terms of (a) the focus of the decision
(individual student or group), (b) the basis for the decision (classroom behavior, effort, or achievement), and (c) the timing of the decision (immediate or longer term). This structure of teacher decision making is shown in Fig. 1.1.
Virtually every decision that teachers make concerning their students can be
placed in one of the cells of Fig. 1.1. For example, the first question concerns
the classroom behavior of an individual student, which the teacher can take
some time to make. This question, then, would be placed in the cell
corresponding with classroom behavior (the basis for the decision) of an
individual student (the focus of the decision), with a reasonable amount of time
to make the decision (the timing of the decision). In contrast, the sixth question
concerns the achievement of a class of students and requires the teacher to make
a rather immediate decision. This question, then, would be placed in the cell
corresponding with achievement (the basis for the decision) of a class of
students (the focus of the decision), with some urgency attached to the making
of the decision (the timing of the decision).
UNDERSTANDING HOW TEACHERS MAKE DECISIONS
On what basis do teachers make decisions? They have several possibilities.
First, they can decide to do what they have always done:
• “How should I teach these students? I should teach them the way I’ve
always taught them. I put a couple of problems on the overhead
projector and work them for the students. Then I give them a worksheet
containing similar problems and tell them to complete the worksheet
and to raise their hands if they have any trouble.”
• “What grade should I assign Billy? Well, if his cumulative point total
exceeds 92, he gets an ‘A.’ If not, he gets a lower grade in accordance
with his cumulative point total. I tell students about my grading scale at
the beginning of the year.”
Teachers who choose to stay with the status quo tend to do so because they
believe what they are doing is the right thing to do, they have become
comfortable doing it, or they cannot think of anything else to do. Decisions that
require us to change often cause a great deal of discomfort, at least initially.
Second, teachers can make decisions based on real and practical constraints,
such as time, materials and equipment, state mandates, and personal frustration:
• “How much time should I spend on this unit? Well, if I’m going to
complete the course syllabus, I will need to get to Macbeth by February
at the latest. That means I can’t spend more than three weeks on this
unit.”
• “How should I teach my students? I would love to incorporate
computer technology But I only have two computers in my classroom.
What can I do with two computers and 25 students? So I think I’ll just
stay with the ‘tried and true’ until we get more computers.”
• “What can I do to motivate Horatio? I could do a lot more if it weren’t
for those state standards. I have to teach this stuff because the state says
I have to, whether he is interested in learning it or not.”
• “Where does Hortense belong? Anywhere but in my class. I’ve tried
everything I know…talked with the parents…talked with the guidance
counselor. I just need to get her out of my class.”
Although maintaining the status quo and operating within existing constraints
are both viable decision-making alternatives, this is a book about making
decisions based on information about students. At its core, assessment means
gathering information about students that can be used to aid teachers in the
decision-making process.
SOURCES OF INFORMATION
It seems almost trivial to point out that different decisions require different
information. Nonetheless, this point is often forgotten or overlooked by far too
many teachers and administrators. How do teachers get the information about
students that they need to make decisions? In general, they have three
alternatives. First, they can examine information that already exists, such as
information included in students’ permanent files. These files typically include
students’ grades, standardized test scores, health reports, and the like. Second,
teachers can observe students in their natural habitats—as students sit in their
classrooms, interact with other students, read on their own, complete written
work at their desks or tables, and so on. Finally they can assign specific tasks to
students (e.g., ask them questions, tell them to make or do something) and see
how well they perform these tasks. Let us consider each of these alternatives.
Existing Information
After the first year or two of school, a great deal of information is contained in a
student’s permanent file. Examples include:
• health information (e.g., immunizations, handicapping conditions,
chronic diseases);
• transcripts of courses taken and grades earned in those courses;
• written comments made by teachers;
• standardized test scores;
• disciplinary referrals;
• correspondence between home and school;
• participation in extracurricular activities;
• portions of divorce decrees pertaining to child custody and visitation
rights; and
• arrest records.
This information can be used to make a variety of decisions. Information that a
child is a diabetic, for example, can help a teacher make the proper decision
should the child begin to exhibit unusual behavior. Information about child
custody enables an administrator to make the right decision when a noncustodial
parent comes to school to pick up the child. Information about a child’s grades
can be used to determine whether the child should be placed on the Principal’s
List or Honor Roll. Information about previous disciplinary referrals typically
provides the basis for determining the proper punishment following an incident
of misbehavior. Information obtained from standardized test scores is used in
many schools to decide whether a child should be placed in a class for gifted and
talented students or whether the child is in need of academic assistance.
Although all of these examples pertain to individual students, it is possible
and, in many cases, desirable to combine (or aggregate) the data to provide
information about groups of students. How many students are on the Principal’s
List or Honor Roll? Are there the same numbers of boys and girls? Have these
numbers (or percentages) changed over the past several years? How many
disciplinary referrals have occurred this year? Are the numbers of referrals the
same for Whites, Blacks, Hispanics, Asians, and so on? Are the numbers of
referrals increasing, decreasing, or staying the same? How many students score
in the highest quarter nationally on the standardized tests… in the lowest
quarter…above the national average? Are the scores the same for Whites,
Blacks, Hispanics, Asians, and so on? For boys and girls? On average, are the
scores increasing, decreasing, or remaining the same?
Interestingly, some administrators and teachers are concerned about the use
of the information contained in students’ permanent records to make decisions.
Specifically they are concerned that this information may bias the person
accessing it. Because of this bias, for example, a student may be improperly
labeled as a troublemaker and treated accordingly Alternatively, a teacher may
bring a bias to the information contained in a student’s permanent file. Such a
teacher may search through the file for information supporting his or her
perception that the student is incapable of learning the material being covered in
class. As we shall see throughout this book, the problems of the
misinterpretation and misuse of information are serious indeed. However, these
problems are no more likely to occur with information contained in students’
permanent files than with any other information source.
Naturalistic Observations
There is ample evidence that many of the immediate decisions that teachers
make are based on their observations of students in their classrooms (Clark &
Peterson, 1986). In fact, the available evidence suggests that teachers make
some type of decision every 2 minutes they are in their classrooms and rely
heavily on this observational information to do so (Fogarty, Wang, & Creek,
1982; Jackson, 1968). The logic of this decision-making process is as follows:
1. If my students are not disruptive, they are complying with the
classroom rules.
2. If my students are paying attention, they are probably learning.
3. So, if my students are not disruptive and are paying attention, then I
shall continue to teach the way I am teaching since my instruction is
probably effective.
However, if students are engaged in disruptive behavior or are not paying
attention, then there is a need to do something different. Yet to do something
different requires that a decision about what to do differently be made and carried out. If, for example, a fair number of students have puzzled looks on
their faces, the teacher may decide to go back over the material one more time.
If only 3 of the 25 students in the class have completed a written assignment, the
teacher may decide to give them 10 more minutes to complete it. If Dennis
seems to be daydreaming, the teacher may decide to call on Dennis to answer a
question. Finally, if Denise is sitting at her desk with her hand raised, the teacher
may decide to walk to her desk and give her some help.
When making decisions about groups of students, not every student need be
included in the decision-making process. For example, the aforementioned
puzzled looks may be on the faces of only five or six students. About 30 years
ago, Dahloff (1971) suggested that teachers used steering groups to help them
make decisions about the pace of instruction. These groups, typically composed
of four or five students, literally steer the pacing through the curriculum units. If
it appears that this group of students got it, the teacher moves on. If not, he or
she reviews the material one more time or tries a different approach to get the
material across. Quite obviously, the pace of instruction is directly related to the
academic composition of the steering group. Including higher achieving students
in the steering group results in more rapid pacing; the reverse is true for groups
composed primarily of lower achieving students.
Naturalistic observations are an important and quite reasonable way for
teachers to get the information they need to make decisions because teachers are
constantly engaged in observation. In addition, the feedback they receive from
observations is immediate, unlike test data that may require days, weeks, or
months (in the case of statewide achievement tests or commercial normreferenced
tests) to process. However, information obtained via naturalistic
observation can be misleading. The puzzled looks may be a ploy on the part of
students to stop the teacher from moving forward. The reason that only 3 of the
25 students have completed the assignment may be that the rest of the students
do not know how to do the work. In this case, giving them 10 additional minutes
without some instructional intervention would be a waste of time. Dennis may
be concentrating, not daydreaming. Denise may be stretching, not raising her
hand for assistance.
Assessment Tasks
Following tradition, if teachers want to know whether their students have
learned what they were supposed to learn, how students feel about what they are
learning, how they perceive their classroom environment, and so on, they
administer quizzes, tests, or questionnaires. These assessment instruments
typically contain a series of items (e.g., questions to be answered, incomplete
sentences to be completed, matches to be made between entries in one column
and those in another). In some cases, the instrument may contain a single item.
In these cases, this item often requires that the student produce an extended response (e.g., write an essay about…; demonstrate that…). To simplify matters,
we refer to all of the items included on these instruments, regardless of their
structure, format, or number, as assessment tasks. Occasionally when needed to
maintain the flow of the writing, “item” also will be used.
Different tasks may require different responses from students. The nature of
the required response is inherent in the verb included in the task description
(“Write an essay about…”) or in the directions given to students about the tasks
(“Circle the option that…”). In general, these verbs ask students to perform
some action (e.g., write, demonstrate) or select from among possible responses
to the task (e.g., circle, choose). Not surprisingly, the first set of tasks is referred
to as performance tasks, whereas the second set of tasks is referred to as
selection tasks.
Which tasks we should use to get the information we need to make a decision
depends primarily on the type of information we need to make the decision. For
example, if we need information about how well students have memorized the
authors of a series of novels, it seems reasonable to use a selection task—
specifically, one presented to students in a matching format (with titles of novels
listed in one column and novelists listed in another). However, if we need
information about how well students can explain a current event (e.g.,
nationwide increases or decreases in school violence) in terms of various
historical and contemporary factors, a performance task (e.g., a research report)
may be more appropriate. Finally suppose we want information about how well
students like school. In this case, either selection tasks (such as those included
on traditional attitude scales) or performance tasks (such as a written response to
the prompt “Write a brief essay describing the things you like most and least
about this school”) could be used. This last example illustrates that assessment
tasks are not limited to what traditionally has been termed the cognitive domain.
This is an important point, one that reappears throughout this book.
However, contrary to what you might read elsewhere, certain forms of
assessment tasks are no better or worse than others. There are people who
relentlessly bash multiple-choice tests. There are those who advocate
performance assessment with what can only be termed religious zeal. Of course,
there are those who believe that any form of standardized testing is bad. Based
on my 30 years of experience, I have learned an important lesson (one that you
hopefully will learn in much less time). Assessment tasks are like tools in a
carpenter’s toolbox. Like a good carpenter, a good assessor has a variety of tools
that he or she learns to use well to accomplish the intended purpose(s). Just as a
carpenter may strive to build the best house possible, a teacher should strive to
make the best decisions possible. One important element of good decision
making is the quality of information on which the decision is based.
Before we move to a discussion of the quality of information obtained via
assessment, however, one final comment about assessment tasks is in order.
Many assessment tasks look much like what may be termed learning tasks.
Consider the following task.
You find the following artifacts during an archeological dig.
[Pictures of six artifacts are shown here]. Determine the likely
purpose and origin of each artifact. Considering all six artifacts,
describe the likely traits of the people who made or used them.
(Adapted from http://www.relearning.org/.)
This certainly is a task. Specifically, the students are to examine the six artifacts
and, based on this examination, (a) determine their likely purpose and origin,
and (b) describe the likely traits of the people who made or used them. Yet, is
this an assessment task? Unfortunately, you cannot answer this question by
looking at it no matter how closely carefully or often. To answer this question,
you have to know or infer why the task was given to the students in the first
place. If the task were given to help students learn how to determine the
purposes and origins of artifacts, and how to determine the likely traits of the
people who made or used them, then it is a learning task (because it is intended
to help students learn). In contrast, if it is given to see how well students have
learned how to determine the purposes and origins of artifacts, and the likely
traits of the people who made or used them after some period of instruction, then
it would be an assessment task.
The confusion between learning tasks and assessment tasks looms large in
many classrooms because tasks are such an integral part of classroom
instruction. If you enter almost any classroom, you are likely to see students
completing worksheets, solving problems contained in textbooks, constructing
models of theaters or atoms, or engaging in experiments. Because they are
assigned to students, these tasks are often called assignments (which is
shorthand for assigned tasks).
On the surface, the issue here is quite simple. Whatever they are called, are
they given to promote or facilitate learning or are they given to assess how well
learning has occurred? In reality however, the issue is quite complex. Teachers
often assess student learning while students are engaged in learning tasks. In this
situation, the task serves both learning (for the students) and assessment (for the
teacher) purposes.
Consider the archeological dig example previously mentioned. Suppose for a
moment that it truly is a learning task. That is, the task is intended to help
students learn how to examine historical artifacts in terms of their purposes and
origins, as well as the traits of the people who made or used them. Suppose
further that students are to work on this task in pairs. As they work, the teacher
circulates among the students visually monitoring their progress or lack thereof.
As problems are noted via this observational assessment, the teacher stops and
offers suggestions, hints, or clues. At the end of the class period, the teacher collects the assignment, reads through what the students have written, writes
comments, and offers suggestions for improvement. At the start of the next class
period, the teacher gives the assignment back to the students and tells them to
revise their work based on the feedback he or she has provided them.
Some would argue this is the perfect blend of instruction and assessment
because the task serves both purposes: learning and assessment. Others have
argued that the link between instruction and assessment is so tight in this
situation that there is no independent assessment of whether the intended
learning actually occurred (Anderson et al., 2001). Because teachers often
provide assistance to students as they work on learning tasks, the quality of
students’ performance on learning tasks is influenced by the students as well as
their teachers. In other words, when assessments are made based on student
performance on learning tasks (rather than specifically designated assessment
tasks), teachers are simultaneously assessing the quality of student learning and
their own teaching.
THE QUALITY OF INFORMATION
Before there were classroom assessment books, there were tests and
measurement books. If you were to read these tests and measurement books, you
would find chapters, sections of chapters, or, occasionally, multiple chapters
written about validity, reliability and objectivity. Unfortunately the chapter titles
in these books are sometimes misleading. For example, you may find a chapter
entitled “Test Validity.” The title suggests that validity is inherent in the tests
themselves. This simply is not true. Validity pertains to the test scores. Stated
somewhat differently validity is an indicator of the quality of the information
obtained by administering a test to a student or group of students.
All three concepts—validity, reliability, and objectivity—have to do with the
quality of the information obtained from tests or other assessment instruments or
methods. Because these concepts have long historical standing in the field, we
review each of them briefly. The focus of this brief review is on their practical
application to classroom assessment. To aid in the discussion, we rely on two
examples: one concerning individual student achievement and the other
concerning individual student effort.
Validity
In general terms, validity is the extent to which the information obtained from an
assessment instrument (e.g., test) or method (e.g., observation) enables you to
accomplish the purpose for which the information was collected. In terms of
classroom assessment, the purpose is to inform a decision. For example, a teacher wants to decide on the grade to be assigned to a student or a teacher
wants to know what he or she should do to get a student to work harder.
To simplify the grading example, let us assume that we are assigning a grade
based on a student’s performance on a single test. Let us further assume that the
test represents a unit of material that requires about 3 weeks to complete. Finally
let us assume that we want the grade to reflect how well the student has
achieved the stated unit objectives. What are the validity issues in this example?
First, we want to make sure that the items on the test (i.e., the assessment tasks)
are directly related to the unit objectives. This is frequently termed content
validity. Second, we want to make sure that the proportions of the items related
to the various objectives correspond with the emphasis given to those objectives
in the unit. This is frequently referred to as instructional validity. Third, we want
to assign the grade based on how well the students have mastered the objectives,
not based on how well they perform relative to other students. In common
parlance, we want to make a criterion-referenced, not a norm-referenced
decision. To do this, we need defensible performance standards. The greater the
content validity, the greater the instructional validity and the more defensible the
performance standards, the greater the overall validity of the information
obtained from the test for the purpose of assigning a grade to a student.
Let us move to the effort example. By phrasing the question the way we did
(i.e., What should I do to get this student to work harder?), we have already
made one decision—namely, that the student in question does not work very
hard (or certainly as hard as you, the teacher, would like). On what basis did we
arrive as this determination? Typically, the information used to make this
decision comes from naturalistic observations. “The student is easily distracted.”
“The student neither completes not turns in homework.” The initial validity
question in this case concerns the validity of the inference about lack of effort
made based on the observational information.
Effort is what psychologists refer to as a construct. That is, it is a hypothetical
or constructed idea that helps us make sense of what we see and hear. From an
assessment point of view, constructs must be linked with indicators.
Distractibility and no homework are negative indicators. That is, they are
indicators of a lack of effort. The issue before us is whether they are valid
indicators of a lack of effort. To address this issue, we must consider alternative
hypotheses. Perhaps distractibility is a function of some type of neurological
disorder. Similarly no homework may be a function of a lack of the knowledge
needed to complete the homework. These alternative hypotheses must be
examined and subsequently ruled out if we are to accept that distractibility and
no homework are valid indicators of a lack of effort. This is the process of
construct validity.
Yet this is only part of the validity puzzle. The question raised by the teacher
is what he or she should do to get the student to worker harder. To answer this
question, we need additional information—information not available by means of observation. Specifically, we need to know why the student is not putting
forth sufficient effort. We explore the why question in greater detail throughout
the book.
Reliability
Reliability is the consistency of the information obtained from one or more assessments.
Some writers equate reliability with dependability, which conjures up
a common-sense meaning of the term. A reliable person is a dependable one—a
person who can be counted on in a variety of situations and at various times.
Similarly, reliable information is information that is consistent across tasks,
settings, times, and/or assessors. Because the issue of consistency across assessors
typically falls under the heading of objectivity, it is discussed a bit later.
Let us return to the grading example introduced in the validity section.
Continuing with our assumptions, let us suppose that the test is a mathematics
test and consists of a single item (If .2x+7.2=12, what is the value of x?).
Furthermore, suppose that the major objective of the unit was for students to
learn to solve for unknowns in number sentences (content validity) and that all
but 10% of time devoted to the unit focused on learning to solve for unknowns
in number sentences (instructional validity). Finally, suppose the grading was
pass or fail. That is, if the student arrives at the right answer (i.e., x=24), he or
she is assigned a grade of pass (regardless of how many other students got the
item right). Otherwise, a grade of fail is assigned.
Items such as this one would have a high degree of validity for the purpose of
assigning student a grade. Yet where does a single-item test stand in terms of
reliability? How much confidence would you place in the results of a single-item
test? Hopefully, not a great deal. The use of multiple items permits us to
investigate consistency across tasks. Typically, the greater the number of tasks
(obviously to some limit), the greater the reliability of the information we obtain
from a student’s performance on those tasks, which in turn is used to make
inferences about that student’s achievement on the unit objectives.
We may be interested in other types of consistency of student task
performance. For example, we can readminister the test in 2 weeks to examine
the consistency of the information over time. We may arrange for two
administrative conditions, one in the regular classroom and the other in the
library, to see whether the information remains consistent across settings.
Finally, we can have two different tests, each composed of a random sample of
all possible items that could be written for this objective (e.g., .4x−6=−2; 15x
+10=160; 1.8x−18=0; 3x+2=38; −6x−6=−60; 24x+24=480). By administering
the two tests, we could determine whether students’ scores are the same on the
two tests or whether a student’s score depends on the particular test that was
administered. All of these are ways to examine the reliability of the information
we gather when we assess students.
Turning to the effort example, the initial determination that the student was
lacking effort can be subjected to the same type of examination. We can observe
the student in different classes (settings), learning different subjects (tasks), and
in the morning and the afternoon (time). By comparing the different results, we
may learn that the student is distractible and fails to turn in completed
homework only when he or she is in my class. In other words, I may be a source
of the problem. If so, this increased understanding moves me toward answering
the primary question: What can I do to get this student to work harder?
Unreliability is a problem because it creates errors. More specifically
unreliability decreases the precision with which the measurement is made. Quite
clearly imprecise measurement is not a good thing. In classroom assessment,
however, as the previous example shows, a lack of reliability (i.e., consistency)
may be informative. Knowing what accounts for the inconsistency of the
information that we collect may provide the understanding we need to begin to
solve the problem and arrive at a defensible decision.
Most of the time, however, having reliable information is a good thing. When
grades need to be assigned, having consistency of student performance across
assignments and tests makes the decision to assign a grade of, say, “B” more
defensible. How do we justify a grade of “B” on a report card if the student has
received grades of “A” on all homework and three quizzes and a grade of “C”
on the two projects? We can argue that it is best to compute an average when
confronted with such inconsistency, but that does not deal with the inconsistency
per se. The inconsistency can result from differential validity (i.e., the quizzes
and projects are measuring different things) or a lack of reliability (assuming the
same thing is being measured). Regardless of the cause of inconsistency, it must
be dealt with in some way.
Reliability of assessment information is particularly important when
potentially life-altering decisions are made about students. For example, a
decision to classify a student as a special education student must be based on
information that is remarkably consistent over tasks, time, settings, and
observers/assessors.
The point here is that, like validity, the reliability of information must be
examined in the context of the purpose for which the information is gathered. A
single case of documented sexual harassment by one student of another is
generally sufficient grounds for suspension, expulsion, and, perhaps, criminal
prosecution. For other qualities and characteristics (e.g., laziness), however,
information taken from several occasions and situations is needed.
Objectivity
In the field of tests and measurement, objectivity means that the scores assigned by different people to students’ responses to items included on a quiz, test, homework assignment, and so on are identical or, at the very least, highly similar. If a student is given a multiple-choice test that has an accompanying answer key, then anyone using the answer key to score the tests should arrive at the same score. Hence, multiple-choice tests (along with true-false tests, matching tests, and most short answer tests) are referred to as objective tests.
Once again, as in the case of test validity, this is a bit of a misnomer. It is the
scores on the tests, not the tests per se, that are objective.
The importance of objectivity in assessment can perhaps best be understood
if we consider the alternative. For example, suppose that a student’s score on a
particular test depends more on the person scoring the test than it does on the
student’s responses to the assessment tasks. Quite clearly, this is not an
acceptable condition.
As might be expected, concerns for objectivity are particularly acute with
performance tasks (e.g., essays, projects). However, there are several ways to
increase the objectivity of scoring the responses made by students to these
assessment tasks. The three major ways are to:
• use a common set of scoring or evaluation criteria;
• use a scoring rubric and so-called anchors to enhance the meaning of
each criterion (see chap. 4); and
• provide sufficient training to those responsible for doing the scoring.
Within the larger assessment context, the concept of objectivity can, and
probably should, be replaced by the concept of corroboration (see chap. 7). This
is quite evident in the sexual harassment example mentioned earlier. Although it
only takes one instance of sexual harassment to render a decision of school
suspension or expulsion, we must be certain that the information we have
concerning that instance is corroborated by others.
ISSUES IN THE ASSESSMENT OF STUDENTS
Because assessment is linked to decision making, and because an increasing
number of decisions made about students have serious, long-term consequences,
teachers must take assessment seriously. In this section, we consider three issues
teachers should be aware of and must address in some fashion: (a) ethics of
assessment, (b) preparing students for assessment, and (c) standardization and
accommodation.
The Ethics of Assessment
Ethical matters pertain to all types of assessment by virtue of the fact that
information about individual students is collected. In fact, because teaching is
partly a moral enterprise, ethics pertain to virtually every aspect of classroom
life. Both the National Education Association and the American Federation of Teachers have issued ethical standards for teachers’ relations with their students.
Four of these are particularly applicable to formal assessment.
First, strive to obtain information that is highly valid, reliable, and objective
before making important decisions that affect students. For example, a semester
grade should be based on a student’s performance on assessment tasks that are
clearly linked with important learning objectives. Similarly referring a student
for additional testing or for placement in a particular program requires that
assessment information is obtained in multiple situations and on several
occasions.
Second, recognize the limitations inherent in making decisions based on
information obtained from assessments. Despite all efforts to secure the most
valid, reliable, and objective information, errors occur and mistakes are made.
Hence, multiple sources of information generally provide a more defensible
basis for making important decisions about students. Corroboration, consistency
and concern for consequences are three Cs that are extremely useful in good
decision making (see chap. 7).
Third, do not use information obtained from assessments to demean or
ridicule students. Although this should go without saying, it is important to keep
in mind. In an attempt to motivate a student to do better in the future, a teacher
may make statements that can be construed as demeaning by the students (e.g.,
“Do you enjoy always getting the lowest score in the class?”). Also out of
frustration, teachers may make comments they later regret. Following a
classroom observation, I remember asking one teacher if she was unaware that
one student slept through the entire lesson. Her response, overheard by several
students, was, “It’s better to let sleeping dogs lie.”
Fourth, do not disclose assessment information about students unless
disclosure serves a compelling professional purpose or is required by law.
Students have the right of privacy. The Family Educational Rights to Privacy
Act (FERPA), enacted in 1974, was intended to protect students and their
parents from having their personal and academic records made public without
their consent. FERPA requires that:
- Student records must be kept confidential among the teacher, student, and student’s parents or guardian.
- Written consent must be obtained from the student’s parents before disclosing that student’s records to a third party.
- Parents must be allowed to challenge the accuracy of the information kept in their children’s records.
- Students who have reached the age of 18 must be accorded the same rights formerly granted to their parents.
It is important to note that the term records as used in the legislation includes
such things as hand-written notes, grade books, computer printouts, taped
interviews, and performances (Gallagher, 1998).
Preparing Students for Assessment
To obtain the best information possible, students should be prepared for each
formal assessment—that is, an assessment consisting of a set of assessment
tasks. A set of guidelines that can help teachers prepare students for formal
assessments is shown in Table 1.1.
The first guideline is that students should be made aware of the assessment.
They should know when it will take place as well as its purpose, structure,
format, and content. Students should be given sufficient information to answer
the basic journalistic questions: who, where, where, why, when, and how.
The second guideline concerns the emotional tone established for the
assessment—a tone quite often set by teachers. Students should be told to do
their best (on assessments of achievements) or reminded of the importance of
indicating their true beliefs and feelings (e.g., on affective assessments related to
effort). In addition, they should be reminded of the importance of the assessment
in their own lives. Does one half of their grade depend on their performance?
Are their chances of entering the college of their choice affected by their
performance?
The third through fifth guidelines are concerned with the way in which
students approach the assessment once it begins. They should read and follow
the directions. They should pace themselves, periodically checking their
progress relative to the available time. They should skip over those items they
do not know or understand, returning to them if they have sufficient time.
TABLE 1.1 Guidelines for Preparing Students for Formal Assessments
1. Announce the assessment in advance and inform students of the purpose, structure, format, and content of the assessment instrument or procedure.
2. Approach the assessment positively, yet honestly.
3. Remind students to pay careful attention to the directions and follow the directions exactly
4. Tell students to pace themselves so they complete the entire assessment.
5. Tell students to skip over items if they do not know the answer and come back to them later, time permitting. If they do get back to those items, tell them to make educated guesses for as many items as possible in the time remaining. (By an educated guess, I mean that students are certain that one or more of the choices of answers is definitely wrong.)
6. For essay tests, tell students to plan and organize their essays before writing them.
7. Tell students to be in good physical and mental condition for the assessment (e.g., to get a good night’s sleep, to eat a good breakfast). If it is high-stakes assessment, send a note home reminding parents/guardians about the upcoming assessment.
The sixth guideline concerns essay examinations, while the seventh concerns
high-stakes tests. Students should prepare an outline of their major points prior
to writing the essay. If the assessment is high stakes (i.e., the results of the
assessment are likely to have serious consequences for the students), sufficient
rest and nutrition should be encouraged, even to the point of reminding parents
of this.
Some readers might suggest that these guidelines are more important for
secondary school students than for elementary school students. Although this
may have been the case in the past, it is no longer true. Critical decisions are
made about students based on assessment information obtained as early as first
grade. Is this student ready for school? The assessment needed to make this
decision typically focuses on achievement and effort. Does this student have
attention deficit-hyperactivity disorder (ADHD)? The assessment needed to
make this decision focuses primarily on classroom behavior and, to a lesser
extent, effort. Should this student be placed in a special education program?
Depending on the program being considered, this assessment may focus on
achievement, effort, or classroom behavior. An increasing number of decisions
that affect a child’s future, both in and out of school, are made in the early
grades. We need to prepare students for these decisions, and we need to get them
(the decisions) right.
Standardization and Accommodation
There seems to be a great deal of confusion about the term standardization.
Some educators would lead you to believe that the term applies only to high-stakes
tests. Others would have you believe that it applies only to so-called
norm-referenced tests. Still others would have you believe that only commercial
testing companies have standardized tests. In point of fact, the term standardized
applies to virtually all formal assessments.
Standardization simply means that the same set of assessment tasks is given
to all students in the same order under the same assessment conditions. This
definition is consistent with the dictionary definition of standardized—that is,
uniform. Using this definition, most teacher-made tests and quizzes are
standardized, as are most structured observation forms and student self-report
instruments.
There are at least two practical reasons for standardization. First, only one
instrument needs to be prepared. This saves time at the front end of the
assessment process. Second, standardization permits large-group administration
of instruments. This saves time at the back end of the assessment process.
At the same time, however, there have been an increasing number of calls for
non-standardization. Most of these fall within the general category of
accommodation. Let us be clear from the outset. Accommodation deals with the
how, not the what, of assessment. This critical distinction between how and what has been upheld by two federal rulings (Anderson v. Banks, 1981;
Brookhart v. Illinois Board of Education, 1982).
To better understand this distinction, let us look at the kinds of
accommodations that are appropriate. A list of possible accommodations is
shown in Table 1.2. These accommodations can be understood in one of two
ways. First, we can examine them in terms of whether they focus on the
administrative conditions, the way in which the tasks are presented to the
students, or the way in which students respond to the tasks. Within this
framework, Accommodations 1 to 3 address the administrative conditions,
Accommodations 4 to 6 address the presentation of tasks, and Accommodations
7 and 8 address the responses to the tasks.
Second, and perhaps more important, the accommodations can be examined
in terms of the problem they are trying to solve. When examined in this way the
following pattern emerges:
• Auditory difficulties (Accommodation 6);
• Visual difficulties (Accommodations 4, 5, 6, and 8);
• Time constraint difficulties (Accommodation 1);
TABLE 1.2 Possible Student Accommodations
1. Provide extra time for the assessment.
2. Change the setting of the assessment to cut down on distractions.
3. Make the assessment “open book,” including notes.
4. Read directions orally and give students ample opportunity to ask questions.
5. Read questions to students.
6. Use a special edition of the assessment (e.g., large print, audiotapes).
7. Give examples of how to respond to the tasks.
8. Permit students to respond orally, audiotaping their responses.
• Behavioral/anxiety difficulties (Accommodations 2, 4, and 7); and
• Memory difficulties (Accommodation 3).
This way of examining the issue of accommodation returns us to one of the
major points made earlier in this chapter. Solving problems requires that we
move beyond description of the problem to an explanation of its causes. Using a
large-print edition of the assessment or reading the directions and/or questions to
students is only appropriate for students with visual or reading difficulties. It is
unlikely to help those students whose problems stem from auditory, time
constraint, behavioral/anxiety, or memory difficulties.
One final matter concerning non-standardization must be addressed before
moving to the next chapter. In addition to matching the hypothesized cause of
the problem, the reasonableness of some accommodations depends on the nature
of the assessment per se. If we are assessing reading comprehension, then
reading the items to students leads to an invalid assessment. If accommodations
are to made, then the what of assessment needs to be changed: Reading
comprehension would have to be replaced with “understanding text” (Gredler,
1999, p. 254). If we are assessing students’ ability to recall mathematical or
scientific formulas, then the accommodation of giving students the formulas to
“lessen reliance on memory” (Airasian, 1997, p. 204) would be inappropriate.
However, we are assessing students’ ability to apply mathematical or scientific
formulas, then this accommodation would be quite appropriate.
Unfortunately, the topic of accommodation is discussed primarily in the
context of special education. As Phillips (1994) noted, however, allowing all
students access to useful accommodations may be fair to low-achieving students
as well. Although by law they are not entitled to supportive accommodations,
they often would benefit by having this opportunity to demonstrate their
capabilities within traditional, standardized assessment settings.
ASSESSMENT AND DECISION MAKING
As mentioned earlier in this chapter, high-quality information does not
guarantee that the wisest of decisions are made. At the same time, however, wise
decisions generally require high-quality information. So what is the relationship
between assessment and decision making? Where does evaluation fit into this
picture? Once again, the answers to these questions depend on the type of
decision that needs to be made.
In addition to the difference among decisions shown in Fig. 1.1, decisions can
also be differentiated in terms of what might be termed straightforward
decisions and what might be termed problematic decisions. Straightforward
decisions are those that can reasonably be made based on the information
available at the time. Decisions pertaining to the grading of students tend to be
straightforward decisions. Other rather straightforward decisions are how to
arrange classrooms, where to focus time and effort, and whether to seek advice
and counsel from others.
Problematic decisions, in contrast, are those that typically require information
beyond the initial information available. Decisions as to how best to motivate
students tend to be problematic decisions. We may make a fairly straightforward
inference that a student is unmotivated based on observational data. The student
does not come to class and, when in class, he or she sleeps. The student rarely
hands in assignments and the assignments turned in are of poor quality. In
common parlance, we have identified the problem. However, what we do to solve the problem is not at all straightforward. How do we go about arranging
conditions in which the student would likely be more motivated? Other
problematic decisions include:
• What should I teach students in the limited time available?
• How much time should I spend on a particular unit or topic?
• What should I do to help students who are having serious and
continuous difficulty learning?
Quite clearly, no decision falls neatly into either category There is some overlap
between them (i.e., a gray area, so to speak). In addition, there are decisions that
are straightforward to a point and then become problematic. However, the
dichotomy between straightforward and problematic decisions is a useful place
to start the discussion of the relationships among assessment, decision making,
and evaluation.
Assessment and Straightforward Decisions
If a teacher decides that 94 points earned during a semester results in a grade of
“A” being assigned, and if a particular student earns 94 points, then the decision
to give the student a grade of “A” is straightforward. (We are, of course,
assuming that the 94 points are earned on the basis of reasonably valid, reliable,
and objective assessment instruments.) Similarly suppose a teacher examines the
item response summary for a particular commercial mathematics test
administered to his or her students. The teacher notices that his or her students
do far worse on the computation items than on the concept and problem-solving
items. A reasonably straightforward decision may be to spend more time and
effort on mathematical computation next year.
Although these decisions are rather straightforward, it is important to note
that there is some gray area even in these decisions. In terms of grading, what do
you do with a student who earns 93.5 points? Should you round a 93.5 to a 94
(hence, an “A”) or is a “miss as good as a mile” (hence, a “B”)? What about
assigning an A− or a B+? Similarly, with respect to the use of the data from the
item response summary do I want to shift my limited instructional time away
from conceptual understanding and problem solving to computation (which is
quite reasonable if my desire it to improve student test performance) or do I
want to stay the course (because I believe that conceptual understanding and
problem solving are far more important than computational accuracy)?
Thus, although these decisions are classified as straightforward, they still
require some thought. Almost without exception, decisions involve a choice of
one alternative over another. The road not taken is not necessarily the wrong
road.
Assessment and Problematic Decisions
Calfee and Masuda (1997) suggested that much of classroom assessment is
applied social science research. Social science research studies begin with the
formulation of questions (Questions) and move to the collection of the data
needed to answer the questions (Data Collection). Once data are collected, there
is the need to interpret the data so that the questions can be answered (Data
Interpretation). Ultimately, there is the “job of deciding what to do with the
interpreted findings, the bottom line” (p. 82) (Decision Making). Virtually every
problematic decision mentioned earlier requires a research-oriented or inquirybased
approach.
As mentioned earlier, assessment begins when teachers pose should questions
(Questions). To answer these questions, teachers need information (Data
Collection). To reiterate, the type of assessment depends primarily on the nature
of information that is needed to answer the question. Next, teachers must make
sense of the information. This is what researchers mean by the phrase Data
Interpretation. Once teachers make sense of the information they have available
to them, they have to decide what to do with it (Decision Making). More
specifically what actions should I take based on the answer to the question I
posed? From an assessment perspective, this is, in fact, the bottom line—when
the decision is made and resultant actions are planned and implemented.
Often, as in the case of the motivation example, the resultant action is to
collect more information. It is important to point out, however, that it is not
more of the same information. Rather, the initial question is likely modified
based on the initial information, and the new information helps the teacher
answer the modified question. The research or inquiry cycle—Question, Data
Collection, Data Interpretation, and Decision Making—begins again.
Assessment and Evaluation
Thus far in this initial chapter, we have emphasized purpose, information, and
decision making. To restate a previously raised question, “Where does
evaluation fit in to all of this?” Evaluation is a specific kind of decision.
Specifically, evaluation requires that a judgment be made about the worth or
value of something. Typically, evaluative decisions involve the concept of
goodness—good behavior, good work, and good learning. Interestingly this trio
of goods underlies the traditional elementary school report card. When behavior
has been good, the student receives a high mark in Conduct. When work has
been good, the student receives a high mark in Effort. Finally, when learning has
been good, the student receives a high mark in Achievement.
Evaluation, then, requires some standard (or standards) of goodness. Is Paul’s
behavior good enough for me to praise it? Is Paula’s effort good enough or do I
need to get on her about it? Is Paulene’s learning good enough to justify a grade
of “A?” Performance standards—whether they pertain to conduct, effort, or achievement—are generally responses to the question, “How much is good
enough?”
CLOSING COMMENT
Decision making is a critical component of effective teaching. In fact, it may be,
as Shavelson (1973) argued, the basic teaching skill. Although good information
does not necessarily produce wise decisions, having access to good information
is certainly an asset for the decision maker. In the remainder of this book, the
emphasis is the use of classroom assessment to enable teachers to improve their
decision-making capabilities.