Екатерина Щёголева - M.Ed. - Advanced Teaching at the Elementary, Middle School and Secondary School specialization

52 подписчика

Anderson, Lorin W.. Classroom Assessment : Enhancing the Quality of Teacher Decision Making CHAPTER ONE Introduction to Classroom Assessment

20 марта 202520 мар 2025

46 мин

Anderson, Lorin W.. Classroom Assessment : Enhancing the Quality of Teacher Decision Making, Taylor & Francis Group, 2002. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/univ-people-ebooks/detail.action?docID=362333.

Created from univ-people-ebooks on 2025-03-20 10:04:33. “What makes a good teacher?” This question has been debated at least since formal schooling began, if not long before. It is a difficult question to answer because, as Rabinowitz and Travers (1953) pointed out almost a half-century ago, the good teacher “does not exist pure and serene, available for scientific scrutiny, but is instead a fiction of the minds of men” (p. 586). Some have argued that good teachers possess certain traits, qualities, or characteristics. These teachers are understanding, friendly, responsible, enthusiastic, imaginative, and emotionally stable (Ryans, 1960). Others have suggested that good teachers interact with their students in certain ways and use particular teaching practic

Оглавление

UNDERSTANDING TEACHERS’ DECISIONS
UNDERSTANDING HOW TEACHERS MAKE DECISIONS
SOURCES OF INFORMATION

“What makes a good teacher?” This question has been debated at least since

formal schooling began, if not long before. It is a difficult question to answer

because, as Rabinowitz and Travers (1953) pointed out almost a half-century

ago, the good teacher “does not exist pure and serene, available for scientific

scrutiny, but is instead a fiction of the minds of men” (p. 586). Some have

argued that good teachers possess certain traits, qualities, or characteristics.

These teachers are understanding, friendly, responsible, enthusiastic,

imaginative, and emotionally stable (Ryans, 1960). Others have suggested that

good teachers interact with their students in certain ways and use particular

teaching practices. They give clear directions, ask higher order questions, give

feedback to students, and circulate among students as they work at their desks,

stopping to provide assistance as needed (Brophy & Good, 1986). Still others

have argued that good teachers facilitate learning on the part of their students.

Not only do their students learn, but they also are able to demonstrate their

learning on standardized tests (Medley 1982). What each of us means when we

use the phrase good teacher, then, depends primarily on what we value in or

about teachers.

Since the 1970s, there has been a group of educators and researchers who

have argued that the key to being a good teacher lies in the decisions that

teachers make:

Any teaching act is the result of a decision, whether conscious or

unconscious, that the teacher makes after the complex cognitive

processing of available information. This reasoning leads to the

hypothesis that the basic teaching skill is decision making.

(Shavelson, 1973, p. 18); (emphasis added)

In addition to emphasizing the importance of decision making, Shavelson made

a critically important point. Namely teachers make their decisions “after the

complex cognitive processing of available information.” Thus, there is an

essential link between available information and decision making. Using the

terminology of educational researchers, information is a necessary, but not

sufficient condition for good decision making. In other words, without

information, good decisions are difficult. Yet simply having the information

does not mean that good decisions are made. As Bussis, Chittenden, and Amarel

(1976) noted:

Decision-making is invariably a subjective, human activity

involving value judgments…placed on whatever evidence is

available…. Even when there is virtual consensus of the “facts of

the matter,” such facts do not automatically lead to decisions

regarding future action. People render decisions; information

does not. (p. 19)

As we see throughout this book, teachers have many sources of information they

can use in making decisions. Some are better than others, but all are typically

considered at some point in time. The critical issue facing teachers, then, is what

information to use and how to use it to make the best decisions possible in the

time available. Time is important because many decisions need to be made

before we have all the information we would like to have.

UNDERSTANDING TEACHERS’ DECISIONS

The awareness that a decision needs to be made is often stated in the form of a

should question (e.g., “What should I do in this situation?”). Here are some

examples of the everyday decisions facing teachers:

1. Should I send a note to Barbara’s parents informing them that she

constantly interrupts the class and inviting them to a conference to

discuss the problem?

2. Should I stop this lesson to deal with the increasing noise level in the

room or should I just ignore it, hoping it will go away?

3. What should I do to get LaKeisha back on task?

4. Should I tell students they will have a choice of activities tomorrow if

they complete their group projects by the end of the class period?

5. What grade should I give Jorge on his essay?

6. Should I move on to the next unit or should I spend a few more days

reteaching the material before moving on?

Although all of these are should questions, they differ in three important ways.

First, the odd-numbered questions deal with individual students, whereas the

even-numbered questions deal with the entire class. Second, the first two

questions deal with classroom behavior, the second two questions with student

effort, and the third two questions with student achievement. Third, some of the

decisions (e.g., Questions 2, 3, and, perhaps, 6) must be made on the spot,

whereas for others (e.g., Questions 1, 4, and, to a certain extent, 5) teachers have

more time to make their decisions. These should questions (and their related

decisions), then, can be differentiated in terms of (a) the focus of the decision

(individual student or group), (b) the basis for the decision (classroom behavior, effort, or achievement), and (c) the timing of the decision (immediate or longer term). This structure of teacher decision making is shown in Fig. 1.1.

Virtually every decision that teachers make concerning their students can be

placed in one of the cells of Fig. 1.1. For example, the first question concerns

the classroom behavior of an individual student, which the teacher can take

some time to make. This question, then, would be placed in the cell

corresponding with classroom behavior (the basis for the decision) of an

individual student (the focus of the decision), with a reasonable amount of time

to make the decision (the timing of the decision). In contrast, the sixth question

concerns the achievement of a class of students and requires the teacher to make

a rather immediate decision. This question, then, would be placed in the cell

corresponding with achievement (the basis for the decision) of a class of

students (the focus of the decision), with some urgency attached to the making

of the decision (the timing of the decision).

UNDERSTANDING HOW TEACHERS MAKE DECISIONS

On what basis do teachers make decisions? They have several possibilities.

First, they can decide to do what they have always done:

• “How should I teach these students? I should teach them the way I’ve

always taught them. I put a couple of problems on the overhead

projector and work them for the students. Then I give them a worksheet

containing similar problems and tell them to complete the worksheet

and to raise their hands if they have any trouble.”

• “What grade should I assign Billy? Well, if his cumulative point total

exceeds 92, he gets an ‘A.’ If not, he gets a lower grade in accordance

with his cumulative point total. I tell students about my grading scale at

the beginning of the year.”

Teachers who choose to stay with the status quo tend to do so because they

believe what they are doing is the right thing to do, they have become

comfortable doing it, or they cannot think of anything else to do. Decisions that

require us to change often cause a great deal of discomfort, at least initially.

Second, teachers can make decisions based on real and practical constraints,

such as time, materials and equipment, state mandates, and personal frustration:

• “How much time should I spend on this unit? Well, if I’m going to

complete the course syllabus, I will need to get to Macbeth by February

at the latest. That means I can’t spend more than three weeks on this

unit.”

• “How should I teach my students? I would love to incorporate

computer technology But I only have two computers in my classroom.

What can I do with two computers and 25 students? So I think I’ll just

stay with the ‘tried and true’ until we get more computers.”

• “What can I do to motivate Horatio? I could do a lot more if it weren’t

for those state standards. I have to teach this stuff because the state says

I have to, whether he is interested in learning it or not.”

• “Where does Hortense belong? Anywhere but in my class. I’ve tried

everything I know…talked with the parents…talked with the guidance

counselor. I just need to get her out of my class.”

Although maintaining the status quo and operating within existing constraints

are both viable decision-making alternatives, this is a book about making

decisions based on information about students. At its core, assessment means

gathering information about students that can be used to aid teachers in the

decision-making process.

SOURCES OF INFORMATION

It seems almost trivial to point out that different decisions require different

information. Nonetheless, this point is often forgotten or overlooked by far too

many teachers and administrators. How do teachers get the information about

students that they need to make decisions? In general, they have three

alternatives. First, they can examine information that already exists, such as

information included in students’ permanent files. These files typically include

students’ grades, standardized test scores, health reports, and the like. Second,

teachers can observe students in their natural habitats—as students sit in their

classrooms, interact with other students, read on their own, complete written

work at their desks or tables, and so on. Finally they can assign specific tasks to

students (e.g., ask them questions, tell them to make or do something) and see

how well they perform these tasks. Let us consider each of these alternatives.

Existing Information

After the first year or two of school, a great deal of information is contained in a

student’s permanent file. Examples include:

• health information (e.g., immunizations, handicapping conditions,

chronic diseases);

• transcripts of courses taken and grades earned in those courses;

• written comments made by teachers;

• standardized test scores;

• disciplinary referrals;

• correspondence between home and school;

• participation in extracurricular activities;

• portions of divorce decrees pertaining to child custody and visitation

rights; and

• arrest records.

This information can be used to make a variety of decisions. Information that a

child is a diabetic, for example, can help a teacher make the proper decision

should the child begin to exhibit unusual behavior. Information about child

custody enables an administrator to make the right decision when a noncustodial

parent comes to school to pick up the child. Information about a child’s grades

can be used to determine whether the child should be placed on the Principal’s

List or Honor Roll. Information about previous disciplinary referrals typically

provides the basis for determining the proper punishment following an incident

of misbehavior. Information obtained from standardized test scores is used in

many schools to decide whether a child should be placed in a class for gifted and

talented students or whether the child is in need of academic assistance.

Although all of these examples pertain to individual students, it is possible

and, in many cases, desirable to combine (or aggregate) the data to provide

information about groups of students. How many students are on the Principal’s

List or Honor Roll? Are there the same numbers of boys and girls? Have these

numbers (or percentages) changed over the past several years? How many

disciplinary referrals have occurred this year? Are the numbers of referrals the

same for Whites, Blacks, Hispanics, Asians, and so on? Are the numbers of

referrals increasing, decreasing, or staying the same? How many students score

in the highest quarter nationally on the standardized tests… in the lowest

quarter…above the national average? Are the scores the same for Whites,

Blacks, Hispanics, Asians, and so on? For boys and girls? On average, are the

scores increasing, decreasing, or remaining the same?

Interestingly, some administrators and teachers are concerned about the use

of the information contained in students’ permanent records to make decisions.

Specifically they are concerned that this information may bias the person

accessing it. Because of this bias, for example, a student may be improperly

labeled as a troublemaker and treated accordingly Alternatively, a teacher may

bring a bias to the information contained in a student’s permanent file. Such a

teacher may search through the file for information supporting his or her

perception that the student is incapable of learning the material being covered in

class. As we shall see throughout this book, the problems of the

misinterpretation and misuse of information are serious indeed. However, these

problems are no more likely to occur with information contained in students’

permanent files than with any other information source.

Naturalistic Observations

There is ample evidence that many of the immediate decisions that teachers

make are based on their observations of students in their classrooms (Clark &

Peterson, 1986). In fact, the available evidence suggests that teachers make

some type of decision every 2 minutes they are in their classrooms and rely

heavily on this observational information to do so (Fogarty, Wang, & Creek,

1982; Jackson, 1968). The logic of this decision-making process is as follows:

1. If my students are not disruptive, they are complying with the

classroom rules.

2. If my students are paying attention, they are probably learning.

3. So, if my students are not disruptive and are paying attention, then I

shall continue to teach the way I am teaching since my instruction is

probably effective.

However, if students are engaged in disruptive behavior or are not paying

attention, then there is a need to do something different. Yet to do something

different requires that a decision about what to do differently be made and carried out. If, for example, a fair number of students have puzzled looks on

their faces, the teacher may decide to go back over the material one more time.

If only 3 of the 25 students in the class have completed a written assignment, the

teacher may decide to give them 10 more minutes to complete it. If Dennis

seems to be daydreaming, the teacher may decide to call on Dennis to answer a

question. Finally, if Denise is sitting at her desk with her hand raised, the teacher

may decide to walk to her desk and give her some help.

When making decisions about groups of students, not every student need be

included in the decision-making process. For example, the aforementioned

puzzled looks may be on the faces of only five or six students. About 30 years

ago, Dahloff (1971) suggested that teachers used steering groups to help them

make decisions about the pace of instruction. These groups, typically composed

of four or five students, literally steer the pacing through the curriculum units. If

it appears that this group of students got it, the teacher moves on. If not, he or

she reviews the material one more time or tries a different approach to get the

material across. Quite obviously, the pace of instruction is directly related to the

academic composition of the steering group. Including higher achieving students

in the steering group results in more rapid pacing; the reverse is true for groups

composed primarily of lower achieving students.

Naturalistic observations are an important and quite reasonable way for

teachers to get the information they need to make decisions because teachers are

constantly engaged in observation. In addition, the feedback they receive from

observations is immediate, unlike test data that may require days, weeks, or

months (in the case of statewide achievement tests or commercial normreferenced

tests) to process. However, information obtained via naturalistic

observation can be misleading. The puzzled looks may be a ploy on the part of

students to stop the teacher from moving forward. The reason that only 3 of the

25 students have completed the assignment may be that the rest of the students

do not know how to do the work. In this case, giving them 10 additional minutes

without some instructional intervention would be a waste of time. Dennis may

be concentrating, not daydreaming. Denise may be stretching, not raising her

hand for assistance.

Assessment Tasks

Following tradition, if teachers want to know whether their students have

learned what they were supposed to learn, how students feel about what they are

learning, how they perceive their classroom environment, and so on, they

administer quizzes, tests, or questionnaires. These assessment instruments

typically contain a series of items (e.g., questions to be answered, incomplete

sentences to be completed, matches to be made between entries in one column

and those in another). In some cases, the instrument may contain a single item.

In these cases, this item often requires that the student produce an extended response (e.g., write an essay about…; demonstrate that…). To simplify matters,

we refer to all of the items included on these instruments, regardless of their

structure, format, or number, as assessment tasks. Occasionally when needed to

maintain the flow of the writing, “item” also will be used.

Different tasks may require different responses from students. The nature of

the required response is inherent in the verb included in the task description

(“Write an essay about…”) or in the directions given to students about the tasks

(“Circle the option that…”). In general, these verbs ask students to perform

some action (e.g., write, demonstrate) or select from among possible responses

to the task (e.g., circle, choose). Not surprisingly, the first set of tasks is referred

to as performance tasks, whereas the second set of tasks is referred to as

selection tasks.

Which tasks we should use to get the information we need to make a decision

depends primarily on the type of information we need to make the decision. For

example, if we need information about how well students have memorized the

authors of a series of novels, it seems reasonable to use a selection task—

specifically, one presented to students in a matching format (with titles of novels

listed in one column and novelists listed in another). However, if we need

information about how well students can explain a current event (e.g.,

nationwide increases or decreases in school violence) in terms of various

historical and contemporary factors, a performance task (e.g., a research report)

may be more appropriate. Finally suppose we want information about how well

students like school. In this case, either selection tasks (such as those included

on traditional attitude scales) or performance tasks (such as a written response to

the prompt “Write a brief essay describing the things you like most and least

about this school”) could be used. This last example illustrates that assessment

tasks are not limited to what traditionally has been termed the cognitive domain.

This is an important point, one that reappears throughout this book.

However, contrary to what you might read elsewhere, certain forms of

assessment tasks are no better or worse than others. There are people who

relentlessly bash multiple-choice tests. There are those who advocate

performance assessment with what can only be termed religious zeal. Of course,

there are those who believe that any form of standardized testing is bad. Based

on my 30 years of experience, I have learned an important lesson (one that you

hopefully will learn in much less time). Assessment tasks are like tools in a

carpenter’s toolbox. Like a good carpenter, a good assessor has a variety of tools

that he or she learns to use well to accomplish the intended purpose(s). Just as a

carpenter may strive to build the best house possible, a teacher should strive to

make the best decisions possible. One important element of good decision

making is the quality of information on which the decision is based.

Before we move to a discussion of the quality of information obtained via

assessment, however, one final comment about assessment tasks is in order.

Many assessment tasks look much like what may be termed learning tasks.

Consider the following task.

You find the following artifacts during an archeological dig.

[Pictures of six artifacts are shown here]. Determine the likely

purpose and origin of each artifact. Considering all six artifacts,

describe the likely traits of the people who made or used them.

(Adapted from http://www.relearning.org/.)

This certainly is a task. Specifically, the students are to examine the six artifacts

and, based on this examination, (a) determine their likely purpose and origin,

and (b) describe the likely traits of the people who made or used them. Yet, is

this an assessment task? Unfortunately, you cannot answer this question by

looking at it no matter how closely carefully or often. To answer this question,

you have to know or infer why the task was given to the students in the first

place. If the task were given to help students learn how to determine the

purposes and origins of artifacts, and how to determine the likely traits of the

people who made or used them, then it is a learning task (because it is intended

to help students learn). In contrast, if it is given to see how well students have

learned how to determine the purposes and origins of artifacts, and the likely

traits of the people who made or used them after some period of instruction, then

it would be an assessment task.

The confusion between learning tasks and assessment tasks looms large in

many classrooms because tasks are such an integral part of classroom

instruction. If you enter almost any classroom, you are likely to see students

completing worksheets, solving problems contained in textbooks, constructing

models of theaters or atoms, or engaging in experiments. Because they are

assigned to students, these tasks are often called assignments (which is

shorthand for assigned tasks).

On the surface, the issue here is quite simple. Whatever they are called, are

they given to promote or facilitate learning or are they given to assess how well

learning has occurred? In reality however, the issue is quite complex. Teachers

often assess student learning while students are engaged in learning tasks. In this

situation, the task serves both learning (for the students) and assessment (for the

teacher) purposes.

Consider the archeological dig example previously mentioned. Suppose for a

moment that it truly is a learning task. That is, the task is intended to help

students learn how to examine historical artifacts in terms of their purposes and

origins, as well as the traits of the people who made or used them. Suppose

further that students are to work on this task in pairs. As they work, the teacher

circulates among the students visually monitoring their progress or lack thereof.

As problems are noted via this observational assessment, the teacher stops and

offers suggestions, hints, or clues. At the end of the class period, the teacher collects the assignment, reads through what the students have written, writes

comments, and offers suggestions for improvement. At the start of the next class

period, the teacher gives the assignment back to the students and tells them to

revise their work based on the feedback he or she has provided them.

Some would argue this is the perfect blend of instruction and assessment

because the task serves both purposes: learning and assessment. Others have

argued that the link between instruction and assessment is so tight in this

situation that there is no independent assessment of whether the intended

learning actually occurred (Anderson et al., 2001). Because teachers often

provide assistance to students as they work on learning tasks, the quality of

students’ performance on learning tasks is influenced by the students as well as

their teachers. In other words, when assessments are made based on student

performance on learning tasks (rather than specifically designated assessment

tasks), teachers are simultaneously assessing the quality of student learning and

their own teaching.

THE QUALITY OF INFORMATION

Before there were classroom assessment books, there were tests and

measurement books. If you were to read these tests and measurement books, you

would find chapters, sections of chapters, or, occasionally, multiple chapters

written about validity, reliability and objectivity. Unfortunately the chapter titles

in these books are sometimes misleading. For example, you may find a chapter

entitled “Test Validity.” The title suggests that validity is inherent in the tests

themselves. This simply is not true. Validity pertains to the test scores. Stated

somewhat differently validity is an indicator of the quality of the information

obtained by administering a test to a student or group of students.

All three concepts—validity, reliability, and objectivity—have to do with the

quality of the information obtained from tests or other assessment instruments or

methods. Because these concepts have long historical standing in the field, we

review each of them briefly. The focus of this brief review is on their practical

application to classroom assessment. To aid in the discussion, we rely on two

examples: one concerning individual student achievement and the other

concerning individual student effort.

Validity

In general terms, validity is the extent to which the information obtained from an

assessment instrument (e.g., test) or method (e.g., observation) enables you to

accomplish the purpose for which the information was collected. In terms of

classroom assessment, the purpose is to inform a decision. For example, a teacher wants to decide on the grade to be assigned to a student or a teacher

wants to know what he or she should do to get a student to work harder.

To simplify the grading example, let us assume that we are assigning a grade

based on a student’s performance on a single test. Let us further assume that the

test represents a unit of material that requires about 3 weeks to complete. Finally

let us assume that we want the grade to reflect how well the student has

achieved the stated unit objectives. What are the validity issues in this example?

First, we want to make sure that the items on the test (i.e., the assessment tasks)

are directly related to the unit objectives. This is frequently termed content

validity. Second, we want to make sure that the proportions of the items related

to the various objectives correspond with the emphasis given to those objectives

in the unit. This is frequently referred to as instructional validity. Third, we want

to assign the grade based on how well the students have mastered the objectives,

not based on how well they perform relative to other students. In common

parlance, we want to make a criterion-referenced, not a norm-referenced

decision. To do this, we need defensible performance standards. The greater the

content validity, the greater the instructional validity and the more defensible the

performance standards, the greater the overall validity of the information

obtained from the test for the purpose of assigning a grade to a student.

Let us move to the effort example. By phrasing the question the way we did

(i.e., What should I do to get this student to work harder?), we have already

made one decision—namely, that the student in question does not work very

hard (or certainly as hard as you, the teacher, would like). On what basis did we

arrive as this determination? Typically, the information used to make this

decision comes from naturalistic observations. “The student is easily distracted.”

“The student neither completes not turns in homework.” The initial validity

question in this case concerns the validity of the inference about lack of effort

made based on the observational information.

Effort is what psychologists refer to as a construct. That is, it is a hypothetical

or constructed idea that helps us make sense of what we see and hear. From an

assessment point of view, constructs must be linked with indicators.

Distractibility and no homework are negative indicators. That is, they are

indicators of a lack of effort. The issue before us is whether they are valid

indicators of a lack of effort. To address this issue, we must consider alternative

hypotheses. Perhaps distractibility is a function of some type of neurological

disorder. Similarly no homework may be a function of a lack of the knowledge

needed to complete the homework. These alternative hypotheses must be

examined and subsequently ruled out if we are to accept that distractibility and

no homework are valid indicators of a lack of effort. This is the process of

construct validity.

Yet this is only part of the validity puzzle. The question raised by the teacher

is what he or she should do to get the student to worker harder. To answer this

question, we need additional information—information not available by means of observation. Specifically, we need to know why the student is not putting

forth sufficient effort. We explore the why question in greater detail throughout

the book.

Reliability

Reliability is the consistency of the information obtained from one or more assessments.

Some writers equate reliability with dependability, which conjures up

a common-sense meaning of the term. A reliable person is a dependable one—a

person who can be counted on in a variety of situations and at various times.

Similarly, reliable information is information that is consistent across tasks,

settings, times, and/or assessors. Because the issue of consistency across assessors

typically falls under the heading of objectivity, it is discussed a bit later.

Let us return to the grading example introduced in the validity section.

Continuing with our assumptions, let us suppose that the test is a mathematics

test and consists of a single item (If .2x+7.2=12, what is the value of x?).

Furthermore, suppose that the major objective of the unit was for students to

learn to solve for unknowns in number sentences (content validity) and that all

but 10% of time devoted to the unit focused on learning to solve for unknowns

in number sentences (instructional validity). Finally, suppose the grading was

pass or fail. That is, if the student arrives at the right answer (i.e., x=24), he or

she is assigned a grade of pass (regardless of how many other students got the

item right). Otherwise, a grade of fail is assigned.

Items such as this one would have a high degree of validity for the purpose of

assigning student a grade. Yet where does a single-item test stand in terms of

reliability? How much confidence would you place in the results of a single-item

test? Hopefully, not a great deal. The use of multiple items permits us to

investigate consistency across tasks. Typically, the greater the number of tasks

(obviously to some limit), the greater the reliability of the information we obtain

from a student’s performance on those tasks, which in turn is used to make

inferences about that student’s achievement on the unit objectives.

We may be interested in other types of consistency of student task

performance. For example, we can readminister the test in 2 weeks to examine

the consistency of the information over time. We may arrange for two

administrative conditions, one in the regular classroom and the other in the

library, to see whether the information remains consistent across settings.

Finally, we can have two different tests, each composed of a random sample of

all possible items that could be written for this objective (e.g., .4x−6=−2; 15x

+10=160; 1.8x−18=0; 3x+2=38; −6x−6=−60; 24x+24=480). By administering

the two tests, we could determine whether students’ scores are the same on the

two tests or whether a student’s score depends on the particular test that was

administered. All of these are ways to examine the reliability of the information

we gather when we assess students.

Turning to the effort example, the initial determination that the student was

lacking effort can be subjected to the same type of examination. We can observe

the student in different classes (settings), learning different subjects (tasks), and

in the morning and the afternoon (time). By comparing the different results, we

may learn that the student is distractible and fails to turn in completed

homework only when he or she is in my class. In other words, I may be a source

of the problem. If so, this increased understanding moves me toward answering

the primary question: What can I do to get this student to work harder?

Unreliability is a problem because it creates errors. More specifically

unreliability decreases the precision with which the measurement is made. Quite

clearly imprecise measurement is not a good thing. In classroom assessment,

however, as the previous example shows, a lack of reliability (i.e., consistency)

may be informative. Knowing what accounts for the inconsistency of the

information that we collect may provide the understanding we need to begin to

solve the problem and arrive at a defensible decision.

Most of the time, however, having reliable information is a good thing. When

grades need to be assigned, having consistency of student performance across

assignments and tests makes the decision to assign a grade of, say, “B” more

defensible. How do we justify a grade of “B” on a report card if the student has

received grades of “A” on all homework and three quizzes and a grade of “C”

on the two projects? We can argue that it is best to compute an average when

confronted with such inconsistency, but that does not deal with the inconsistency

per se. The inconsistency can result from differential validity (i.e., the quizzes

and projects are measuring different things) or a lack of reliability (assuming the

same thing is being measured). Regardless of the cause of inconsistency, it must

be dealt with in some way.

Reliability of assessment information is particularly important when

potentially life-altering decisions are made about students. For example, a

decision to classify a student as a special education student must be based on

information that is remarkably consistent over tasks, time, settings, and

observers/assessors.

The point here is that, like validity, the reliability of information must be

examined in the context of the purpose for which the information is gathered. A

single case of documented sexual harassment by one student of another is

generally sufficient grounds for suspension, expulsion, and, perhaps, criminal

prosecution. For other qualities and characteristics (e.g., laziness), however,

information taken from several occasions and situations is needed.

Objectivity

In the field of tests and measurement, objectivity means that the scores assigned by different people to students’ responses to items included on a quiz, test, homework assignment, and so on are identical or, at the very least, highly similar. If a student is given a multiple-choice test that has an accompanying answer key, then anyone using the answer key to score the tests should arrive at the same score. Hence, multiple-choice tests (along with true-false tests, matching tests, and most short answer tests) are referred to as objective tests.

Once again, as in the case of test validity, this is a bit of a misnomer. It is the

scores on the tests, not the tests per se, that are objective.

The importance of objectivity in assessment can perhaps best be understood

if we consider the alternative. For example, suppose that a student’s score on a

particular test depends more on the person scoring the test than it does on the

student’s responses to the assessment tasks. Quite clearly, this is not an

acceptable condition.

As might be expected, concerns for objectivity are particularly acute with

performance tasks (e.g., essays, projects). However, there are several ways to

increase the objectivity of scoring the responses made by students to these

assessment tasks. The three major ways are to:

• use a common set of scoring or evaluation criteria;

• use a scoring rubric and so-called anchors to enhance the meaning of

each criterion (see chap. 4); and

• provide sufficient training to those responsible for doing the scoring.

Within the larger assessment context, the concept of objectivity can, and

probably should, be replaced by the concept of corroboration (see chap. 7). This

is quite evident in the sexual harassment example mentioned earlier. Although it

only takes one instance of sexual harassment to render a decision of school

suspension or expulsion, we must be certain that the information we have

concerning that instance is corroborated by others.

ISSUES IN THE ASSESSMENT OF STUDENTS

Because assessment is linked to decision making, and because an increasing

number of decisions made about students have serious, long-term consequences,

teachers must take assessment seriously. In this section, we consider three issues

teachers should be aware of and must address in some fashion: (a) ethics of

assessment, (b) preparing students for assessment, and (c) standardization and

accommodation.

The Ethics of Assessment

Ethical matters pertain to all types of assessment by virtue of the fact that

information about individual students is collected. In fact, because teaching is

partly a moral enterprise, ethics pertain to virtually every aspect of classroom

life. Both the National Education Association and the American Federation of Teachers have issued ethical standards for teachers’ relations with their students.

Four of these are particularly applicable to formal assessment.

First, strive to obtain information that is highly valid, reliable, and objective

before making important decisions that affect students. For example, a semester

grade should be based on a student’s performance on assessment tasks that are

clearly linked with important learning objectives. Similarly referring a student

for additional testing or for placement in a particular program requires that

assessment information is obtained in multiple situations and on several

occasions.

Second, recognize the limitations inherent in making decisions based on

information obtained from assessments. Despite all efforts to secure the most

valid, reliable, and objective information, errors occur and mistakes are made.

Hence, multiple sources of information generally provide a more defensible

basis for making important decisions about students. Corroboration, consistency

and concern for consequences are three Cs that are extremely useful in good

decision making (see chap. 7).

Third, do not use information obtained from assessments to demean or

ridicule students. Although this should go without saying, it is important to keep

in mind. In an attempt to motivate a student to do better in the future, a teacher

may make statements that can be construed as demeaning by the students (e.g.,

“Do you enjoy always getting the lowest score in the class?”). Also out of

frustration, teachers may make comments they later regret. Following a

classroom observation, I remember asking one teacher if she was unaware that

one student slept through the entire lesson. Her response, overheard by several

students, was, “It’s better to let sleeping dogs lie.”

Fourth, do not disclose assessment information about students unless

disclosure serves a compelling professional purpose or is required by law.

Students have the right of privacy. The Family Educational Rights to Privacy

Act (FERPA), enacted in 1974, was intended to protect students and their

parents from having their personal and academic records made public without

their consent. FERPA requires that:

Student records must be kept confidential among the teacher, student, and student’s parents or guardian.
Written consent must be obtained from the student’s parents before disclosing that student’s records to a third party.
Parents must be allowed to challenge the accuracy of the information kept in their children’s records.
Students who have reached the age of 18 must be accorded the same rights formerly granted to their parents.

It is important to note that the term records as used in the legislation includes

such things as hand-written notes, grade books, computer printouts, taped

interviews, and performances (Gallagher, 1998).

Preparing Students for Assessment

To obtain the best information possible, students should be prepared for each

formal assessment—that is, an assessment consisting of a set of assessment

tasks. A set of guidelines that can help teachers prepare students for formal

assessments is shown in Table 1.1.

The first guideline is that students should be made aware of the assessment.

They should know when it will take place as well as its purpose, structure,

format, and content. Students should be given sufficient information to answer

the basic journalistic questions: who, where, where, why, when, and how.

The second guideline concerns the emotional tone established for the

assessment—a tone quite often set by teachers. Students should be told to do

their best (on assessments of achievements) or reminded of the importance of

indicating their true beliefs and feelings (e.g., on affective assessments related to

effort). In addition, they should be reminded of the importance of the assessment

in their own lives. Does one half of their grade depend on their performance?

Are their chances of entering the college of their choice affected by their

performance?

The third through fifth guidelines are concerned with the way in which

students approach the assessment once it begins. They should read and follow

the directions. They should pace themselves, periodically checking their

progress relative to the available time. They should skip over those items they

do not know or understand, returning to them if they have sufficient time.

TABLE 1.1 Guidelines for Preparing Students for Formal Assessments

1. Announce the assessment in advance and inform students of the purpose, structure, format, and content of the assessment instrument or procedure.

2. Approach the assessment positively, yet honestly.

3. Remind students to pay careful attention to the directions and follow the directions exactly

4. Tell students to pace themselves so they complete the entire assessment.

5. Tell students to skip over items if they do not know the answer and come back to them later, time permitting. If they do get back to those items, tell them to make educated guesses for as many items as possible in the time remaining. (By an educated guess, I mean that students are certain that one or more of the choices of answers is definitely wrong.)

6. For essay tests, tell students to plan and organize their essays before writing them.

7. Tell students to be in good physical and mental condition for the assessment (e.g., to get a good night’s sleep, to eat a good breakfast). If it is high-stakes assessment, send a note home reminding parents/guardians about the upcoming assessment.

The sixth guideline concerns essay examinations, while the seventh concerns

high-stakes tests. Students should prepare an outline of their major points prior

to writing the essay. If the assessment is high stakes (i.e., the results of the

assessment are likely to have serious consequences for the students), sufficient

rest and nutrition should be encouraged, even to the point of reminding parents

of this.

Some readers might suggest that these guidelines are more important for

secondary school students than for elementary school students. Although this

may have been the case in the past, it is no longer true. Critical decisions are

made about students based on assessment information obtained as early as first

grade. Is this student ready for school? The assessment needed to make this

decision typically focuses on achievement and effort. Does this student have

attention deficit-hyperactivity disorder (ADHD)? The assessment needed to

make this decision focuses primarily on classroom behavior and, to a lesser

extent, effort. Should this student be placed in a special education program?

Depending on the program being considered, this assessment may focus on

achievement, effort, or classroom behavior. An increasing number of decisions

that affect a child’s future, both in and out of school, are made in the early

grades. We need to prepare students for these decisions, and we need to get them

(the decisions) right.

Standardization and Accommodation

There seems to be a great deal of confusion about the term standardization.

Some educators would lead you to believe that the term applies only to high-stakes

tests. Others would have you believe that it applies only to so-called

norm-referenced tests. Still others would have you believe that only commercial

testing companies have standardized tests. In point of fact, the term standardized

applies to virtually all formal assessments.

Standardization simply means that the same set of assessment tasks is given

to all students in the same order under the same assessment conditions. This

definition is consistent with the dictionary definition of standardized—that is,

uniform. Using this definition, most teacher-made tests and quizzes are

standardized, as are most structured observation forms and student self-report

instruments.

There are at least two practical reasons for standardization. First, only one

instrument needs to be prepared. This saves time at the front end of the

assessment process. Second, standardization permits large-group administration

of instruments. This saves time at the back end of the assessment process.

At the same time, however, there have been an increasing number of calls for

non-standardization. Most of these fall within the general category of

accommodation. Let us be clear from the outset. Accommodation deals with the

how, not the what, of assessment. This critical distinction between how and what has been upheld by two federal rulings (Anderson v. Banks, 1981;

Brookhart v. Illinois Board of Education, 1982).

To better understand this distinction, let us look at the kinds of

accommodations that are appropriate. A list of possible accommodations is

shown in Table 1.2. These accommodations can be understood in one of two

ways. First, we can examine them in terms of whether they focus on the

administrative conditions, the way in which the tasks are presented to the

students, or the way in which students respond to the tasks. Within this

framework, Accommodations 1 to 3 address the administrative conditions,

Accommodations 4 to 6 address the presentation of tasks, and Accommodations

7 and 8 address the responses to the tasks.

Second, and perhaps more important, the accommodations can be examined

in terms of the problem they are trying to solve. When examined in this way the

following pattern emerges:

• Auditory difficulties (Accommodation 6);

• Visual difficulties (Accommodations 4, 5, 6, and 8);

• Time constraint difficulties (Accommodation 1);

TABLE 1.2 Possible Student Accommodations

1. Provide extra time for the assessment.

2. Change the setting of the assessment to cut down on distractions.

3. Make the assessment “open book,” including notes.

4. Read directions orally and give students ample opportunity to ask questions.

5. Read questions to students.

6. Use a special edition of the assessment (e.g., large print, audiotapes).

7. Give examples of how to respond to the tasks.

8. Permit students to respond orally, audiotaping their responses.

• Behavioral/anxiety difficulties (Accommodations 2, 4, and 7); and

• Memory difficulties (Accommodation 3).

This way of examining the issue of accommodation returns us to one of the

major points made earlier in this chapter. Solving problems requires that we

move beyond description of the problem to an explanation of its causes. Using a

large-print edition of the assessment or reading the directions and/or questions to

students is only appropriate for students with visual or reading difficulties. It is

unlikely to help those students whose problems stem from auditory, time

constraint, behavioral/anxiety, or memory difficulties.

One final matter concerning non-standardization must be addressed before

moving to the next chapter. In addition to matching the hypothesized cause of

the problem, the reasonableness of some accommodations depends on the nature

of the assessment per se. If we are assessing reading comprehension, then

reading the items to students leads to an invalid assessment. If accommodations

are to made, then the what of assessment needs to be changed: Reading

comprehension would have to be replaced with “understanding text” (Gredler,

1999, p. 254). If we are assessing students’ ability to recall mathematical or

scientific formulas, then the accommodation of giving students the formulas to

“lessen reliance on memory” (Airasian, 1997, p. 204) would be inappropriate.

However, we are assessing students’ ability to apply mathematical or scientific

formulas, then this accommodation would be quite appropriate.

Unfortunately, the topic of accommodation is discussed primarily in the

context of special education. As Phillips (1994) noted, however, allowing all

students access to useful accommodations may be fair to low-achieving students

as well. Although by law they are not entitled to supportive accommodations,

they often would benefit by having this opportunity to demonstrate their

capabilities within traditional, standardized assessment settings.

ASSESSMENT AND DECISION MAKING

As mentioned earlier in this chapter, high-quality information does not

guarantee that the wisest of decisions are made. At the same time, however, wise

decisions generally require high-quality information. So what is the relationship

between assessment and decision making? Where does evaluation fit into this

picture? Once again, the answers to these questions depend on the type of

decision that needs to be made.

In addition to the difference among decisions shown in Fig. 1.1, decisions can

also be differentiated in terms of what might be termed straightforward

decisions and what might be termed problematic decisions. Straightforward

decisions are those that can reasonably be made based on the information

available at the time. Decisions pertaining to the grading of students tend to be

straightforward decisions. Other rather straightforward decisions are how to

arrange classrooms, where to focus time and effort, and whether to seek advice

and counsel from others.

Problematic decisions, in contrast, are those that typically require information

beyond the initial information available. Decisions as to how best to motivate

students tend to be problematic decisions. We may make a fairly straightforward

inference that a student is unmotivated based on observational data. The student

does not come to class and, when in class, he or she sleeps. The student rarely

hands in assignments and the assignments turned in are of poor quality. In

common parlance, we have identified the problem. However, what we do to solve the problem is not at all straightforward. How do we go about arranging

conditions in which the student would likely be more motivated? Other

problematic decisions include:

• What should I teach students in the limited time available?

• How much time should I spend on a particular unit or topic?

• What should I do to help students who are having serious and

continuous difficulty learning?

Quite clearly, no decision falls neatly into either category There is some overlap

between them (i.e., a gray area, so to speak). In addition, there are decisions that

are straightforward to a point and then become problematic. However, the

dichotomy between straightforward and problematic decisions is a useful place

to start the discussion of the relationships among assessment, decision making,

and evaluation.

Assessment and Straightforward Decisions

If a teacher decides that 94 points earned during a semester results in a grade of

“A” being assigned, and if a particular student earns 94 points, then the decision

to give the student a grade of “A” is straightforward. (We are, of course,

assuming that the 94 points are earned on the basis of reasonably valid, reliable,

and objective assessment instruments.) Similarly suppose a teacher examines the

item response summary for a particular commercial mathematics test

administered to his or her students. The teacher notices that his or her students

do far worse on the computation items than on the concept and problem-solving

items. A reasonably straightforward decision may be to spend more time and

effort on mathematical computation next year.

Although these decisions are rather straightforward, it is important to note

that there is some gray area even in these decisions. In terms of grading, what do

you do with a student who earns 93.5 points? Should you round a 93.5 to a 94

(hence, an “A”) or is a “miss as good as a mile” (hence, a “B”)? What about

assigning an A− or a B+? Similarly, with respect to the use of the data from the

item response summary do I want to shift my limited instructional time away

from conceptual understanding and problem solving to computation (which is

quite reasonable if my desire it to improve student test performance) or do I

want to stay the course (because I believe that conceptual understanding and

problem solving are far more important than computational accuracy)?

Thus, although these decisions are classified as straightforward, they still

require some thought. Almost without exception, decisions involve a choice of

one alternative over another. The road not taken is not necessarily the wrong

road.

Assessment and Problematic Decisions

Calfee and Masuda (1997) suggested that much of classroom assessment is

applied social science research. Social science research studies begin with the

formulation of questions (Questions) and move to the collection of the data

needed to answer the questions (Data Collection). Once data are collected, there

is the need to interpret the data so that the questions can be answered (Data

Interpretation). Ultimately, there is the “job of deciding what to do with the

interpreted findings, the bottom line” (p. 82) (Decision Making). Virtually every

problematic decision mentioned earlier requires a research-oriented or inquirybased

approach.

As mentioned earlier, assessment begins when teachers pose should questions

(Questions). To answer these questions, teachers need information (Data

Collection). To reiterate, the type of assessment depends primarily on the nature

of information that is needed to answer the question. Next, teachers must make

sense of the information. This is what researchers mean by the phrase Data

Interpretation. Once teachers make sense of the information they have available

to them, they have to decide what to do with it (Decision Making). More

specifically what actions should I take based on the answer to the question I

posed? From an assessment perspective, this is, in fact, the bottom line—when

the decision is made and resultant actions are planned and implemented.

Often, as in the case of the motivation example, the resultant action is to

collect more information. It is important to point out, however, that it is not

more of the same information. Rather, the initial question is likely modified

based on the initial information, and the new information helps the teacher

answer the modified question. The research or inquiry cycle—Question, Data

Collection, Data Interpretation, and Decision Making—begins again.

Assessment and Evaluation

Thus far in this initial chapter, we have emphasized purpose, information, and

decision making. To restate a previously raised question, “Where does

evaluation fit in to all of this?” Evaluation is a specific kind of decision.

Specifically, evaluation requires that a judgment be made about the worth or

value of something. Typically, evaluative decisions involve the concept of

goodness—good behavior, good work, and good learning. Interestingly this trio

of goods underlies the traditional elementary school report card. When behavior

has been good, the student receives a high mark in Conduct. When work has

been good, the student receives a high mark in Effort. Finally, when learning has

been good, the student receives a high mark in Achievement.

Evaluation, then, requires some standard (or standards) of goodness. Is Paul’s

behavior good enough for me to praise it? Is Paula’s effort good enough or do I

need to get on her about it? Is Paulene’s learning good enough to justify a grade

of “A?” Performance standards—whether they pertain to conduct, effort, or achievement—are generally responses to the question, “How much is good

enough?”

CLOSING COMMENT

Decision making is a critical component of effective teaching. In fact, it may be,

as Shavelson (1973) argued, the basic teaching skill. Although good information

does not necessarily produce wise decisions, having access to good information

is certainly an asset for the decision maker. In the remainder of this book, the

emphasis is the use of classroom assessment to enable teachers to improve their

decision-making capabilities.