Wednesday, April 02, 2008

New NAEP Comparison Reports Published...Our Question: Why?

The National Assessment of Educational Progress (NAEP) has published two reports that will undoubtedly be used by policymakers (and pundits) to discern something about the quality of (a) state tests, (b) state standards, (c) public education in each state, or (d) all of the above.

Here are the links to the reports:

Comparison Between NAEP and State Mathematics Assessment Results: 2003

Comparison Between NAEP and State Reading Assessment Results: 2003

I went to a session on these comparisons at AERA, and I am glad to say that the first chapter of the report (I've only looked at the math report so far) goes through each of the caveats discussed at that session. A good portion of the first chapter is devoted to all the things this report cannot tell us.

I have to ask this: If so much time has to be spent telling the reader how NOT to use the report, why publish the report at all? It is as if the NAEP folks expected that the reports would not be read in their entirety, that the findings would be misused....

Here are just a few of the many, many caveats in that first chapter of the math report (in the report authors' own words):

  • This report does not address questions about the content, format, or conduct of state assessments, as compared to NAEP. The only information presented in this report concerns the results of testing.
  • The only inference that can be made with assurance is that the schools where students achieve high NAEP mathematics scores are teh same schools where students achieve high state assessment mathematics scores. (This is the only reliable finding, yet they printed a 2-volume set of books on the comparisons of math scores?)
  • This report does not necessarily represent all students in each state. It is based only on NAEP and state assessment scores in schools that participated in NAEP. (Approximately 100 schools in each state participate.)
  • NAEP results are for grades 4 and 8, and they are compared to state assessment results for the same grade, an adjacent grade, or a combination of grades. (!!!)
  • This report does not address questions about NAEP and state assessment of individual variation of students' mathematics scores within demographic groups within schools.
  • For most states, this report does not address comparisons of average test scores.
  • The only comparisons in this report are between percentages of students meeting mathematices standards, as measured by NAEP and state assessments. (A footnote documents that even this standard of judgment isn't adhered to for all states.)
  • Comparisons between percentages meeting different standards on two different tests (e.g., proficient as defined by NAEP and proficient as defined by the state assessment) are meaningless. (Okay, once you write that sentence, why don't you just stop writing the report?)
  • Finally, this report is not an evaluation of state assessments.
The report also lists non-content factors that may depress correlations between NAEP percentages and state percentages include, but are not limited to:
    • differences in grades tested (yes, that's right, these reports compare NAEP scores of 4th or 8th graders to state scores of students in grades 3, 5, 7, or 9);
    • small numbers of students tested (by NAEP or the state assessment) in some small schools. Here the report authors toss a small bone in the direction of the huge problem of the unreliability of percentages;
    • extremely high or extremely low standards, which would depress variability and thus correlations;
    • differences between the NAEP and state tests in accommodations for students with disabilities and English language learners;
    • differences in motivational contexts -- ya think???
    • differences in the time of year of testing.
Expect to see these caveats completely ignored as these reports get a lot of press and policymaker attention.

1 comment:

Anonymous said...

I'm so glad someone reads these reports! Thanks for your insight on NAEP; at least it reinforces that -- even though I'm not a statistician or psychometrist -- I have a clue. ... wsp