Norm-referenced test

A norm-referenced test is a type of test, assessment, or evaluation in which the tested individual is compared to a sample of his or her peers (referred to as a "normative sample"). The term "normative assessment" refers to the process of comparing one test-taker to his or her peers.

Other types
Alternative to normative testing, tests can be ipsative, that is, the individual assessment is compared to him- or her-self through time.

By contrast, a test is criterion-referenced when provision is made for translating the test score into a statement about the behavior to be expected of a person with that score. The same test can be used in both ways. Robert Glaser originally coined the terms "norm-referenced test" and "criterion-referenced test".

Standards-based education reform is based on the belief that public education should establish what every student should know and be able to do. Students should be tested against a fixed yardstick, rather than against each other or sorted into a mathematical bell curve. By assessing that every student must pass these new, higher standards, education officials believe that all students will achieve a diploma that prepares them for success in the 21st century.

Common use
Most state achievement tests are criterion referenced. In other words, a predetermined level of acceptable performance is developed and students pass or fail in achieving or not achieving this level. Tests that set goals for students based on the average student's performance are norm-referenced tests. Tests that set goals for students based on a set standard (e.g., 80 words spelled correctly) are criterion-referenced tests.

Many college entrance exams and nationally used school tests use norm-referenced tests. The SAT, Graduate Record Examination (GRE), and Wechsler Intelligence Scale for Children (WISC) compare individual student performance to the performance of a normative sample. Test-takers cannot "fail" a norm-referenced test, as each test-taker receives a score that compares the individual to others that have taken the test, usually given by a percentile. This is useful when there is a wide range of acceptable scores that is different for each college. For example one estimate of the average SAT score for Harvard University is 2200 out of 2400 possible. The average for Indiana University is 1650.

By contrast, nearly two-thirds of US high school students will be required to pass a criterion-referenced high school graduation examination. One high fixed score is set at a level adequate for university admission whether the high school graduate is college bound or not. Each state gives its own test and sets its own passing level, with states like Massachusetts showing very high pass rates, while in Washington State, even average students are failing, as well as 80 percent of some minority groups. This practice is opposed by many in the education community such as Alfie Kohn as unfair to groups and individuals who don't score as high as others.

Advantages and limitations
An obvious disadvantage of norm-referenced tests is that it cannot measure progress of the population of a whole, only where individuals fall within the whole. Thus, only measuring against a fixed goal can be used to measure the success of an educational reform program which seeks to raise the achievement of all students against new standards which seek to assess skills beyond choosing among multiple choices. However, while this is attractive in theory, in practice the bar has often been moved in the face of excessive failure rates, and improvement sometimes occurs simply because of familiarity with and teaching to the same test.

With a norm-referenced test, grade level was traditionally set at the level set by the middle 50 percent of scores. By contrast, the National Children's Reading Foundation believes that it is essential to assure that virtually all of our children read at or above grade level by third grade, a goal which cannot be achieved with a norm referenced definition of grade level.

Critics of criterion-referenced tests point out that judges set bookmarks around items of varying difficulty without considering whether the items actually are compliant with grade level content standards or are developmentally appropriate. Thus, the original 1997 sample problems published for the WASL 4th grade mathematics contained items that were difficult for college educated adults, or easily solved with 10th grade level methods such as similar triangles.

The difficulty level of items themselves, as are the cut-scores to determine passing levels are also changed from year to year. Pass rates also vary greatly from the 4th to the 7th and 10th grade graduation tests in some states.

One of the faults of No Child Left Behind is that each state can choose or construct its own test which cannot be compared to any other state. A Rand study of Kentucky results found indications of artificial inflation of pass rates which were not reflected in increasing scores in other tests such as the NAEP or SAT given to the same student populations over the same time.

Graduation test standards are typically set at a level consistent for native born 4 year university applicants. An unusual side effect is that while colleges often admit immigrants with very strong math skills who may be deficient in english, there is no such leeway in high school graduation tests, which usually require passing all sections, including language. Thus, it is not unusual for institutions like the University of Washington to admit strong Asian American or Latino students who did not pass the writing portion of the state WASL test, but such students would not even receive a diploma once the testing requirement is in place.

Although the tests such as the WASL are intended as a minimal bar for high school, 27 percent of 10th graders applying for Running Start in Washington State failed the math portion of the WASL. These students applied to take college level courses in high school, and achieve at a much higher level than average students. The same studyconcluded the level of difficulty was comparable to, or greater than that of tests intended to place students already admitted to the college.

A norm referenced test has none of these problems because it does not seek to enforce any expectation of what all students should know or be able to do other than what actual students demonstrate. Present levels of performance and inequity are taken as fact, not as defects to be removed by a redesigned system. Goals of student performance are not raised every year until all are proficient. Scores are not required to show continuous improvement through Total Quality Management systems.

A rank-based system only produces data which tell which average students perform at an average level, which students do better, and which students do worse. This contradicts the fundamental beliefs, whether optimistic or simply unfounded, that all will perform at one uniformly high level in a standards based system if enough incentives and punishments are put into place. This difference in beliefs underlies the most significant differences between a traditional and a standards based education system.