Making Sense of Testing Terminology
While no child should ever be seen as just a number on a test, psycho-educational evaluations can reveal important information about a person’s learning strengths and weaknesses.
Here’s a brief glossary of important terms you may see in an educational or psychological evaluation report.
Types of Tests
Ability tests measure a person’s potential to learn. IQ tests, such as the Wechsler Intelligence scales are one type of ability test. Ability tests may be given 1:1 or in a group setting. Nonverbal ability tests can be used for children who are not native English speakers or who are nonverbal. When the Scholastic Aptitude Test (SAT) is taken by children 13 and younger, the results are generally viewed as a measure of a student’s ability to critically think and solve problems at a level of instruction that they have not yet received.
Achievement tests measure what content a student has learned at a particular grade-level or within a subject area. Achievement tests may be given 1:1 or in a group setting, with children filling in “bubble” answer sheets.
Criterion Referenced Test
Criterion referenced tests, such as the Iowa Test of Basic Skills or the Terra Nova, are a type of achievement test that measures whether a student has mastered material at one specific grade level. These tests tend to be exhauastive in the content they cover and are generally administered over the course of days.
Norm Referenced Test
Norm referenced tests, such as the Woodcock Johnson Test of Achievement, evaluates achievement in a broader manner. Students are allowed to test as high (or as low) as they can go, sampling just a few items of knowledge at each grade levels. These tests take much less time to administer.
Formative assessments are informal ways that teachers check to see that a child is learning throughout the day or weeks of instruction. Formative assessments may be a quiz, an exit slip, or another on-the-spot Q&A method. The goal of a formative assessment is to see if the child needs some type of reteaching to help them fully learn subject matter material.
Summative assessments are another informal method of checking for understanding, but occurs at the end of a unit study. Traditionally, teachers have used an end of chapter test, a midterm, or a final exam as a summative assessment. However, a term paper, an oral report, or a digital presentation can also serve as a summative assessment.
Establishing a basal means that a test-taker answered a number of questions correctly at the beginning of a subtest. Once a basal is established, it is presumed that the test-taker would have answered all the easier questions at 100% accuracy. Basals are used to minimize the amount of time spent testing and to prevent testing fatigue. In some cases, the basal is established by beginning with the first question in the sub-test. More often, the test manual will dictate which test question the examinee should start with, based on age or grade.
The term ceiling has more than one meaning. A ceiling is reached when a test-taker has answered a specific number of questions incorrectly in a row. The number of continuously wrong answers may range from 3 to 6, depending on the test publisher’s protocols. At that point, the sub-test ends. Ceiling also refers to when a test taker has reached the maximum number of questions available to answer in a sub-test on a Norm Referenced Test. In a case like this, it is important to do a full error-analysis of the sub-tests. Younger students who correctly answer all the questions in a sub-test will have a different learning profile than an older student who answers all the questions but got half of them wrong.
The actual number of correct answers on a sub-test. Classroom teachers often report Raw Scores on quizzes, but you will rarely see them in an individualized testing score report.
The average score. The Mean score on most individualized standardized achievement and ability tests is 100.
Standard scores statistically transform a student’s Raw Score into a number that can be compared across groups of students.
A score ranging from 1 to 99 that indicates how a student’s score compares to others who have taken the test. Generally, a student earning a Standard Score of 100 ranks at the 50th Percentile. The higher the Percentile Rank, the better the ability or achievement level of a student.
A cumulative score reports a student’s performance on a select set of sub-tests. Composite Scores are generally reported as Standard Scores and are more than just an average of the sub-tests.
Standard Deviation (SD)
A statistical term that analyzes a percentage breakdown of scores across a sample of people who have taken the test. Standard Deviations are generally reported in 15-point increments. Occasionally you will see 10-point SD breakdowns. See how Standards Deviations look on the bell curve.
Grade Equivalent (GE)
Often a misunderstood score, GEs do not state that a student is able to perform work at a certain grade level. Instead, Grade Equivalent scores tell us that a student has accurately answered a high percentage of questions at the same rate as students in that particular GE. In other words, a 5th grade student earning a 9th grade GE is not necessarily expected to achieve at the 9th grade level in that academic subject. Rather, the student correctly answered the same percentage of questions as 9th grade students did taking the 5th grade test. Read more about Grade Equivalent scores, with this excellent example of 2 different aged students earning the same GE.
Age Equivalent (AE)
Similar to Grade Equivalents. An AE score may be reported instead of a GE for very young children, adults no longer attending school, homeschoolers, or those with asynchronous development.
A statistical way of describing test scores that breaks scores into nine percentage groups. The 5th Stanine includes scores that fall within the 40th – 59th Percentile.
Standard Error of Measure (SEM)
Offers a range of scores for a test-taker, based upon a 5-10% degree of confidence.
Normal Curve Equivalent (NCE)
Similar to Percentile Ranks, NCEs measure a student’s performance on a particular test on a scale of 1 to 99. NCEs are generally only reported on group achievement tests.