Computerized classification test

Overview
A computerized classification test (CCT) refers to, as its name would suggest, a test that is administered by computer for the purpose of classifying examinees. The most common CCT is a mastery test where the test classifies examinees as "Pass" or "Fail," but the term also includes tests that classify examinees into more than two categories. While the term may generally be considered to refer to all computer-administered tests for classification, it is usually used to refer to tests that are interactively administered or of variable-length, similar to computerized adaptive testing (CAT). Like CAT, variable-length CCTs can accomplish the goal of the test (accurate classification) with a fraction of the number of items used in a conventional fixed-form test.

A CCT requires several components:

1. An item bank calibrated with a psychometric model selected by the test designer

2. A starting point

3. An item selection algorithm

4. A termination criterion and scoring procedure

The starting point is not a topic of contention; research on CCT primarily investigates the application of different methods for the other three components. Note: The termination criterion and scoring procedure are separate in CAT, but the same in CCT because the test is terminated when a classification is made. Therefore, there are five components that must be specified to design a CAT.

An introduction to CCT is found in Thompson (2006), with more detail in Thompson (2007) and a book by Parshall, Spray, Kalohn and Davey (2006). A bibliography of published CCT research is found below.

How a CCT Works
A CCT is very similar to a CAT. Items are administered one at a time to an examinee. After the examinee responds to the item, the computer scores it and determines if the examinee is able to be classified yet. If they are, the test is terminated and the examinee is classified. If not, another item is administered. This process repeats until the examinee is classified or another ending point is satisfied (all items in the bank have been administered, or a maximum test length is reached).

Psychometric Model
Two approaches are available for the psychometric model of a CCT: classical test theory (CTT) and item response theory (IRT). Classical test theory assumes a state model because it is applied by determining item parameters for a sample of examinees determined to be in each category. For instance, several hundred "masters" and several hundred "nonmasters" might be sampled to determine the difficulty and discrimination for each, but doing so requires that you be able to easily identify a distinct set of people that are in each group. IRT, on the other hand, assumes a trait model; the knowledge or ability measured by the test is a continuum. The classification groups will need to be more or less arbitrarily defined along the continuum, such as the use of a cutscore to demarcate masters and nonmasters, but the specification of item parameters assumes a trait model.

There are advantages and disadvantages to each. CTT offers greater conceptual simplicity. More importantly, CTT requires fewer examinees in the sample for calibration of item parameters to be used eventually in the design of the CCT, making it useful for smaller testing programs. See Frick (1992) for a description of a CTT-based CCT. Most CCTs, however, utilize IRT. IRT offers greater specificity, but the most important reason may be that the design of a CCT (and a CAT) is expensive, and is therefore more likely done by a large testing program with extensive resources. Such a program would likely use IRT.

Starting point
A CCT must have a specified starting point to enable certain algorithms. If the sequential probability ratio test is used as the termination criterion, it implicitly assumes a starting ratio of 1.0 (equal probability of the examinee being a master or nonmaster). If the termination criterion is a confidence interval approach, a specified starting point on theta must be specified. Usually, this is 0.0, the center of the distribution, but it could also be randomly drawn from a certain distribution if the parameters of the examinee distribution are known. Also, previous information regarding an individual examinee, such as their score the last time they took the test (if re-taking) may be used.

Item Selection
In a CCT, items are selected for administration throughout the test, unlike the traditional method of administering a fixed set of items to all examinees. While this is usually done by individual item, it can also be done in groups of items known as testlets (Leucht & Nungester, 1996; Vos & Glas, 2000).

Methods of item selection fall into two categories: cutscore-based and estimate-based. Cutscore-based methods (also known as sequential selection) maximize the information provided by the item at the cutscore, or cutscores if there are more than one, regardless of the ability of the examinee. Estimate-based methods (also known as adaptive selection) maximize information at the current estimate of examinee ability, regardless of the location of the cutscore. Both work efficiently, but the efficiency depends in part on the termination criterion employed. Because the sequential probability ratio test only evaluates probabilities near the cutscore, cutscore-based item selection is more appropriate. Because the confidence interval termination criterion is centered around the examinees ability estimate, estimate-based item selection is more appropriate. This is because the test will make a classification when the confidence interval is small enough to be completely above or below the cutscore (see below). The confidence interval will be smaller when the standard error of measurement is smaller, and the standard error of measurement will be smaller when there is more information at the theta level of the examinee.

Termination criterion
There are three termination criteria commonly used for CCTs. Bayesian decision theory methods offer great flexibility by presenting an infinite choice of loss/utility structures and evaluation considerations, but also introduce greater arbitrariness. A confidence interval approach calculates a confidence interval around the examinee's current theta estimate at each point in the test, and classifies the examinee when the interval falls completely within a region of theta that defines a classification. This was originally known as adaptive mastery testing (Kingsbury & Weiss, 1983), but does not necessarily require adaptive item selection, nor is it limited to the two-classification mastery testing situation. The sequential probability ratio test (Reckase, 1983) defines the classification problem as a hypothesis test that the examinee's theta is equal to a specified point above the cutscore or a specified point below the cutscore.

A bibliography of CCT research
Armitage, P. (1950). Sequential analysis with more than two alternative hypotheses, and its relation to discriminant function analysis. Journal of the Royal Statistical Society, 12, 137-144.

Braun, H., Bejar, I.I., and Williamson, D.M. (2006). Rule-based methods for automated scoring: Application in a licensing context. In Williamson, D.M., Mislevy, R.J., and Bejar, I.I. (Eds.) Automated scoring of complex tasks in computer-based testing. Mahwah, NJ: Erlbaum.

Dodd, B. G., De Ayala, R. J., & Koch, W. R. (1995). Computerized adaptive testing with polytomous items. Applied Psychological Measurement, 19, 5-22.

Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23, 249-261.

Eggen, T. J. H. M, & Straetmans, G. J. J. M. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement, 60, 713-734.

Epstein, K. I., & Knerr, C. S. (1977). Applications of sequential testing procedures to performance testing. Paper presented at the 1977 Computerized Adaptive Testing Conference, Minneapolis, MN.

Ferguson, R. L. (1969). The development, implementation, and evaluation of a computer-assisted branched test for a program of individually prescribed instruction. Unpublished doctoral dissertation, University of Pittsburgh.

Frick, T. W. (1989). Bayesian adaptation during computer-based tests and computer-guided exercises. Journal of Educational Computing Research, 5, 89-114.

Frick, T. W. (1990). A comparison of three decisions models for adapting the length of computer-based mastery tests. Journal of Educational Computing Research, 6, 479-513.

Frick, T. W. (1992). Computerized adaptive mastery tests as expert systems. Journal of Educational Computing Research, 8, 187-213.

Huang, C.-Y., Kalohn, J.C., Lin, C.-J., and Spray, J. (2000). Estimating Item Parameters from Classical Indices for Item Pool Development with a Computerized Classification Test. (Research Report 2000-4). Iowa City, IA: ACT, Inc.

Jacobs-Cassuto, M.S. (2005). A Comparison of Adaptive Mastery Testing Using Testlets With the 3-Parameter Logistic Model. Unpublished doctoral dissertation, University of Minnesota, Minneapolis, MN.

Jiao, H., & Lau, A. C. (2003). The Effects of Model Misfit in Computerized Classification Test. Paper presented at the annual meeting of the National Council of Educational Measurement, Chicago, IL, April 2003.

Jiao, H., Wang, S., & Lau, C. A. (2004). An Investigation of Two Combination Procedures of SPRT for Three-category Classification Decisions in Computerized Classification Test. Paper presented at the annual meeting of the American Educational Research Association, San Antonio, April 2004.

Kalohn, J. C., & Spray, J. A. (1999). The effect of model misspecification on classification decisions made using a computerized test. Journal of Educational Measurement, 36, 47-59.

Kingsbury, G.G., & Weiss, D.J. (1979). An adaptive testing strategy for mastery decisions. Research report 79-05. Minneapolis: University of Minnesota, Psychometric Methods Laboratory.

Kingsbury, G.G., & Weiss, D.J. (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss (Ed.), New horizons in testing: Latent trait theory and computerized adaptive testing (pp. 237-254). New York: Academic Press.

Lau, C. A. (1996). Robustness of a unidimensional computerized testing mastery procedure with multidimensional testing data. Unpublished doctoral dissertation, University of Iowa, Iowa City IA.

Lau, C. A., & Wang, T. (1998). Comparing and combining dichotomous and polytomous items with SPRT procedure in computerized classification testing. Paper presented at the annual meeting of the American Educational Research Association, San Diego.

Lau, C. A., & Wang, T. (1999). Computerized classification testing under practical constraints with a polytomous model. Paper presented at the annual meeting of the American Educational Research Association, Montreal, Canada.

Lau, C. A., & Wang, T. (2000). A new item selection procedure for mixed item type in computerized classification testing. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, Louisiana.

Lewis, C., & Sheehan, K. (1990). Using Bayesian decision theory to design a computerized mastery test. Applied Psychological Measurement, 14, 367-386.

Lin, C.-J. & Spray, J.A. (2000). Effects of item-selection criteria on classification testing with the sequential probability ratio test. (Research Report 2000-8). Iowa City, IA: ACT, Inc.

Linn, R. L., Rock, D. A., & Cleary, T. A. (1972). Sequential testing for dichotomous decisions. Educational & Psychological Measurement, 32, 85-95.

Luecht, R. M. (1996). Multidimensional Computerized Adaptive Testing in a Certification or Licensure Context. Applied Psychological Measurement, 20, 389-404.

Parshall, C. G., Spray, J. A., Kalohn, J. C., & Davey, T. (2006). Practical considerations in computer-based testing. New York: Springer.

Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.), New horizons in testing: Latent trait theory and computerized adaptive testing (pp. 237-254). New York: Academic Press.

Rudner, L. M. (2002). An examination of decision-theory adaptive testing procedures. Paper presented at the annual meeting of the American Educational Research Association, April 1-5, 2002, New Orleans, LA.

Sheehan, K., & Lewis, C. (1992). Computerized mastery testing with nonequivalent testlets. Applied Psychological Measurement, 16, 65-76.

Spray, J. A. (1993). Multiple-category classification using a sequential probability ratio test (Research Report 93-7). Iowa City, Iowa: ACT, Inc.

Spray, J. A., Abdel-fattah, A. A., Huang, C., and Lau, C. A. (1997). Unidimensional approximations for a computerized test when the item pool and latent space are multidimensional (Research Report 97-5). Iowa City, Iowa: ACT, Inc.

Spray, J. A., & Reckase, M. D. (1987). The effect of item parameter estimation error on decisions made using the sequential probability ratio test (Research Report 87-17). Iowa City, IA: ACT, Inc.

Spray, J. A., & Reckase, M. D. (1994). The selection of test items for decision making with a computerized adaptive test. Paper presented at the Annual Meeting of the National Council for Measurement in Education (New Orleans, LA, April 5-7, 1994).

Spray, J. A., & Reckase, M. D. (1996). Comparison of SPRT and sequential Bayes procedures for classifying examinees into two categories using a computerized test. Journal of Educational & Behavioral Statistics,21, 405-414.

Thompson, N. A. (2007). A Practitioner’s Guide for Variable-length Computerized Classification Testing. Practical Assessment Research & Evaluation, 12(1). Available online: http://pareonline.net/getvn.asp?v=12&n=1

Vos, H. J. (1998). Optimal sequential rules for computer-based instruction. Journal of Educational Computing Research, 19, 133-154.

Vos, H. J. (1999). Applications of Bayesian decision theory to sequential mastery testing. Journal of Educational and Behavioral Statistics, 24, 271-292.

Wald, A. (1947). Sequential analysis. New York: Wiley.

Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361-375.

Weissman, A. (2004). Mutual information item selection in multiple-category classification CAT. Paper presented at the Annual Meeting of the National Council for Measurement in Education, San Diego, CA.

Weitzman, R. A. (1982a). Sequential testing for selection. Applied Psychological Measurement, 6, 337-351.

Weitzman, R. A. (1982b). Use of sequential testing to prescreen prospective entrants into military service. In D. J. Weiss (Ed.), Proceedings of the 1982 Computerized Adaptive Testing Conference. Minneapolis, MN: University of Minnesota, Department of Psychology, Psychometric Methods Program, 1982.

Websites with further information
Measurement Decision Theory by Lawrence Rudner

CAT Central by David J. Weiss