**The Science**

The REASZON™ assessment uses a Bayesian Computer Adaptive Testing algorithm to reduce the number of items needed to assess critical reasoning, and to better measure individual’s critical reasoning level. This algorithm requires calibration on a large sample of test takers. This calibration procedure determines the relation between item characteristics (e.g. complexity of the puzzle, number of decoy mirrors, etc.) and the distribution of critical reasoning. In traditional tests, which rely on a large number of premade items, this calibration determines the difficulty of each item. For the REASZON™ assessment, items are uniquely generated for each test taker, and rely on a set of item parameters that determine the form and difficulty of each item. The Bayesian nature of the calibration algorithm allows REASZON™ to generalize the item generation to any combination of the item parameters.

Once the REASZON™ test is calibrated, the computer adaptive nature of the test allows for fast measurement of an individual’s critical reasoning, as each item is generated at the individual’s expected critical reasoning score. This lets the assessment quickly attain an optimal difficulty for each individual, and refine each individual’s score by slightly varying the difficulty of the generated items.

The current calibration sample is an internationally representative sample of 2639 individuals who were enrolled in Dr. Neta’s critical reasoning Coursera MOOC (“Think Again: How to Reason and Argue”). This sample was approximately 55% male, with an age range between 17 and 77 (average age of 40). Approximately 51% of the calibration sample had completed a bachelor’s degree or above. Using this sample, the REASZON™ was calibrated and the resulting population critical reasoning distribution was normalized to a standard normal distribution.

The standardization using the calibration sample allows for the comparison of individual scores to the population curve, which leads to the production of percentile scores (e.g. Test taker scored in the 70% percentile, suggesting that their critical reasoning was better than 70% of the population.)

Standardization allows for the calculation of other population level curves, which allows REASZON™ to report comparisons against a variety of sub-populations (e.g. test taker scored in the 70% percentile of computer science majors, and in the 80% percentile overall.)

Finally, for interpretability overall percentile scores are converted into a numerical score that ranges from 700-1300, with 1000 representing the 50% percentile.

In addition to calibration, REASZON™ was compared against commonly used measure of critical reasoning, ETS HEighten , as well as several academic outcomes of interest.

In our first validation study, REASZON™ was administered alongside of ETS HEighten to 182 business school students. GPA, SAT Verbal and SAT Math scores were collected, along with demographic information such as race, gender and age.

General linear models were fit predicting GPA, SAT Verbal and SAT Math scores by REASZON™ score and demographics, as well as ETS HEighten scores and demographics. Interactions between REASZON™ score and demographics were examined, and none were found to be significant. This suggests that REASZON™ is unbiased by demographics in terms of its predictive ability. Furthermore, the total predictive ability in the form of R^{2} , a scale-free indicator of model fit, was examined for both the REASZON™ model and HEighten model. For GPA, the REASZON™ model had an R^{2} of .172, while the HEighten model an R^{2} of .1767, indicating that HEighten explained no more than .5% more variance in GPA than REASZON™, a trivial difference. For SAT Math, REASZON™ had an R^{2} of .4127, while HEighten had an R^{2;} of .4175, again suggesting that HEighten explained no more than .5% more variance in SAT-Math scores. Finally, for SAT-Verbal, RE ASZON™ had an R^{2} of .2832, while HEighten had an R^{2} of .5831, which suggests that HEighten explains 29.9% more variance in SAT-Verbal scores.

This set of results suggest that REASZON™ performs comparably to HEighten with regard to GPA and SAT-Math, while it underperforms with regard to SAT-Verbal. This result is not unexpected, as ETS HEighten has a significant verbal comprehension component, as it relies on complex vignettes. REASZON™ does not rely on complex vignettes, and instead uses a limited vocabulary (if, then, else), and a simple set of concepts. This difference in test construction is likely the reason for the difference in performance on the SAT-Verbal. But it is precisely this same difference that allows REASZON™ to measure critical reasoning skill in a way that avoids bias introduced by differences in reading comprehension and culturally-specific background knowledge.