Terminological Quirks in Psychometrics

Psychometrics, with its blend of psychological theory and statistical methodology, has several terms that frequently become sources of confusion when communicating more broadly. Here are a few examples of such terms, along with explanations of why they might cause confusion.

Reliability vs. Validity

Why Confusing: Confusion often arises because the concept of validity is broader than how it’s often contrasted with reliability, especially in contemporary psychometric theory. Validity encompasses all evidence and rationale supporting the interpretation and use of test scores, including aspects of reliability. In this more holistic view, reliability can be seen as a subset of validity, as consistent (reliable) measurement is necessary for a test to be considered valid for a given purpose.

The American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (2014) in their “Standards for Educational and Psychological Testing” emphasize this broader conception of validity. They argue that validity is not just about the test itself but involves the validity of inferences made from test scores, encompassing everything from test content and response processes to internal structure, relations to other variables, and consequences of testing.

In this context, validity is an overarching concern that includes:

Content Validity: Does the test adequately cover the construct it’s intended to measure?
Criterion-related Validity: How well do test scores predict or correlate with relevant outcomes or criteria?
Construct Validity: Does the test accurately reflect the theoretical construct it’s supposed to measure? This encompasses evidence from various sources, including the internal structure of the test (which relates to reliability), relationships with other variables, and the implications of the testing process (consequential validity).

Thus, while reliability focuses on the consistency and repeatability of test scores, the broader concept of validity integrates reliability as a necessary condition for valid inferences from those scores. This expanded view of validity indeed subsumes reliability, acknowledging that a test must be reliable to be valid, but also requires much more to ensure that test scores are used and interpreted appropriately. The distinction between reliability and validity in the narrower sense often serves pedagogical purposes but can oversimplify the rich and complex considerations involved in validating test score interpretations and uses.

What “Scale” is it?

Why Confusing: The term “scale” in psychometrics often causes confusion due to its multiple meanings and applications within and outside the field of measurement.

The ambiguity arises when “scale” is used interchangeably with terms like “test,” “questionnaire,” “inventory,” or “measure,” each of which can have slightly different meanings or connotations in various contexts. And the confusion can deepen when attempting to discern whether someone is reffering to the instrument itself, the scores derived from it, or perhaps a subscale embedded within it, with a distinct response formats and measurement type. Are we discussing the raw scale scores, or the derived scores? Is the focus on a specific subscale, or a composite score aggregating various facets? Are we discussing the standardized scores?

Here are some examples of potential confusions related to the term “scale” in psychometrics:

Scale vs. Test vs. Questionnaire: These terms are often used interchangeably, but they can have nuanced differences. A “scale” might refer to a specific instrument measuring a single construct, or a subscale of a test; while a “test” or a “questionnaire” might contain a mixture of scales and other types of questions.
Raw Scale Scores vs. Derived Scores: When discussing scale scores, confusion can arise between raw scores (the original unprocessed responses) and derived scores (transformed or standardized scores).
Subscales vs. Composite Scores: Scales can contain subscales, which are smaller components designed to measure specific facets of a construct. The confusion may arise when discussing whether the focus is on a particular subscale or the overall composite score, especially when they have different implications or interpretations.
Instrument vs. Scores: In discussions about research instruments, confusion may arise when the focus shifts between the instrument itself (the tool used to collect data) and the scores generated from it (the data output). Researchers may inadvertently switch between these perspectives.

It is always helphul with clear communication and precise terminology. But as always, communication to diverse audiences require adjustments to terminology.

Terminological Quirks in Psychometrics

Reliability vs. Validity

What “Scale” is it?

Related posts

Leave a Reply Cancel reply