Employment Testing: Failing to Make the Grade: What Gets Lost? Risks of Translating Psychological Models and Legal Requirements to Computer Code

The genesis for this posting is the article "Technologies of Compliance: Risk and Regulation in a Digital Age" authored by Kenneth A. Bamberger and found at 88 Texas L. Rev. 669 (2010). This posting takes portions of the article, modified to address the issue of job applicant assessments, and intersperses information on elements of workforce analytics to provide examples of the risks and challenges raised in the Bamberger article.

Workforce analytic systems are powerful tools, but they pose real perils. They force computer programmers to attempt to interpret psychological models, legal requirements and managerial logic; they mask the uncertainty of the very hazards with which lawmakers and regulators are concerned; they skew decisionmaking through an “automation bias” as a substitute for sound judgment; and their lack of transparency thwarts oversight and accountability.

Lost In Translation

The hiring assessment functionality of workforce analytics contains three divergent logic systems, legal, psychological and managerial. The legal logic system derives, in part, from the Americans with Disabilities Act (ADA) and its accompanying regulations and related caselaw. The psychological logic system derives primarily from the five-factor model of personality, or Big Five, as it has evolved over the past 20-25 years. The managerial logic derives from the implementation of the human resource function of the employer. Technology in the form of workforce analytics then attempts to tie these three logic systems together in order to create an automated assessment program that attempts to determine applicant "suitability" or "fit."

Information technology is not value-neutral, but embodies bias inherent in both its social and organizational context and its form. It is not infinitely plastic, but, through its systematization, trends towards inflexibility. It is not merely a transparent tool of intentional organizational control, but in turn shapes organizational definitions, perceptions, and decision structures. In addition to controlling the primary risks it seeks to address, then, it can raise—and then mask—different sorts of risk in its implementation.

For example, many workforce analytic companies utilize the Big Five model in creating their personality assessments. As its name implies, the Big Five looks at five traits: openness, conscientiousness, extraversion, agreeableness, and neuroticism, with each trait conceptualized on an axis from low to high (e.g., low neuroticism, high neuroticism). The Big Five, operating under various names, existed for a number of decades prior to its "rebirth" in the early 1990s, where it was embraced by organizational psychologists.

Since the late 1990s workforce analytics companies like Unicru (now owned by Kronos) have adopted the Big Five for use in their job applicant assessment program. The workforce analytic companies have created "model" psychological profiles and tested applicants against those profiles. In general, applicants receive either green, yellow or red scores on the basis of a 50/25/25 cutoff. Applicants scoring red are generally not interviewed, let alone hired.

The use of technology systems to hardwire workforce analytics raises a number of fundamental issues regarding the translation of legal mandates, psychological models and business practices into computer code and the resulting distortions. These translation distortions arise from the organizational and social context in which translation occurs; choices “embody biases that exist independently, and usually prior to the creation of the system.” And they arise as well from the nature of the technology itself “and the attempt to make human constructs amenable to computers.”

These distortions are compounded when psychological models, legal standards and managerial processes are turned over to programmers for translation into predictive algorithms and computer code. These programmers may know nothing of the psychological models, legal standards and management processes. Some are employees of separate IT divisions within firms; many are employees of third-party systems vendors. Wherever they work, their translation efforts are colored by their own disciplinary assumptions, the technical constraints of requirements engineering, and limits arising from the cost and capacity of computing.

Managerial processes may be poor vehicles for capturing nuance in legal policy and psychological models, especially in a context like employment discrimination where regulators have eschewed rules for standards and where the interpretation of those standards by regulators and psychological professionals may change over time. For example, the ADA prohibits pre-employment medical examinations and psychological tests used by workforce analytic companies may be considered medical examinations (please see ADA, FFM and DSM). Employers and workforce analytic companies have interpreted the medical examination requirement as prohibiting the use of tests that are designed to diagnose mental illnesses.

This interpretation creates two significant risks for employers and workforce analytic companies. First, the legal standard does not speak to "tests designed to diagnose mental illnesses;" rather, it is "whether the test is designed to reveal an impairment of physical or mental health such as those listed in the Diagnostic and Statistical Manual of Mental Disorders." "Designed to reveal" is semantically and substantively different from "designed to diagnose" and, as set out in Employment Tests are Designed to Reveal an Impairment, Big Five-based tests are designed to reveal impairments by their screening out process. As depicted in the graphic, tests designed to diagnose are a subset of the overall category of tests designed to reveal an impairment. Using the "designed to diagnose" category as the proxy for medical examinations puts employers and workforce analytic companies at significant risk of violating the ADA medical examination prohibition and the confidential medical information safeguards under the ADA. It may also result in other claims against the workforce analytic companies by job applicants, employers and insurers. The mistaken use of the "designed to diagnose" category as a proxy for medical examinations could be considered a design defect in the product liability arena or as negligent in a tort claim..

The second significant risk for employers and workforce analytic companies arises from their failure to account for the evolution of the Big Five model from non-clinical model to clinical model. In her seminal review of the personality disorder literature published in 2007, Dr. Lee Anna Clark stated that “the five-factor model of personality is widely accepted as representing the higher-order structure of both normal and abnormal personality traits.” A clear sign of this evolution comes with the publication of the most current volume of the Diagnostic and Statistical Manual for Mental Disorders (DSM-5), published in May 2013, where the model used to diagnose many personality disorders is based on the Big Five.

Consequently, even if the standard for defining a medical examination was focused solely on the use of a test that diagnosed a mental illness, the five-factor model has now evolved into a diagnostic tool used by the psychiatric community to define mental impairments, including personality disorders, of the kind set out in the DSM-5. The failure of employers and workforce analytic companies to account for the evolutionary development of the five-factor model puts them at significant risk due to their belief that the five-factor model is not a diagnostic tool - a belief that time and scientific advances have now overturned.

Automation Bias

While computer code and predictive-analytic methods might be accessible to programmers, they remain opaque to users —for whom, often, only the outcomes remain visible. In the case of job applicants, even this information (assessment outcome) is not visible to them - results are not disclosed by employers or workforce analytic companies. Programmers “code[] layer after layer of policies and other types of rules” that managers and directors cannot hope to understand or unwind.

Human judgment is subject to an automation bias, which fosters a tendency to “disregard or not search for contradictory information insight of a computer-generated solution that is accepted as correct.” Such bias has been found to be most pronounced when computer technology fails to flag a problem.

In a recent study from the medical context, researchers compared the diagnostic accuracy of two groups of experienced mammogram readers (radiologists, radiographers, and breast clinicians)—one aided by a Computer Aided Detection (CAD) program and the other lacking access to the technology. The study revealed that the first group was almost twice as likely to miss signs of cancer if the CAD did not flag the concerning presentation than the second group that did not rely on the program.

Automation bias may be found in the algorithms created and used by workforce analytic companies to provide "insights" to their employer customers. For example, Kenexa, an IBM company, has determined that distance from work, commute time and frequency of household moves all have a correlation with attrition in call-center and fast-food jobs. Applicants who live more than five miles from work, have a lengthy commute or have moved more frequently are scored down by the algorithms, making them less desirable candidates.

Painting with the broad brush of distance from work, commute time and moving frequency may result in well-qualified applicants being excluded. The Kenexa insights are generalized correlations; they say nothing about any particular applicant.

What are the risks of employers slavishly adhering to the results of the algorithm? Part of the answer comes from identifying groups of people who have longer commutes and move more frequently than others, lower-income persons who, according to the U.S. Census, are disproportionately African-American and Hispanic.

Through the application of these “insights,” many low-income persons are electronically redlined, meaning employers will pass over qualified applicants because they live (or don’t live) in certain areas, or because they have moved. The reasons for moving do not matter — whether it is to find a better school for their children, to escape domestic violence, the elimination of mass transit in their community, or as a consequence of job loss due to a company shutdown (please see From What Distance is Discrimination Acceptable?)

An employer who does not look past the simple results of the assessment algorithms not only harms itself by failing to consider well-qualified employees, the employer puts itself at risk for employment discrimination claims by classes of persons (e.g., African-American, Hispanic) protected by federal and state employment laws.

Institutionalized (Mis)Understanding

Institutionalization of workforce management practices might permit evolutionary improvements in existing measurements, but it masks areas where risk types are ignored or analysis is insufficient and where more revolutionary, paradigm-shifting advances might be warranted.

These understandings (or misunderstandings) can be institutionalized across the field of workforce analytics. As workforce analytic practices are disseminated through the industry by professional groups, workforce analytic practitioners, management scholars, and third-party technology vendors and consultants, they standardize an approach that other firms adopt, seeking legitimacy.

Workforce analytic systems, designed in part to mitigate risks, have now become sources of risk themselves. They create the perception of stability through probabilistic reasoning and the experience of accuracy, reliability, and comprehensiveness through automation and presentation. But in so doing, technology systems draw organizational attention away from uncertainty and partiality. They can embed, and then justify, self-interested assumptions and hypotheses.

Moreover, they shroud opacity—and the challenges for oversight that opacity presents—in the guise of legitimacy, providing the allure of shortcuts and safe harbors for actors both challenged by resource constraints and desperate for acceptable means to demonstrate compliance with legal mandates and market expectations.

The technical language of workforce analytic systems obscures the accountability of the decisions they channel. Programming and mathematical idiom can shield layers of embedded assumptions from high-level firm decisionmakers charged with meaningful oversight and can mask important concerns with a veneer of transparency. This problem is compounded in the case of regulators outside the firm, who frequently lack the resources or vantage to peer inside buried decision processes and must instead rely on the resulting conclusions about risks and safeguards offered them by the parties they regulate.

Risks of a Technological Monoculture

Technology-based workforce analytic systems proliferate, in part, because policy makers have rejected rule-based mandates in favor of regulatory principles that rely on the exercise of context-specific judgment by regulated entities for their implementation . Yet workforce analytic technology can turn each of these regulatory choices on its head. The need to translate psychological, legal and managerial logic into a fourth distinct logic of computer code and quantitative analytics creates the possibility that legal choices will be skewed by the biases inherent in that process.

Such biases introduce several risks: that choices will be shaped both by assumptions divorced from sound management and incentives unrelated to public ends (e.g., hiring discrimination leading to larger income support payments - SSDI, SSI); that the rule-bound nature of code will substitute one-time technological “fixes” for ongoing human oversight and assessment (e.g., failure to recognize the evolution of the Big Five becoming a diagnostic tool); and that the standardization of risk-assessment approaches will eliminate variety—and therefore robustness in workforce analytic efforts, developing systemic risks of which individual actors may not be aware.

Systemic risks have developed because there is a technological "monoculture" in the workforce analytic industry. The problems are analogous to those of the biological domain. A deeply entrenched standard prevents the introduction of technological ideas that deviate too far for accepted norms. This means that the industry may languish with inefficient or non-optimal solutions to problems, even though efficient, optimal, and technically feasible solutions exist. The technical feasibility of these superior solutions is not important; they are excluded because they are incompatible with the status quo technology.

In a diverse population, any particular weakness or vulnerability is likely confined to only a small segment of the whole population, making population-wide catastrophes extremely unlikely. In a homogeneous population, any vulnerability is manifested by everyone, creating a risk of total extinction. In the case of workforce analytics, one successful challenge the Big Five-based model either being an illegal medical examination or screening out persons with disabilities introduces systemic risk to all customers of that workforce analytics company - one loss will lead to multiple challenges, along the lines of the asbestos litigation (please see The Next Asbestos? The Next FLSA?).Systemic risk is not limited to the ecosystem of the workforce analytic company being challenged, but extends to all companies that market or utilize Big Five-based assessments.

The potential costs are enormous. If the assessment is an illegal medical examination, then each applicant has a claim based on the use of an illegal medical examination. Some employers use the assessments to screen millions of applicants each year; each applicant is a potential plaintiff. Further, if the test is a medical examination, then each applicant has a claim for the misuse of confidential medical information (if the test is a medical examination, the applicant responses are confidential medical information). Not only does that lead to claims based on privacy violations, but all systems, solutions and databases that incorporate the information obtained from the assessments will need to be "sanitized." In a very real sense, the data may be the virus and the costs of "cleansing" those systems may well dwarf the very significant damages payable to applicants (please see When the First Domino Falls: Consequences to Employers of Embracing Workforce Assessment Solutions).

Employment Testing: Failing to Make the Grade

Friday, September 20, 2013

What Gets Lost? Risks of Translating Psychological Models and Legal Requirements to Computer Code

No comments:

Post a Comment