Employment Testing: Failing to Make the Grade: September 2013

Wednesday, September 25, 2013

Risks to Kroger Shareholders

There is material risk to Kroger shareholders and other stakeholders that the company's use of the Kronos assessments over the past 7+ years has created millions of potential plaintiffs, including every person who has applied for a position at one of the many Kroger companies.

Kroger is being investigated by the Equal Employment Opportunity Commission (EEOC) as a consequence of multiple charges filed with the EEOC alleging non-compliance with labor and employment laws, including the Americans with Disabilities Act (ADA). The investigation has been ongoing for more than five years and has generated a number of district court and appellate court decisions as Kroger and the employment assessment company it uses, Kronos, have sought to avoid disclosing information about the assessment and its impact on persons protected by the ADA.

These actions by Kroger and Kronos have generally been unsuccessful; the appellate court decisions have sided with the EEOC and have ordered Kronos to disclose massive amounts of information, including:

Any and all documents and data constituting or related to validation studies or validation evidence pertaining to Kronos assessment tests purchased by Kroger, including but not limited to such studies or evidence as they relate to the use of the tests as personnel selection or screening instruments, even if created or performed for other customer(s).
The user’s manual and instructions for the use of assessment tests used by Kroger.
Any and all documents (if any) related to Kroger, including but not limited to correspondence, notes, and data files, relating to Kroger; its use of the assessment test; results, ratings, or scores of individual test takers; and any validation efforts made thereto.
Any and all documents discussing, analyzing or measuring potential adverse impact on persons with disabilities.
Any and all documents related to any and all job analysis performed by any person or entity related to any or all position(s) at Kroger.

See Kroger and Kronos: Chaos and Disorder.

Claims and Damages

Claims against Kroger include:

Claims that the Kronos assessment is an illegal pre-employment medical examination whose use is prohibited by the ADA.
Claims that responses to the Kronos assessment, as a medical examination, are confidential medical information, subject to strict safeguards that Kroger has been recklessly ignoring.
Claims that the Kronos assessment illegally screens out persons with mental disabilities in violation of the ADA.
Claims that Kroger failed to ensure that the Kronos assessment results accurately reflect the skills, aptitude and other factors that the assessment purports to measure, rather than reflecting an applicant’s impairment.

As a medical examination, it is unlawful for the Kronos assessment to be administered by Kroger to any job applicant prior to a conditional offer of employment being made to that applicant. Each time the Kronos assessment is so administered to any such applicant, Kroger is violating Title I of the ADA and subjecting itself to claims for back pay, front pay, compensatory damages and injunctive relief. Each person that has been subject to the illegal medical examination of Kroger under Article 1 of the ADA is entitled to seek compensatory and punitive damages of $600,000, in addition to back pay, front pay and recovery of attorneys’ fees.

An employer utilizing the Kronos assessment, like Kroger, is at risk for punitive damages where, as held by the court in Kolstad v. American Dental Association, the “employer has engaged in intentional discrimination and has done so with malice or reckless indifference to the federally protected rights of an aggrieved individual.”

Kroger’s unlawful use of the Unicru assessment subjects the company to material and publicly-disclosable liability and it shareholders to a potentially significant diminution in the value of their shares.

Ignoring EEOC Guidance

Reckless indifference may be show by a variety of methods, including an employer’s failure to comply with the EEOC guidance for testing and selection, pursuant to which employers are encouraged to:

Administer tests without regard to race, color, national origin, sex, religion, age, or disability;
Ensure that tests are properly validated;
Ensure that tests are job-related;
Ensure that tests are appropriate for the employer's purpose;
If selection procedures screen out a protected group, determine whether there is an equally effective alternative selection procedure that has less adverse impact and adopt that alternative procedure;
Keep abreast of changes in job requirements and update the test specifications or selection procedures accordingly; and
Ensure that tests and selection procedures are not adopted casually by managers who know little about these processes.

In a court filing submitted several years ago in connection with its ongoing EEOC litigation, Kronos stated that “[n]o adverse impact or validation studies have been performed by Kronos, or to Kronos’ knowledge, by or for Kroger with respect to potential adverse impact on individuals with disabilities.” This filing was served on both the EEOC and Kroger.

Consequently, Kroger cannot confirm that the Kronos assessments have been administered without regard to disability, nor can it confirm that the Kronos assessments have been properly validated. Even more recklessly, Kroger has known that the Kronos assessments have not been validated with respect to potential adverse impact on individuals with disabilities and yet has continued to use the unvalidated assessment for years.

Disregarding Industry Standards

Reckless indifference may also be shown by an employer’s failure to meet industry standards. The general counsel of the Equal Employment Advisory Council, an employer association comprised of more than 300 major corporations and staffed by experienced lawyers and HR professionals with in-depth knowledge in handling EEO and affirmative action compliance issues, set out the following industry standards in a statement made to the EEOC in 2007:

Tests are to be based on "objective" criteria
Tests are attractive screening methods "when administered properly"
A "carefully selected" test that is "properly validated" can provide a great deal of relevant information when "[u]sed in conjunction with other sources of information"
A few "basic principles that EEAC member companies strive to apply:“

Ensure each employment test has been properly validated.
Ensure that the validity study is current and properly documented.
Avoid overreliance on representations made by test manufacturers regarding test validity and suitability for a particular job.
Conduct periodic audits of employment selection testing procedures to monitor for

possible disparate impact
significant changes in jobs
outdated validity studies
other potential problems

As set out in the many documents filed in the ongoing litigation among the EEOC, Kroger and Kronos, few, if any, of these basic industry principles have been followed by Kroger. For example:

Kroger was unable to provide the EEOC with any validation studies, whether current or not or properly documented or not;
There was no information provided that demonstrated periodic audits by Kroger of the Kronos assessment to monitor for possible disparate impact or outdated validity studies; and
Kroger utilizes the Kronos assessment, on its own, as the basis for rejecting applicants – the assessment was not “used in conjunction with other sources of information.”
Kroger seemingly relied solely on the representations of Kronos regarding test validity and suitability.

Kroger’s knowing failure to meet many the basic principles set out by the industry and turning a blind eye towards the known discriminatory impact of the Kronos assessment is demonstrably reckless.

He Said, She Said

A key element in both the EEOC and EEAC guidance is the responsibility of the employer (Kroger) to independently review the assessment and avoid reliance on the representations of the assessment provider. Did Kroger conduct such an independent review? It would appear not.

In a letter to the EEOC dated February 9, 2009, sent by Kroger’s counsel it states, “Unicru, the company that developed the assessment [and subsequently acquired by Kronos], has informed … [Kroger] that the assessment has been fully and appropriately validated, and that there is no disparate impact.” That statement admits a reliance on Kronos's representations and indicates no independent validation effort on the part of Kroger.

The quoted sentence from the February 9, 2009 letter of Kroger's counsel is puzzling, given the Kronos statement mentioned previously that it is “impossible” to measure disparate impact on people who have disabilities. Which is it? How can Kronos assure Kroger that there is no disparate impact, as Kroger’s counsel stated to the EEOC, if, according to Kronos, it is impossible to measure disparate impact?

Ignoring Supreme Court Precedent

In a petition to revoke an EEOC subpoena filed on October 16, 2008, Kronos claims that there is “no known method … to ascertain adverse impact against the entire generic category of disabilities.” As noted above, such a claim is a red herring. Given the wealth of scholarship and data concerning the interrelationship between the FFM model and diagnosing mental illness (please see ADA, FFM and DSM), it seems more likely that Kronos was unwilling to spend the money to determine the impact of the Unicru assessment on persons with disabilities. Either that, or Kronos knows what the results would demonstrate – that the assessment is a medical examination and its administration by employers, including Kroger, results in disparate treatment of, and has a disparate impact on, persons with mental illness.

As evidenced by the scholarship on the use of the FFM to diagnose mental illness previously discussed, there are no barriers to testing the use of the FFM in the Kronos assessment to determine its impact on persons with mental illness. As noted by the authors of “Veiled Barriers: Pre-EmploymentTesting and Disability” published by the Journal of Rehabilitation Administration in 2006:

[S]tudies of sub-groups, such as individuals with mental illnesses or cognitive impairments could be conducted to determine the potential, and perhaps likelihood for, pre-employment test results unfairly penalizing these individuals in the employee selection and hiring stages …

In Albemarle Paper Company v. Moody, the Supreme Court addressed a case in which an employer implemented a test (Wonderlic) on the theory that a certain verbal intelligence was called for by the increasing sophistication of the plant's operations. The company made no attempt to validate the test for job-relatedness, and simply adopted the national "norm" score as a cut-off point for new job applicants. The Supreme Court cited the Standards of the American Psychological Association and pointed out that a test should be validated on people as similar as possible to those to whom it will be administered. The Court further stated that differential studies should be conducted on minority groups wherever feasible.

Substitute (i) Kroger for Albemarle Paper Company and (ii) Kronos assessment for the Wonderlic test in the Albemarle Paper Company decision and one sees few, if any, differences. Like Albemarle Paper Company, Kroger made no attempt to validate the test for job-relatedness and arbitrarily adopted cut-off points. Like Albemarle Paper Company, neither Kroger nor Kronos conducted validation studies on persons with mental illness.

Kroger’s failure to follow Supreme Court precedent that (i) requires validation of an employment test (Unicru assessment), (ii) prohibits arbitrary cut-off points, and (iii) requires differential studies of the impact of the test on protected classes (persons with mental illness)? Reckless.

Friday, September 20, 2013

What Gets Lost? Risks of Translating Psychological Models and Legal Requirements to Computer Code

The genesis for this posting is the article "Technologies of Compliance: Risk and Regulation in a Digital Age" authored by Kenneth A. Bamberger and found at 88 Texas L. Rev. 669 (2010). This posting takes portions of the article, modified to address the issue of job applicant assessments, and intersperses information on elements of workforce analytics to provide examples of the risks and challenges raised in the Bamberger article.

Workforce analytic systems are powerful tools, but they pose real perils. They force computer programmers to attempt to interpret psychological models, legal requirements and managerial logic; they mask the uncertainty of the very hazards with which lawmakers and regulators are concerned; they skew decisionmaking through an “automation bias” as a substitute for sound judgment; and their lack of transparency thwarts oversight and accountability.

Lost In Translation

The hiring assessment functionality of workforce analytics contains three divergent logic systems, legal, psychological and managerial. The legal logic system derives, in part, from the Americans with Disabilities Act (ADA) and its accompanying regulations and related caselaw. The psychological logic system derives primarily from the five-factor model of personality, or Big Five, as it has evolved over the past 20-25 years. The managerial logic derives from the implementation of the human resource function of the employer. Technology in the form of workforce analytics then attempts to tie these three logic systems together in order to create an automated assessment program that attempts to determine applicant "suitability" or "fit."

Information technology is not value-neutral, but embodies bias inherent in both its social and organizational context and its form. It is not infinitely plastic, but, through its systematization, trends towards inflexibility. It is not merely a transparent tool of intentional organizational control, but in turn shapes organizational definitions, perceptions, and decision structures. In addition to controlling the primary risks it seeks to address, then, it can raise—and then mask—different sorts of risk in its implementation.

For example, many workforce analytic companies utilize the Big Five model in creating their personality assessments. As its name implies, the Big Five looks at five traits: openness, conscientiousness, extraversion, agreeableness, and neuroticism, with each trait conceptualized on an axis from low to high (e.g., low neuroticism, high neuroticism). The Big Five, operating under various names, existed for a number of decades prior to its "rebirth" in the early 1990s, where it was embraced by organizational psychologists.

Since the late 1990s workforce analytics companies like Unicru (now owned by Kronos) have adopted the Big Five for use in their job applicant assessment program. The workforce analytic companies have created "model" psychological profiles and tested applicants against those profiles. In general, applicants receive either green, yellow or red scores on the basis of a 50/25/25 cutoff. Applicants scoring red are generally not interviewed, let alone hired.

The use of technology systems to hardwire workforce analytics raises a number of fundamental issues regarding the translation of legal mandates, psychological models and business practices into computer code and the resulting distortions. These translation distortions arise from the organizational and social context in which translation occurs; choices “embody biases that exist independently, and usually prior to the creation of the system.” And they arise as well from the nature of the technology itself “and the attempt to make human constructs amenable to computers.”

These distortions are compounded when psychological models, legal standards and managerial processes are turned over to programmers for translation into predictive algorithms and computer code. These programmers may know nothing of the psychological models, legal standards and management processes. Some are employees of separate IT divisions within firms; many are employees of third-party systems vendors. Wherever they work, their translation efforts are colored by their own disciplinary assumptions, the technical constraints of requirements engineering, and limits arising from the cost and capacity of computing.

Managerial processes may be poor vehicles for capturing nuance in legal policy and psychological models, especially in a context like employment discrimination where regulators have eschewed rules for standards and where the interpretation of those standards by regulators and psychological professionals may change over time. For example, the ADA prohibits pre-employment medical examinations and psychological tests used by workforce analytic companies may be considered medical examinations (please see ADA, FFM and DSM). Employers and workforce analytic companies have interpreted the medical examination requirement as prohibiting the use of tests that are designed to diagnose mental illnesses.

This interpretation creates two significant risks for employers and workforce analytic companies. First, the legal standard does not speak to "tests designed to diagnose mental illnesses;" rather, it is "whether the test is designed to reveal an impairment of physical or mental health such as those listed in the Diagnostic and Statistical Manual of Mental Disorders." "Designed to reveal" is semantically and substantively different from "designed to diagnose" and, as set out in Employment Tests are Designed to Reveal an Impairment, Big Five-based tests are designed to reveal impairments by their screening out process. As depicted in the graphic, tests designed to diagnose are a subset of the overall category of tests designed to reveal an impairment. Using the "designed to diagnose" category as the proxy for medical examinations puts employers and workforce analytic companies at significant risk of violating the ADA medical examination prohibition and the confidential medical information safeguards under the ADA. It may also result in other claims against the workforce analytic companies by job applicants, employers and insurers. The mistaken use of the "designed to diagnose" category as a proxy for medical examinations could be considered a design defect in the product liability arena or as negligent in a tort claim..

The second significant risk for employers and workforce analytic companies arises from their failure to account for the evolution of the Big Five model from non-clinical model to clinical model. In her seminal review of the personality disorder literature published in 2007, Dr. Lee Anna Clark stated that “the five-factor model of personality is widely accepted as representing the higher-order structure of both normal and abnormal personality traits.” A clear sign of this evolution comes with the publication of the most current volume of the Diagnostic and Statistical Manual for Mental Disorders (DSM-5), published in May 2013, where the model used to diagnose many personality disorders is based on the Big Five.

Consequently, even if the standard for defining a medical examination was focused solely on the use of a test that diagnosed a mental illness, the five-factor model has now evolved into a diagnostic tool used by the psychiatric community to define mental impairments, including personality disorders, of the kind set out in the DSM-5. The failure of employers and workforce analytic companies to account for the evolutionary development of the five-factor model puts them at significant risk due to their belief that the five-factor model is not a diagnostic tool - a belief that time and scientific advances have now overturned.

Automation Bias

While computer code and predictive-analytic methods might be accessible to programmers, they remain opaque to users —for whom, often, only the outcomes remain visible. In the case of job applicants, even this information (assessment outcome) is not visible to them - results are not disclosed by employers or workforce analytic companies. Programmers “code[] layer after layer of policies and other types of rules” that managers and directors cannot hope to understand or unwind.

Human judgment is subject to an automation bias, which fosters a tendency to “disregard or not search for contradictory information insight of a computer-generated solution that is accepted as correct.” Such bias has been found to be most pronounced when computer technology fails to flag a problem.

In a recent study from the medical context, researchers compared the diagnostic accuracy of two groups of experienced mammogram readers (radiologists, radiographers, and breast clinicians)—one aided by a Computer Aided Detection (CAD) program and the other lacking access to the technology. The study revealed that the first group was almost twice as likely to miss signs of cancer if the CAD did not flag the concerning presentation than the second group that did not rely on the program.

Automation bias may be found in the algorithms created and used by workforce analytic companies to provide "insights" to their employer customers. For example, Kenexa, an IBM company, has determined that distance from work, commute time and frequency of household moves all have a correlation with attrition in call-center and fast-food jobs. Applicants who live more than five miles from work, have a lengthy commute or have moved more frequently are scored down by the algorithms, making them less desirable candidates.

Painting with the broad brush of distance from work, commute time and moving frequency may result in well-qualified applicants being excluded. The Kenexa insights are generalized correlations; they say nothing about any particular applicant.

What are the risks of employers slavishly adhering to the results of the algorithm? Part of the answer comes from identifying groups of people who have longer commutes and move more frequently than others, lower-income persons who, according to the U.S. Census, are disproportionately African-American and Hispanic.

Through the application of these “insights,” many low-income persons are electronically redlined, meaning employers will pass over qualified applicants because they live (or don’t live) in certain areas, or because they have moved. The reasons for moving do not matter — whether it is to find a better school for their children, to escape domestic violence, the elimination of mass transit in their community, or as a consequence of job loss due to a company shutdown (please see From What Distance is Discrimination Acceptable?)

An employer who does not look past the simple results of the assessment algorithms not only harms itself by failing to consider well-qualified employees, the employer puts itself at risk for employment discrimination claims by classes of persons (e.g., African-American, Hispanic) protected by federal and state employment laws.

Institutionalized (Mis)Understanding

Institutionalization of workforce management practices might permit evolutionary improvements in existing measurements, but it masks areas where risk types are ignored or analysis is insufficient and where more revolutionary, paradigm-shifting advances might be warranted.

These understandings (or misunderstandings) can be institutionalized across the field of workforce analytics. As workforce analytic practices are disseminated through the industry by professional groups, workforce analytic practitioners, management scholars, and third-party technology vendors and consultants, they standardize an approach that other firms adopt, seeking legitimacy.

Workforce analytic systems, designed in part to mitigate risks, have now become sources of risk themselves. They create the perception of stability through probabilistic reasoning and the experience of accuracy, reliability, and comprehensiveness through automation and presentation. But in so doing, technology systems draw organizational attention away from uncertainty and partiality. They can embed, and then justify, self-interested assumptions and hypotheses.

Moreover, they shroud opacity—and the challenges for oversight that opacity presents—in the guise of legitimacy, providing the allure of shortcuts and safe harbors for actors both challenged by resource constraints and desperate for acceptable means to demonstrate compliance with legal mandates and market expectations.

The technical language of workforce analytic systems obscures the accountability of the decisions they channel. Programming and mathematical idiom can shield layers of embedded assumptions from high-level firm decisionmakers charged with meaningful oversight and can mask important concerns with a veneer of transparency. This problem is compounded in the case of regulators outside the firm, who frequently lack the resources or vantage to peer inside buried decision processes and must instead rely on the resulting conclusions about risks and safeguards offered them by the parties they regulate.

Risks of a Technological Monoculture

Technology-based workforce analytic systems proliferate, in part, because policy makers have rejected rule-based mandates in favor of regulatory principles that rely on the exercise of context-specific judgment by regulated entities for their implementation . Yet workforce analytic technology can turn each of these regulatory choices on its head. The need to translate psychological, legal and managerial logic into a fourth distinct logic of computer code and quantitative analytics creates the possibility that legal choices will be skewed by the biases inherent in that process.

Such biases introduce several risks: that choices will be shaped both by assumptions divorced from sound management and incentives unrelated to public ends (e.g., hiring discrimination leading to larger income support payments - SSDI, SSI); that the rule-bound nature of code will substitute one-time technological “fixes” for ongoing human oversight and assessment (e.g., failure to recognize the evolution of the Big Five becoming a diagnostic tool); and that the standardization of risk-assessment approaches will eliminate variety—and therefore robustness in workforce analytic efforts, developing systemic risks of which individual actors may not be aware.

Systemic risks have developed because there is a technological "monoculture" in the workforce analytic industry. The problems are analogous to those of the biological domain. A deeply entrenched standard prevents the introduction of technological ideas that deviate too far for accepted norms. This means that the industry may languish with inefficient or non-optimal solutions to problems, even though efficient, optimal, and technically feasible solutions exist. The technical feasibility of these superior solutions is not important; they are excluded because they are incompatible with the status quo technology.

In a diverse population, any particular weakness or vulnerability is likely confined to only a small segment of the whole population, making population-wide catastrophes extremely unlikely. In a homogeneous population, any vulnerability is manifested by everyone, creating a risk of total extinction. In the case of workforce analytics, one successful challenge the Big Five-based model either being an illegal medical examination or screening out persons with disabilities introduces systemic risk to all customers of that workforce analytics company - one loss will lead to multiple challenges, along the lines of the asbestos litigation (please see The Next Asbestos? The Next FLSA?).Systemic risk is not limited to the ecosystem of the workforce analytic company being challenged, but extends to all companies that market or utilize Big Five-based assessments.

The potential costs are enormous. If the assessment is an illegal medical examination, then each applicant has a claim based on the use of an illegal medical examination. Some employers use the assessments to screen millions of applicants each year; each applicant is a potential plaintiff. Further, if the test is a medical examination, then each applicant has a claim for the misuse of confidential medical information (if the test is a medical examination, the applicant responses are confidential medical information). Not only does that lead to claims based on privacy violations, but all systems, solutions and databases that incorporate the information obtained from the assessments will need to be "sanitized." In a very real sense, the data may be the virus and the costs of "cleansing" those systems may well dwarf the very significant damages payable to applicants (please see When the First Domino Falls: Consequences to Employers of Embracing Workforce Assessment Solutions).