Employment Testing: Failing to Make the Grade: 2013

Tuesday, December 31, 2013

A Brave New, Though Likely Illegal, World

According to a TechCrunch article, the Prophecy Sciences employment assessment "seeks to analyze the unique blend of chemical reactions, electrical impulses, reflexes and behaviors that make you who you are ..." In addition to testing accuracy and response time, The test monitors job applicant eye movement, pulse rate and electrodermal activity.

The Americans with Disabilities Act (ADA) prohibits employers from administering pre-employment medical examinations. Guidance by the Equal Employment Opportunity Commission defines medical examination under the ADA by reference to seven factors, any one of which may be sufficient to determine that a test is a medical examination. One of those factors is whether the test measures an applicant's physiological responses to performing a task. Prophecy Sciences test measures pulse rate, eye movement and electrodermal activity (physiological responses) as applicants perform the task of taking the test.

The EEOC guidance states:

[I]f an employer measures an applicant's physiological or biological responses to performance, the test would be medical.

Example: A messenger service tests applicants' ability to run one mile in 15 minutes. At the end of the run, the employer takes the applicants' blood pressure and heart rate. Measuring the applicant's physiological responses makes this a medical examination.

Even if one were to disregard the medical examination problem, how does this test work with persons who have physical disabilities? Would blindness have an impact on the eye movement measure? How would limb paralysis effect the skin sensors?

How would Stephen Hawking perform on this test?

Monday, December 23, 2013

Employment Assessments: 21st Century Snake Oil?

The Oxford English Dictionary defines snake oil as "a quack remedy or panacea." The origins of snake oil as a derogatory phrase trace back to the latter half of the 19th century, which saw a dramatic rise in the popularity of "patent medicines." Despite the name, most patent medicines were not officially patented. They were medicines with questionable effectiveness whose contents were usually kept secret.

By the middle of the 19th century the manufacture of patent medicines had become a major industry in America. Often high in alcoholic content and fortified with cocaine, morphine or opium, many of these concoctions were advertised for infants and children. Some level of exoticism in the contents of the preparation was deemed desirable by their promoters and nearly any scientific discovery could inspire a key ingredient or principle in a patent medicine.

From the beginning, some physicians and medical societies were critical of patent medicines. They argued that the remedies did not cure illnesses, discouraged the sick from seeking legitimate treatments, and created negative consequences like alcohol and drug dependency.

By the end of the 19th century, Americans favored laws to force manufacturers to disclose the remedies' ingredients and use more realistic language in their advertising. These laws met with fierce resistance from the manufacturers. Finally, with strong support from President Theodore Roosevelt, a Pure Food and Drug Act was passed by Congress in 1906, paving the way for public health action against unlabeled or unsafe ingredients and misleading advertising.

Are there any similarities between the sales and marketing of snake oil/patent medicine and the sales and marketing of employment assessments? Do the assessments have questionable effectiveness? Are their contents kept secret? Do they market scientific discoveries, in the form of buzzwords, as inspiring key ingredients? Is there public action challenging the assessments?

Questionable Effectiveness?

According to a 2012 study by Oracle and Development Dimensions International (DDI), a global human resources consulting firm whose expertise includes designing and implementing selection systems, more than 250 staffing directors and over 2,000 new hires from 28 countries provided the following perspectives on their organization’s selection processes (the following are excerpts from the study):

[O]nly 41 percent of staffing directors report that their pre-employment assessments are able to predict better hires.
Only half of staffing directors rate their systems as effective, and even fewer view them as aligned, objective, flexible, efficient, or integrated.
[T]he actual process for making a hiring decision is less effective than a coin toss.

In a 2007 article titled, “Reconsidering the Use of Personality Tests in Employment Contexts”, co-authored by six current or former editors of academic psychological journals, Dr. Kevin Murphy, Professor of Psychology at Pennsylvania State University and Editor of the Journal of Applied Psychology (1996-2002), states:

The problem with personality tests is … that the validity of personality measures as predictors of job performance is often disappointingly low. A couple of years ago, I heard a SIOP talk by Murray Barrick … He said, “If you took all the … [factors], measured well, you corrected for everything using the most optimistic corrections you could possibly get, you could account for about 15% of the variance in performance [between projected and actual performance].” … You are saying that if you take normal personality tests, putting everything together in an optimal fashion and being as optimistic as possible, you’ll leave 85% of the variance unaccounted for. The argument for using personality tests to predict performance does not strike me as convincing in the first place.

Secret Contents?

Using terms like patent-pending, proprietary and trade secret, employment assessment companies claim that they cannot disclose information about their assessment processes. Are these claims legitimate or, like the Wizard of Oz, are the claims used to mask the lack of relevant and legal substance behind the assessments? Or is it a bit of both?

Even assuming that the confidentiality claims by the assessment companies are appropriate, are there no circumstances in which the companies must disclose information regarding how their assessments are developed and implemented and the results of the assessment usage across a broad population of job applicants? There are.

In a 2012 decision, a federal appeals court stated the Americans with Disabilities Act (ADA) prohibits employment tests when such tests screen out or tend to screen out disabled people and the use of the test is not job-related for the position in question and consistent with business necessity. The court went on to order that the employer and testing company, Kroger and Kronos, respectively, provide the Equal Employment Opportunity Commission (EEOC) with:

Any and all documents and data constituting or related to validation studies or validation evidence pertaining to Kronos assessment tests purchased by Kroger, including but not limited to such studies or evidence as they relate to the use of the tests as personnel selection or screening instruments, even if created or performed for other customer(s);
The user’s manual and instructions for the use of assessment tests used by Kroger;
Any and all documents (if any) related to Kroger, including but not limited to correspondence, notes, and data files, relating to Kroger; its use of the assessment test; results, ratings, or scores of individual test takers; and any validation efforts made thereto; and
Any and all documents discussing, analyzing or measuring potential adverse impact on persons with disabilities.

So, it's quick and easy for the regulatory agency (EEOC) to obtain the necessary information, right?No. The EEOC investigation leading to the appeals court decision referenced above has been ongoing for more than six years. It has generated a number of district court and appellate court decisions. None of those decisions have addressed substantive claims of discrimination. They are all decisions relating to the unwillingness of Kroger and Kronos to provide information to the EEOC that would allow the EEOC to determine whether the Kronos assessments illegally discriminated against persons with disabilities.

Please see Kroger and Kronos: Chaos and Disorder.

Negative Consequences?

Employment personality tests discriminate against applicants with mental illness. These applicants are ready, willing and able to work, but are being illegally screened out from employment consideration by personality tests that use a non-validated stereotype of the capabilities of persons with mental illness. Please see What Are The Issues, ADA, FFM and DSM, and Employment Assessments Are Designed to Reveal an Impairment.

Employment discrimination on the basis of mental illness affects all demographic groups. Mental illness is no respecter of age, gender, geography, income, occupation, military status, race, religion or sexual orientation. Persons with mental illness include military veterans returning to the civilian workforce, new and expectant mothers, LGBTs and young adults. Please see Tests Discriminate Against Returning Veterans, Tests Discriminate Against New and Expectant Mothers and Employment Tests Discriminate Against LGBTs.

The illegal screening out of applicants with mental illness has come at a cost of tens of billions of dollars to taxpayers and the U.S. Treasury. The prevalence of mental disorders has generally remained unchanged over the past 15 years and substantially increased rates of treatment should have resulted in a decline in the percentage of persons receiving disability awards who are diagnosed with mental illness. Sadly, no. People with psychiatric impairments constitute the largest and most rapidly growing subgroup of income support program awards (SSDI, SSI). Please see Costing Taxpayers Billions of Dollars Each Year.

Every year since 1999, more Americans have killed themselves than the year before, making suicide the nation’s greatest untamed cause of death. Being unemployed is associated with a 2-3X increase in the relative risk of death by suicide, compared with being employed. Given that more than 90% of persons who attempt suicide have mental illnesses, a tool like personality testing that illegally excludes persons with mental illness from employment consideration leads to an increase both in perceived burdensomeness and thwarted belongingness/social alienation, two critical elements tied to the risk of suicide. Please see Does the Rising Use of Employment Personality Tests Contribute to An Increase in Suicides?

Marketing by Scientific Buzzwords?

New entrants in the assessment field include ConectCubed, Good Co., Evolv, Knack and Prophesy Sciences They compete with incumbents like Kenexa (IBM), Kronos, SHL, Success Factors (SAP), and Taleo (Oracle). Marketing claims include:

Our games are fun, but our technology is rock solid. We design and develop our games using state-of-the-art behavioral science, then we use data-mining tools and massive amounts of data to validate and compute the Knacks you earn. (Knack)

We use a powerful combination of cognitive games, biometric signals, and machine learning algorithms to compile actionable insights about you and your teammates. (Prophesy Sciences)

We have combined decades of academic and business research with sophisticated statistical models to create our Proprietary Psychometric Algorithm. (Good Co.)

Evolv’s patent-pending technology platform unifies and supplements existing data from current systems, then utilizes that dataset to identify fact-based workforce insights that drive measurable ROI. (Evolv)

Kronos helps organizations find value in big data with enhanced analytics. (Kronos)

Public Action?

The EEOC has been attempting to investigate the use of Kronos assessment by Kroger for more than six years. As noted previously, that investigation has resulted in a number of court decisions, not on the substance of the claims of alleged discrimination, but on the requirement of Kronos and Kroger to provide the EEOC with relevant information regarding the assessment and its usage.

That investigation has evolved into a systemic investigation by the EEOC. Systemic investigations involves pattern or practice, policy, and/or class cases where the alleged discrimination has a broad impact on an industry, profession, company, or geographic area. As stated by the court in the 2012 appellate decisions referenced previously, it is “a proper inquiry for the EEOC to seek information about how these tests work, including information about the types of characteristics they screen out….“

In connection with systemic investigations, the EEOC’s enforcement tools include issuing broad information requests and subpoenas on employers that are named as respondents in EEOC charges, particularly when the EEOC suspects systemic discrimination, and filing pattern or practice class lawsuits in federal court. For some employers, the potential class size can be measured in the millions of plaintiffs.

The EEOC's systemic investigation of Kroger and Kronos, as well as its investigation of other employers and their assessment companies, is consistent with the EEOC's implementation of its Strategic Enforcement Plan (SEP) for 2013-2016. The first national priority of the SEP is “eliminating systemic barriers in recruitment and hiring.” The SEP goes on to state that “people with disabilities continue to confront discriminatory policies and practices at the recruitment and hiring stages. These include … the use of screening tools (e.g., pre-employment tests …)."

Disability discrimination, including employment assessment litigation, is a significant element of the EEOC's enforcement activities. As shown in the chart below, ADA claims constituted the largest percentage of the EEOC’s yearly litigation filing activity for FY 2013 - almost half of all cases.

The Importance of This Issue

The long-term fiscal stability of the United States of America depends, in part, on ensuring that Americans with disabilities have meaningful opportunities to contribute to our collective well-being and on eliminating outdated policies that keep people in cycles of poverty and dependency.

More than two decades after the passage of the ADA, the unemployment rate for Americans with disabilities stubbornly remains nearly double that of people without disabilities, while their rate of labor force participation has continued to be abysmally low. Figures from the Bureau of Labor Statistics show that labor force participation for workers with disabilities was 20.3 percent, while the total for workers without disabilities was 69.1 percent—more than three times higher.

There are many benefits of employment—work enhances skills such as communication, socialization, academics, physical health, and community skills; it factors into how one is perceived by society; it promotes economic well-being (reducing government expenditures on income support programs, Medicare and Medicaid); it leads to greater opportunity for upward mobility; and it contributes to greater self-esteem.

Tuesday, December 17, 2013

Better Get While the Gettin's Good

On December 11, 2013, Reuters reported that the two private equity companies that took human resources management software firm Kronos Inc. private in 2007 are looking to sell the company. Hellman & Friedman LLC and JMI Equity are exploring a sale of Kronos, which could be valued at more than $4 billion. Interested purchasers are reported to include TPG, KKR and Bain.

The Reuters article states that Hellman & Friedman and JMI have taken advantage of Kronos' strong cash flow to draw more than $1.5 billion in dividends from Kronos, and so have already earned twice the $752.9 million they committed as equity when they agreed to acquire the company in 2007. In November 2013, the two companies had Kronos borrow to pay themselves a $490 million dividend.

Who should be interested in a potential sale of Kronos by Hellman & Friedman and JMI, other than the sellers, potential buyers and Kronos employees? The hundreds of employers that are customers of the Kronos talent acquisition and employee assessment services.

Why should those employers be interested? The risks to those employers from the ongoing systemic investigation by the Equal Employment Opportunity Commission (EEOC) of several Kronos customers, an investigation focused on whether the Kronos assessment services violate the Americans with Disabilities Act (ADA) by illegally screening out persons with disabilities.

What are EEOC systemic investigations? Systemic investigations involves pattern or practice, policy, and/or class cases where the alleged discrimination has a broad impact on an industry, profession, company, or geographic area. In connection with systemic investigations, the EEOC’s enforcement tools include issuing broad information requests and subpoenas on employers that are named as respondents in EEOC charges, particularly when the EEOC suspects systemic discrimination, and filing pattern or practice class lawsuits in federal court.

What are the risks to employers? Systemic investigations by the EEOC and class action claims by job applicants for damages and injunctive relief. For some employers, the potential class size can be measured in the millions of plaintiffs. Employers have primary liability under the ADA, but Kronos has indemnified many of its employer customers. If Kronos does not have the financial resources, however, the indemnification is illusory.

What is Kronos?

Kronos is a U.S.-based workforce management software and services company. According to the company, tens of thousands of organizations in more than 100 countries - including more than half of the Fortune 1000 - use Kronos.

In August 2006, Kronos acquired Unicru, Inc., a company specializing in software used to assess and hire hourly workers. At the time of the acquisition by Kronos, Unicru had as customers for its assessment (the Unicru assessment) more than 140 leading companies and brands, including SuperValu, Kroger, Toys "R" Us, Best Buy, CVS, Borders, Lowe's, Caribou Coffee, and Marquis Healthcare.

The Unicru assessment consists of a number of statements, to which an applicant must answer “strongly disagree,” “disagree,” “agree,” or “strongly agree.” It includes statements such as: “You have confidence in yourself”; "You try to sense what others are thinking and feeling”; “You always say whatever is on your mind”; and “It is easy for you to feel what others are feeling.”

The systemic investigation of Kronos assessment customers, including Kroger, arose from a charge filed with the EEOC more than six years ago by a Kroger job applicant. The charge led to an investigation that has been ongoing for more than six years and has generated a number of district court and appellate court decisions as Kronos has unsuccessfully sought to avoid disclosing information about the Unicru assessment and its impact on persons protected by the ADA.

Cloning Employees and Institutionalizing Biased Hiring Practices

According to Kronos, the Unicru assessment is an artificial intelligence test that uses neural networks to “learn” the characteristics of a customer’s “best” employees. As stated by Kronos’ Chief Scientist and the developer of the Unicru assessment, Dr. David Scarborough, in chillingly Orwellian terms, "[o]ur system allows you to clone your best, most reliable people."

First used for engineering and industrial applications during the mid-1980s, neural networks evolved from early artificial intelligence research. Modeled on the function of the human brain, a neural network attempts to imitate human reasoning. Large amounts of data are fed into the network, which looks for relationships and reaches conclusions.

"There are a couple of dangers," states Jai Shekhawat, CEO of Chicago-based Fieldglass Inc., which develops software for managing workers. "Is something a correlation--a predictor--or merely a coincidence? At best, [these methods] are complementary to human judgment, not a substitute for it."

Notwithstanding such dangers, Kronos customers like Kroger are substituting this “coincidence” for human judgment. Based on the prospective employee's answers on the application, the Unicru assessment categorizes the applicant as red, green or yellow. In most cases, red is usually an automatic discard, or, as Dr. Scarborough stated “[m]anagers are strongly discouraged from hiring first quartile (“red”) applicants …”

There is no evidence that the Unicru assessment determines whether an employer’s hiring practices are biased or discriminatory. For example, if the Unicru assessment had been utilized fifty years ago, many companies’ “best” employees would have the personality traits of white males – persons of color, women and those with disabilities need not have applied.

The Unicru assessment embeds and industrializes existing stigma, bias and discrimination in the hiring process. As stated by Cynthia Dwork and Deirdre K. Mulligan in a recent Stanford Law Review article:

While automated decisionmaking systems “may reduce the impact of biased individuals, they may also normalize the far more massive impacts of system-level biases and blind spots.” Rooting out biases and blind spots in big data depends on our ability to constrain, understand, and test the systems that use such data to shape information, experiences, and opportunities.

As a “blind” tool that “learns” from the employer, the Unicru assessment replicates the existing bias of the employer and applies it on a massive scale. All applicants have their test responses fed through a discriminatory filter that is the Unicru assessment (a filter that is biased both on its own and in conjunction with its “learned” behavior).

Illegal Medical Examination

The ADA prohibits the use of pre-employment medical examinations. At the pre-offer stage, an employer is only entitled to ask about an applicant's ability to perform the essential functions of the job. The ADA's prohibition against pre-employment examinations seeks to ensure that the applicant's disability is not considered prior to the assessment of the applicant's qualifications.

EEOC guidance provides a seven-factor test for analyzing whether a test or procedure qualifies as a “medical examination,” including:

whether the test is designed to reveal an impairment of physical or mental health such as those listed in the Diagnostic and Statistical Manual of Mental Disorders (“DSM”); and
whether the test is interpreted by a health care professional.

According to the guidance, the presence of any one of the seven factors is enough to support a finding that the test is a medical examination and the Unicru assessment meets the two factors listed above.

Since the Unicru assessment is based on the five-factor model (FFM) of personality it meets the first factor listed above. As set out in previous posts - ADA, FFM and DSM and Employment Assessments are Designed to Reveal an Impairment - assessments based on the FFM are designed to reveal an impairment of mental health, such as those listed in the DSM.

As to the second factor, whether the test is interpreted by a health care professional, the individuals who developed the Unicru assessment are psychologists, most of whom are members of the APA. In developing the Assessments, the psychologists establish the rules by which the assessments are to be interpreted (i.e., how the responses to the questions are to be scored, including whether the applicant receives a green, yellow or red rating).

According to the APA Model Act for State Licensure of Psychologists, “[t]he practice of psychology includes … (a) psychological testing and the evaluation or assessment of personal characteristics, such as intelligence; personality; cognitive, physical, and/or emotional abilities; … [and] (f) provision of direct services to … groups for the purpose of enhancing … organizational effectiveness, using psychological principles, methods, and/or procedures … for making decisions about the individual, such as selection …”

EEOC guidance states that psychologists are among the “variety of health professionals [that] may provide documentation regarding psychiatric disabilities” for ADA purposes. Accordingly, the psychologists who developed the Unicru assessment are "health care providers" for purposes of the ADA.

The CVS Example

In July 2011, CVS and the Rhode Island Civil Liberties Union (ACLU) entered into a voluntary settlement addressing the ACLU’s complaint challenging CVS’s use of a pre-hire questionnaire that the ACLU claimed could have a discriminatory impact on people with certain mental impairments or disorders.

The CVS questionnaire contained statements to which applicants were required to respond, including: “You change from happy to sad without any reason,” “You get angry more often than nervous,” “Your moods are steady from day to day,” and “There’s no use having close friends; they always let you down.”

Responding to a complaint filed by the ACLU, the Rhode Island Commission for Human Rights had issued a finding in February 2011 that there was "probable cause" to believe that the questionnaire used by CVS violated state anti-discrimination laws that bar employers from eliciting information that pertain to job applicants' mental or physical disabilities.

Although employers may legally ask questions designed to help determine an applicant’s personality or aptitude for a job, the ACLU’s complaint argued that questions found in the CVS pre-offer assessment “could have the effect of discriminating against applicants with certain mental impairments or disorders, and go beyond merely measuring general personality traits.”

Pursuant to the settlement agreement, CVS agreed to permanently remove the questions at issue from its online application.

Systemic Risk to Employers

The success of workforce science companies in developing employment personality and assessment tests over the past twenty years has created "systemic risk" for their employer customers. If one employer has violated the law and subjected itself to significant liability as a consequence of its use of an assessment provided by a workforce science company, then all customers of that company are similarly at risk. Workforce science companies provide their services to thousands of employers, including many of the largest employers in the U.S.

The lack of diversity in the psychological model underlying many of the personality tests offered by workforce science companies (the five-factor model of personality or Big Five) also means that if one workforce science company's personality tests that use the Big Five is found to be an illegal medical examination under the Americans with Disabilities Act (ADA), all workforce science companies that use the Big Five (and, more importantly, their customers) are similarly at risk.

There are multiple risks to employers arising from the use of personality tests and workforce assessments, including:

Claims under the ADA and the Rehabilitation Act of 1973 that the personality tests are illegal medical examinations or that they illegally screen out persons with mental illness (as set out above);
Claims under the ADA and the Rehabilitation Act of 1973 that the employer fails to select and administer the assessment in the most effective manner to ensure that the assessment results accurately reflect the skills, aptitude or whatever other factor that the assessment purports to measure, rather than reflecting an applicant’s impairment;
Claims that employers and workforce assessment companies fail to properly safeguard confidential medical information obtained from the personality tests and illegally use that confidential medical information in violation of the ADA; and
Claims under Title VII of the Civil Rights Act that the workforce analytics cause there to be a disparate impact on the hiring of blacks and Hispanics.

As to the potential size of the plaintiff classes for the claims listed above, they range from a percentage of all applicants (in the case of claims that the tests illegally screen out persons with mental illness and claims of disparate impact under Title VII) to all applicants over the past 12 months (in the case of claims that the personality test is an illegal medical examination) to all applicants, employees and ex-employees over a longer period of time (in the case of claims that employers and workforce assessment companies failed to safeguard confidential medical information).

For some employers, the potential class size can be measured in the millions of plaintiffs. Consistent with the 2011 Supreme Court decision in Wal-Mart Stores, Inc. v. Dukes, plaintiffs in a class action suit predicated on the use of personality tests and workforce analytics will be challenging a uniform, company-wide practice. The uniform use of testing by an employer demonstrates that "there are questions of law or fact common to the class," or commonality, as required by the rules governing class actions.

Illusory Indemnification?

A key element in continuing to use Kronos assessment services is Kronos' ability to indemnify its employer customers. As noted above, the success of workforce assessment companies in marketing personality tests and workforce analytics over the past twenty years has created "systemic risk" for its customers. If one employer has violated the law and subjected itself to significant liability as a consequence of its use of a solution provided by a workforce assessment company, then all customers of that workforce assessment company are similarly at risk.

Even assuming workforce assessment companies are willing to provide indemnification to all customers, those employers need to independently assess whether the workforce assessment companies and their insurers have adequate resources to indemnify all customers.

As Kenexa, an employment assessment company, consistently noted in its annual 10-K risk factor disclosures prior to its December 2012 acquisition by IBM:

The failure of our solutions to comply with employment laws may require us to indemnify our customers, which may harm our business. Some of our customer contracts contain indemnification provisions that require us to indemnify our customers against claims of non-compliance with employment laws related to hiring. To the extent these claims are successful and exceed our insurance coverages, these obligations would have a negative impact on our cash flow, results of operation and financial condition.

Similarly, customers of Kronos might be concerned about Kronos' ability to fulfill its indemnification obligations. As noted above, Kronos' current owners have paid themselves significant dividends during their ownership tenure, including causing the company to borrow to pay a $490 million dividend earlier this year.

The current owners of Kronos are also delaying substantive interaction with the EEOC in connection with its systemic investigation of Kronos customers, including the more than five years of litigation over the EEOC's information requests, while at the same time looking to sell Kronos. It may be possible that Hellman & Friedman LLC and JMI Equity end up with more than $5 billion from a $752 million investment, while leaving the new owner with the contingent indemnification liabilities. Kronos, under the new owner, may not have sufficient resources to cover the indemnification claims of its employer customers.

* * * * *

"Better Get While the Gettin's Good," the title of this post, is a lyric from Credence Clearwater Revival's song Up Around the Bend. The song's first verse reads:

There's a place up ahead and I'm goin'
Just as fast as my feet can fly
Come away, come away if you're goin',
Leave the sinkin' ship behind.

The question is whether the owners of Kronos Inc. are trying to get while the gettin's good by selling the company and leaving that sinking ship behind?

Friday, November 29, 2013

Employment Testing: Hot Button Issue for EEOC and OFCCP

On October 30, 2013, the U.S. Department of Labor announced that federal construction contractor M.C. Dean Inc. had settled allegations that it failed to provide equal employment opportunity to 381 African American, Hispanic and Asian American workers who applied for jobs at the company's Dulles headquarters. A review by the department's Office of Federal Contract Compliance Programs determined that the contractor used a set of selection procedures, including invalid tests, which unfairly kept qualified minority candidates from securing jobs as apprentices and electricians.

"Our nation was built on the principles of fair play and equal opportunity, and artificial barriers that keep workers from securing good jobs violate those principles," said OFCCP Director Patricia A. Shiu. "I am pleased that this settlement will provide remedies to the affected workers and that M.C. Dean has agreed to invest significant resources to improve its hiring practices so that this never happens again."

Under the terms of the agreement, M.C. Dean will pay $875,000 in back wages and interest to 272 African American, 98 Hispanic and 11 Asian American job applicants who were denied employment in 2010. The contractor will also extend 39 job offers to the class members as opportunities become available. Additionally, M.C. Dean has agreed to undertake extensive self-monitoring measures and personnel training to ensure that all of its employment practices fully comply with Executive Order 11246, which prohibits federal contractors and subcontractors from discriminating in employment on the bases of race, color and national origin.

As stated in the Affirmative Action & OFCCP Law Advisor:

This settlement provides (at least) two lessons to all federal contractors. First, the OFCCP is digging deeper than just the overall applicant-to-hire adverse impact analyses. Where there is overall applicant-to-hire adverse impact in the hiring process, the Agency will analyze each stage (screen, test, interview, offer, etc.) in the hiring processes for adverse impact. Second, where there is adverse impact at the testing stage, employers must evaluate the validity of their “tests.” In these cases, OFCCP will request and send the validation materials to its Industrial-Organization Psychologist for review, so it must be able to withstand scrutiny, including whether the test has been (i) validated recently, (ii) validated for the employer’s specific position, and (iii) that there are not less discriminatory methods for achieving the same predictive results of job performance.

In particular, employers who are using employment tests that have never been validated, have not been validated for the specific position for which they are being used, have not been validated for their specific company, have not been reviewed by someone other than the testing vendor who created the test, or have not been revalidated as the position changed over time may not realize they may be “at risk” in these audits.

In short, own each step of your hiring process – even if a third-party testing vendor created and/or administers your test, the employer will be held accountable if the test causes adverse impact and is not properly validated. Employers need to get in front of these testing issues by analyzing the test’s potential adverse impact and existing validation to minimize exposure during audits. Notably, this has also become a “hot button” for EEOC, so taking a close look at your tests can help minimize exposure to both OFCCP and EEOC claims.

Wednesday, November 27, 2013

On Not Dying Young: Fatal Illness or Flawed Algorithm?

Lukas F. Hartman, in a November 26, 2013 posting titled "Why 23andMe has the FDA worried: It wrongly told me I might die young," demonstrates the need for skepticism and oversight of many algorithmic-based decision models.

23andMe is one of many companies to offer at-home genetic testing; in September it reported that its database had reached 400,000 people. Scientists have raised questions about the accuracy of the tests, and in May 2011 a Dutch study claimed the tests were inaccurate and offered little to no benefit to consumers. 23andMe’s $99 Saliva Collection Kit and Personal Genome Service (PGS) claims to test saliva, to provide data that shows users how their genetics may impact their health and explores their personal ancestry. The company is backed by Google.

The US Food and Drug Administration (FDA) recently ordered 23andMe to “immediately discontinue” the marketing of a genetic screening service, after the company failed to send the agency information that supports its marketing claims. “FDA is concerned about the public health consequences of inaccurate results from the PGS device; the main purpose of compliance with FDA’s regulatory requirements is to ensure that the tests work,” Gutierrez wrote in the letter, which was dated 22 November 2013 and addressed to 23andMe co-founder Anne Wojcicki.

An Unwelcome Surprise

Mr. Hartman signed up for 23andme in November 2010. He sent them his saliva and received a web login to his genome in return.

23andMe extract a sort of gene soup from a person's saliva and pour it on a DNA microarray chip made by a company called Illumina. These chips are covered with thousands of little testing probes. A probe is made up of a lump of molecules to which the matching pieces of my DNA naturally attach. These molecules are designed so that they light up when a match occurs. Hundreds of thousands of chemical tests run in parallel on the chip. The result is an image that is scanned by a computer and compared to a database of so called SNPs, “snips.” According to Wikipedia, these “single nucleotide polymorphisms” make up about 90% of all genetic variation in the human genome. So when 23andMe detects a SNP variation in a person's genome it means that in a base pair of that person's DNA there is a difference from the so-called “reference genome.”

To sum it up, 23andMe compares hundreds of thousands of scanned SNPs to its database which is constantly updated in response to new scientific studies and sources. The website then shows you nicely designed, ready to ingest interpretations of your genetic variations manifesting in health risks. Every time they have new updates for “Health Risks” or “Inherited Conditions,” you’ll receive an email.

Everything went well for a long time. There were no special surprises. But some weeks ago there was, suddenly, an unnerving update in Mr. Hartman's inherited conditions report. He clicked the link and a warning appeared. You have to specifically agree if you want to know the result of potentially unnerving, life changing results. He clicked OK and was forwarded to the result. It said:

Has two mutations linked to limb-girdle muscular dystrophy. A person with two of these mutations typically has limb-girdle muscular dystrophy.

Mr. Hartman let that sink in for a moment. He had never heard of this illness before. “Some people with limb-girdle muscular dystrophy lose the ability to walk and suffer from serious disability,” said the page, showing Mr. Hartman an image of a smiling physical therapist treating a smiling patient. What 23andMe didn’t spell out—but Wikipedia did—was that LGMD potentially ends with death.

Coding Error or Genetic Condition?

Mr. Hartman downloaded my 23andMe data and poked at it with a text editor. He read cryptic articles about genetic engineering and installed a genome analysis tool, “Promethease,” which can import, amongst other formats, 23andMe raw data; but in contrast to 23andMe it tells you even the very unnerving stuff. Someone had found a bug in Mr. Hartman and he tried to reproduce it.

Technically speaking, 23andMe detected two SNP variations in Mr. Hartman's genome called rs28933693 and rs28937900. So he attempted finding out more about these mutations. When you look up “rs28933693” in SNPedia, a kind of Wikipedia for SNPs, you’ll find a link to an entry in OMIM (Online Mendelian Inheritance in Man). The entry features medical study excerpts concerning some LGMD patients that all had the same so called homozygous mutation in a certain gene location.

To understand the meaning of this you have to recall that humans are diploid organisms: We have two copies of each chromosome, one inherited from the mother, another from the father. A heterozygous mutation only affects one of the two copies, a homozygous mutation means that the same location of both copies differs in the same way.

Diploid is a good thing; it means that we potentially have a backup of every critical function of our body. So if a piece of my DNA encodes a critical enzyme and this code is “broken” on one of the chromosome copies, it could well be intact on the other. If you’re out of luck and both of your parents are “carriers” of exactly the same mutation, the inherited condition may manifest in you. This was the case with the LGMD patients mentioned in the study Mr. Hartman stumbled upon. Both of their copies of the respective chromosome region are mutated in the same (homozygous) way, which triggers the muscular dystrophy. This very rarely happens, but it happens.

After researching tensely for some hours, Mr. Hartman looked closer into the data that 23andMe provided as a download. Yes, he really had two mutations. But they weren’t on the same gene, but on two different genes. By rare chance, both of these mutations are statistically linked to LGMD, but to two different versions of LGMD. So he didn’t have a homozygous mutation, but two unrelated heterozygous ones. The web programmers at 23andme had added those two mutations together into one homozygous mutation in their code. And so the algorithm switched to red alert.

Mr. Hartman sent a support request to 23andMe including his research and conclusions (this would be called a “bug report” in software engineering). After a few days of waiting, 23andMe confirmed the bug and apologized. So the bug was not inside of of Mr. Hartman, but in the algorithm. An algorithm can be fixed easily, unlike someone's genetic code.

False Positives, False Negatives and the Risks of Automation Bias

Human judgment is subject to an automation bias which, as discussed in a 2010 law review article, fosters a tendency to disregard or not search for contradictory information insight of a computer-generated solution that is accepted as correct. Such bias has been found to be most pronounced when computer technology fails to flag a problem.

In a study from the medical context, researchers compared the diagnostic accuracy of two groups of experienced mammogram readers (radiologists, radiographers, and breast clinicians)—one aided by a Computer Aided Detection (CAD) program and the other lacking access to the technology. The study revealed that the first group was almost twice as likely to miss signs of cancer if the CAD did not flag the concerning presentation than the second group that did not rely on the program.

The false positive for limb-girdle muscular dystrophy 23andMe emailed Mr. Hartman is clearly problematic, but the risk of a false negative when combined with automation bias is potentially catastrophic.

BRCA1 Gene

Consider, for illustrative purposes, coding errors resulting in false negatives for the breast cancer 1, early onset (BRCA1) gene. BRCA1 is part of a complex that repairs double-strand breaks in DNA. The strands of the DNA double helix are continuously breaking from damage. Sometimes one strand is broken, and sometimes both strands are broken simultaneously. BRCA1 is part of a protein complex that repairs DNA when both strands are broken.

Researchers have identified more than 1,000 mutations in the BRCA1 gene, many of which are associated with an increased risk of cancer. Researchers believe that the defective BRCA1 protein is unable to help fix DNA damages leading to mutations in other genes. These mutations can accumulate and may allow cells to grow and divide uncontrollably to form a tumor.

Women having inherited a defective BRCA1 gene have risks for breast and ovarian cancer that are so high and seem so selective that many woman with BRCA1 mutations choose to have prophylactic surgery. Why? Bilateral prophylactic mastectomy has been shown to reduce the risk of breast cancer by at least 95 percent in women who have a mutation in the BRCA1 gene. A woman receiving a false negative for a BRCA1 mutation would not consider prophylactic surgery. Why should she, she has no BRCA1 mutation?

A false negative creates a false sense of security and restricts a woman's right to choose. To choose whether to have prophylactic surgery; to choose to have more intense monitoring; to choose alternative therapies; to choose life.

* * * * *

The potential and pitfalls of an increasingly algorithmic world beg the question of whether legal and policy changes are needed to regulate our changing environment. Should we regulate, or further regulate, algorithms in certain contexts? What would such regulation look like? Is it even possible? What ill effects might regulation itself cause? Given the ubiquity of algorithms, do they, in a sense, regulate us?

Monday, November 18, 2013

Do We Regulate Algorithms, or Do Algorithms Regulate Us?

The genesis for this posting comes from the following articles:

"Consumer Protection in Cyberspace" by Oscar H. Gandy, Jr.;
"Governing Algorithms: A Provocation Piece" by Solon Barocas, Sophie Hood and Malte Ziewitz;
"It's Not Privacy, and It's Not Fair" by Cynthia Dwork and Deirdre K. Mulligan; and
"The Relevance of Algorithms" by Tarleton Gillespie.

This posting includes portions of the articles and modifies them to address issues relating to big data and the use of algorithmic decisionmaking in the area of pre-employment assessments and workforce optimization.

Embedding Bias

Every step in the big data pipeline raises concerns: the privacy implications of amassing, connecting, and using personal information, the implicit and explicit biases embedded in both datasets and algorithms, and the individual and societal consequences of the resulting classifications and segmentation.

While many companies and government agencies foster an illusion that classification is (or should be) an area of absolute algorithmic rule—that decisions are neutral, organic, and even automatically rendered without human intervention—reality is a far messier mix of technical and human curating. Data isn't something that's abstract and value-neutral. Data only exists when it's collected, and collecting data is a human activity. And in turn, the act of collecting and analyzing data changes (one could even say "interprets") us.

Both the datasets and the algorithms reflect choices, among others, about data, connections, inferences, interpretation, and thresholds for inclusion that advance a specific purpose. Like maps that represent the physical environment in varied ways to serve different needs—mountaineering, sightseeing, or shopping—classification systems are neither neutral nor objective, but are biased toward their purposes. They reflect the explicit and implicit values of their designers.Assumptions are embedded in a data model upon its creation. Data sources are shaped through ‘washing’, integration, and algorithmic calculations in order to be commensurate to an acceptable level that allows a data set to be created.

Errors are not only possible, but they are likely to occur at each stage in the process of assessment that proceeds from identification to its conclusion in a discriminatory act. Error is inherent in the nature of the processes through which reality is represented as digitally encoded data. Some of these errors will be random, but most will reflect the biases inherent in the theories, and the goals, the instruments and the institutions that govern the collections of data in the first place.

Clear Windshield or Rearview Mirror?

The decisions made by the users of sophisticated analytics determine the provision, denial, enhancement, or restriction of the opportunities that citizens and consumers face both inside and outside formal markets.

Algorithms embody a profound deference to precedent; they draw on the past to act on (and enact) the future. The apparent omniscience of big data may in truth be nothing more than misdirection. Instead of offering a clear windshield, the big data phenomenon may be more like a big rear-view mirror telling us nothing about the future.

Does this deference to precedent result in a self-reinforcing and self-perpetuating system, where individuals are forever burdened by a history that they are encouraged to repeat and from which they are unable to escape? Does deference to past patterns augment path dependence, reduce individual choice, and result in cumulative disadvantage?

Already burdened segments of the population can become further victimized through the use of sophisticated algorithms in support of the identification, classification, segmentation, and targeting of individuals as members of analytically constructed groups. In creating these groups, the algorithms rely upon generalizations that lead to viewing people as members of populations, or categories, or groups, rather than as individuals (i.e., persons who live more than X miles from a jobsite).

Shrouding Opacity In The Guise of Legitimacy

Workforce analytic systems, designed in part to mitigate risks for employers, have now become sources of material risk. The systems create the perception of stability through probabilistic reasoning and the experience of accuracy, reliability, and comprehensiveness through automation and presentation. But in so doing, technology systems draw organizational attention away from uncertainty and partiality. They embed, and then justify, self-interested assumptions and hypotheses.

Moreover, they shroud opacity—and the challenges for oversight that opacity presents—in the guise of legitimacy, providing the allure of shortcuts and safe harbors for actors both challenged by resource constraints and desperate for acceptable means to demonstrate compliance with legal mandates and market expectations.

The technical language of workforce analytic systems obscures the accountability of the decisions they channel. Programming and mathematical idiom can shield layers of embedded assumptions from high-level firm decisionmakers charged with meaningful oversight and can mask important concerns with a veneer of transparency. This problem is compounded in the case of regulators outside the firm, who frequently lack the resources or vantage to peer inside buried decision processes and must instead rely on the resulting conclusions about risks and safeguards offered them by the parties they regulate.

Do We Regulate Algorithms, or Do Algorithms Regulate Us?

Can an algorithm be agnostic? Algorithms may be rule-based mechanisms that fulfill requests, but they are also governing agents that are choosing between competing, and sometimes conflicting, data objects.

We regulate markets, and market behavior, out of concerns for equity, as well as out of concern for efficiency. The fact that the impacts of design flaws are inequitably distributed is at least one basis for justifying regulatory intervention.

The regulatory challenge is to find ways to internalize the many external costs generated by the rapidly expanding use of analytics. That is, to find ways to force the providers and users of discriminatory technologies to pay the full social costs of their use. Requirements to warn, or otherwise inform users and their customers about the risks associated with the use of these systems should not absolve system producers of their own responsibility for reducing or mitigating the harms. This is part of imposing economic burdens or using incentives as tools to shape behavior most efficiently and effectively.