Employment Testing: Failing to Make the Grade: July 2013

Monday, July 29, 2013

Workforce Science: A Critical Look at Big Data and the Selection and Management of Employees

This post takes a critical look at the application of "Big Data" to company human resources; specifically, the selection and management of employees.

The post focuses on one company, Evolv, that describes itself as "a workforce science software company that harnesses big data, predictive analytics and cloud computing to help businesses improve workplace productivity and profitability." Evolv was selected not because it is sui generis; rather, it is emblematic of numerous companies, from start-ups to well-established companies that market "workforce science" to employers.

According to Evolv, workforce science:

[I]dentifies the characteristics of the most qualified, productive employees within an hourly workforce throughout the employee lifecycle. By using objective, data-driven methodologies and machine learning, Evolv enables operational and financial executives to make better business decisions that result in millions of dollars in savings each year.

The Evolv graphic below is intended to illustrate the process of workforce science.

The steps listed in this flowchart serve as the titles of the sub-headings below.

New and Existing Data:

Companies capture and store workforce data.

Questions arise concerning the nature of this data, its accuracy and usefulness. Most companies have vast amounts of HR data (employee demographics, performance ratings, talent mobility data, training completed, age, academic history, etc.) but they are in no position to use it. According to Bersin by Deloitte, an HR research and consultancy organization, only 20% of the companies believe that the data they capture now (let alone historically) is highly credible and reliable for decision-making in their own organization.

The complexity of working with myriad data types and myriad, often incompatible, systems was underscored by Dat Tran of the U.S. Department of Veterans Affairs at the 2013 MIT Chief Data Officer and Information Quality Symposium. "The VA does not have an integrated data environment; we have myriad systems and databases, and enterprise data standards do not exist. There is no 360-degree view of the customer," Tran said in a discussion of the obstacles facing an agency dealing with 11 petabytes of data and 6.3 million patients.

"Bad data" is data that has not been collected accurately or consistently or the data has been defined differently from person to person, group to group and company to company. In the recruitment and hiring context, unproctored online tests allow an applicant to take the test anywhere and anytime. That freedom creates conditions ripe for obtaining "bad' data. As stated by Jim Beaty, PhD, and Chief Science Officer at testing company Previsor:

Applicants who want to cheat on the test can employ a number of strategies to beat the test, including logging in for the test multiple times to practice or get the answers, colluding with another persons while completing the test, or hiring a test proxy to take the test.

And what about the accuracy of tests responses from those who are hired? Analyzing a sample of over 31,000 employees, Evolv found that employees who said they were most likely to follow the rules left the job on average 10% earlier, were 3% less likely to close a sale and were actually not particularly good at following rules.

Decision-makers increasingly face computer-generated information and analyses that could be collected and analyzed in no other way. Precisely for that reason, going behind that output is out of the question, even if one has good cause to be suspicious. In short, the computer analysis becomes a credible reference point although based on poor data.

Supplement

Psychometric tools gather predictive data on the workforce throughout the employment lifecycle

Psychometrics is the field of study concerned with the theory and technique of psychological measurement, which includes the measurement of knowledge, abilities, attitudes, personality traits, and educational measurement. The field is primarily concerned with the construction and validation of measurement instruments such as questionnaires, tests, and personality assessments.

To what extent are psychometric tools accurate predictors of behavior or performance?

According to a 2012 study by Oracle and Development Dimensions International (DDI), a global human resources consulting firm whose expertise includes designing and implementing selection systems, more than 250 staffing directors and over 2,000 new hires from 28 countries provided the following perspectives on their organization’s selection processes (the following are excerpts from the study):

[O]nly 41 percent of staffing directors report that their pre-employment assessments are able to predict better hires.
Only half of staffing directors rate their systems as effective, and even fewer view them as aligned, objective, flexible, efficient, or integrated.
[T]he actual process for making a hiring decision is less effective than a coin toss.

In a 2007 article titled, “Reconsidering the Use of Personality Tests in Employment Contexts”, co-authored by six current or former editors of academic psychological journals, Dr. Kevin Murphy, Professor of Psychology at Pennsylvania State University and Editor of the Journal of Applied Psychology (1996-2002), states:

The problem with personality tests is … that the validity of personality measures as predictors of job performance is often disappointingly low. A couple of years ago, I heard a SIOP talk by Murray Barrick … He said, “If you took all the … [factors], measured well, you corrected for everything using the most optimistic corrections you could possibly get, you could account for about 15% of the variance in performance [between projected and actual performance].” … You are saying that if you take normal personality tests, putting everything together in an optimal fashion and being as optimistic as possible, you’ll leave 85% of the variance unaccounted for. The argument for using personality tests to predict performance does not strike me as convincing in the first place.

Cleanse and Upload

Structured and Unstructured Data Is Aggregated

Data isn't something that's abstract and value-neutral. Data only exists when it's collected, and collecting data is a human activity. And in turn, the act of collecting and analyzing data changes (one could even say "interprets") us.

Workforce science requires enormous amounts of historic or legacy data. This data has to be consolidated from a number of disparate source systems within each company, each with their specific data environment and particular brand of business logic. That data consolidation then must be replicated across hundreds or thousands of companies.

Structured data refers to data that is identifiable because it is organized in a structure. The most common form of structured data is a database where specific information is stored based on a methodology of columns and rows (i.e., Excel). Structured data is understood by computers and is also efficiently organized for human readers.

Unstructured data refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner. This results in irregularities and ambiguities that make it difficult to understand using traditional computer programs as compared to structured data.

Unstructured data consists of two basic categories; textual objects (based on written or printed language, such as emails or Word documents); and bitmap objects (non-language based, such as image, video or audio files).

There are many types of techniques that need to be put together in a complex data processing flow utilizing unstructured data. These techniques include:

information extraction (to produce structured records from text or semi-structured data)
cleansing and normalization (to be able to even compare string values of the same type, such as a dollar amount or a job title)
entity resolution (to link records that correspond to the same real-world entity or that are related via some other type of semantic relationship)
mapping (to bring the extracted and linked records to a uniform schematic representation)
data fusion (to merge all the related facts into one integrated, clean object)

Assumptions are embedded in a data model upon its creation. Data sources are shaped through ‘washing’, integration, and algorithmic calculations in order to be commensurate to an acceptable level that allows a data set to be created. By the time the data are ready to be used, they are already ‘at several degrees of remove from the world.’

Data is never raw; it’s always structured according to somebody’s predispositions and values. The end result looks disinterested, but, in reality, there are value choices all the way through, from construction to interpretation.

Analyze and Predict

Data Analyzed Using Machine Learning and Predictive Algorithms

The theory of big data is to have no theory, at least about human nature. One just gathers huge amounts of information, observes the patterns and estimates probabilities about how people will act in the future. One does not address causality.

In linear systems, cause and effect is much easier to pinpoint. However, the world around us is considered a complex system where there are often multiple variables pushing an outcome to occur. Nigel Goldenfeld, a professor of physics at University Illinois, sums it up best: “For every event that occurs, there are a multitude of possible causes, and the extent to which each contributes to the event is not clear.”

Algorithms and big data are powerful tools. Wisely used, they can help match the right people with the right jobs. But they must be designed and used by humans, or they can go very wrong. As David Brooks wrote in the New York Times:

Data creates bigger haystacks. This is a point Nassim Taleb, the author of “Antifragile,” has made. As we acquire more data, we have the ability to find many, many more statistically significant correlations. Most of these correlations are spurious and deceive us when we’re trying to understand a situation. Falsity grows exponentially the more data we collect.

There’s a saying in artificial intelligence circles that techniques like machine learning can very quickly get you 80% of the way to solving just about any (real world) problem, but going beyond 80% is extremely hard, maybe even impossible. The Netflix Challenge is a case in point: hundreds of the best researchers in the world worked on the problem for 2 years and the winning team got a 10% improvement over Netflix’s in-house algorithm.

A corollary of the above saying is that it is very rare for startup companies to ever have a competitive advantage because of their machine learning algorithms. If a worldwide concerted effort can only improve Netflix’s algorithm by 10%, how likely are 4 people in an R&D department in a startup going to have a significant breakthrough. Modern machine algorithms are the product of thousands of academics and billions of dollars of R&D and are generally only improved upon at the margins by individual companies.

Some of the best and brightest organizations have recognized that improvement, if any, in machine learning comes from outside the organization. Facebook, Ford, GE and other companies have run contests for data-science challenges on Kaggle, while NASA and other government agencies, as well as the Harvard Business School, have taken the crowdsource route on Topcoder.

Abhishek Shivkumar of IBM Watson Labs has listed the top ten problems for machine learning in 2013. These problems include churn prediction, truth and veracity, scalability and intelligent learning. This doesn’t mean machine learning isn’t ever useful – it just means one needs to apply it to contexts that are fault tolerant: for example, online ad targeting, ranking search results, recommendations, and spam filtering. Applying machine learning concepts in the context of persons livelihoods (and, potentially, lives) is problematic, not just for the individual applicant or employee but also for the employer and its potential liability exposure.

Big Data Networking

Results Are Benchmarked Against Big Data Network

According to Evolv, its network

extracts learnings and insight from the millions of real time talent data points – from across the Evolv client base, or Network – streaming in and out of the Evolv platform every single day, week and month.

Some of the real time data points streaming in and out of the Evolv Network are data from the use of pre-employment personality tests administered to applicants and employees. As set out in the posts What Are The Issues?, ADA, FFM and DSM and Employment Assessment Are Designed to Reveal An Impairment, tests utilizing the Five-Factor Model of personality may be considered illegal medical examinations under the Americans with Disabilities Act (ADA).

Consequently, information obtained from those tests is confidential medical information, the use of which is subject to strict limits. Regulations require that confidential information be kept on separate forms and in separate files and it may not be intermingled with other information - i.e., shared with third parties on the Evolv network.

Not only are employers subject to claims by applicants and employees alleging breach of the confidentiality provisions of the ADA, the use of confidential medical information in assessment, hiring and other human resource function may have created a virus that has "infected" a variety of databases, applications and software solutions utilized by the employer.

Costs to employers from the illegal use of confidential medical information include damages payable to applicants and employees, defense transaction costs (i.e., legal fees), and costs to "sanitize" infected databases, applications and software solutions. As set out in the Damages and Indemnification Challenges for Employers post, even if companies like Evolv are willing to provide indemnification to all customers, those customers will have to determine whether the company and its insurers have adequate resources to indemnify all customers.

Evaluate

Analyzed Data Reveals Insights That Drive Workforce Performance and Retention

Act

The Impact of Insights Are Quantified and Used to Inform Decision-making

As noted in a prior post, prejudice does not rise from malice or hostile animus alone. It may result as well from insensitivity caused by simple want of careful, rational reflection.

For example, take two insights from Evolv:

Living in close proximity to the job site and having access to reliable transportation—are correlated with reduced attrition and better performance; and
Referred employees have 10% longer tenure than non-referred employees and demonstrate approximately equal performance.

An employer confronted with these two insights might well determine that (i) applicants living beyond a certain distance from the job site (i.e., retail store) should be excluded from employment consideration and (ii) preference in hiring should be extended to applicants referred by existing employees.

Painting with the broad brush of distance from job site will result in well-qualified applicants being excluded, applicants who might have ended up being among the longest tenured of employees. Remember that the Evolv insight is a generalized correlation (i.e., persons living closer to the job site tend to have longer tenure than persons living farther from the job site). The insight says nothing about any particular applicant.

As a consequence, employers will pass over qualified applicants solely because they live (or don't live) in certain areas. Not only does the employer do a disservice to itself and the applicant, they increase the risk of employment litigation, with its consequent costs. How?

A recent New York Time article, "In Climbing Income Ladder, Location Matters," reads, in part:

Her nearly four-hour round-trip [job commute] stems largely from the economic geography of Atlanta, which is one of America’s most affluent metropolitan areas yet also one of the most physically divided by income. The low-income neighborhoods here often stretch for miles, with rows of houses and low-slung apartments, interrupted by the occasional strip mall, and lacking much in the way of good-paying jobs

The dearth of good-paying jobs in low-income neighborhoods means that residents of those neighborhoods have a longer commute. The 2010 Census showed that poverty rates are much higher for blacks and Hispanics. Consequently, hiring decisions predicated on distance, intentionally or not, discriminate against certain races.

Similarly, an employer extending a hiring preference to referrals of existing employees may be further exacerbating the discriminatory impact of its hiring process. Those referrals tend to be persons from the same neighborhoods and socioeconomic backgrounds of existing employees, meaning that workforce diversity, broadly considered, will decline.

With the huge amounts of "bad" data that get generated and stored daily, the failure to understand how to leverage the data in a practical way that has business benefit will increasingly lead to shaky insights and faulty decision-making, with significant costs to applicants, employees , employers and society.

Optimize

Closed-Loop Optimization Constantly Analyzes and Refines Insights

According to Evolv, "closed-loop optimization is the process of using Big Data analytics to determine the outcomes of the assessments and other data collected, and then using the knowledge gained to make ever more effective assessments." Click on this link for an Evolv video that describes the closed-loop optimization process.

The challenge in using a closed-loop optimization process for hiring and employment decisions is that those decisions do not fit within a closed loop. Take for example the Evolv insight that living in close proximity to the job site are correlated with reduced attrition and better performance. Over time, the closed-loop optimization process for that insight means that a growing percentage of the workforce lives in close proximity to the job site. Excellent. Less attrition and better performance across jobsite.

That closed loop, however, does not account for factors like the element of time and the relative immobility of persons and companies. Businesses tend to be clustered; they are not evenly spread throughout the geography. If all businesses in a particular area focus on hiring applicants in close proximity, costs will increase (greater demand for the same number of applicants), employee turnover will increase (since the number of geographically-proximate employees changes slowly) and profitability will decrease (higher wage costs combined with greater turnover).

When two variables, A and B, are found to be correlated, there are several possibilities:

A causes B
B causes A
A causes B at the same time as B causes A (a self-reinforcing system)
Some third factor causes both A and B

The correlation is simple coincidence. It is wrong to assume any of these possibilities. Evolv, however, assumes that A (proximity to job site) causes B (reduced attrition and better performance). Therefore, employers should hire applicants who live closer to the job site.

The correlation could also demonstrate B (reduced attrition and better performance) is caused by C (proximity of job site to applicants homes). Instead of being a hiring insight, the correlation might function better as being a job site location insight. Given the relative immobility of persons and companies, locating a job site (call center, etc.) close to communities with high numbers of lower-income persons could lead to a more sustainable competitive advantage.

As David Brooks wrote, "Data struggles with context. Human decisions are not discrete events. They are embedded in sequences and contexts. ... Data analysis is pretty bad at narrative and emergent thinking, and it cannot match the explanatory suppleness of even a mediocre novel."

Executives and managers frequently hear about some new software billed as the “next big thing.” They call the software provider and say, “We heard you have a great tool and we’d like a demonstration.” The software is certainly seductive with its bells and whistles, but its effectiveness and usefulness depend upon the validity of the information going in and how the people actually work with it over time. Having a tool is great, but remember that a fool with a tool is still a fool (and sometimes a dangerous fool).

Wednesday, July 24, 2013

Big Data and Employment Testing: Correlation Is Not Causation

Imagine you are watching at a railway station. More and more people arrive until the platform is crowded, and then — hey presto — along comes a train. Did the people cause the train to arrive (A causes B)? Did the train cause the people to arrive (B causes A)? No, they both depended on a railway timetable (C caused both A and B).

Some 60% of American workers earn hourly wages. Of these, about half change jobs each year. So firms that employ lots of unskilled workers, such as supermarkets, home improvement stores and fast-food chains, have to vet million of applications every year. Making the process more efficient could yield big payoffs.

Algorithms and big data are powerful tools. Wisely used, they can help match the right people with the right jobs. But they must be designed and used by humans, otherwise they can go terribly wrong.

Big Data

Big Data is a tool in the right hands which can yield insight, help determine paths and alternatives which are more likely to be successful, and lead to improved conditions. But Big Data is just one of a set of tools which can be used to develop successful paths and alternatives. It is best when it is used in conjunction with other tools: intuition, inductive reasoning, statistical analysis to name a few.

One of the reasons why Big Data is in the forefront today is with the advent of new tools, very large data flows, and advanced computing techniques there are real opportunities to manage and use huge data sets.

The theory of big data is to have no theory, at least about human nature. One just gathers huge amounts of information, observes the patterns and estimates probabilities about how people will act in the future. One does not address causality.

As the authors of Big Data state, “Contrary to conventional wisdom, such human intuiting of causality does not deepen our understanding of the world." Instead, they aim to stand back nonjudgmentally and observe linkages: “Correlations are powerful not only because they offer insights, but also because the insights they offer are relatively clear. These insights often get obscured when we bring causality back into the picture.”

But are correlations relatively clear? The authors of Freaknomics discuss correlation and causation in the video below; specifically, the view of medical professionals in the first half of the 20th century that polio was caused by ice cream consumption (since disproved).

When two variables, A and B, are found to be correlated, there are several possibilities:

A causes B
B causes A
A causes B at the same time as B causes A (a self-reinforcing system)
Some third factor causes both A and B

The correlation is simple coincidence.

It’s wrong to assume any one of these possibilities. Correlation is a (perhaps strong) hint that there may be a relationship, but identifying the exact nature of that relationship requires more - i.e., a controlled experiments or proper statistical analysis. One needs to examine all of the variables that may influence the relationship and look for evidence supporting or rejecting the influence of each. One also needs to find a mechanism that explains any causal relationship.

Bad Data

"Bad data" is data that has not been collected accurately or consistently or the data has been defined differently from person to person, group to group and company to company. The huge amount of "bad" data that is regularly served up for analysis may make it irresponsible for one to just "stand back nonjudgmentally and observe linkages," especially in the pre-employment testing process.

In the recruitment and hiring context, unproctored online personality tests are used to allow an applicant to take the test anywhere and anytime. That freedom creates conditions ripe for obtaining "bad' data. As stated by Jim Beaty, PhD, and Chief Science Officer at testing company Previsor:

Applicants who want to cheat on the test can employ a number of strategies to beat the test, including logging in for the test multiple times to practice or get the answers, colluding with another persons while completing the test, or hiring a test proxy to take the test.

And what about the accuracy of tests responses from those who are hired? Analyzing a sample of over 31,000 employees, a data analytic company's researchers found that employees who said they were most likely to follow the rules left the job on average 10% earlier, were 3% less likely to close a sale and were actually not particularly good at following rules.

Bad data exposes a vexing problem for employers. Applicants and employees seek to tell employers what they believe employers want to hear, and employers tend to ask questions that lead applicants employees to answer these questions in the “right” way.

A Simple Want of Careful, Rational Reflection

As noted in a prior post, prejudice rises not from malice or hostile animus alone. It may result as well from insensitivity caused by simple want of careful, rational reflection.

For example, take two insights from Evolv, a data analytics company:

Living in close proximity to the job site and having access to reliable transportation—are correlated with reduced attrition and better performance; and
Referred employees have 10% longer tenure than non-referred employees and demonstrate approximately equal performance.

An employer confronted with these two insights might well determine that (i) applicants living beyond a certain distance from the job site (i.e., retail store) should be excluded from employment consideration and (ii) preference in hiring should be extended to applicants referred by existing employees. Such a determination may end up being penny wise and pound foolish.

Painting with the broad brush of distance from job site will result in well-qualified applicants being excluded, applicants who might have ended up being among the longest tenured of employees. Remember that the Evolv insight is a generalized correlation (i.e., the pool of persons living closer to the job site tend to have longer tenure than the pool of persons living farther from the job site). The insight says nothing about any particular applicant or employee.

As a consequence, employers will pass over qualified applicants solely because they live (or don't live) in certain areas. Not only does the employer do a disservice to itself and the applicant, they increase the risk of employment litigation, with its consequent costs (attorney fees, damages, reputational harm, etc.). How?

A recent New York Time article, "In Climbing Income Ladder, Location Matters," reads, in part:

Her nearly four-hour round-trip [job commute] stems largely from the economic geography of Atlanta, which is one of America’s most affluent metropolitan areas yet also one of the most physically divided by income. The low-income neighborhoods here often stretch for miles, with rows of houses and low-slung apartments, interrupted by the occasional strip mall, and lacking much in the way of good-paying jobs

The dearth of good-paying jobs in low-income neighborhoods means that residents of those neighborhoods have a longer commute. As to the demographic makeup of low-income families, the 2010 Census showed that poverty rates are much higher for blacks and Hispanics. Consequently, hiring decisions predicated on distance, intentionally or not, discriminate against certain protected classes.

Similarly, an employer extending a hiring preference to referrals of existing employees may be further exacerbating the discriminatory impact of its hiring process. Those referrals tend to be persons from the same neighborhoods and socioeconomic backgrounds of existing employees, meaning that workforce diversity, broadly considered, will decline.

A Fool With A Tool Is Still A Fool

Most companies have vast amounts of HR data (employee demographics, performance ratings, talent mobility data, training completed, age, academic history, etc.) but they are in no position to use it. According to Bersin by Deloitte, an HR research and consultancy organization, only 20% believe that the data they capture now is highly credible and reliable for decision-making in their own organization.

Research shows that the average large company has more than 10 different HR applications and their core HR system is over 6 years old. So it will take significant effort and resources (read funding) to bring this data together and make sense of it.

As stated by Jim Stikeleather on the Harvard Business Review blog:

Machines don't make the essential and important connections among data and they don't create information. Humans do. Tools have the power to make work easier and solve problems. A tool is an enabler, facilitator, accelerator and magnifier of human capability, not its replacement or surrogate. That's what the software architect Grady Booch had in mind when he uttered that famous phrase: "A fool with a tool is still a fool."

Understand that expertise is more important than the tool. Otherwise the tool will be used incorrectly and generate nonsense (logical, properly processed nonsense, but nonsense nonetheless).

Although data does give rise to information and insight, they are not the same. Data's value to business relies on human intelligence, on how well managers and leaders formulate questions and interpret results. More data doesn't mean you will get "proportionately" more information. In fact, the more data you have, the less information you gain as a proportion of the data (concepts of marginal utility, signal to noise and diminishing returns).

Saturday, July 20, 2013

Pre-Employment Testing: Failing to Make the Grade

What Are the Issues?

When people apply for a job online these days, they are increasingly being asked to take personality tests even before they exchange an e-mail or have a phone interview with a hiring manager. Such tests are being used by companies as a way to prune the job applications they receive.

The problem is that these assessments may also be used to illegally screen out job seekers with mental disabilities.

As a result, too many people living with mental disabilities that are willing and able to work remain unemployed or underemployed. Not only does the United States economy experience the indirect loss in productivity and tax revenue arising from the unemployment and underemployment of persons with mental disabilities, there is a direct, rising and material cost to the U.S. Treasury associated with income support payments, like Social Security Disability Insurance and Supplementary Support Income. (Please see the post Costing Taxpayers Billions of Dollars Each Year).

Who are these persons with mental disabilities?

They are our sons and daughters, our mothers and fathers, our friends and colleagues. Mental illness is no respecter of age, race, gender, faith, sexual orientation, occupation, social position, education or wealth. Anyone can develop a mental illness.

They are:

A soldier returning from Afghanistan looking to enter the civilian workforce, who suffers from PTSD as a result of combat
A mother of a young child looking to support her family, who is recovering from post-partum depression
A recent college graduate looking to start his career, who has been diagnosed with bipolar disorder

They are us. An estimated 26.2% of Americans ages 18 and older - about one in four adults - suffer from a diagnosable mental disorder in a given year.

At some point during his or her lifetime, the average American adult has a 28.8% chance of developing an anxiety disorders, a 24.8% chance of developing an impulse-control order, and a 20.8% chance of developing a mood disorder.

How widespread is pre-employment testing?

There are hundreds of companies that offer pre-employment assessments and/or implementation services to employers. One of the larger assessment companies, Kenexa (recently acquired by IBM) assesses more than 20 million persons a year. Other large assessment companies include Kronos (through its acquisition of Unicru), Oracle (through its acquisition of Taleo), SAP (through its acquisition of Success Factors) and SHL.

A mid-sized company in the retail business may have 50,000 assessments per month or 600,000 per year. Large “big box” employers and fast food companies may have 1-2 million assessments per year.

Personality tests are “growing like wildfire,” said Josh Bersin, president and CEO of Bersin & Associates, an Oakland, Calif., research firm. Bersin estimated that this kind of pre-hire testing has been growing by as much as 20 percent annually in the past few years. Industries that are flooded with resumes such as retail, food service and hospitality are among the ones that use such tests most often, he said.

“A lot of work has been done over the years on how personality tests impact gender, race or age bias, but I don’t know if anyone has done enough research yet on mental disabilities,” says Bersin, “the medical community is starting to redefine what these diagnoses are, and the laws may not have caught up.”

Are pre-employment assessments legal?

The policy of the Equal Employment Opportunity Commission (EEOC) is that pre-employment testing, including personality testing, is acceptable as long as the test is not a "medical examination" as defined by the Americans with Disabilities Act (ADA).

EEOC guidance provides a seven-factor test for analyzing whether a test or procedure qualifies as a “medical examination,” including whether the test is designed to reveal an impairment of physical or mental health such as those listed in the Diagnostic and Statistical Manual of Mental Disorders (DSM).

According to the guidance, the presence of any one of the seven factors is enough to support a finding that the test is a medical examination.

How do the assessments screen out persons with mental disabilities?

A key component of the assessment process is a computer-administered personality test based on the five factor model of personality, or FFM, a coordinate system that maps which personality traits go together in people’s descriptions or ratings of one another. the FFM describes personality in terms of five broad factors:

Openness: inventive and curious vs. consistent and cautious.
Conscientiousness: efficient and organized vs. easy-going and careless.
Extraversion: outgoing and energetic vs. solitary and reserved.
Agreeableness: friendly and compassionate vs. cold and unkind
Neuroticism: sensitive and nervous vs. secure and confident

The majority of personality disorders are characterized by significant positive relations with Neuroticism and significant negative relations with Extraversion, Agreeableness and Conscientiousness. Consequently, applicants who take the Assessment and have low scores on Openness, Conscientiousness, Extraversion and Agreeableness and high scores on Neuroticism are likely not to be offered employment (or even interviewed).

Based on an applicant’s responses to the online test, the assessment categorizes the applicant as red, green or yellow. In many cases, green gets an applicant an automatic follow-up interview. Red is usually an automatic discard. A red or yellow score on the Assessment does not necessarily mean that an applicant has a mental disability, but a person who has a mental disability is likely to receive a red or yellow score on the assessment and will be denied consideration for employment.

Some have argued that assessments are designed to measure “normal” personalities and/or “stable” personality traits. That argument fails because, as a dimensional model, the FFM determines each applicant’s position along the axis of each of the five traits. Those five traits are the common measuring rod for all persons, including persons with mental disabilities. By its design and structure, an assessment based on the FFM measures all aspects of a personality, including both normal and abnormal personality traits.

For more information, please see the ADA, FFM and DSM post.

Are the tests designed to intentionally screen out persons with mental disabilities?

Perhaps.

One assessment company, Clearfit, lists “27.2 days/yr. lost productivity for depressed workers” as one of the impacts to employers if personality is not taken into account in the hiring process.

In any event, courts have held that the intent is irrelevant. What is relevant is that the use of the FFM as the basis of an assessment means that the assessment was designed to reveal mental impairments by identifying and rejecting those applicants who do not fall within the “green” parameters of the five traits.

Are there any studies addressing the impact of assessment tests on persons with mental disabilities?

There has been no public disclosure of any studies addressing the impact of FFM-based assessments on persons with mental disabilities. One prominent assessment company, Kronos, has claimed in a court document, that there is “no known method … to ascertain adverse impact against the entire generic category of disabilities.” Kronos goes on to claim ”that “the diverse nature of disabilities (e.g., blindness, paraplegia, deafness, severe mental illness) makes an analysis of a selection device’s adverse impact on “people who have disabilities” impossible.” Such claims are unsupportable.

As stated in a 2004 article published by the the Journal of Rehabilitation Administration, “studies of sub-groups, such as individuals with mental illnesses or cognitive impairments could be conducted to determine the potential, and perhaps likelihood for, pre-employment test results unfairly penalizing these individuals in the employee selection and hiring stages …”

Such studies are, in fact, mandated. In a 1975 decision, the Supreme Court addressed a case in which an employer implemented a test on the theory that a certain verbal intelligence was called for by the increasing sophistication of the plant’s operations. The Court held that a test should be validated on people as similar as possible to those to whom it will be administered. The Court further stated that differential studies should be conducted on minority groups wherever feasible.

What's Up?

Does the Rising Use of Employment Personality Tests Contribute to An Increase in Suicides?

As stated in a 2013 Newsweek cover story:

Every year since 1999, more Americans have killed themselves than the year before, making suicide the nation’s greatest untamed cause of death. In much of the world, it’s among the only major threats to get significantly worse in this century than in the last.

There has been an almost 20 percent rise in the annual suicide rate, a 30 percent jump in the sheer number of people who died, at least 400,000 casualties in a decade—about the same toll as World War II and Korea combined.

In 2013, America is likely to reach a grim milestone: the 40,000th death by suicide, the highest annual total on record. In November 2012, a study lead by Ian Rockett, an epidemiologist at West Virginia University, showed that suicide had become the leading cause of “injury death” in America. As the CDC noted again this spring, suicide outpaces the rate of death on the road—and for that matter anywhere else people accidentally harm themselves.

Public Health Burden of Suicidal Behavior in 2008
Adults 18 and Older
All rates per 100,000 population
Source: CDC's National Vital Statistics System.

Interpersonal Theory of Suicidal Behavior

The interpersonal theory of suicidal behavior holds that an individual will die by suicide if he or she has both the desire for suicide and capability to act on that desire. According to the theory, suicidal desire results from the convergence of two interpersonal states: perceived burdensomeness and thwarted belongingness:

Perceived burdensomeness is the view that one’s existence burdens family, friends, and/or society. This view produces the idea that “my death will be worth more than my life to family, friends, society, etc.” – a view, it is important to emphasize, that represents a potentially fatal misperception. Past research, though not designed to test the interpersonal-psychological theory, nonetheless has documented an association between higher levels of perceived burdensomeness and suicidal ideation.
Thwarted belongingness is the experience that one is alienated from others, not an integral part of a family, circle of friends, or other valued group. A persuasive case can be made that, of all the risk factors for suicidal behavior, ranging from the molecular to the cultural levels, the strongest and most uniform support has emerged for indices related to social isolation

Desire alone is not sufficient to result in death by suicide--a third component must be present: the acquired capability for suicide, which develops from repeated exposure and habituation to painful and provocative events. These experiences often include previous self-injury, but can also include other experiences, such as repeated accidental injuries; numerous physical fights; and occupations like physician and front-line soldier in which exposure to pain and injury, either directly or vicariously, is common.

The diagram below sets out a visual representation of the interpersonal theory of suicide. The intersection of perceived burdensomeness and thwarted belongingness creates a desire for suicide. The intersection of that area and the capability for suicide creates the circumstances for suicide or near-lethal suicide attempt.

Expanding Use of Personality Tests and Potential Impact on Suicides

Studies have shown that being unemployed was associated with a twofold to threefold increased relative risk of death by suicide, compared with being employed. Given that more than 90% of persons who attempt suicide have mental illnesses, a process like the use of personality testing that results in persons with mental illnesses being excluded from employment consideration can lead to an increase both in (i) perceived burdensomeness and, (ii) as a consequence of not being employed, thwarted belongingness/social alienation. This happens both on an individual and aggregate level and may cause an increase in suicides.

To date, employment personality assessments have been used primarily in the context of recruitment and hiring for entry-level positions. Assessment use is expanding to a variety of employment actions, including promotion, leadership development, training, retention, succession planning, outplacement, and restructuring.

The possible knock-on impacts of the broadening use of assessments includes the potential that existing employees with mental illness (whether diagnosed or not), many of whom became employees prior to the widespread use of assessments in the hiring process, will find themselves unemployed and, as a consequence, having greater feelings of burdensomeness and thwarted belongingness. This, in turn, may lead to more suicides.

Some Questions and Answers

What are pre-employment assessment tests?

There are a variety of pre-employment assessments, including intelligence tests, personality tests, job fit tests, interest inventories and work skills tests. Personality tests are designed to measure an individual’s emotional, motivational, interpersonal and attitudinal characteristics, as opposed to abilities. Although personality tests were originally designed for use by psychologists and psychiatrists in clinical settings to diagnose and treat mental illnesses, the increasing use of such tests by employers has spawned an entire industry focusing on developing job-specific personality testing.

Are pre-employment personality assessments effective?

According to a 2012 study by Oracle and Development Dimensions International (DDI), a global human resources consulting firm whose expertise includes designing and implementing selection systems, more than 250 staffing directors and over 2,000 new hires from 28 countries provided the following perspectives on their organization’s selection processes (the following are excerpts from the study):

[O]nly 41 percent of staffing directors report that their pre-employment assessments are able to predict better hires.
Only half of staffing directors rate their systems as effective, and even fewer view them as aligned, objective, flexible, efficient, or integrated.
[T]he actual process for making a hiring decision is less effective than a coin toss.

In a 2007 article titled, “Reconsidering the Use of Personality Tests in Employment Contexts”, co-authored by six current or former editors of psychological journals, Dr. Kevin Murphy, Professor of Psychology at Pennsylvania State University and Editor of the Journal of Applied Psychology (1996-2002), states:

The problem with personality tests is … that the validity of personality measures as predictors of job performance is often disappointingly low. … The argument for using personality tests to predict performance does not strike me as convincing in the first place.

What are examples of test questions?

Set out below are examples of questions from an assessment test. Applicants are to respond to each statement with one of five choices: Strongly Disagree, Slightly Disagree, Not Sure/In Between, Slightly Agree, and Strongly Agree.

1. I don’t mind changes in my daily routine.
2. Others consider me a good teammate.
3. I hardly ever finish things on time.
4. Even if they are correct, I find criticism from others difficult to take.
5. I find unexpected changes to be frustrating.
6. It bothers me when people ask me to help them get their work done.
7. At work, I sometimes don’t finish things on time.
8. I prefer things to stay the same and not change.
9. I usually won’t go out of my way to help someone else.
10. Unexpected problems at work cause me great frustration.
11. I believe that others have good intentions.
12. I don’t always see things through.
13. I complete tasks before being told to do them.
14. I do not get emotional in stressful situations.
15. I can change course, if necessary.
16. I am willing to help other people, even if I am very busy.
17. I do everything I say I will do.
18. I jump into action before others.
19. I am not easily stressed.
20. I can handle criticism without getting upset.
21. I am put off by unexpected events.
22. I only help others if I have extra time at the end of the day.
23. I rarely finish doing things before they are actually due (such as paying bills, finishing work).
24. I am the first person to volunteer for new projects.
25. I am easily stressed.
26. I dislike the unknown.
27. I only offer my assistance to others if my own workload is complete.
28. I prefer to have the same structured schedule every day.
29. At work, you simply can’t help everyone and get your own job done.
30. I get frustrated at work when there are too many demands on my time.