Employment Testing: Failing to Make the Grade: Kenexa

Showing posts with label Kenexa. Show all posts

Thursday, October 9, 2014

Big Data's Disparate Impact - Excerpts and Annotations

This posting is based on, and excerpts are taken from, "Big Data's Disparate Impact" by Solon Barocas and Andrew D. Selbst. Their article addresses the potential for disparate impact in the data mining process and points to different places within the process where a disproportionately adverse impact on protected classes may result from innocent choices on the part the data miner. Excerpts from the article are set out below in normal typeface. Please note that footnotes from the article are not included in the excerpts set out below. Annotations that further illuminate issues raised in the article are indented and italicized. Readers are strongly encouraged to read the article by Messrs Barocas and Selbst.

* * * * * * *

"Big Data's Disparate Impact" introduces the computer science literature on data mining and proceeds through the various steps of solving a problem this way:

defining the target variable,
labeling and collecting the training data,
feature selection, and
making decisions on the basis of the resulting model.

Each of these steps creates possibilities for a final result that has a disproportionately adverse impact on protected classes, whether by specifying the problem to be solved in ways that affect classes differently, failing to recognize or address statistical biases, reproducing past prejudice, or considering an insufficiently rich set of factors. Even in situations where data miners are extremely careful, they can still effect discriminatory results with models that, quite unintentionally, pick out proxy variables for protected classes.

To be sure, data mining is a very useful construct. It even has the potential to be a boon to those who would not discriminate, by formalizing decision-making processes and thus limiting the influence of individual bias.

Data mining in such an instance addresses the issue of the "rogue recruiter," a recruiter who is biased, whether intentionally or not, against certain protected classes. Employers and testing companies argue that replacing the rogue recruiter with an algorithmic-based decision model will eliminate the biased hiring practices of that recruiter.

But where data mining does perpetuate discrimination, society does not have a ready answer for what to do about it.

The simple fact that hiring decisions are made "by computers" does not mean the decisions are not subject to bias. Human judgment is subject to an automation bias, which fosters a tendency to disregard or not search for contradictory information insight of a computer-generated solution that is accepted as correct. Such bias has been found to be most pronounced when computer technology fails to flag a problem.

The use of technology systems to hardwire workforce analytics raises a number of fundamental issues regarding the translation of legal mandates, psychological models and business practices into computer code and the resulting distortions. These translation distortions arise from the organizational and social context in which translation occurs; choices embody biases that exist independently, and usually prior to the creation of the system. And they arise as well from the nature of the technology itself and the attempt to make human constructs amenable to computers. (Please see What Gets Lost? Risks of Translating Psychological Models and Legal Requirements to Computer Code.)

Defining the Target Variable and Class Labels

In contrast to those traditional forms of data analysis that simply return records or summary statistics in response to a specific query, data mining attempts to locate statistical relationships in a dataset. In particular, it automates the process of discovering useful patterns, revealing regularities upon which subsequent decision-making can rely. The accumulated set of discovered relationships is commonly called a “model,” and these models can be employed to automate the process of classifying entities or activities of interest, estimating the value of unobserved variables, or predicting future outcomes.

[B]y exposing so-called “machine learning” algorithms to examples of the cases of interest, the algorithm “learns” which related attributes or activities can serve as potential proxies for those qualities or outcomes of interest. In the machine learning and data mining literature, these states or outcomes of interest are known as “target variables.”

The proper specification of the target variable is frequently not obvious, and it is the data miner’s task to define it. In doing so, data miners must translate some amorphous problem into a question that can be expressed in more formal terms that computers can parse. In particular, data miners must determine how to solve the problem at hand by translating it into a question about the value of some target variable.

This initial step requires a data miner to “understand[] the project objectives and requirements from a business perspective [and] then convert[] this knowledge into a data mining problem definition.” Through this necessarily subjective process of translation, though, data miners may unintentionally parse the problem and define the target variable in such a way that protected classes happen to be subject to systematically less favorable determinations.

Kenexa, an employment assessment company purchased by IBM in December 2012, believes that a lengthy commute raises the risk of attrition in call-center and fast-food jobs. It asks applicants for call-center and fast-food jobs to describe their commute by picking options ranging from "less than 10 minutes" to "more than 45 minutes.

The longer the commute, the lower their recommendation score for these jobs, says Jeff Weekley, who oversees the assessments. Applicants also can be asked how long they have been at their current address and how many times they have moved. People who move more frequently "have a higher likelihood of leaving," Mr. Weekley said.

Are there any groups of people who might live farther from the work site and may move more frequently than others? Yes, lower-income persons, disproportionately women, Black, Hispanic and the mentally ill (all, protected classes). They can't afford to live where the jobs are and move more frequently because of an inability to afford housing or the loss of employment.Not only are these protected classes poorly paid, many are electronically redlined from hiring consideration.

As a consequence of Kenexa's "insights," its clients will pass over qualified applicants solely because they live (or don't live) in certain areas. Not only does the employer do a disservice to itself and the applicant, it increases the risk of employment litigation, with its consequent costs. (Please see From What Distance is Discrimination Acceptable?)

[W]here employers turn to data mining to develop ways of improving and automating their search for good employees, they face a number of crucial choices. Like [the term] creditworthiness, the definition of a good employee is not a given. “Good” must be defined in ways that correspond to measurable outcomes: relatively higher sales, shorter production time, or longer tenure, for example.

When employers use data mining to find good employees, they are, in fact, looking for employees whose observable characteristics suggest, based on the evidence that an employer has assembled, that they would meet or exceed some monthly sales threshold, that they would perform some task in less than a certain amount of time, or that they would remain in their positions for more than a set number of weeks or months. Rather than drawing categorical distinctions along these lines, data mining could also estimate or predict the specific numerical value of sales, production time, or tenure period, enabling employers to rank rather than simply sort employees.

These may seem like eminently reasonable things for employers to want to predict, but they are, by necessity, only part of an array of possible ways of defining what “good” means. An employer may attempt to define the target variable in a more holistic way—by, for example, relying on the grades that prior employees have received in annual reviews, which are supposed to reflect an overall assessment of performance. These target variable definitions simply inherit the formalizations involved in preexisting assessment mechanisms, which in the case of human-graded performance reviews, may be far less consistent.

As previously noted, Kenexa defines a "good" employee as a function, in part, of job tenure. It then uses a number of proxies - distance from jobsite, length of time at current address, and how many times moved - to define "job tenure."

Painting with the broad brush of distance from job site, commute time and moving frequency results in well-qualified applicants being excluded, applicants who might have ended up being among the longest tenured of employees. The Kenexa findings are generalized correlations (i.e., persons living closer to the job site tend to have longer tenure than persons living farther from the job site). The insights say nothing about any particular applicant.

The general lesson to draw from this discussion is that the definition of the target variable and its associated class labels will determine what data mining happens to find. While critics of data mining have tended to focus on inaccurate classifications (false positives and false negatives), as much—
if not more—danger resides in the definition of the class label itself and the subsequent labeling of examples from which rules are inferred. While different choices for the target variable and class labels can seem more or less reasonable, valid concerns with discrimination enter at this stage because the different choices may have a greater or lesser adverse impact on protected classes.

Training Data

As described above, data mining learns by example. Accordingly, what a model learns depends on the examples to which it has been exposed. The data that function as examples are known as training data: quite literally the data that train the model to behave in a certain way. The character of the training data can have meaningful consequences for the lessons that data mining happens to learn.

Discriminatory training data leads to discriminatory models.This can mean two rather different things, though:

If data mining treats cases in which prejudice has played some role as valid examples from which to learn a decision-making rule, that rule may simply reproduce the prejudice involved in these earlier cases; and
If data mining draws inferences from a biased sample of the populations to which the inferences are expected to generalize, any decisions that rests on these inferences may systematically disadvantage those who are under- or over-represented in the dataset.

Labeling Examples

The unavoidably subjective labeling of examples can skew the resulting findings in such a way that any decisions taken on the basis of those findings will characterize all future cases along the same lines, even if such characterizations would seem plainly erroneous to analysts who looked more closely at the individual cases. For all their potential problems, though, the labels applied to the training data must serve as ground truth.

The kinds of subtle mischaracterizations that happened during training will be impossible to detect when evaluating the performance of a model, because the training is taken as a given at that point. Thus, decisions taken on the basis of discoveries that rest on haphazardly labeled data or data labeled in a systematically, though unintentionally, biased manner will seem valid.

So long as prior decisions affected by some form of prejudice serve as examples of correctly rendered determinations, data mining will necessarily infer rules that exhibit the same prejudice.

An employer currently subject to an EEOC investigation states it identified “a pool of existing employees” that Kronos, a third party assessment provider, utilized to create a customized assessment for use by the employer. The employer's reliance on that employee sample is flawed because people with mental disabilities are severely underrepresented in the existing workforce:

According to a 2010 Kessler Foundation/NOD Survey of Employment of Americans with Disabilities conducted by Harris Interactive survey, the employment gap between people with and without disabilities has remained significant over the past 25+ years.

According to a 2013 report of the Senate HELP Committee, Unfinished Business: Making Employment of People with Disabilities A National Priority, only 32% of working age people with disabilities participate in the labor force, as compared with 77% of working age people without disabilities. For people with mental illnesses, rates are even lower.

The employment rate for people with serious mental illness is less than half the 33% rate for other disability groups (Anthony, Cohen, Farkas, & Gagne, 2002).

Surveys have found that only 10% - 15% of people with serious mental illness receiving community treatment are competitively employed (Henry, 1990; Lindamer et al., 2003; Pandiani & Leno, 2011; Rosenheck et al., 2006; Salkever et al., 2007).

In Albemarle Paper Company v. Moody, 422 US 405 (1975), in which an employer implemented a test on the theory that a certain verbal intelligence was called for by the increasing sophistication of the plant's operations, the Supreme Court cited the Standards of the American Psychological Association and pointed out that a test should be validated on people as similar as possible to those to whom it will be administered. The Court further stated that differential studies should be conducted on minority groups/protected classes wherever feasible.

The use of the employer's own workforce to develop and benchmark its assessment is flawed because people with mental disabilities are severely underrepresented in the employer's workforce and the overall U.S. workforce.

Not only can data mining inherit prior prejudice through the mislabeling of examples, it can also reflect current prejudice through the ongoing behavior of users taken as inputs to data mining.

This is what Latanya Sweeney discovered in a study that found that Google queries for black-sounding names were more likely to return contextual (i.e., key-word triggered) advertisements for arrest records than those for white-sounding names.

Sweeney confirmed that the companies paying for these ads had not set out to focus on black-sounding names; rather, the fact that black-sounding names were more likely to trigger such advertisements seemed to be an artifact of the algorithmic process that Google employs to determine

which advertisements to deliver alongside the results for certain queries. Although the details of the process by which Google computes the so-called “quality score” according to which it ranks advertisers’ bids is not fully known, one important factor is the predicted likelihood, based on historical trends, that users will click on an advertisement.

As Sweeney points out, the process “learns over time which ad text gets the most clicks from "viewers of the ad” and promotes that advertisement in its rankings accordingly. Sweeney posits that this aspect of the process could result in the differential delivery of advertisements that reflect the kinds of prejudice held by those exposed to the advertisements. In attempting to cater to the preferences of users, Google will unintentionally reproduce the existing prejudices that inform users’ choices.

A similar situation could conceivably arise on websites that recommend potential employees to employers, as LinkedIn does through its Talent Match feature. If LinkedIn determines which candidates to recommend on the basis of the demonstrated interest of employers in certain types of candidates, Talent Match will offer recommendations that reflect whatever biases employers happen to exhibit. In particular, if LinkedIn’s algorithm observes that employers disfavor certain candidates that are members of a protected class, Talent Match may decrease the rate at which it recommends these types of candidates to employers. The recommendation engine would learn to cater to the prejudicial preferences of employers.

Data Collection

Organizations that do not or cannot observe different populations in a consistent way and with

equal coverage will amass evidence that fails to reflect the actual incidence and relative proportion of some attribute or activity in the under- or over-observed group. Consequently, decisions that depend on conclusions drawn from this data may discriminate against members of these groups.

The data might suffer from a variety of problems: the individual records that a company maintains about a person might have serious mistakes, the records of the entire protected class of which this person is a member might also have similar mistakes at a higher rate than other groups, and the entire set of records may fail to reflect members of protected classes in accurate proportion to others. In other words, the quality and representativeness of records might vary in ways that correlate with class membership (e.g., institutions might maintain systematically less accurate, precise, timely, and complete records). Even a dataset with individual records of consistently high quality can suffer from statistical biases that fail to represent different groups in accurate proportions. Much attention has

focused on the harms that might befall individuals whose records in various commercial databases are error-ridden, but far less consideration has been paid to the systematic disadvantage that members of protected classes may suffer from being miscounted and the resulting biases in their representation

in the evidence base.

Recent scholarship has begun to stress this point. Jonas Lerman, for example, worries about “the nonrandom, systemic omission of people who live on big data’s margins, whether due to poverty, geography, or lifestyle, and whose lives are less ‘datafied’ than the general population’s.” Kate Crawford has likewise warned, “because not all data is created or even collected equally, there are ‘signal problems’ in big-data sets—dark zones or shadows where some citizens and communities are ... underrepresented.” Errors of this sort may befall historically disadvantaged groups at higher rates because they are less involved in the formal economy and its data-generating activities.

Crawford points to Street Bump, an application for Boston residents that takes advantage of accelerometers built into smart phones to detect when drivers ride over potholes (sudden movement that suggests broken road automatically prompts the phone to report the location to the city).

While Crawford praises the cleverness and cost-effectiveness of this passive approach to reporting road problems, she rightly warns that whatever information the city receives from this application will be biased by the uneven distribution of smartphones across populations in different parts of

the city. In particular, systematic differences in smartphone ownership will very likely result in the underreporting of road problems in the poorer communities where protected groups disproportionately congregate. If the city were to rely on this data to determine where it should direct its resources, it would only further underserve these communities. Indeed, the city would discriminate against those who lack the capacity to report problems as effectively as wealthier residents with smartphones.

A similar dynamic could easily apply in an employment context if members of protected classes are unable to report their interest in and qualification for jobs listed online as easily or effectively as others due to systematic differences in Internet access.

Zappos has launched a new careers site and removed all job postings. Instead of applying for jobs, persons interested in working at Zappos will need to enroll in a social network run by the company, called Zappos Insiders. The social network will allow them to network with current employees by digital Q&As, contests and other means in hopes that Zappos will tap them when jobs come open.

"Zappos Insiders will have unique access to content, Google Hangouts, and discussions with recruiters and hiring teams. Since the call-to-action is to become an Insider versus applying for a specific opening, we will capture more people with a variety of skill sets that we can pipeline for current or future openings," said Michael Bailen, Zappos’ head of talent acquisition.

In response to a question, “How can I stand out from the pack and stay front-and-center in the Zappos Recruiters’ minds?” on the Zappos' Insider site, the company lists six ways to stand out, including: using Twitter, Facebook, Instagram, Pinterest and Google Hangouts; participating in TweetChats; following Zappos’ employees on various social media platforms; and, reaching out to Zappos’ “team ambassadors.”

For the most part, all of the foregoing activities require broadband internet access and devices (tablets, smartphones, etc.) that run on those access networks. A number of protected classes will be challenged by both the broadband access and social media participation requirements:

As noted by a PewResearch Internet Project Research report, African Americans have long been less likely than whites to have high speed broadband access at home, and that continues to be the case. Today, African Americans trail whites by seven percentage points when it comes to overall internet use (87% of whites and 80% of blacks are internet users), and by twelve percentage points when it comes to home broadband adoption (74% of whites and 62% of blacks have some sort of broadband connection at home).

The gap between whites and blacks when it comes to traditional measures of internet and broadband adoption is pronounced. Specifically, older African Americans, as well as those who have not attended college, are significantly less likely to go online or to have broadband service at home compared to whites with a similar demographic profile.

According to the PewResearch Internet Project, even among those persons who have broadband access, the percentage of those using social media sites varies significantly by age.

Social medial participation is not solely a function of age. "Social media is transforming how we engage with customers, employees, jobseekers and other stakeholders," said Kathy Martinez, Assistant Secretary of Labor for Disability Employment Policy. "But when social media is inaccessible to people with disabilities, it excludes a sizeable segment of our population."

Persons with disabilities (e.g., sight or hearing loss, paralysis), whether physical, mental, or developmental, face challenges accessing social media. Each of the social media platforms promoted by Zappos - Twitter, Facebook, Instagram, Pinterest, and Google Hangouts - have differing levels of support for those with disabilities (e.g., close captions or real live captions on image content that utilize sound/voice). (Please see Zappos: The Future of Hiring and Hiring Discrimination?)

To ensure that data mining reveals patterns that obtain for more than the particular sample under
analysis, the sample must share the same probability distribution as the data that would be gathered from all cases across both time and population. In other words, the sample must be proportionally representative of the entire population, even though the sample, by definition, does not include every
case.

If a sample includes a disproportionate representation of a particular class (more or less than its actual incidence in the overall population), the results of an analysis of that sample may skew in favor or against the over-or under-represented class. While the representativeness of the data is often simply assumed, this assumption is rarely justified, and is “perhaps more often incorrect than correct.”

Feature Selection

Organizations—and the data miners that work for them—also make choices about what attributes they observe and what they subsequently fold into their analyses. Data miners refer to the process of settling on the specific string of input variables as “feature selection.” Members of protected classes may find that they are subject to systematically less accurate classifications or predictions because the details necessary to achieve equally accurate determinations reside at a level of granularity and coverage that the features fail to achieve.

This problem stems from the fact that data are by necessity reductive representations of an infinitely more specific real-world object or phenomenon. At issue, really, is the coarseness and comprehensiveness of the criteria that permit statistical discrimination and the uneven rates at which different groups happen to be subject to erroneous determinations. Crucially, these erroneous and potentially adverse outcomes are artifacts of statistical reasoning rather than prejudice on the part of decision-makers or bias in the composition of the dataset. As Frederick Schauer explains, decision-makers that rely on statistically sound but nonuniversal generalizations “are being simultaneously rational and unfair” because certain individuals are “actuarially saddled” by statistically sound inferences that are nevertheless inaccurate

To take an obvious example, hiring decisions that consider credentials tend to assign enormous weight to the reputation of the college or university from which an applicant has graduated, despite the fact that such credentials may communicate very little about the applicant’s job-related skills and competencies. If equally competent members of protected classes happen to graduate from these colleges or universities at disproportionately low rates, decisions that turn on the credentials conferred by these schools, rather than some set of more specific qualities that more accurately sort individuals, will incorrectly and systematically discount these individuals.

Kenexa, an assessment company owned by IBM and used by hundreds of employers, believes that a lengthy commute raises the risk of attrition in call-center and fast-food jobs. It asks applicants for those jobs to describe their commute by picking options ranging from "less than 10 minutes" to "more than 45 minutes." According to Kenexa’s Jeff Weekley, in a September 20, 2012 article in The Wall Street Journal, “The longer the commute, the lower their recommendation score for these jobs.” Applicants are also asked how long they have been at their current address and how many times they have moved. People who move more frequently "have a higher likelihood of leaving," Mr. Weekley said.

A 2011 study by the Center for Public Housing found that poor and near-poor families tend to move more frequently than the general population. A wide range of often complex forces appears to drive this mobility:

the formation and dissolution of households;

an inability to afford one’s housing costs;

the loss of employment;

lack of quality housing; or

a safer neighborhood.

According to the U.S. Census, lower-income persons are disproportionately female, black, Hispanic, and mentally ill.

Painting with the broad brush of distance from work, commute time and moving frequency results in well-qualified applicants being excluded from employment consideration. Importantly, the workforce insights of companies like Kenexa are based on data correlations - they say nothing about a particular person.

The application of these insights means that many low-income persons are electronically redlined. Employers do not even interview, let alone hire, qualified applicants because they live in certain areas or because they have moved. The reasons for moving do not matter, even if it was to find a better school for their children, to escape domestic violence or due to job loss from a plant shutdown.

When Clayton County, Georgia killed its bus system in 2010, it had nearly 9,000 daily riders. Many of those riders used the service to commute to their jobs. The transit shutdown increased commuting times (as persons found alternate ways to get to work) and led to more housing mobility (as persons relocated to be closer to their jobs to mitigate commuting time). Though no fault of their own, the impact of increasing the former bus riders commuting time or moving their residence made them less attractive job candidates to the many employers who use companies like Kenexa.

Making Decisions on the Basis of the Resulting Model

Cases of decision-making that do not artificially introduce discriminatory effects into the data mining process may nevertheless result in systematically less favorable determinations for members of protected classes. Situations of this sort are possible when the criteria that are genuinely relevant in making rational and well-informed decisions also happen to serve as reliable proxies for class membership. In other words, the very same criteria that correctly sort individuals according to their
predicted likelihood of excelling at a job—as formalized in some fashion— may also sort individuals according to class membership.

For example, employers may find, in conferring greater attention and opportunities to employees that they predict will prove most competent at some task, that they subject members of protected groups to consistently disadvantageous treatment because the criteria that determine the attractiveness of employees happen to be held at systematically lower rates by members of these groups. Decision-makers do not necessarily intend this disparate impact because they hold prejudicial beliefs; rather, their reasonable priorities as profit-seekers unintentionally recapitulate the inequality that happens to exist in society. Furthermore, this may occur even if proscribed criteria have been removed from the dataset, the data are free from latent prejudice or bias, the data is especially granular and diverse, and the only goal is to maximize classificatory or predictive accuracy.

The problem stems from what researchers call “redundant encodings”: cases in which membership in a protected class happens to be encoded in other data. This occurs when a particular piece of data or certain values for that piece of data are highly correlated with membership in specific protected classes. The fact that these data may hold significant statistical relevance to the decision at hand explains why data mining can result in seemingly discriminatory models even when its only objective is to ensure the greatest possible accuracy for its determinations. If there is a disparate distribution of an attribute, a more precise form of data mining will be more likely to capture it as such. Better data and more features will simply expose the exact extent of inequality.

Data mining could also breathe new life into traditional forms of intentional discrimination because decision-makers with prejudicial views can mask their intentions by exploiting each of the mechanisms enumerated above. Stated simply, any form of discrimination that happens unintentionally can be orchestrated intentionally as well.

For instance, decision-makers could knowingly and purposefully bias the collection of data to ensure that mining suggests rules that are less favorable to members of protected classes. They could likewise attempt to preserve the known effects of prejudice in prior decision-making by insisting that such decisions constitute a reliable and impartial set of examples from which to induce a decision-making rule. And decision-makers could intentionally rely on features that only permit coarse-grain distinction-making—distinctions that result in avoidable and higher rates of erroneous determinations for members of a protected class.

Because data mining holds the potential to infer otherwise unseen attributes, including those traditionally deemed sensitive, it can furnish methods by which to determine indirectly individuals’ membership in protected classes and to unduly discount, penalize, or exclude such people accordingly. In other words, data mining could grant decision-makers the ability to distinguish and disadvantage members of protected classes without access to explicit information about individuals’ class membership. It could instead help to pinpoint reliable proxies for such membership and thus place institutions in the position to automatically sort individuals into their respective class without ever having to learn these facts directly.

Friday, June 13, 2014

Exacerbating Long-Term Unemployment: Big Data and Employment Assessments

A recent Brookings Institution paper states that the “diverse and varied set of characteristics [of the long-term unemployed] implies that a broad array of policies will be needed to substantially lower the long-term unemployment rate and stem labor force withdrawal, as concentrating on any single occupation, industry, demographic group or region is unlikely to have a substantial impact reducing long-term unemployment by itself." Please see On the Margins of the Labor Market.

There is, however, a common employment factor that can be linked to numerous occupations, industries, demographic groups and regions -- online job application processes that require individuals (i) to provide "location-based information" (i.e., distance from job site, commute time, household relocation) and (ii) to complete personality assessments.The screening elements in these processes exclude or penalize persons with lower socioeconomic status - disproportionately Blacks, Hispanics, persons with mental illness, and the less well-educated. The same groups (ex persons with mental illness) that the recent Brookings Institution paper found to comprise a disproportionate percentage of the long-term unemployed.

Jobs that were once filled on the basis of work history and interviews are left to personality tests, data analysis and algorithms. The new hiring tools are part of a broader effort to gather and analyze employee data. Use of online assessments has grown exponentially over the past 10-15 years, with assessment companies like Kronos now having a database of hundreds of millions of job applicant and employee information. To provide a sense of scale, one major big box retailer processes more than nine million job applications a year.

Personality tests are “growing like wildfire,” said Josh Bersin, president and CEO of Bersin & Associates, an Oakland, Calif., research firm. Bersin estimated that this kind of pre-hire testing has been growing by as much as 20 percent annually in the past few years. Industries that are flooded with resumes such as retail, food service and hospitality are among the ones that use such tests most often, he said.

Employment Redlining: Location-Based Discrimination

Kenexa, an assessment company purchased by IBM in December 2012 for $1.3 billion, will test tens of millions of applicants this year for thousands of clients. Kenexa believes that a lengthy commute raises the risk of attrition in call-center and fast-food jobs. It asks applicants for call-center and fast-food jobs to describe their commute by picking options ranging from "less than 10 minutes" to "more than 45 minutes."The longer the commute, the lower their recommendation score for these jobs, said Jeff Weekley,, who oversees the assessments. Applicants also can be asked how long they have been at their current address and how many times they have moved. People who move more frequently "have a higher likelihood of leaving," Mr. Weekley said.

Painting with the broad brush of distance from job site, commute time and moving frequency results in otherwise well-qualified applicants being excluded, applicants who might have ended up being among the longest tenured of employees. The Kenexa findings are generalized correlations; the insights say nothing about any particular applicant. Please see From What Distance is Discrimination Acceptable.

Are there any groups of people who might live farther from the work site and may move more frequently than others? Yes, lower-income persons, disproportionately women, black, Hispanic and the mentally ill. They can't afford to live where the jobs are and move more frequently because of an inability to afford housing or the loss of employment.

Spatial Mismatch and its Institutionalization

An NBER study published in April 2014, "Job Displacement and the Duration of Joblessness: The Role of Spatial Mismatch, finds that better job accessibility significantly decreases the duration of joblessness among lower-paid displaced workers. Blacks, females, and older workers are more sensitive to job accessibility than other subpopulations.

The so-called “spatial mismatch hypothesis,” which originally grew out of research on the effects of segregated housing markets, has been debated among economists and social scientists since the 1960s. But while there’s general agreement that “job accessibility” has some impact on unemployment duration, researchers have disagreed about how important it is and for which groups of workers.

Although the study was limited to the 2000-05 period, its conclusion — that “a worker with locally inferior access to jobs is likely to have worse labor market outcomes” — could help explain the current situation. What we know for sure is that as of March 2014, more than a third (35.7%) of all unemployed Americans had been out of work for more than 26 weeks, according to the BLS. Blacks and Asians are most likely to experience extended joblessness: Last month, 44% of unemployed blacks and about as many unemployed Asians had been out of work longer than 26 weeks, versus a third of unemployed whites and 32% of unemployed Hispanics. Please see Long-Term Unemployment and its Costs.

With the "location-based" scoring "insights" provided by companies like Kenexa, spatial mismatch has been institutionalized over the past 5-10 years. If a job applicant has a long commute - whether due to the lack of effective mass transit where the applicant lives or to the lack of access to personal transportation, that applicant may never be interviewed, let alone offered a job.

Mental Illness and Socioeconomic Status

One of the most consistently replicated findings in the social sciences has been the negative relationship of socioeconomic status (SES) with mental illness: The lower the SES of an individual is, the higher is his or her risk of mental illness.

As an example, for the period from 2005-2010, the Centers for Disease Control found that among adults 20–44 and 45–64 years of age, depression was five times as high for those below poverty, about three times as high for those with family income at 100%–199% of poverty, and 60% higher for those with income at 200%–399% of poverty compared with those at 400% or more of the poverty level.

According to a 2001 study, lower income Americans had a higher prevalence of 1 or more psychiatric disorders (51% vs 28%): mood disorders (33% vs 16%), anxiety disorders (36% vs 11%), and eating disorders (10% vs 7%). Consequently, pre-employment assessments using these location-based "insights" screen out persons with mental illness.

Mental Illness and Disability

The prevalence of mental disorders in the U.S. population remained unchanged between 1990 and 2003. In that same interval, the rate of treatment of mental illness substantially increased—which in turn should have contributed to improved work-readiness among individuals coping with mental illness. The combination of the prevalence of mental disorders remaining unchanged and substantially increased rates of treatment should have resulted in a decline in the percentage of persons receiving SSDI awards who are diagnosed with mental illness. That has not been the case. Please see Costing Taxpayers Billions of Dollars Each Year.

People with psychiatric impairments constitute the largest and most rapidly growing subgroup of Social Security disability beneficiaries. In 2011, 47.5 percent of persons receiving SSI and 31.0 percent of persons receiving SSDI had a mental disorder. These percentages keep growing, in part because beneficiaries with psychiatric impairments are generally younger than other beneficiaries when they become ill and therefore remain on the Social Security rolls much longer.

Some analysts contend that rising disability awards for mental illness reflect a “broken” system that provides benefits to those who should not receive them; others point out that income support makes it easier for persons with mental illness to live in the community. These conflicting conclusions reflect an ongoing debate over whether increasing awards for mental illness represent a policy success because they reach needy individuals or failure because the increased awards reflect moral hazard.

The income support programs may be working as designed, but those programs did not anticipate the impact of the widespread use of pre-employment assessments and the resulting material increase in the absolute number and percentage of unemployed persons with mental disabilities seeking SSDI and SSI benefits as a consequence of the use of potentially illegal assessments.

* * * * *

Persistently high long-term unemployment has signiﬁcant implications for families, government budgets, and the country’s overall economic and social health. The high rate of long-term unemployment has had a direct impact on the federal budget by prompting the extension of normal unemployment benefits, ratcheting up spending on other government safety-net programs (including, indirectly, SSDI, SSI and Medicare) and by reducing taxable wages. Martin Feldstein in a recent article in the Wall Street Journal, draws on the Brookings Institution paper to suggest that those who have been out of work for six months or more do not affect wage inflation and that since the unemployment rate among those out of work for less than six months was only 4.1%, wage inflation may soon begin to rise more rapidly.

The growing and widespread use of employment assessments and applicant data collection processes over the past ten years has likely had an impact on the growth of the long-term unemployed in the U.S. labor market. Persons with lower socioeconomic status, disproportionately Black, Hispanic, persons with mental illness, and the less well-educated, risk becoming a permanent underclass of the unemployed and underemployed.

Some of the most profound challenges revealed by the recent White House Report "Big Data: Seizing Opportunities, Preserving Values" concern how big data analytics may lead to disparate inequitable treatment, particularly of disadvantaged groups, or create such an opaque decision-making environment that individual autonomy is lost in an impenetrable set of algorithms. Please see White House: Big Data's Role in Employment Discrimination.

Workforce assessment systems, designed in part to mitigate risks for employers, have become sources of material risk, both to job applicants and employers. The systems create the perception of stability through probabilistic reasoning and the experience of accuracy, reliability, and comprehensiveness through automation and presentation. But in so doing, technology systems draw attention away from uncertainty and partiality. Moreover, they shroud opacity—and the challenges for oversight that opacity presents—in the guise of legitimacy, providing the allure of shortcuts and safe harbors for actors both challenged by resource constraints and desperate for acceptable means to demonstrate compliance with legal mandates and market expectations.

Thursday, January 16, 2014

Gut Check: How Intelligent Is Artificial Intelligence?

In 6 Ways to Create a Smarter Workforce in 2014, Tim Geisert, Kenexa's Chief Marketing Officer, writes:

Use science, precision and data to hire the right people for the job the first time. According to a 2012 IBM study, 71 percent of CEOs surveyed cited human capital as their greatest source of sustained economic value. So, why does HR continue to rely on gut instinct alone to make such important decisions?

Perhaps Mr. Geisert should have spoken with Rudy Karsan, one of Kenexa's founders and its CEO, who wrote in Listening to Your Gut Feeling:

On the big decisions I have gone against my gut on a couple of occasions and it’s been a train wreck. My gut is made up of my instinct, my faith, my intuition, my experiences and data that is currently inaccessible because it’s tucked away in the deep recesses of my brain.

Then again, perhaps Mr. Karsan should have spoken with Troy Kanter, President of Human Capital Management for Kenexa, who stated in a press release:

Now, instead of making hiring decisions based on 'gut' feelings and personal likes and dislikes, hiring managers and HR can select candidates based on objective data, which also prevents potential legal ramifications and mitigates risk in the hiring process."

So, two of the Kenexa executives believe that going with your gut is bad idea, but the most senior Kenexa executive believes that going against your gut is an accident waiting to happen. Who's right?

What Does Data Tell Us?

The benefits provided by the use of pre-employment assessments, whether called workforce science, talent analytics or any other name, should be readily apparent and quantifiable. For example, has the rising use of pre-employment assessments over the past 10-15 years resulted in greater employee engagement?

Gallup has measured employee engagement since 2000 and it defines “engaged” employees as those who are involved in, enthusiastic about, and committed to their work and contribute to their organization in a positive manner. The 2013 Gallup report shows that 70% of American workers are “not engaged” or “actively disengaged” and are emotionally disconnected from their workplaces.

Is the employee engagement data from the 2013 report an anomaly? No. As shown in the following chart taken from the 2013 Gallup report, there has been little change workplace engagement levels since 2000.

Contrast the lack of change in employee engagement with the marketing of pre-employment assessments, like this selection from the Kronos website:

Your employees are the face of your brand and the most vital asset of your business. They drive your productivity and profitability. What’s more important than selecting the right ones? Take the guesswork out of employee selection with industry-specific, behavioral-based assessments and interview guides [from Kronos].

Gallup’s research shows that employee engagement is strongly connected to business outcomes essential to an organization’s financial success, including productivity, profitability, and customer satisfaction. Yet, as the report states, "workplace engagement levels have hardly budged since Gallup began measuring them in 2000."

Brain vs Computer

In "Thinking In Silicon," a December 2013 article in the MIT Technology Review, Tom Simonite writes:

Picture a person reading these words on a laptop in a coffee shop. The machine made of metal, plastic, and silicon consumes about 50 watts of power as it translates bits of information—a long string of 1s and 0s—into a pattern of dots on a screen. Meanwhile, inside that person’s skull, a gooey clump of proteins, salt, and water uses a fraction of that power not only to recognize those patterns as letters, words, and sentences but to recognize the song playing on the radio.

All today’s computers, from smartphones to supercomputers, have just two main components: a central processing unit, or CPU, to manipulate data, and a block of random access memory, or RAM, to store the data and the instructions on how to manipulate it. The CPU begins by fetching its first instruction from memory, followed by the data needed to execute it; after the instruction is performed, the result is sent back to memory and the cycle repeats. Even multicore chips that handle data in parallel are limited to just a few simultaneous linear processes.

Brains compute in parallel as the electrically active cells inside them, called neurons, operate simultaneously and unceasingly. Bound into intricate networks by threadlike appendages, neurons influence one another’s electrical pulses via connections called synapses. When information flows through a brain, it processes data as a fusillade of spikes that spread through its neurons and synapses. You recognize the words in this paragraph, for example, thanks to a particular pattern of electrical activity in your brain triggered by input from your eyes. Crucially, neural hardware is also flexible: new input can cause synapses to adjust so as to give some neurons more or less influence over others, a process that underpins learning. In computing terms, it’s a massively parallel system that can reprogram itself.

Okay, but what about computing at the bleeding edge, like the "cognitive computing" of IBM's Watson?

Not So Elementary

According to a January 9, 2014 article in CIO.com, IBM says cognitive computing systems like Watson are capable of understanding the subtleties, idiosyncrasies, idioms and nuance of human language by mimicking how humans reason and process information.

Whereas traditional computing systems are programmed to calculate rapidly and perform deterministic tasks, IBM says cognitive systems analyze information and draw insights from the analysis using probabilistic analytics. And they effectively continuously reprogram themselves based on what they learn from their interactions with data.

Said IBM CEO Ginni Rometty, "In 2011, we introduced a new era [of computing] to you. It is cognitive. It was a new species, if I could call it that. It is taught, not programmed. It gets smarter over time. It makes better judgments over time." "It is not a super search engine," she adds. "It can find a needle in a haystack, but it also understands the haystack."

This "new species" of computing has its challenges. According to "IBM Struggles to Turn Watson Computer Into Big Business," a recent Wall Street Journal article:

Watson is having more trouble solving real-life problems than "Jeopardy" questions, according to a review of internal IBM documents and interviews with Watson's first customers.

For example, Watson's basic learning process requires IBM engineers to master the technicalities of a customer's business—and translate those requirements into usable software. The process has been arduous.

Klaus-Peter Adlassnig is a computer scientist at the Medical University of Vienna and the editor-in-chief of the journal Artificial Intelligence in Medicine. The problem with Watson, as he sees it, is that it’s essentially a really good search engine that can answer questions posed in natural language. Over time, Watson does learn from its mistakes, but Adlassnig suspects that the sort of knowledge Watson acquires from medical texts and case studies is “very flat and very broad.” In a clinical setting, the computer would make for a very thorough but cripplingly literal-minded doctor—not necessarily the most valuable addition to a medical staff.

As Hector J. Levesque, a professor at the University of Toronto and a founding member of the American Association of Artificial Intelligence, recently wrote:

"As a field, I believe that we tend to suffer from what might be called serial silver bulletism, defined as follows:

the tendency to believe in a silver bullet for AI, coupled with the belief that previous beliefs about silver bullets were hopelessly naıve.

We see this in the fads and fashions of AI research over the years: first, automated theorem proving is going to solve it all; then, the methods appear too weak, and we favour expert systems; then the programs are not situated enough, and we move to behaviour-based robotics; then we come to believe that learning from big data is the answer; and on it goes."

Similarly, assessment companies have marketed the benefits of "science, precision and data" over the past fifteen years under the guise of neural networks, artificial intelligence, big data and deep learning, yet what has changed? Employee engagement levels have hardly budged and employee turnover remains a continuing and expensive challenge for employers.

The more things change, the more they remain the same or, in deference to Monsieur Levesque "plus ça change, plus c'est la même chose."

Monday, December 23, 2013

Employment Assessments: 21st Century Snake Oil?

The Oxford English Dictionary defines snake oil as "a quack remedy or panacea." The origins of snake oil as a derogatory phrase trace back to the latter half of the 19th century, which saw a dramatic rise in the popularity of "patent medicines." Despite the name, most patent medicines were not officially patented. They were medicines with questionable effectiveness whose contents were usually kept secret.

By the middle of the 19th century the manufacture of patent medicines had become a major industry in America. Often high in alcoholic content and fortified with cocaine, morphine or opium, many of these concoctions were advertised for infants and children. Some level of exoticism in the contents of the preparation was deemed desirable by their promoters and nearly any scientific discovery could inspire a key ingredient or principle in a patent medicine.

From the beginning, some physicians and medical societies were critical of patent medicines. They argued that the remedies did not cure illnesses, discouraged the sick from seeking legitimate treatments, and created negative consequences like alcohol and drug dependency.

By the end of the 19th century, Americans favored laws to force manufacturers to disclose the remedies' ingredients and use more realistic language in their advertising. These laws met with fierce resistance from the manufacturers. Finally, with strong support from President Theodore Roosevelt, a Pure Food and Drug Act was passed by Congress in 1906, paving the way for public health action against unlabeled or unsafe ingredients and misleading advertising.

Are there any similarities between the sales and marketing of snake oil/patent medicine and the sales and marketing of employment assessments? Do the assessments have questionable effectiveness? Are their contents kept secret? Do they market scientific discoveries, in the form of buzzwords, as inspiring key ingredients? Is there public action challenging the assessments?

Questionable Effectiveness?

According to a 2012 study by Oracle and Development Dimensions International (DDI), a global human resources consulting firm whose expertise includes designing and implementing selection systems, more than 250 staffing directors and over 2,000 new hires from 28 countries provided the following perspectives on their organization’s selection processes (the following are excerpts from the study):

[O]nly 41 percent of staffing directors report that their pre-employment assessments are able to predict better hires.
Only half of staffing directors rate their systems as effective, and even fewer view them as aligned, objective, flexible, efficient, or integrated.
[T]he actual process for making a hiring decision is less effective than a coin toss.

In a 2007 article titled, “Reconsidering the Use of Personality Tests in Employment Contexts”, co-authored by six current or former editors of academic psychological journals, Dr. Kevin Murphy, Professor of Psychology at Pennsylvania State University and Editor of the Journal of Applied Psychology (1996-2002), states:

The problem with personality tests is … that the validity of personality measures as predictors of job performance is often disappointingly low. A couple of years ago, I heard a SIOP talk by Murray Barrick … He said, “If you took all the … [factors], measured well, you corrected for everything using the most optimistic corrections you could possibly get, you could account for about 15% of the variance in performance [between projected and actual performance].” … You are saying that if you take normal personality tests, putting everything together in an optimal fashion and being as optimistic as possible, you’ll leave 85% of the variance unaccounted for. The argument for using personality tests to predict performance does not strike me as convincing in the first place.

Secret Contents?

Using terms like patent-pending, proprietary and trade secret, employment assessment companies claim that they cannot disclose information about their assessment processes. Are these claims legitimate or, like the Wizard of Oz, are the claims used to mask the lack of relevant and legal substance behind the assessments? Or is it a bit of both?

Even assuming that the confidentiality claims by the assessment companies are appropriate, are there no circumstances in which the companies must disclose information regarding how their assessments are developed and implemented and the results of the assessment usage across a broad population of job applicants? There are.

In a 2012 decision, a federal appeals court stated the Americans with Disabilities Act (ADA) prohibits employment tests when such tests screen out or tend to screen out disabled people and the use of the test is not job-related for the position in question and consistent with business necessity. The court went on to order that the employer and testing company, Kroger and Kronos, respectively, provide the Equal Employment Opportunity Commission (EEOC) with:

Any and all documents and data constituting or related to validation studies or validation evidence pertaining to Kronos assessment tests purchased by Kroger, including but not limited to such studies or evidence as they relate to the use of the tests as personnel selection or screening instruments, even if created or performed for other customer(s);
The user’s manual and instructions for the use of assessment tests used by Kroger;
Any and all documents (if any) related to Kroger, including but not limited to correspondence, notes, and data files, relating to Kroger; its use of the assessment test; results, ratings, or scores of individual test takers; and any validation efforts made thereto; and
Any and all documents discussing, analyzing or measuring potential adverse impact on persons with disabilities.

So, it's quick and easy for the regulatory agency (EEOC) to obtain the necessary information, right?No. The EEOC investigation leading to the appeals court decision referenced above has been ongoing for more than six years. It has generated a number of district court and appellate court decisions. None of those decisions have addressed substantive claims of discrimination. They are all decisions relating to the unwillingness of Kroger and Kronos to provide information to the EEOC that would allow the EEOC to determine whether the Kronos assessments illegally discriminated against persons with disabilities.

Please see Kroger and Kronos: Chaos and Disorder.

Negative Consequences?

Employment personality tests discriminate against applicants with mental illness. These applicants are ready, willing and able to work, but are being illegally screened out from employment consideration by personality tests that use a non-validated stereotype of the capabilities of persons with mental illness. Please see What Are The Issues, ADA, FFM and DSM, and Employment Assessments Are Designed to Reveal an Impairment.

Employment discrimination on the basis of mental illness affects all demographic groups. Mental illness is no respecter of age, gender, geography, income, occupation, military status, race, religion or sexual orientation. Persons with mental illness include military veterans returning to the civilian workforce, new and expectant mothers, LGBTs and young adults. Please see Tests Discriminate Against Returning Veterans, Tests Discriminate Against New and Expectant Mothers and Employment Tests Discriminate Against LGBTs.

The illegal screening out of applicants with mental illness has come at a cost of tens of billions of dollars to taxpayers and the U.S. Treasury. The prevalence of mental disorders has generally remained unchanged over the past 15 years and substantially increased rates of treatment should have resulted in a decline in the percentage of persons receiving disability awards who are diagnosed with mental illness. Sadly, no. People with psychiatric impairments constitute the largest and most rapidly growing subgroup of income support program awards (SSDI, SSI). Please see Costing Taxpayers Billions of Dollars Each Year.

Every year since 1999, more Americans have killed themselves than the year before, making suicide the nation’s greatest untamed cause of death. Being unemployed is associated with a 2-3X increase in the relative risk of death by suicide, compared with being employed. Given that more than 90% of persons who attempt suicide have mental illnesses, a tool like personality testing that illegally excludes persons with mental illness from employment consideration leads to an increase both in perceived burdensomeness and thwarted belongingness/social alienation, two critical elements tied to the risk of suicide. Please see Does the Rising Use of Employment Personality Tests Contribute to An Increase in Suicides?

Marketing by Scientific Buzzwords?

New entrants in the assessment field include ConectCubed, Good Co., Evolv, Knack and Prophesy Sciences They compete with incumbents like Kenexa (IBM), Kronos, SHL, Success Factors (SAP), and Taleo (Oracle). Marketing claims include:

Our games are fun, but our technology is rock solid. We design and develop our games using state-of-the-art behavioral science, then we use data-mining tools and massive amounts of data to validate and compute the Knacks you earn. (Knack)

We use a powerful combination of cognitive games, biometric signals, and machine learning algorithms to compile actionable insights about you and your teammates. (Prophesy Sciences)

We have combined decades of academic and business research with sophisticated statistical models to create our Proprietary Psychometric Algorithm. (Good Co.)

Evolv’s patent-pending technology platform unifies and supplements existing data from current systems, then utilizes that dataset to identify fact-based workforce insights that drive measurable ROI. (Evolv)

Kronos helps organizations find value in big data with enhanced analytics. (Kronos)

Public Action?

The EEOC has been attempting to investigate the use of Kronos assessment by Kroger for more than six years. As noted previously, that investigation has resulted in a number of court decisions, not on the substance of the claims of alleged discrimination, but on the requirement of Kronos and Kroger to provide the EEOC with relevant information regarding the assessment and its usage.

That investigation has evolved into a systemic investigation by the EEOC. Systemic investigations involves pattern or practice, policy, and/or class cases where the alleged discrimination has a broad impact on an industry, profession, company, or geographic area. As stated by the court in the 2012 appellate decisions referenced previously, it is “a proper inquiry for the EEOC to seek information about how these tests work, including information about the types of characteristics they screen out….“

In connection with systemic investigations, the EEOC’s enforcement tools include issuing broad information requests and subpoenas on employers that are named as respondents in EEOC charges, particularly when the EEOC suspects systemic discrimination, and filing pattern or practice class lawsuits in federal court. For some employers, the potential class size can be measured in the millions of plaintiffs.

The EEOC's systemic investigation of Kroger and Kronos, as well as its investigation of other employers and their assessment companies, is consistent with the EEOC's implementation of its Strategic Enforcement Plan (SEP) for 2013-2016. The first national priority of the SEP is “eliminating systemic barriers in recruitment and hiring.” The SEP goes on to state that “people with disabilities continue to confront discriminatory policies and practices at the recruitment and hiring stages. These include … the use of screening tools (e.g., pre-employment tests …)."

Disability discrimination, including employment assessment litigation, is a significant element of the EEOC's enforcement activities. As shown in the chart below, ADA claims constituted the largest percentage of the EEOC’s yearly litigation filing activity for FY 2013 - almost half of all cases.

The Importance of This Issue

The long-term fiscal stability of the United States of America depends, in part, on ensuring that Americans with disabilities have meaningful opportunities to contribute to our collective well-being and on eliminating outdated policies that keep people in cycles of poverty and dependency.

More than two decades after the passage of the ADA, the unemployment rate for Americans with disabilities stubbornly remains nearly double that of people without disabilities, while their rate of labor force participation has continued to be abysmally low. Figures from the Bureau of Labor Statistics show that labor force participation for workers with disabilities was 20.3 percent, while the total for workers without disabilities was 69.1 percent—more than three times higher.

There are many benefits of employment—work enhances skills such as communication, socialization, academics, physical health, and community skills; it factors into how one is perceived by society; it promotes economic well-being (reducing government expenditures on income support programs, Medicare and Medicaid); it leads to greater opportunity for upward mobility; and it contributes to greater self-esteem.