Employment Testing: Failing to Make the Grade: machine learning

Showing posts with label machine learning. Show all posts

Monday, August 25, 2014

Sound and Fury, Signifying Nothing

Incorporating elements of gamification, big data, machine learning, and predictive human analytics, Knack is a veritable buzzword oasis. According to Knack, their games are designed to test cognitive skills that employers might want, drawing on some of the latest scientific research. These range from pattern recognition to emotional intelligence, risk appetite and adaptability to changing situations.

John Funge, Knack's CTO, states that "we have used our games to infer cognitive ability, conscientiousness, leadership potential, creativity as well as predict how people would perform as surgeons, management consultants, and innovators." In an Economist article, Chris Chabris, a Knack executive, states that games have huge advantages over traditional recruitment tools, such as personality tests, which can easily be outwitted by an astute candidate. Many more things can be tested quickly and performance can't be faked on Knack's games, he says.

Gary Halfteck, Knack's founder and CEO, says playing a video game can be a better representation of who you are and your skill sets than an employer might get in a one-on-one conversation. "As people, we make many decisions that are biased, whether it's consciously or subconsciously, and we have no good tools to assess and evaluate, let alone predict, what one's potential is," he says.

If Knack's CEO admits that people make many decisions that are biased, what prevents the people at Knack from being biased in the creation, development and implementation of their games? Further, what prevents employers using Knack from being held liable for the biases of those games? The answer to both questions: Nothing.

Algorithmic Illusion

While many companies foster an illusion that scoring/classification is an area of absolute algorithmic rule—that decisions are neutral, organic, and even automatically rendered without human intervention—reality is a far messier mix of technical and human curating. Both the datasets and the algorithms used to analyze the data reflect choices, among others, about connections, inferences, and interpretation.

The recent White House report, “Big Data: Seizing Opportunities, Preserving Values," found that, "while big data can be used for great social good, it can also be used in ways that perpetrate social harms or render outcomes that have inequitable impacts, even when discrimination is not intended."

The fact sheet accompanying the White House report warns:

As more decisions about our commercial and personal lives are determined by algorithms and automated processes, we must pay careful attention that big data does not systematically disadvantage certain groups, whether inadvertently or intentionally. We must prevent new modes of discrimination that some uses of big data may enable, particularly with regard to longstanding civil rights protections in housing, employment, and credit.

Some of the most profound challenges revealed by the White House Report concern how data analytics may lead to disparate inequitable treatment, particularly of disadvantaged groups, or create such an opaque decision-making environment that individual autonomy is lost in an impenetrable set of algorithms. Please see Knack Testing Illegal Under ADA?

Systemic Risk

Workforce assessment systems like Knack's games, designed in part to mitigate risks for employers, are becoming sources of material risk, both to job applicants and employers. The systems create the perception of stability through probabilistic reasoning and the experience of accuracy, reliability, and comprehensiveness through automation and presentation. But in so doing, technology systems draw attention away from uncertainty and partiality.

While Knack's approach may help reduce an employer's hiring costs and may reduce the impact of overtly biased or discriminatory behavior, the inclusion of one or more potentially "defective components" in the assessments means that employers face the risk that a finding of bias or discrimination of a Knack assessment used by one employer will put all employers that use the assessment at risk. Please see When the First Domino Falls: Consequences to Employers of Embracing Workforce Assessment Solutions.

These "defective components" in assessments may be either design defects (i.e., the adoption and use of certain personality models) or manufacturing defects (i.e., coding errors in the assessment software). The latter is analogous to the coding error at 23andMe that resulted in notices going out to some customers informing them that they had a chronic and life-shortening condition when they did not. Please see On Not Dying Young: Fatal Illness or Flawed Algorithm?

Each day an employer continues to use the Knack assessment, there are more potential plaintiffs with claims against that employer. Labor and employment laws like Title VII and the ADA, permit an employer to use a third party like Knack to undertake the assessment of job applicants. The use of a third party, however, does not insulate an employer from any claims arising from the assessment usage. Under those laws, an employer is responsible (and liable) for any failures on the part of an assessment or assessment provider to comply with the provisions of those laws.

No Silver Bullet

Just as concerns about scoring systems are heightened, their human element is diminishing. Although software engineers initially identify the correlations and inferences programmed into algorithms, machine learning, predictive analytics, and big data promises to eliminate the human “middleman” at some point in the process.

As Hector J. Levesque, a professor at the University of Toronto and a founding member of the American Association of Artificial Intelligence, wrote:

"As a field, I believe that we tend to suffer from what might be called serial silver bulletism, defined as follows:

the tendency to believe in a silver bullet for AI, coupled with the belief that previous beliefs about silver bullets were hopelessly naıve.

We see this in the fads and fashions of AI research over the years: first, automated theorem proving is going to solve it all; then, the methods appear too weak, and we favour expert systems; then the programs are not situated enough, and we move to behaviour-based robotics; then we come to believe that learning from big data is the answer; and on it goes."

Similarly, employment assessment companies like Knack market the benefits of science, precision and data over the past fifteen years under the guise of neural networks, artificial intelligence, big data and deep learning, yet what has changed? Employee engagement levels have hardly budged and employee turnover remains a continuing and expensive challenge for employers. Please see Gut Check: How Intelligent is Artificial Intelligence?

Monday, July 29, 2013

Workforce Science: A Critical Look at Big Data and the Selection and Management of Employees

This post takes a critical look at the application of "Big Data" to company human resources; specifically, the selection and management of employees.

The post focuses on one company, Evolv, that describes itself as "a workforce science software company that harnesses big data, predictive analytics and cloud computing to help businesses improve workplace productivity and profitability." Evolv was selected not because it is sui generis; rather, it is emblematic of numerous companies, from start-ups to well-established companies that market "workforce science" to employers.

According to Evolv, workforce science:

[I]dentifies the characteristics of the most qualified, productive employees within an hourly workforce throughout the employee lifecycle. By using objective, data-driven methodologies and machine learning, Evolv enables operational and financial executives to make better business decisions that result in millions of dollars in savings each year.

The Evolv graphic below is intended to illustrate the process of workforce science.

The steps listed in this flowchart serve as the titles of the sub-headings below.

New and Existing Data:

Companies capture and store workforce data.

Questions arise concerning the nature of this data, its accuracy and usefulness. Most companies have vast amounts of HR data (employee demographics, performance ratings, talent mobility data, training completed, age, academic history, etc.) but they are in no position to use it. According to Bersin by Deloitte, an HR research and consultancy organization, only 20% of the companies believe that the data they capture now (let alone historically) is highly credible and reliable for decision-making in their own organization.

The complexity of working with myriad data types and myriad, often incompatible, systems was underscored by Dat Tran of the U.S. Department of Veterans Affairs at the 2013 MIT Chief Data Officer and Information Quality Symposium. "The VA does not have an integrated data environment; we have myriad systems and databases, and enterprise data standards do not exist. There is no 360-degree view of the customer," Tran said in a discussion of the obstacles facing an agency dealing with 11 petabytes of data and 6.3 million patients.

"Bad data" is data that has not been collected accurately or consistently or the data has been defined differently from person to person, group to group and company to company. In the recruitment and hiring context, unproctored online tests allow an applicant to take the test anywhere and anytime. That freedom creates conditions ripe for obtaining "bad' data. As stated by Jim Beaty, PhD, and Chief Science Officer at testing company Previsor:

Applicants who want to cheat on the test can employ a number of strategies to beat the test, including logging in for the test multiple times to practice or get the answers, colluding with another persons while completing the test, or hiring a test proxy to take the test.

And what about the accuracy of tests responses from those who are hired? Analyzing a sample of over 31,000 employees, Evolv found that employees who said they were most likely to follow the rules left the job on average 10% earlier, were 3% less likely to close a sale and were actually not particularly good at following rules.

Decision-makers increasingly face computer-generated information and analyses that could be collected and analyzed in no other way. Precisely for that reason, going behind that output is out of the question, even if one has good cause to be suspicious. In short, the computer analysis becomes a credible reference point although based on poor data.

Supplement

Psychometric tools gather predictive data on the workforce throughout the employment lifecycle

Psychometrics is the field of study concerned with the theory and technique of psychological measurement, which includes the measurement of knowledge, abilities, attitudes, personality traits, and educational measurement. The field is primarily concerned with the construction and validation of measurement instruments such as questionnaires, tests, and personality assessments.

To what extent are psychometric tools accurate predictors of behavior or performance?

According to a 2012 study by Oracle and Development Dimensions International (DDI), a global human resources consulting firm whose expertise includes designing and implementing selection systems, more than 250 staffing directors and over 2,000 new hires from 28 countries provided the following perspectives on their organization’s selection processes (the following are excerpts from the study):

[O]nly 41 percent of staffing directors report that their pre-employment assessments are able to predict better hires.
Only half of staffing directors rate their systems as effective, and even fewer view them as aligned, objective, flexible, efficient, or integrated.
[T]he actual process for making a hiring decision is less effective than a coin toss.

In a 2007 article titled, “Reconsidering the Use of Personality Tests in Employment Contexts”, co-authored by six current or former editors of academic psychological journals, Dr. Kevin Murphy, Professor of Psychology at Pennsylvania State University and Editor of the Journal of Applied Psychology (1996-2002), states:

The problem with personality tests is … that the validity of personality measures as predictors of job performance is often disappointingly low. A couple of years ago, I heard a SIOP talk by Murray Barrick … He said, “If you took all the … [factors], measured well, you corrected for everything using the most optimistic corrections you could possibly get, you could account for about 15% of the variance in performance [between projected and actual performance].” … You are saying that if you take normal personality tests, putting everything together in an optimal fashion and being as optimistic as possible, you’ll leave 85% of the variance unaccounted for. The argument for using personality tests to predict performance does not strike me as convincing in the first place.

Cleanse and Upload

Structured and Unstructured Data Is Aggregated

Data isn't something that's abstract and value-neutral. Data only exists when it's collected, and collecting data is a human activity. And in turn, the act of collecting and analyzing data changes (one could even say "interprets") us.

Workforce science requires enormous amounts of historic or legacy data. This data has to be consolidated from a number of disparate source systems within each company, each with their specific data environment and particular brand of business logic. That data consolidation then must be replicated across hundreds or thousands of companies.

Structured data refers to data that is identifiable because it is organized in a structure. The most common form of structured data is a database where specific information is stored based on a methodology of columns and rows (i.e., Excel). Structured data is understood by computers and is also efficiently organized for human readers.

Unstructured data refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner. This results in irregularities and ambiguities that make it difficult to understand using traditional computer programs as compared to structured data.

Unstructured data consists of two basic categories; textual objects (based on written or printed language, such as emails or Word documents); and bitmap objects (non-language based, such as image, video or audio files).

There are many types of techniques that need to be put together in a complex data processing flow utilizing unstructured data. These techniques include:

information extraction (to produce structured records from text or semi-structured data)
cleansing and normalization (to be able to even compare string values of the same type, such as a dollar amount or a job title)
entity resolution (to link records that correspond to the same real-world entity or that are related via some other type of semantic relationship)
mapping (to bring the extracted and linked records to a uniform schematic representation)
data fusion (to merge all the related facts into one integrated, clean object)

Assumptions are embedded in a data model upon its creation. Data sources are shaped through ‘washing’, integration, and algorithmic calculations in order to be commensurate to an acceptable level that allows a data set to be created. By the time the data are ready to be used, they are already ‘at several degrees of remove from the world.’

Data is never raw; it’s always structured according to somebody’s predispositions and values. The end result looks disinterested, but, in reality, there are value choices all the way through, from construction to interpretation.

Analyze and Predict

Data Analyzed Using Machine Learning and Predictive Algorithms

The theory of big data is to have no theory, at least about human nature. One just gathers huge amounts of information, observes the patterns and estimates probabilities about how people will act in the future. One does not address causality.

In linear systems, cause and effect is much easier to pinpoint. However, the world around us is considered a complex system where there are often multiple variables pushing an outcome to occur. Nigel Goldenfeld, a professor of physics at University Illinois, sums it up best: “For every event that occurs, there are a multitude of possible causes, and the extent to which each contributes to the event is not clear.”

Algorithms and big data are powerful tools. Wisely used, they can help match the right people with the right jobs. But they must be designed and used by humans, or they can go very wrong. As David Brooks wrote in the New York Times:

Data creates bigger haystacks. This is a point Nassim Taleb, the author of “Antifragile,” has made. As we acquire more data, we have the ability to find many, many more statistically significant correlations. Most of these correlations are spurious and deceive us when we’re trying to understand a situation. Falsity grows exponentially the more data we collect.

There’s a saying in artificial intelligence circles that techniques like machine learning can very quickly get you 80% of the way to solving just about any (real world) problem, but going beyond 80% is extremely hard, maybe even impossible. The Netflix Challenge is a case in point: hundreds of the best researchers in the world worked on the problem for 2 years and the winning team got a 10% improvement over Netflix’s in-house algorithm.

A corollary of the above saying is that it is very rare for startup companies to ever have a competitive advantage because of their machine learning algorithms. If a worldwide concerted effort can only improve Netflix’s algorithm by 10%, how likely are 4 people in an R&D department in a startup going to have a significant breakthrough. Modern machine algorithms are the product of thousands of academics and billions of dollars of R&D and are generally only improved upon at the margins by individual companies.

Some of the best and brightest organizations have recognized that improvement, if any, in machine learning comes from outside the organization. Facebook, Ford, GE and other companies have run contests for data-science challenges on Kaggle, while NASA and other government agencies, as well as the Harvard Business School, have taken the crowdsource route on Topcoder.

Abhishek Shivkumar of IBM Watson Labs has listed the top ten problems for machine learning in 2013. These problems include churn prediction, truth and veracity, scalability and intelligent learning. This doesn’t mean machine learning isn’t ever useful – it just means one needs to apply it to contexts that are fault tolerant: for example, online ad targeting, ranking search results, recommendations, and spam filtering. Applying machine learning concepts in the context of persons livelihoods (and, potentially, lives) is problematic, not just for the individual applicant or employee but also for the employer and its potential liability exposure.

Big Data Networking

Results Are Benchmarked Against Big Data Network

According to Evolv, its network

extracts learnings and insight from the millions of real time talent data points – from across the Evolv client base, or Network – streaming in and out of the Evolv platform every single day, week and month.

Some of the real time data points streaming in and out of the Evolv Network are data from the use of pre-employment personality tests administered to applicants and employees. As set out in the posts What Are The Issues?, ADA, FFM and DSM and Employment Assessment Are Designed to Reveal An Impairment, tests utilizing the Five-Factor Model of personality may be considered illegal medical examinations under the Americans with Disabilities Act (ADA).

Consequently, information obtained from those tests is confidential medical information, the use of which is subject to strict limits. Regulations require that confidential information be kept on separate forms and in separate files and it may not be intermingled with other information - i.e., shared with third parties on the Evolv network.

Not only are employers subject to claims by applicants and employees alleging breach of the confidentiality provisions of the ADA, the use of confidential medical information in assessment, hiring and other human resource function may have created a virus that has "infected" a variety of databases, applications and software solutions utilized by the employer.

Costs to employers from the illegal use of confidential medical information include damages payable to applicants and employees, defense transaction costs (i.e., legal fees), and costs to "sanitize" infected databases, applications and software solutions. As set out in the Damages and Indemnification Challenges for Employers post, even if companies like Evolv are willing to provide indemnification to all customers, those customers will have to determine whether the company and its insurers have adequate resources to indemnify all customers.

Evaluate

Analyzed Data Reveals Insights That Drive Workforce Performance and Retention

Act

The Impact of Insights Are Quantified and Used to Inform Decision-making

As noted in a prior post, prejudice does not rise from malice or hostile animus alone. It may result as well from insensitivity caused by simple want of careful, rational reflection.

For example, take two insights from Evolv:

Living in close proximity to the job site and having access to reliable transportation—are correlated with reduced attrition and better performance; and
Referred employees have 10% longer tenure than non-referred employees and demonstrate approximately equal performance.

An employer confronted with these two insights might well determine that (i) applicants living beyond a certain distance from the job site (i.e., retail store) should be excluded from employment consideration and (ii) preference in hiring should be extended to applicants referred by existing employees.

Painting with the broad brush of distance from job site will result in well-qualified applicants being excluded, applicants who might have ended up being among the longest tenured of employees. Remember that the Evolv insight is a generalized correlation (i.e., persons living closer to the job site tend to have longer tenure than persons living farther from the job site). The insight says nothing about any particular applicant.

As a consequence, employers will pass over qualified applicants solely because they live (or don't live) in certain areas. Not only does the employer do a disservice to itself and the applicant, they increase the risk of employment litigation, with its consequent costs. How?

A recent New York Time article, "In Climbing Income Ladder, Location Matters," reads, in part:

Her nearly four-hour round-trip [job commute] stems largely from the economic geography of Atlanta, which is one of America’s most affluent metropolitan areas yet also one of the most physically divided by income. The low-income neighborhoods here often stretch for miles, with rows of houses and low-slung apartments, interrupted by the occasional strip mall, and lacking much in the way of good-paying jobs

The dearth of good-paying jobs in low-income neighborhoods means that residents of those neighborhoods have a longer commute. The 2010 Census showed that poverty rates are much higher for blacks and Hispanics. Consequently, hiring decisions predicated on distance, intentionally or not, discriminate against certain races.

Similarly, an employer extending a hiring preference to referrals of existing employees may be further exacerbating the discriminatory impact of its hiring process. Those referrals tend to be persons from the same neighborhoods and socioeconomic backgrounds of existing employees, meaning that workforce diversity, broadly considered, will decline.

With the huge amounts of "bad" data that get generated and stored daily, the failure to understand how to leverage the data in a practical way that has business benefit will increasingly lead to shaky insights and faulty decision-making, with significant costs to applicants, employees , employers and society.

Optimize

Closed-Loop Optimization Constantly Analyzes and Refines Insights

According to Evolv, "closed-loop optimization is the process of using Big Data analytics to determine the outcomes of the assessments and other data collected, and then using the knowledge gained to make ever more effective assessments." Click on this link for an Evolv video that describes the closed-loop optimization process.

The challenge in using a closed-loop optimization process for hiring and employment decisions is that those decisions do not fit within a closed loop. Take for example the Evolv insight that living in close proximity to the job site are correlated with reduced attrition and better performance. Over time, the closed-loop optimization process for that insight means that a growing percentage of the workforce lives in close proximity to the job site. Excellent. Less attrition and better performance across jobsite.

That closed loop, however, does not account for factors like the element of time and the relative immobility of persons and companies. Businesses tend to be clustered; they are not evenly spread throughout the geography. If all businesses in a particular area focus on hiring applicants in close proximity, costs will increase (greater demand for the same number of applicants), employee turnover will increase (since the number of geographically-proximate employees changes slowly) and profitability will decrease (higher wage costs combined with greater turnover).

When two variables, A and B, are found to be correlated, there are several possibilities:

A causes B
B causes A
A causes B at the same time as B causes A (a self-reinforcing system)
Some third factor causes both A and B

The correlation is simple coincidence. It is wrong to assume any of these possibilities. Evolv, however, assumes that A (proximity to job site) causes B (reduced attrition and better performance). Therefore, employers should hire applicants who live closer to the job site.

The correlation could also demonstrate B (reduced attrition and better performance) is caused by C (proximity of job site to applicants homes). Instead of being a hiring insight, the correlation might function better as being a job site location insight. Given the relative immobility of persons and companies, locating a job site (call center, etc.) close to communities with high numbers of lower-income persons could lead to a more sustainable competitive advantage.

As David Brooks wrote, "Data struggles with context. Human decisions are not discrete events. They are embedded in sequences and contexts. ... Data analysis is pretty bad at narrative and emergent thinking, and it cannot match the explanatory suppleness of even a mediocre novel."

Executives and managers frequently hear about some new software billed as the “next big thing.” They call the software provider and say, “We heard you have a great tool and we’d like a demonstration.” The software is certainly seductive with its bells and whistles, but its effectiveness and usefulness depend upon the validity of the information going in and how the people actually work with it over time. Having a tool is great, but remember that a fool with a tool is still a fool (and sometimes a dangerous fool).