Monday, July 29, 2013

Workforce Science: A Critical Look at Big Data and the Selection and Management of Employees

This post takes a critical look at the application of "Big Data" to company human resources; specifically, the selection and management of employees.

The post focuses on one company, Evolv, that describes itself as "a workforce science software company that harnesses big data, predictive analytics and cloud computing to help businesses improve workplace productivity and profitability."  Evolv was selected not because it is sui generis; rather, it is emblematic of numerous companies, from start-ups to well-established companies that market "workforce science" to employers.

According to Evolv, workforce science:
[I]dentifies the characteristics of the most qualified, productive employees within an hourly workforce throughout the employee lifecycle. By using objective, data-driven methodologies and machine learning, Evolv enables operational and financial executives to make better business decisions that result in millions of dollars in savings each year. 
The Evolv graphic below is intended to illustrate the process of workforce science.

The steps listed in this flowchart serve as the titles of the sub-headings below.

New and Existing Data:
Companies capture and store workforce data.

Questions arise concerning the nature of this data, its accuracy and usefulness. Most companies have vast amounts of HR data (employee demographics, performance ratings, talent mobility data, training completed, age, academic history, etc.) but they are in no position to use it. According to Bersin by Deloitte, an HR research and consultancy organization, only 20% of the companies believe that the data they capture now (let alone historically) is highly credible and reliable for decision-making in their own organization.

The complexity of working with myriad data types and myriad, often incompatible, systems was underscored by Dat Tran of the U.S. Department of Veterans Affairs at the 2013 MIT Chief Data Officer and Information Quality Symposium. "The VA does not have an integrated data environment; we have myriad systems and databases, and enterprise data standards do not exist. There is no 360-degree view of the customer," Tran said in a discussion of the obstacles facing an agency dealing with 11 petabytes of data and 6.3 million patients.  

"Bad data" is data that has not been collected accurately or consistently or the data has been defined differently from person to person, group to group and company to company. In the recruitment and hiring context, unproctored online tests allow an applicant to take the test anywhere and anytime. That freedom creates conditions ripe for obtaining "bad' data. As stated by Jim Beaty, PhD, and Chief Science Officer at testing company Previsor:
Applicants who want to cheat on the test can employ a number of strategies to beat the test, including logging in for the test multiple times to practice or get the answers, colluding with another persons while completing the test, or hiring a test proxy to take the test.
And what about the accuracy of tests responses from those who are hired? Analyzing a sample of over 31,000 employees, Evolv found that employees who said they were most likely to follow the rules left the job on average 10% earlier, were 3% less likely to close a sale and were actually not particularly good at following rules.

Decision-makers increasingly face computer-generated information and analyses that could be collected and analyzed in no other way. Precisely for that reason, going behind that output is out of the question, even if one has good cause to be suspicious. In short, the computer analysis becomes a credible reference point although based on poor data.

Psychometric tools gather predictive data on the workforce throughout the employment lifecycle

Psychometrics is the field of study concerned with the theory and technique of psychological measurement, which includes the measurement of knowledge, abilities, attitudes, personality traits, and educational measurement. The field is primarily concerned with the construction and validation of measurement instruments such as questionnaires, tests, and personality assessments.

To what extent are psychometric tools accurate predictors of behavior or performance?

According to a 2012 study by Oracle and Development Dimensions International (DDI), a global human resources consulting firm whose expertise includes designing and implementing selection systems, more than 250 staffing directors and over 2,000 new hires from 28 countries provided the following perspectives on their organization’s selection processes (the following are excerpts from the study):
  • [O]nly 41 percent of staffing directors report that their pre-employment assessments are able to predict better hires.
  • Only half of staffing directors rate their systems as effective, and even fewer view them as aligned, objective, flexible, efficient, or integrated. 
  • [T]he actual process for making a hiring decision is less effective than a coin toss.

In a 2007 article titled, “Reconsidering the Use of Personality Tests in Employment Contexts”, co-authored by six current or former editors of academic psychological journals, Dr. Kevin Murphy, Professor of Psychology at Pennsylvania State University and Editor of the Journal of Applied Psychology (1996-2002), states:

The problem with personality tests is … that the validity of personality measures as predictors of job performance is often disappointingly low. A couple of years ago, I heard a SIOP talk by Murray Barrick … He said, “If you took all the … [factors], measured well, you corrected for everything using the most optimistic corrections you could possibly get, you could account for about 15% of the variance in performance [between projected and actual performance].” … You are saying that if you take normal personality tests, putting everything together in an optimal fashion and being as optimistic as possible, you’ll leave 85% of the variance unaccounted for. The argument for using personality tests to predict performance does not strike me as convincing in the first place.

Cleanse and Upload
Structured and Unstructured Data Is Aggregated

Data isn't something that's abstract and value-neutral. Data only exists when it's collected, and collecting data is a human activity. And in turn, the act of collecting and analyzing data changes (one could even say "interprets") us. 

Workforce science requires enormous amounts of historic or legacy data. This data has to be consolidated from a number of disparate source systems within each company, each with their specific data environment and particular brand of business logic. That data consolidation then must be replicated across hundreds or thousands of companies.

Structured data refers to data that is identifiable because it is organized in a structure. The most common form of structured data is a database where specific information is stored based on a methodology of columns and rows (i.e., Excel). Structured data is understood by computers and is also efficiently organized for human readers.  

Unstructured data refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner. This results in irregularities and ambiguities that make it difficult to understand using traditional computer programs as compared to structured data.

Unstructured data consists of two basic categories; textual objects (based on written or printed language, such as emails or Word documents); and bitmap objects (non-language based, such as image, video or audio files).

There are many types of techniques that need to be put together in a complex data processing flow utilizing unstructured data. These techniques include
  • information extraction (to produce structured records from text or semi-structured data)
  • cleansing and normalization (to be able to even compare string values of the same type, such as a dollar amount or a job title)
  • entity resolution (to link records that correspond to the same real-world entity or that are related via some other type of semantic relationship)
  • mapping (to bring the extracted and linked records to a uniform schematic representation)
  • data fusion (to merge all the related facts into one integrated, clean object)
Assumptions are embedded in a data model upon its creation. Data sources are shaped through ‘washing’, integration, and algorithmic calculations in order to be commensurate to an acceptable level that allows a data set to be created. By the time the data are ready to be used, they are already ‘at several degrees of remove from the world.’

Data is never raw; it’s always structured according to somebody’s predispositions and values. The end result looks disinterested, but, in reality, there are value choices all the way through, from construction to interpretation.

Analyze and Predict
Data Analyzed Using Machine Learning and Predictive Algorithms

The theory of big data is to have no theory, at least about human nature. One just gathers huge amounts of information, observes the patterns and estimates probabilities about how people will act in the future. One does not address causality.

In linear systems, cause and effect is much easier to pinpoint. However, the world around us is considered a complex system where there are often multiple variables pushing an outcome to occur. Nigel Goldenfeld, a professor of physics at University Illinois, sums it up best: “For every event that occurs, there are a multitude of possible causes, and the extent to which each contributes to the event is not clear.”

Algorithms and big data are powerful tools. Wisely used, they can help match the right people with the right jobs. But they must be designed and used by humans, or they can go very wrong. ADavid Brooks wrote in the New York Times:
Data creates bigger haystacks. This is a point Nassim Taleb, the author of “Antifragile,” has made. As we acquire more data, we have the ability to find many, many more statistically significant correlations. Most of these correlations are spurious and deceive us when we’re trying to understand a situation. Falsity grows exponentially the more data we collect. 
There’s a saying in artificial intelligence circles that techniques like machine learning can very quickly get you 80% of the way to solving just about any (real world) problem, but going beyond 80% is extremely hard, maybe even impossible. The Netflix Challenge is a case in point: hundreds of the best researchers in the world worked on the problem for 2 years and the winning team got a 10% improvement over Netflix’s in-house algorithm.

A corollary of the above saying is that it is very rare for startup companies to ever have a competitive advantage because of their machine learning algorithms.  If a worldwide concerted effort can only improve Netflix’s algorithm by 10%, how likely are 4 people in an R&D department in a startup going to have a significant breakthrough.  Modern machine algorithms are the product of thousands of academics and billions of dollars of R&D and are generally only improved upon at the margins by individual companies.

Some of the best and brightest organizations have recognized that improvement, if any, in machine learning comes from outside the organization. Facebook, Ford, GE and other companies have run contests for data-science challenges on Kaggle, while NASA and other government agencies, as well as the Harvard Business School, have taken the crowdsource route on Topcoder.

Abhishek Shivkumar of IBM Watson Labs has listed the top ten problems for machine learning in 2013. These problems include churn prediction, truth and veracity, scalability and intelligent learning. This doesn’t mean machine learning isn’t ever useful – it just means one needs to apply it to contexts that are fault tolerant:  for example, online ad targeting, ranking search results, recommendations, and spam filtering.  Applying machine learning concepts in the context of persons livelihoods (and, potentially, lives) is problematic, not just for the individual applicant or employee but also for the employer and its potential liability exposure

Big Data Networking
Results Are Benchmarked Against Big Data Network

According to Evolv, its network
extracts learnings and insight from the millions of real time talent data points – from across the Evolv client base, or Network – streaming in and out of the Evolv platform every single day, week and month.  
Some of the real time data points streaming in and out of the Evolv Network are data from the use of pre-employment personality tests administered to applicants and employees. As set out in the posts What Are The Issues?, ADA, FFM and DSM and Employment Assessment Are Designed to Reveal An Impairment, tests utilizing the Five-Factor Model of personality may be considered illegal medical examinations under the Americans with Disabilities Act (ADA).

Consequently, information obtained from those tests is confidential medical information, the use of which is subject to strict limits. Regulations require that confidential information be kept on separate forms and in separate files and it may not be intermingled with other information - i.e., shared with third parties on the Evolv network.

Not only are employers subject to claims by applicants and employees alleging breach of the confidentiality provisions of the ADA, the use of confidential medical information in assessment, hiring and other human resource function may have created a virus that has "infected" a variety of databases, applications and software solutions utilized by the employer.

Costs to employers from the illegal use of confidential medical information include damages payable to applicants and employees, defense transaction costs (i.e., legal fees), and costs to "sanitize" infected databases, applications and software solutions. As set out in the Damages and Indemnification Challenges for Employers post, even if companies like Evolv are willing to provide indemnification to all customers, those customers will have to determine whether the company and its insurers have adequate resources to indemnify all customers.

Analyzed Data Reveals Insights That Drive Workforce Performance and Retention


The Impact of Insights Are Quantified and Used to Inform Decision-making

As noted in a prior post, prejudice does not rise from malice or hostile animus alone. It may result as well from insensitivity caused by simple want of careful, rational reflection.

For example, take two insights from Evolv:

  1. Living in close proximity to the job site and having access to reliable transportation—are correlated with reduced attrition and better performance; and
  2. Referred employees have 10% longer tenure than non-referred employees and demonstrate approximately equal performance.
An employer confronted with these two insights might well determine that (i) applicants living beyond a certain distance from the job site (i.e., retail store) should be excluded from employment consideration and (ii) preference in hiring should be extended to applicants referred by existing employees.

Painting with the broad brush of distance from job site will result in well-qualified applicants being excluded, applicants who might have ended up being among the longest tenured of employees. Remember that the Evolv insight is a generalized correlation (i.e., persons living closer to the job site tend to have longer tenure than persons living farther from the job site). The insight says nothing about any particular applicant.

As a consequence, employers will pass over qualified applicants solely because they live (or don't live) in certain areas. Not only does the employer do a disservice to itself and the applicant, they increase the risk of employment litigation, with its consequent costs. How?

A recent New York Time article, "In Climbing Income Ladder, Location Matters," reads, in part:

Her nearly four-hour round-trip [job commute] stems largely from the economic geography of Atlanta, which is one of America’s most affluent metropolitan areas yet also one of the most physically divided by income. The low-income neighborhoods here often stretch for miles, with rows of houses and low-slung apartments, interrupted by the occasional strip mall, and lacking much in the way of good-paying jobs
The dearth of good-paying jobs in low-income neighborhoods means that residents of those neighborhoods have a longer commute. The 2010 Census showed that poverty rates are much higher for blacks and Hispanics. Consequently, hiring decisions predicated on distance, intentionally or not, discriminate against certain races.

Similarly, an employer extending a hiring preference to referrals of existing employees may be further exacerbating the discriminatory impact of its hiring process. Those referrals tend to be persons from the same neighborhoods and socioeconomic backgrounds of existing employees, meaning that workforce diversity, broadly considered, will decline.

With the huge amounts of "bad" data that get generated and stored daily, the failure to understand how to leverage the data in a practical way that has business benefit will increasingly lead to shaky insights and faulty decision-making, with significant costs to applicants, employees , employers  and society.

Closed-Loop Optimization Constantly Analyzes and Refines Insights

According to Evolv, "closed-loop optimization is the process of using Big Data analytics to determine the outcomes of the assessments and other data collected, and then using the knowledge gained to make ever more effective assessments." Click on this link for an Evolv video that describes the closed-loop optimization process.

The challenge in using a closed-loop optimization process for hiring and employment decisions is that those decisions do not fit within a closed loop. Take for example the Evolv insight that living in close proximity to the job site are correlated with reduced attrition and better performance. Over time, the closed-loop optimization process for that insight means that a growing percentage of the workforce lives in close proximity to the job site. Excellent. Less attrition and better performance across jobsite.

That closed loop, however, does not account for factors like the element of time and the relative immobility of persons and companies. Businesses tend to be clustered; they are not evenly spread throughout the geography. If all businesses in a particular area focus on hiring applicants in close proximity, costs will increase (greater demand for the same number of applicants), employee turnover will increase (since the number of geographically-proximate employees changes slowly) and profitability will decrease (higher wage costs combined with greater turnover).

When two variables, A and B, are found to be correlated, there are several possibilities:
  • A causes B
  • B causes A
  • A causes B at the same time as B causes A (a self-reinforcing system)
  • Some third factor causes both A and B

The correlation is simple coincidence. It is wrong to assume any of these possibilities. Evolv, however, assumes that A (proximity to job site) causes B (reduced attrition and better performance). Therefore, employers should hire applicants who live closer to the job site. 

The correlation could also demonstrate B (reduced attrition and better performance) is caused by C (proximity of job site to applicants homes). Instead of being a hiring insight, the correlation might function better as being a job site location insight. Given the relative immobility of persons and companies, locating a job site (call center, etc.) close to communities with high numbers of lower-income persons could lead to a more sustainable competitive advantage.

As David Brooks wrote, "Data struggles with context. Human decisions are not discrete events. They are embedded in sequences and contexts. ... Data analysis is pretty bad at narrative and emergent thinking, and it cannot match the explanatory suppleness of even a mediocre novel."

Executives and managers frequently hear about some new software billed as the “next big thing.” They call the software provider and say, “We heard you have a great tool and we’d like a demonstration.” The software is certainly seductive with its bells and whistles, but its effectiveness and usefulness depend upon the validity of the information going in and how the people actually work with it over time. Having a tool is great, but remember that a fool with a tool is still a fool (and sometimes a dangerous fool).

No comments:

Post a Comment

Because I value your thoughtful opinions, I encourage you to add a comment to this discussion. Don't be offended if I edit your comments for clarity or to keep out questionable matters, however, and I may even delete off-topic comments.