The genesis for this posting comes from the following articles:
This posting includes portions of the articles and modifies them to address issues relating to big data and the use of algorithmic decisionmaking in the area of pre-employment assessments and workforce optimization.
Embedding Bias
Every step in the big data pipeline raises concerns: the privacy implications of amassing, connecting, and using personal information, the implicit and explicit biases embedded in both datasets and algorithms, and the individual and societal consequences of the resulting classifications and segmentation.
While many companies and government agencies foster an illusion that classification is (or should be) an area of absolute algorithmic rule—that decisions are neutral, organic, and even automatically rendered without human intervention—reality is a far messier mix of technical and human curating.
Data isn't something that's abstract and value-neutral. Data only exists when it's collected, and collecting data is a human activity. And in turn, the act of collecting and analyzing data changes (one could even say "interprets") us.
Both the datasets and the algorithms reflect choices, among others, about data, connections, inferences, interpretation, and thresholds for inclusion that advance a specific purpose. Like maps that represent the physical environment in varied ways to serve different needs—mountaineering, sightseeing, or shopping—classification systems are neither neutral nor objective, but are biased toward their purposes. They reflect the explicit and implicit values of their designers.Assumptions are embedded in a data model upon its creation. Data sources are shaped through ‘washing’, integration, and algorithmic calculations in order to be commensurate to an acceptable level that allows a data set to be created.
Errors are not only possible, but they are likely to occur at each stage in the process of assessment that proceeds from identification to its conclusion in a discriminatory act. Error is inherent in the nature of the processes through which reality is represented as digitally encoded data. Some of these errors will be random, but most will reflect the biases inherent in the theories, and the goals, the instruments and the institutions that govern the collections of data in the first place.
Clear Windshield or Rearview Mirror?
The decisions made by the users of sophisticated analytics determine the provision, denial, enhancement, or restriction of the opportunities that citizens and consumers face both inside and outside formal markets.
Algorithms embody a profound deference to precedent; they draw on the past to act on (and enact) the future. The apparent omniscience of big data may in truth be nothing more than misdirection. Instead of offering a clear windshield, the big data phenomenon may be more like a big rear-view mirror telling us nothing about the future.
Does this deference to precedent result in a self-reinforcing and self-perpetuating system, where individuals are forever burdened by a history that they are encouraged to repeat and from which they are unable to escape? Does deference to past patterns augment path dependence, reduce individual choice, and result in cumulative disadvantage?
Already burdened segments of the population can become further victimized through the use of sophisticated algorithms in support of the identification, classification, segmentation, and targeting of individuals as members of analytically constructed groups. In creating these groups, the algorithms rely upon generalizations that lead to viewing people as members of populations, or categories, or groups, rather than as individuals (i.e., persons who live more than X miles from a jobsite).
Shrouding Opacity In The Guise of Legitimacy
Workforce analytic systems, designed in part to mitigate risks for employers, have now become sources of material risk. The systems create the perception of stability through probabilistic reasoning and the experience of accuracy, reliability, and comprehensiveness through automation and presentation. But in so doing, technology systems draw organizational attention away from uncertainty and partiality. They embed, and then justify, self-interested assumptions and hypotheses.
Moreover, they shroud opacity—and the challenges for oversight that opacity presents—in the guise of legitimacy, providing the allure of shortcuts and safe harbors for actors both challenged by resource constraints and desperate for acceptable means to demonstrate compliance with legal mandates and market expectations.
The technical language of workforce analytic systems obscures the accountability of the decisions they channel. Programming and mathematical idiom can shield layers of embedded assumptions from high-level firm decisionmakers charged with meaningful oversight and can mask important concerns with a veneer of transparency. This problem is compounded in the case of regulators outside the firm, who frequently lack the resources or vantage to peer inside buried decision processes and must instead rely on the resulting conclusions about risks and safeguards offered them by the parties they regulate.
Do We Regulate Algorithms, or Do Algorithms Regulate Us?
Can an algorithm be agnostic? Algorithms may be rule-based mechanisms that fulfill requests, but they are also governing agents that are choosing between competing, and sometimes conflicting, data objects.
The potential and pitfalls of an increasingly algorithmic world beg the question of whether legal and policy changes are needed to regulate our changing environment. Should we regulate, or further regulate, algorithms in certain contexts? What would such regulation look like? Is it even possible? What ill effects might regulation itself cause? Given the ubiquity of algorithms, do they, in a sense, regulate us?
We regulate markets, and market behavior, out of concerns for equity, as well as out of concern for efficiency. The fact that the impacts of design flaws are inequitably distributed is at least one basis for justifying regulatory intervention.
The regulatory challenge is to find ways to internalize the many external costs generated by the rapidly expanding use of analytics. That is, to find ways to force the providers and users of discriminatory technologies to pay the full social costs of their use. Requirements to warn, or otherwise inform users and their customers about the risks associated with the use of these systems should not absolve system producers of their own responsibility for reducing or mitigating the harms. This is part of imposing economic burdens or using incentives as tools to shape behavior most efficiently and effectively.