Big Data, it seems, is suddenly very big. Among the social scientists with whom I spend time, newly massive, deep-tissue-massaged bodies of data have found currency. As a research tool, the emergent technique seems to promise a rehabilitation of conventional, sometimes dismayingly narrow, quantitative analysis because it involves the use not just of MORE raw material but also of unprecedentedly nuanced software. So, unlike old “Small Data” projects, the empiricism of Big Data research feels like it is rooted in an especially flexible and expansive kind of inquiry. As more and more media, public and private institutions, and cultural enterprises of all kinds operate on-line, the idea that our research subject (manipulated data) and method (manipulating data) shall coincide seduces. But perhaps caution is advised.
I recently attended a social science workshop in which the taxonomic, counting, and graphing choices being made with Big Data seemed to be tripping along with a minimum of criticality and reflexivity. Not one among the sociologists, anthropologists, and cultural historians attending suggested that the new scale of data-collection and warp speed of data-crunching might hold totalizing risks for the analyst. In the bigger-data-sets-are-better atmosphere, Foucault’s point that in rendering a subject knowable we reproduce power seemed lost amidst the intoxicating possibility of…the comprehensive. That this feature of Big Data holds profoundly political implications became clear to me when I read a piece in yesterday’s New York Times by Matt Richtel on the role of Big Data in enhancing inclusion in STEM.
“I Was Discovered by An Algorithm” is not about the social sciences per se, but it is about the use of extraordinarily large data sets for ostensibly value-laden purposes. The article introduces readers to “work-force science,” a new-ish field in which human resources personnel mine massive amounts of data to determine both which sorts of qualification and which individuals may best suit a particular job category or position. In the case of computing professions, the growth of on-line code sharing and programming provides a ready-made body of data that can reveal, proponents say, unrecognized talent. This system supposedly corrects for social biases triggered by our faces or resumes to expand hiring pools and individuals’ opportunities, alike.
But the notion of hidden STEM talent is one I’ve long been concerned about and its mention here alerted me to a conservative deployment of Big Data. Defining the problem as one of unrecognized talent is a way of seeing under-representation in STEM without asking questions about opportunities…about discrimination in education that might preclude an individual’s development of technical interests. Nor does it let us ask about the inherent oppressions of segmented industrial labor , a system that minimizes workers’ chances to learn and grow through work. To me, such searches for promising but as-yet-unrecognized STEM workers have presented a seemingly inclusive agenda that manages systematically to ignore such structural inequities.
Consider the framing of data-driven STEM hiring described in Richtel’s piece. Vivienne Ming, chief scientist at the start-up firm, Gild, approaches the mining of Big Data as a way to evade the biases traditionally found in hiring, including gender, race, and the presumptions we make about one another based on university attended or jobs previously held. The main case covered in the article is that of a young programmer who never attended college but who, once in range of Gild’s “automated vacuum and filter for talent” (as Ming calls it), was revealed to possess exceptional capacities. He got the job. To Ming, this approach to recruitment lets the firm “put everything in,” and then lets the “data speak for itself.”
But of course, data can’t speak for itself; only for those who have given it meaning. Despite Ming’s articulated concern with inclusion, per Gild’s algorithm (and their Nike-esque catchphrase, “Know Who’s Good”), it is only success along existing standards of technical efficacy and productivity that identifies the outstanding programmer. Automating this determination may be great for the firm, but it hardly constitutes a significant push-back at discriminatory conditions. There are doubts expressed in the article about this HR approach, but these are themselves telling about the obfuscatory power of meritocratic logic in industry. Some observers worry that subjective features such as a candidate’s “people skills” are occluded with this kind of data-based hiring. Others want more finely grained objective tools, such as those at Gild who are eager to hone in on prospective employees’ most specialized technical skills. But the superficial differences between these complaints are deceiving. Both thoroughly detach hiring criteria from the social and political conditions in which those criteria arise and which those criteria faithfully reproduce.
I have lately been reading a remarkable book on industrial personnel practices by professor of management Barbara Townley , which considers “power, ethics and the subject at work” from a Foucauldian vantage point. She reminds us that the field of human resources has always been about constructing the individual as an object of knowledge, not about “uncovering” some essential self in the prospective employee. Work-force science, predicated on letting data “speak for itself,” seems exquisitely suited to (in Townley’s phrase) “render organizations and their participants calculable arenas,” and to do so unceasingly “in service to the profitability and productivity of the organization.” To claim, as Ming does, that the largest bodies of data ever deployed for HR purposes will somehow transcend the foundational values of corporate HR seems like selective logic. Personally, I will now be mining Townley’s work for ways to understand the social instrumentalities of Big Data.