Towards Fair Data for All: DataWorks at Georgia Tech

By guest columnist CARL DISALVOassociate professor at the School of Interactive Computing at the Georgia Institute of Technology, with BETSY DISALVOfrom Georgia Tech, and BEN SHAPIROfrom Georgia State University

When people talk about the roles and responsibilities of higher education in the 21st century, those conversations often focus on the challenges of educating students for changing work environments and the ever-increasing role of technology in these environments. This is certainly part of what colleges and universities do, but not everything.

Carl DiSalvo, Betsy DiSalvo, Ben Shapiro (left to right)

Higher education institutions are expanding their mission and offerings to engage and educate learners other than the traditional full-time student, outside of the familiar classroom environment. And for some, there is a return to seeing colleges and universities as part of an ecology of civic institutions and organizations that make up the fabric of our local democracies. It is in this context that we have created and managed DataWorks.

DataWorks is part of the Constellations Center for Equity in Computing at the Georgia Institute of Technology, College of Computing. Through DataWorks, we hire young adults and train them in basic data science skills, such as cleaning and formatting data, using tools ranging from standard spreadsheets to custom scripts in programming languages such as Python.

One of the intentions of DataWorks is to broaden participation in data science. There is a tendency to assume that data science work is done by graduate engineers or that it is largely unskilled labor. These assumptions are wrong, and through DataWorks we hope to demonstrate a more pluralistic approach to what data science is or could be. The young adults at the heart of DataWorks come from communities that are underrepresented in technology fields. The overwhelming majority of tech jobs go to upper-middle-class white cis men. Through DataWorks, we are trying to counter this pattern, at least to a small extent. We also hope to provide pathways to careers for DataWorks employees beyond the program.

Customers bring projects to DataWorks, and through these projects, employees gain hands-on experience working with data and refine or create datasets that support customer work. DataWorks employees are full-time, with reasonable compensation and benefits that reflect efforts to create more fair and equitable work practices around entry-level data work.

There are many ways to think of DataWorks. In some ways, it’s a workforce development program. In other ways, it’s a platform for teaching data science skills outside the classroom to workers rather than students, and for studying how data literacy develops on work place. It is also a chance to explore and experience what a college or university could be, in addition to the departments and professors who conduct research and award degrees. This is an opportunity to consider another way for colleges and universities to embrace civic institution status.

We created DataWorks because we noticed that local governments, nonprofits, and small businesses wanted to use data, but the data they needed wasn’t available or wasn’t in formats accessible. Thus, some of our projects have made this data accessible and usable.

For example, working with the nonprofit Center for Civic Innovation in Atlanta, we took 10 years of Zoning Review Board and Board of Zoning Adjustment records and transformed the data from static PDF files into sets. structured data. Why bother? Well, those records contain information about those councils’ voting patterns and city development patterns that have remained inaccessible and not searchable or comparable as PDF files. Now, professionals and community members can study these voting and development patterns to gain insights and make decisions that support Atlanta communities.

In this way, DataWorks, and by extension Georgia Tech, provides a valuable service to the City of Atlanta and a local nonprofit organization. Different from the extractive nature of so much academic research and engagement, projects such as these hope to contribute to the civic ecologies in which we are embedded, to better allocate Georgia Tech’s resources toward issues of local concern.

Through DataWorks, we also hope to set an example of how fair data working environments can be. For others working in this industry, much of the data work happens as “on-demand work, outside of conventional work structures. For example, powering the artificial intelligence and machine learning that underpins digital services often requires people to do the manual labor of labeling images to be algorithmically classified and processed. Just as there is a lot to worry about what these algorithms do, there is also a lot to worry about how the data behind these algorithms is created.

Too often, on-demand work, particularly involving the labeling, cleaning and formatting of data, exploits workers. But what if data-driven work environments put the work and growth of workers at the heart of the organization? Colleges and universities, especially those that are public, are well placed to host such environments. If we view our commitment to learning as lifelong, and therefore our commitment to learners extends beyond traditional students, then establishing and maintaining safe and fair environments – whether in a classroom class or a learning program – is part of our institutional responsibility.

As an experiment, we don’t know if DataWorks will thrive or in what form. We’ve been working on it for two years, but we all know it’s been a weird two years. One thing is certain: workers are developing skills that will serve them whatever form DataWorks takes. These skills are the technical skills of working with data, taking inaccessible or unstructured data and transforming it so that it is useful and usable. They also develop critical perspectives on technology as they encounter and manage common data limitations and biases.

Part of the nature of experiments is that we don’t know the outcome. Not-knowing is essential to inquiry – we learn through experiences, we find contours, limits and potentials. Whether or not DataWorks thrives, and in what form, is a contribution in itself and helps us understand the limits of what a public college or university is or could be. The idea of ​​a public college or university as a way to truly serve the community, to redistribute its resources, to be a model of fair work, is hopeful. It is an aspiration worth pursuing.

Notes to readers: This column was coordinated by Serve-Learn-Sustain at the Georgia Institute of Technology. This material is based on work supported by the National Science Foundation under grant number 1951818.

Comments are closed.