Nine students — three graduate and six undergraduate — are participating in the Social and Decision Analytics Laboratory’s Data Science for the Public Good program in the National Capital Region. The Social and Decision Analytics Laboratory is a leading laboratory in the Biocomplexity Institute of Virginia Tech. The 10-week fellowship program connects aspiring data scientists to communities that can benefit from their expertise.
“While classwork is heavy on numbers, being able to work directly with people at Arlington County government is eye-opening. It gives you a broader context and changes your world view,” said Adrienne Rogers, of Blacksburg, Virginia, a rising senior majoring in statistics and sociology at Virginia Tech. “It’s not just about numbers any more, but how to actually apply those numbers to finding solutions to problems.”
During a recent roundtable, the students emphasized the value of the program, citing the advantages: personal interaction with local government agencies; working collaboratively in teams across disciplines; and engaging in all phases of the data cycle — data discovery, data acquisition, profiling, and evaluating the data for completeness, uniqueness, and consistency; and developing and conducting corresponding statistical analyses.
“Our overarching objective is to equip new generations of scientists with skills they need to provide policymakers and government leaders with data analysis support that can inform difficult decisions related to healthcare, education, and social justice,” said Sallie Keller, professor of statistics and director of the Social and Decision Analytics Laboratory.
Data Science for the Public Good combines such disciplines as statistics, data science, and the social and behavioral sciences, to address complex issues. The program is also vertically integrated by allowing students to collaborate with project stakeholders at all levels, including undergraduates; graduates; postdoctoral associates; research faculty; and local, state, and federal agency leadership.
Keller said that hands-on experience is an extremely important component of the fellowship. “Early engagement with the community helps reinforce and solidify their commitment and gives them a good foundation of skills that are required for successful careers at the intersection of data science and public policy,” she said.
There are three undergraduate students from Virginia Tech in the program. In addition to Rogers, they are Mark Almanza, of Oakton, Virginia, a rising sophomore majoring in physics, and Madison Arnsbarger, of Purcellville, Virginia, a rising senior majoring in economics.
Virginia Tech graduate student Daniel Chen, of New York City, a Ph.D. candidate in genetics, bioinformatics, and computational biology at the Biocomplexity Institute of Virginia Tech, is also a fellow.
The other students include master’s student Millicent Grant and Ph.D. student Samantha Tyner, both from Iowa State University, and undergraduates Jessica Flynn, Cornell University; William Sandholtz, University of California, Berkeley; and Emily Stark, Austin Peay State University, Clarksville, Tennessee.
As a group project, the fellows have been evaluating the Arlington County Open Data Portal. They established criteria for user-friendly open data portals by reviewing literature and industry standards. They then reviewed Arlington’s open data portal and those of San Francisco, Chicago, and Washington, D.C, to find examples that do and do not meet the criteria. They will soon present their findings to Arlington County’s chief information officer, Jack Belcher, and Deputy Kristanne Littlefield. They will also write an article for the August issue of the American Statistical Association’s monthly magazine, AMSTAT News.
Of particular interest to Stark is a project working with statistics from the Kentucky Department of Education to determine what role geographic location plays in students’ high school success, transition to higher education, and, potentially, the transition to work force.
“It’s not easy to find a program like Data Science for the Public Good that combines my two majors — statistics and psychology. This is a great opportunity for me to work on an issue I am passionate about,” said Stark, who is hoping she can apply some of what she learns to education solutions in Tennessee, where she attends college.
For the U.S. Department of Housing and Urban Development and the U.S. Census Bureau, the students are using local data to explore disclosure avoidance in the American Housing Survey. This involves data discovery — finding, cleaning, and linking external sources of data to the American Housing Survey; measuring population uniqueness using Arlington and Fairfax county tax records; applying disclosure risk methods like top coding and data swapping; and measuring the risk of disclosure using metrics from the literature.
“I never realized how much time it takes to clean the data into the right form. Now I know and can really appreciate all the effort,” said Sandholtz. He has been working on the Arlington County 911 project to cull data that can be repurposed to develop models that will answer questions like whether the deployment of smoke alarms by the Arlington County Fire/EMS department reduces fires, saves lives, and/or reduces property loss.
On July 29, from 10 a.m. to 12:30 p.m., the fellows will present posters at the Virginia Tech Research Center ─ Arlington to showcase their work. Arlington County leaders are among the invitees and the event is open to the public. Chris Barrett, professor and executive director of the Biocomplexity Institute of Virginia Tech, is the keynote speaker.
The Data Science for the Public Good fellowship program is made possible through the support of several research organizations dedicated to serving the public good: Biocomplexity Institute of Virginia Tech, Virginia Tech’s Global Forum for Urban and Regional Resilience, American Statistical Association’s NSF Research Experience for Undergraduates, and sponsored research.