A statistician takes stock of the Syrian civil war.
Beka Steorts PhD’12 is a statistician and machine learner in Duke University’s Department of Statistical Science. Her collaboration with the Human Rights Data Analysis Group (HRDAG) has led to an award-winning breakthrough in information analysis. Working with HRDAG directors Megan Price and Patrick Ball in their pursuit of an honest accounting of the casualties in the Syrian civil war, Steorts and her collaborators have developed a method that assembles information from myriad sources and removes duplicated information in near-real time, which may provide insights into the Syrian conflict. “At the heart of this work,” says Steorts, “I want it to be useful to those doing public good.”
“At the heart of this work, I want it to be useful to those doing public good.”
Data sets, particularly those retrieved from multiple sources, can be difficult. Recorded data can be filled with misspellings, duplicate or incomplete entries, or transposed information, creating “noise.” Steorts’ and her collaborators’ methods allow an analyst to sort through noise, reduce data into usable information, link files, and eliminate duplications. Earlier methods of data reduction took days or weeks to sort and resulted in 50 to 70 percent accuracy. Steorts’ collaborative algorithm for culling data processes 300,000 records in ten minutes, with 99 percent accuracy. It provides reliable, replicable information that allows greater insight into problems on the ground. Regarding the Syrian conflict, Steorts says, “Resources may be more quickly directed to where they are needed. It could help with reparative questions, it might impact policy written after the war has ended, or might support the prosecution of responsible parties in wartime tribunals.”
This collaborative humanitarian work earned Steorts a 2015 MIT 35 Innovators Under 35 award. Steorts’ work took shape after a serendipitous meeting with Rice University computer science professor Anshumali Shrivastava, in which they combined their own strengths of hashing and record linkage. Hashing, says Steorts, is “a fast way to sort similar information into one bin, making this data easier to link.” Since the MIT award, Steorts has been collaborating with Shrivastava’s lab to refine hashing and record linkage, further clarifying data and creating greater transparency in analysis. In May 2017, Steorts’ work in data linkage earned her the National Science Foundation CAREER Award, which provides five years of funding for research and outreach to junior faculty who exhibit excellence in education and research.
In her spare time, Steorts rewrites song lyrics so they have a statistical bent to be performed by the statistician-only band The Imposteriors at the biennial International Society for Bayesian Analysis World Meeting. Her next conference is in 2018; her song choice as of yet is undecided.
Along with her PhD in statistics, Steorts holds an MS in mathematical sciences and a BS in mathematics, and she credits her interdisciplinary training and collaborative work with her ability to solve problems. “I have a big toolbox full of different perspectives to pull from and an amazing group of mentors and student researchers with whom I’ve had the good fortune to work.” They help Steorts maintain her objectivity when facing current issues. “I’ve seen academics who can’t see the forest for the trees. We fail when we view something as just one problem with just one solution. But, we can do really great things when we collaborate together. And, it’s fun.”