Massive amounts of data, collected daily, range from traditional consumers using cell phone technology to big corporations storing the buying patterns of thousands of individuals. 

According to one recent study reported in Computerworld, the amount of event data generated in the U.S. alone is estimated to be 7 million pieces per second and climbing. Managing "big data" processing systems is a huge undertaking.

"There is currently a disconnect between programmers who develop the big data applications and system engineers who oversee large data centers that support these applications. It leads to reduced performance and missed optimization opportunities," said Ali R. Butt, associate professor of computer science at Virginia Tech.

Butt and his Virginia Tech colleague, Chao Wang, assistant professor of electrical and computer engineering, are now in the process of trying to develop innovative resource management tools for these big data processing systems. They are using a $750,000 grant from the National Science Foundation to improve the technology.

Big data processing systems provide the computing substrate for a wide range of applications in such fields as high-speed physics, economics, genomics, astronomy, and meteorology.

"It is crucial, for computing-based scientific discovery, to sustain these data-driven systems at scale in the presence of emerging technologies such as specialized microprocessors, GPUs, and hybrid storage systems," Butt said.

Butt is a past recipient of an NSF CAREER Award on computing performance. This earlier award, valued at $400,000 allowed him to focus on a better understanding of the increasing performance gap between computing power and storage technology, especially for high performance computing (HPC) environments, for some five years prior to receiving this new NSF honor.

Having a better understanding of the application behavior and its interactions with the mixed/hybrid infrastructure "play a central role in creating highly efficient systems for current and future generations of big data applications," Wang said.

The Virginia Tech team plans to address the disconnect between programmers and system engineers by leveraging their expertise in both distributed computing and software engineering.

Wang joined Virginia Tech in 2011 after spending seven years at the NEC Laboratories in Princeton, New Jersey He had worked on complex computer systems, including hardware, software, and embedded systems.   

"Current systems treat the user-provided software codes as a black box, which makes it very difficult to fine-tune the system for running these codes," Wang said. So, Wang and Butt hope to employ static/dynamic program analysis techniques to build informative application models. These models will then be used to help predict application behavior and for managing the computing resources.

The overall goal is to develop what the two researchers are calling Pythia, an online application-aware oracle framework for fine-tuning big data systems on emerging heterogeneous resources. In Greek literature, Pythia was the name of any priestess, throughout the history of the Temple of Apollo, credited for her prophecies.

"Imagine if you can see into the future and know how an application would behave, you can have a strategy for scheduling resources to maximize the system performance. This is the vision behind Pythia," Butt said.  

Butt leads the Distributed Systems and Storage Laboratory at Virginia Tech, which focuses on innovations in computer systems ranging from cloud computing to specialized operating-system-level optimizations for emerging hardware technologies. 

Before coming to Virginia Tech in 2006, he completed his doctorate in electrical and computer engineering at Purdue University. He received a bachelor's degree in electrical engineering from the University of Engineering and Technology in Lahore, Pakistan.

Wang is also a recipient of the NSF CAREER award, the Office of Naval Research Young Investigator award, Virginia Tech College of Engineering Outstanding New Assistant Professor Award, and many best paper awards. He leads the Reliable and Secure Software Laboratory at Virginia Tech, which focuses on innovations in software engineering, formal methods, parallel programming. 

He received a Technology Commercialization Award in 2006. Wang received his Ph.D. from the University of Colorado at Boulder in 2004, and won the 2003-2004 ACM Outstanding Ph.D. Dissertation Award in Electronic Design Automation. He received a bachelor's degree and a master's degree from Peking University, China.

Share this story