Kirk Cameron and Dimitrios Nikolopoulos, associate professors of computer science in the College of Engineering at Virginia Tech, have earned a National Science Foundation (NSF) – Computer Science Research (CSR) award of $350,000 to help improve the reliability of computer systems’ processors.
Computer systems’ processors can suffer what is called a “thermal emergency,” or in other words a sharp increase of the machine’s temperature that is significantly above a safe thermal threshold. The consequence is dramatically compromised reliability.
To address this problem, NSF funded Cameron and Nikolopoulos’ project titled “Thermal Conductors: Runtime software support for proactive heat management in advanced execution systems.”
Cameron, director of the Scalable Performance (SCAPE) laboratory, has been intrigued by the necessity to employ elaborate cooling solutions to remove the heat generated by modern advanced processors, which can consume up to 100 watts, exceeding the temperature of a hot plate.
“What we want is to reduce the heat produced by large systems with lots of components in close proximity such as those in a data center. By first studying the way applications produce heat, our hope is to identify places where we can reduce heat while maintaining the high-performance required by users,” Cameron said.
Therefore, a main point of his research is dedicated to the process of determining the thermal properties of software, mainly by observing the effects of various power reduction strategies on processor and system thermal behavior. A product of this research was Tempest, or Temperature Estimator, a portable freeware tool that enables the user to directly measure temperature and graphically correlate the results to source code.
Nikolopoulos, director of Parallel Emerging Architectures Research Laboratory (PEARL), is researching new thermal reduction techniques applicable to parallel scientific applications and systems, based on program phase analysis.
The idea is that programs and/or system software can be modified to either avoid a “thermal emergency,” or react to the overheating in an attempt to control it, but without compromising system performance.
“I expect that this research will improve high-end computer system reliability, because systems frequently subjected to thermal emergencies tend to have a shorter lifespan. Thermal emergencies typically require a reboot, which is a frustrating process and can lead to loss of data. Furthermore, maintenance costs - for cooling systems, for example - will hopefully be reduced by the implementation of our suggested techniques,” Nikolopoulos said.
Cameron and Nikolopoulos combine both profiling and control infrastructures to create thermal conductors, or novel software that enables automated, transparent optimization of system thermals, while striving to maintain the high-performance expected in advanced execution systems and applications. All of their software tools and techniques will be open source and made available to the public via the internet.
According to the NSF website, the CSR program supports innovative research and education projects that have the potential to lead to significant improvements in existing computer systems by increasing our fundamental understanding of such systems; produce systems software that is qualitatively and quantitatively more reliable and more efficient; and/or, to produce innovative curricula or educational materials that better prepare the next generation of computing professionals.
The CSR program is also interested in projects that expand the capabilities of existing systems by exploiting the potential of new technologies or by developing innovative new ways to use existing technologies. Projects supported will strive to make significant progress on challenging, high-impact problems—as opposed to incremental progress on familiar problems—and will have a credible plan for demonstrating the utility and potential impact of the proposed work, according to NSF.