An outbreak of Escherichia coli causing a severe illness called hemolytic-uremic syndrome began in Germany on May 2 and has killed more than 20 people and sickened more than 2,000. In the rush to save lives, many laboratories are analyzing these genomes and providing data to the research community.
The organism causing the outbreak has been identified as a strain of E. coli O104:H4 that produces a Shiga toxin and causes an illness similar to infection with E. coli O157:H7. Two isolates from this outbreak have been sequenced. Both strains, TY-2482 and LB226692, have been annotated and are now available from Virginia Bioinformatics Institute’s Pathosystems Resource Integration Center, funded by the National Institute of Allergy and Infectious Diseases, part of the National Institutes of Health.
“Our team is working around the clock to help the scientific community address this emergency,” said Bruno Sobral, director of the Virginia Bioinformatics Institute’s Cyberinfrastructure Division and principal investigator. “Analyses such as these provide insights into the origin of highly pathogenic strains and potential response strategies.”
The two genomes have been annotated using Rapid Annotation using Subsystem Technology (RAST) making them consistent with the 184 E. coli genomes and the total 2,865 bacterial genomes available at the Pathosystems Resource Integration Center. The proteins conserved across all E. coli have been used to generate a preliminary phylogenetic tree that is based on 166640 characters across 527 genes in 354 taxa. This tree shows that the two new strains are most closely related to the pathogenic, enteroaggregative strain 559899, which may give additional insight into its origin.
The tree is available in interactive form on the center’s website, as is a comparison of the RAST annotations with the other publicized annotations.
As can be seen in the Pathosystems Resource Integration Center’s Protein Family Sorter, the proteins from these two new pathogenic strains have several unique islands as compared to other E. coli genomes. Further investigation of these islands and unique proteins may yield clues as to virulence or intervention strategies for the new strains. The “heatmap” tab of the Protein Family Sorter presents a graphical view presence and absence of the proteins across the E. coli genomes.
The Pathosystems Resource Integration Center is performing additional analyses; including collecting a list of the important genes identified, and will be providing gene trees and multiple sequence alignments of the genes with their closest homologs, which will be released as additional news items.
The Pathosystems Resource Integration Center project is one of five bioinformatics resource centers funded in whole or in part with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services. Each center specializes in a different group of pathogens including, but not limited to, NIAID Category A-C Priority Pathogen lists for biodefense research, and pathogens causing emerging/reemerging infectious diseases.
Written by Tiffany Trent.