Output list
Journal article
Methods for safely sharing dual-use genetic data
First online publication 02/11/2026
Frontiers in Microbiology, 17
Journal article
Aerosol biome of a cafeteria and medical facility in Los Alamos, New Mexico, USA
First online publication 09/10/2025
Microbiology Resource Announcements, 14, 10, e00766-25
Journal article
DNA viruses from different stages of a wastewater treatment plant in southwest Ohio
First online publication 09/05/2025
Total Environment Microbiology, 1, 4, 100031
Journal article
Standardized and accessible multi-omics bioinformatics workflows through the NMDC EDGE resource
First online publication 09/27/2024
Computational and Structural Biotechnology Journal, 23, 3575-3583
Journal article
Signatures of Mollicutes-related endobacteria in publicly available Mucoromycota genomes
Published 09/25/2024
mSphere, 9, 9, e0030924
Mucoromycota fungi and their Mollicutes-related endobacteria (MRE) are an ideal system for studying bacterial–fungal interactions and evolution due to the long-term and intimate nature of their interactions. However, methods for detecting MRE face specific challenges due to the poor representation of MRE in sequencing databases coupled with the high sequence divergence of their genomes, making traditional similarity searches unreliable. This has precluded estimations on the diversity of MRE associated with Mucoromycota. To determine the prevalence of previously undetected MRE in fungal genome sequences, we scanned 389 Mucoromycota genome assemblies available from the National Center for Biotechnology Information for the presence of MRE sequences using publicly available tools to map contigs from fungal assemblies to publicly available MRE genomes. We demonstrate a higher diversity of MRE genomes than previously described in Mucoromycota and a lack of cophylogeny between MRE and the majority of their fungal hosts. This supports the late invasion hypothesis regarding MRE acquisition across most of the examined fungal families. In contrast with other Mucoromycota lineages, MRE from the Gigasporaceae displayed some degree of cophylogeny with their hosts, which may indicate that horizontal transmission is restricted between members of this family or that transmission is strictly vertical. These results underscore the need for a refined process to capture sequencing data from potential fungal endosymbionts to discern their evolution and transmission. Screens of fungal genomes for MRE can help improve the quality of fungal genome assemblies while identifying new MRE lineages to further test hypotheses on their origin and evolution.IMPORTANCEMollicutes-related endobacteria (MRE) are obligate intracellular bacteria found within Mucoromycota fungi. Despite their frequent detection, MRE roles in host functioning are still unknown. Comparative genomic investigations can improve our understanding of the impact of MRE on their fungal hosts by identifying similarities and differences in MRE genome evolution. However, MRE genomes have only been assembled from a small fraction of Mucoromycota hosts. Here, we demonstrate that MRE can be present yet undetected in publicly available Mucoromycota genome assemblies. We use these newfound sequences to assess the broader diversity of MRE and their phylogenetic relationships with respect to their hosts. We demonstrate that publicly available tools can be used to extract novel MRE sequences from assembled fungal genomes leading to insights on MRE evolution. This work contributes to a greater understanding of the fungal microbiome, which is crucial to improving knowledge on the dynamics and impacts of fungi in microbial ecosystems.
Journal article
Fabricated devices for performing bacterial-fungal interaction experiments across scales
Published 08/07/2024
Frontiers in microbiology, 15, 1380199
Diverse and complex microbiomes are found in virtually every environment on Earth. Bacteria and fungi often co-dominate environmental microbiomes, and there is growing recognition that bacterial-fungal interactions (BFI) have significant impacts on the functioning of their associated microbiomes, environments, and hosts. Investigating BFI in vitro remains a challenge, particularly when attempting to examine interactions at multiple scales of system complexity. Fabricated devices can provide control over both biotic composition and abiotic factors within an experiment to enable the characterization of diverse BFI phenotypes such as modulation of growth rate, production of biomolecules, and alterations to physical movements. Engineered devices ranging from microfluidic chips to simulated rhizosphere systems have been and will continue to be invaluable to BFI research, and it is anticipated that such devices will continue to be developed for diverse applications in the field. This will allow researchers to address specific questions regarding the nature of BFI and how they impact larger microbiome and environmental processes such as biogeochemical cycles, plant productivity, and overall ecosystem resilience. Devices that are currently used for experimental investigations of bacteria, fungi, and BFI are discussed herein along with some of the associated challenges and several recommendations for future device design and applications.
Journal article
Identification of mobile genetic elements with geNomad
Published 08/01/2024
Nature biotechnology, 42, 8
Identifying and characterizing mobile genetic elements in sequencing data is essential for understanding their diversity, ecology, biotechnological applications and impact on public health. Here we introduce geNomad, a classification and annotation framework that combines information from gene content and a deep neural network to identify sequences of plasmids and viruses. geNomad uses a dataset of more than 200,000 marker protein profiles to provide functional gene annotation and taxonomic assignment of viral genomes. Using a conditional random field model, geNomad also detects proviruses integrated into host genomes with high precision. In benchmarks, geNomad achieved high classification performance for diverse plasmids and viruses (Matthews correlation coefficient of 77.8% and 95.3%, respectively), substantially outperforming other tools. Leveraging geNomad's speed and scalability, we processed over 2.7 trillion base pairs of sequencing data, leading to the discovery of millions of viruses and plasmids that are available through the IMG/VR and IMG/PR databases. geNomad is available at https://portal.nersc.gov/genomad. geNomad identifies mobile genetic elements in sequencing data.
Journal article
Published 04/11/2024
Microbiology resource announcements, 13, 4, e0067723
We present the complete genome sequence of the probiotic strain Lactobacillus acidophilus ATCC 9224. The genome sequence provides a valuable resource for investigating the phylogenetic evolution of this lineage and conducting comparative genomics with other Lactobacillus strains and species.
Journal article
Published 03/11/2024
Viruses, 16, 3, 430
Genomic sequencing of clinical samples to identify emerging variants of SARS-CoV-2 has been a key public health tool for curbing the spread of the virus. As a result, an unprecedented number of SARS-CoV-2 genomes were sequenced during the COVID-19 pandemic, which allowed for rapid identification of genetic variants, enabling the timely design and testing of therapies and deployment of new vaccine formulations to combat the new variants. However, despite the technological advances of deep sequencing, the analysis of the raw sequence data generated globally is neither standardized nor consistent, leading to vastly disparate sequences that may impact identification of variants. Here, we show that for both Illumina and Oxford Nanopore sequencing platforms, downstream bioinformatic protocols used by industry, government, and academic groups resulted in different virus sequences from same sample. These bioinformatic workflows produced consensus genomes with differences in single nucleotide polymorphisms, inclusion and exclusion of insertions, and/or deletions, despite using the same raw sequence as input datasets. Here, we compared and characterized such discrepancies and propose a specific suite of parameters and protocols that should be adopted across the field. Consistent results from bioinformatic workflows are fundamental to SARS-CoV-2 and future pathogen surveillance efforts, including pandemic preparation, to allow for a data-driven and timely public health response.
Journal article
Combining compositional data sets introduces error in covariance network reconstruction
Published 01/01/2024
ISME Communications, 4, 1, ycae057
Microbial communities are diverse biological systems that include taxa from across multiple kingdoms of life. Notably, interactions between bacteria and fungi play a significant role in determining community structure. However, these statistical associations across kingdoms are more difficult to infer than intra-kingdom associations due to the nature of the data involved using standard network inference techniques. We quantify the challenges of cross-kingdom network inference from both theoretical and practical points of view using synthetic and real-world microbiome data. We detail the theoretical issue presented by combining compositional data sets drawn from the same environment, e.g. 16S and ITS sequencing of a single set of samples, and we survey common network inference techniques for their ability to handle this error. We then test these techniques for the accuracy and usefulness of their intra- and inter-kingdom associations by inferring networks from a set of simulated samples for which a ground-truth set of associations is known. We show that while the two methods mitigate the error of cross-kingdom inference, there is little difference between techniques for key practical applications including identification of strong correlations and identification of possible keystone taxa (i.e. hub nodes in the network). Furthermore, we identify a signature of the error caused by transkingdom network inference and demonstrate that it appears in networks constructed using real-world environmental microbiome data.