Graduate students and postdocs in the lab not only have numerous opportunities to build a toolkit of basic bioinformatics skills, but also develop critical teaching skills that enable them to communicate best-practicies in using command-line tools and bioinformatics pipelines. We believe that this prepares our trainees to handle the increasingly large and complex datasets that are now routine.
A full course in transcriptomics
As access to high-throughput sequencing technology increases, the bottleneck in biomedical research has shifted from generating data, to analyzing and integrating diverse data types. To address these needs and help equip students and postdocs ewith a toolkit for data mining and interrogation, we developed the first semester-long course at UPenn the focuses specifically on studying global gene expression (transcriptomics) through the use of the R programming environment and the Bioconductor suite of software packages – a versatile and robust collection of tools for bioinformatics, statistics, and plotting. Students participate in a mix of lectures and guided code review, all while working with real datasets directly on their laptop. Students learn to analyze RNAseq data using a lightweight and reusable set of modular scripts that leverage open-source software. In addition, students will learn best practices in data science for working in R/Bioconductor, including creating interactive data visualizations, making their analyses transparent and reproducible, and identifying experimental bias in large datasets.
A public database for microbiome research
We engage with the public through our free and open-source database project, microbiomeDB. We believe this database provides a powerful way for teachers to explore microbiomes with their students, without a prerequisite understanding of microbial bioinformatics. As we continue to develop this resource, we hope to develop more tools for the classroom. If you are a teacher interested in brining microbiome research into your classroom, we’d love to hear from you!
Workshops in cloud computing for metagenomics
We work together with our colleagues in the PennCHOP Microbiome Center to host an annual workshop that teaches students to use cloud computing resources for the analysis of shotgun metagenomic data. You can see a walk-through of our 1/2 day workshop here.
Promoting transparency and reproducibility in bioinformatics
We believe that it is imperative for trainees to learn about tools for reproducible research early in their scientific career, and we try to lead by example. Our recent research publications include detailed Supplementary Code Files generated using Rmarkdown (see example here). We have recently taken this a step further by partnering with CodeOcean to deploy fully reproducible code ‘capsules’ build on Docker. These capsules accompany our recent papers, often allowing all figures to be reproduced with the click of single button, and without the need to download a single script or install a piece of software. For example, our recent paper had a code capsule embedded directly in the journal website, thereby fully coupling results with analyses, and marking the first time this had been done for any AAAS journal. You can also view and interact with one of our capsules below, originally from this paper..