August 2020: Hands on Machine Learning Workshop

Our August Virtual Workshop

Five months into the U.S. Coronavirus pandemic R-Ladies Philly continues to host monthly events virtually. In August we hosted “Hands-On Machine Learning Workshop” by Jaclyn Taroni. You can view a recording of the event on our YouTube channel . The content for the main workshop is available online here.


Our Presenter Jaclyn Taroni, is a Principal Data Scientist at the Childhood Cancer Data Lab. She was a Postdoctoral Researcher in the Greene Lab at the University of Pennsylvania Perelman School of Medicine, where she worked on cross-platform normalization of gene expression data and unsupervised transfer learning of transcriptomic data for rare diseases. She conducted her thesis work in the Whitfield Lab at the Geisel School of Medicine at Dartmouth studying the rare autoimmune disease systemic sclerosis, where she developed novel frameworks for analyzing high-throughput molecular data from multiple tissues, clinical manifestations, and drug trials.

Consensus Clustering

Jaclyn used medulloblastoma (common brain cancer) expression data to demonstrate consensus clustering in the machine learning workshop. Typically, data rows are assumed to observations and columns assumed to be variables but gene expression data is transposed, with rows representing genes (features) and columns representing samples. Jacki shows how to use ConsensusClusterPlus package to estimate the number of unsupervised clusters with quantitative and visual ‘stability’ evidence.

Dimensionality Reduction

Because gene expression data has hundreds of thousands of features (genes), reducing the number of dimensions is an important step. Jaclyn introduced several widely-used dimension reduction methods, including PCA, UMAP and t-SNE, and demonstrated their application in R. For the grand finale, Jaclyn explained a domain-specific dimension reduction technique called Pathway-Level Information ExtractoR (PLIER) that learns correlated patterns of gene expression data and extracts latent variables (LVs) to get a low-dimensional representation of the data. It is excellent strategy for biological discovery. For more information about PLIER, please read Mao et al. (2019)

Thank you :)

Big, big thanks to Jaclyn Taroni for preparing materials and sharing her time and knowledge with the R-Ladies Philly community. And thanks to R-Ladies Global for sponsoring the Zoom meeting.

Get Involved!

R-Ladies Philly has a Slack workspace! Join the conversation, learn more about upcoming events, ask a questions in the #r-help channel or find out about new opportunities in the #jobs channel. Join slack here. You can also subscribe to the R-Ladies Philly YouTube channel for recordings of past and future events. And of course, follow us on twitter @RLadiesPhilly.