plus sign

Sarah C. Shi


sarahshi@berkeley.edu / C.V. / GitHub / Google Scholar

I am a Ph.D. student in the Department of Earth and Planetary Science at UC Berkeley, advised by Professor Penny Wieser. My research is at the intersection of volcanology, geochemistry, and machine learning. I am interested in how minerals and melt inclusions record information about Earth’s magmatic systems — information that can help us better understand eruption triggers, crustal processes, and how we understand the planet’s past.

My research combines microanalysis and computational modeling to probe how magmas evolve and move through the crust prior to eruption. I have developed open-source tools like PyIRoGlass for quantifying volatiles in volcanic glasses with Bayesian inference, and mineralML for classifying minerals probabilistically using machine learning. I have applied these tools to everything from arc basalts and basaltic-andesites to Icelandic xenoliths. I am especially interested in making geochemical datasets more accessible, reproducible, and insightful through open science.

Prior to starting my Ph.D. at Berkeley, I completed an M.Phil. at the University of Cambridge and my B.A. at Columbia University. Following my time in Cambridge, I worked as a Data Science Fellow in the Geoinformatics Research Group at the Lamont-Doherty Earth Observatory, Columbia University, where I developed open-source tools to streamline and automate geochemical data analysis.

Whilst I am not thinking about rocks, I enjoy hiking, rowing, fermenting things, and playing with sound.

Research

Research

I am captivated by magma—how it forms, migrates, and ultimately erupts at the Earth’s surface. Underlying each volcano lies a dynamic, evolving magmatic system shaped by mixing, storage, and degassing through time. Minerals and melt inclusions act as time capsules, preserving records of these deep processes in their chemistry and textures. Through them, we can begin to reconstruct the hidden histories of magmatic systems and better understand what leads to explosive volcanic activity.

My research brings together microanalytical data and computational modeling to investigate how magmas evolve and interact in the crust. I develop open-source tools that use Bayesian inference and machine learning to extract signals from complex geochemical data. These tools help reduce uncertainty, scale analyses across large datasets, and surface patterns that may not be obvious by eye. My current work focuses on two main questions: (1) How can we use diffusion modeling and volatile measurements to constrain timescales of magma mixing and depths of magma storage before eruption? And (2) how can probabilistic machine learning approaches help classify minerals and melts in mapped scanning electron microscopy (SEM) data and large geochemical databases, revealing new relationships and improving data reliability?
Autoencoder
Leveraging Bayesian neural networks with variational inference and autoencoders to classify common minerals in global geochemical data repositories and in EDS maps.
QEMSCAN zoning
Utilizing machine learning to automate petrographic and mineralogical observation from energy dispersive X-ray spectroscopy (EDS) and electron microprobe (EPMA) data.
FTIR Baselines
Developing PyIRoGlass, an open-source package that provides a reproducible and documented method for reducing FTIR spectra, with routines for determining concentrations of H2O and CO2.
Olivines
Probing syn-eruptive dynamics of the Fuego 2018 eruption with diffusion chronometry in olivine and volatiles in melt inclusions.
Thermometry
Developing new olivine-saturated melt geothermometers with inversions for reducing temperature uncertainties.

Code

Code

Open, accessible, and reproducible science is the future. I am deeply interested in utilizing statistics and machine learning for developing open-source tools for petrologic questions.
PyIRoGlass Logo
PyIRoGlass, a Bayesian MCMC algorithm for fitting baselines to the FTIR spectra of basaltic-andesitic glasses
Manuscript / Code DOI / GitHub Repository
mineralML Logo
mineralML, a Python package leveraging machine learning for probabilistic mineral classification in repository and analytical data
Manuscript Forthcoming / Code DOI Forthcoming / GitHub Repository

Publications

Publications

[6, in preparation] Shi, S.C., Wieser, P.E., Toth, N., Antoshechkina, P., Lehnert, K. (2025). mineralML: Leveraging Machine Learning for Probabilistic Mineral Classification in Geochemical Databases. JGR: Machine Learning and Computation.
[5, in review] Toth, N., Shi, S.C., Maclennan, J., Tung, P.Y. (2025). EDS Analysis for Petrology: a Probabilistic Framework with GPyEDS. JGR: Machine Learning and Computation.
[4, in review] Wieser, P.E., Shi, S.C., Gleeson, M., Rangel, B., DeVitre, C., Bearden, A., Lynn, K., Trusdell, F., Camille-Caumon, M. (2025). Fluid inclusion constraints on the geometry of the magmatic plumbing system beneath Mauna Loa: Part 1: Extrusive products. Bulletin of Volcanology.
[3] Gleeson, M., Wieser, P., deVitre, C., Shi, S.C., Millet, M.-A., Muir, D., Stock, M., Lissenberg, J. (2025). Persistent high-pressure storage beneath a near-ridge ocean island volcano (Isla Floraena, Galapagos). Journal of Petrology.
[2] Moussallam, Y., Towbin, W.H., Plank, T.A., Bureau, H., Khodja, H., Guan, Y., Ma, C., Baker, M.B., Stolper, E.M., Naab, F.U., Monteleone, B.D., Gaetani, G.A., Shimizu, K., Ushikubo, T., Lee, H., Ding, S., Shi, S.C., Rose-Koga, E.F. (2024). ND70 series basaltic glass reference materials for volatile elements (H2O, CO2, S, Cl, F) analysis and the C ionisation efficiency suppression effect of water in silicate glasses in SIMS analysis. Geostandards and Geoanalytical Research.
[1] Shi, S.C., Towbin, W.H., Plank, T.A., Barth, A.C., Rasmussen, D., Moussallam, Y., Lee, H., Menke, W. (2024). PyIRoGlass: An Open-Source, Bayesian MCMC Algorithm for Fitting Baselines to FTIR Spectra of Basaltic-Andesitic Glasses. Volcanica.

Conferences

Conferences

[17] Moussallam, Y., Towbin, H., Plank, T., Bureau, H., Khodja, H., Guan, Y., Baker, M.B., Stolper, E., Naab, F., Monteleone, B.D., Gaetani, G., Shimizu, K., Lee, H., Ushikubo, T., Ding, S., Shi, S.C., Rose-Koga, E.F., Development of Basaltic Glass Reference Materials for Volatile Element Analysis (H2O, CO2, S, Cl, F) and Investigation of Water-Induced C Ionization Suppression in Silicate Glasses Using SIMS, Goldschmidt 2025 (Poster).
[16] Gleeson, M., Wieser, P., deVitre, C., Shi, S.C., Millet, M.-A., Muir, D., Stock, M., Lissenberg, J., Persistent Magma Storage in the Mantle Across 2.5 Myrs of Ocean-Island Volcanism, GSA 2024 (Talk).
[14] Shi, S.C., Wieser, P., Toth, N., Antoshechkina, P., Lehnert, K., mineralML: Leveraging Machine Learning for Probabilistic Mineral Classification, Gordon Research Seminar, Geochemistry of Mineral Deposits (Invited Talk).
[13] Tweedy, R., Shi, S.C., Uno, K.T., Machine Learning Analysis of n-Alkanes from Woody and Grassy African Plants, NE Geobiology Conference 2024 (Talk).
[12] Shi, S.C., Wieser, P., Toth, N., Antoshechkina, P., Lehnert, K., MIN-ML: Leveraging Machine Learning for Probabilistic Mineral Classification in Geochemical Databases, AGU 2023 (Talk).
[11] Tweedy, R., Shi, S.C., Uno, K.T., African Plant Functional Type Identification from n-Alkanes Chain Lengths via Non-Linear Methods, AGU 2023 (Talk).
[10] Bidgood, A., Shi, S.C., Prabhu, A., Que, X., Twigg, H., Using Supervised and Unsupervised Machine Learning Methods to Predict Missing Geochemical Data and Determine Geochemical Trends in Multielement Systems: Application to Sediment-Hosted Ore Deposits, AGU 2023 (Poster).
[9] Prabhu, A., Wong, M.L., Morrison, S.M.M., Ostroverkhova, A., Clark, M., Zhong, H., Prestgard, T.J., Li, W., Williams, J.R., Shi, S.C., Mays, J., Hazen, R., From detecting agnostic biosignatures to characterizing chondrites: How network science is perfect for making scientific discoveries with geochemical data, AGU 2023 (Invited Talk).
[8] Shi, S.C., Wieser, P., Toth, N., Antoshechkina, P., Lehnert, K., MIN-ML: A Machine Learning Framework for Exploring Mineral Relations and Classifying Common Igneous Minerals, Goldschmidt 2023 (Invited Workshop Talk).
[7] Shi, S.C., Wieser, P., Lehnert, K., Profeta, L., MIN-ML: A Machine Learning Framework for Exploring Mineral Relations and Classifying Common Igneous Minerals, EGU 2023 (Talk).
[6] Tweedy, R., Shi, S.C., Uno, K.T., Grass in the Past: Eastern African Chemotaxonomy from Plant Wax n-alkanes, AGU 2022 (Poster).
[5] Shi, S.C., Barth, A.C., Plank, T.A., Towbin, W.H., Flores, O., Arias, C.P., Magma stalling weakens eruption: Uncertainty quantification in thermometry and volatile measurements, VMSG 2022 (Talk).
[4] Toth, N., Shi, S.C., Maclennan, J., Automated petrography using machine learning, VMSG 2022 (Poster).
[3] Shi, S.C., Barth, A.C., Plank, T.A., Towbin, W.H., Magma stalling weakens eruption, AGU 2021 (Talk and ePoster).
[2] Shi, S.C., Cerling, T.E., Uno, K.T., What plant is that? Chemotaxonomy from n-alkane molecular distributions of East African plants with implications for paleoecology, AGU 2018 (Poster).
[1] Shi, S.C., Cerling, T.E., Uno, K.T., Resolving taxonomy with n-alkane molecular distributions of East African plants, Columbia University Chandler Society Research Symposium (Invited Talk).


Teaching

Teaching

I developed a suite of Jupyter notebooks designed to teach computational basics, statistical thinking, machine learning, and petrology. These materials support the educational mission of the IEDA2 data infrastructure, funded by the United States' National Science Foundation through a cooperative agreement. All notebooks are available on my earthchem-teaching GitHub repository.

python_fundamentals

The python_fundamentals notebook introduces students to the Python programming language. It covers core topics including basic syntax, numerical operations with NumPy, data manipulation with pandas, and visualization using matplotlib.

SERC_MORB_colab

The SERC_MORB_colab notebook was developed for and taught in the Earth’s Environmental Systems: Solid Earth course at Columbia University. It guides students through the analysis of mid-ocean ridge basalt (MORB) data, highlighting trends in mantle melting, crustal thickness, and seismic velocities. The notebook emphasizes hands-on learning with global datasets to foster insight into large-scale geodynamic processes.

mineralML_colab

The mineralML_colab notebook was developed for the Goldschmidt 2023 Conference Workshop – Open Data in Geochemistry and the NFDI4Earth Lecture Series. It provides tools for visualizing petrologic mineral data and applying machine learning to classify minerals. The notebook leverages large datasets from PetDB and GEOROC to demonstrate scalable approaches to mineralogical analysis.



Field

Field

Poás Volcano, Costa Rica
Highlands, Iceland
Aberdare National Park, Kenya
Cornwall (ESB Field Geology), UK