π CrystalDiff: Conditional Generative AI for Material Discovery
A geometric deep learning framework for the inverse design of 3D crystal structures using E(n)-Equivariant Denoising Diffusion Probabilistic Models (DDPM).
π οΈ Technical Achievements & Work Done
This project was built to demonstrate full-stack AI research engineering, from mathematical implementation to scientific validation and deployment.
Core Implementations:
- Geometric Deep Learning from Scratch: Engineered custom E(n)-Equivariant Message Passing layers in pure PyTorch. I bypassed high-level graph wrapper libraries to explicitly program physical rotational invariance and distance-based message passing.
- Custom DDPM Architecture: Implemented a continuous-time Denoising Diffusion Probabilistic Model to generate 3D point clouds from Gaussian noise.
- Property-Conditioned "Inverse Design": Modified the diffusion backbone to accept macroscopic target properties (e.g., Band Gap). The model embeds this scalar into the reverse diffusion process to bias the generation toward desired material constraints.
- Automated Scientific Validation: Wrote analytical scripts to compute the Radial Distribution Function (RDF) of the generated coordinates, mathematically proving the model learned Pauli Exclusion and covalent bonding limits without hard-coded physics rules.
- End-to-End Application: Deployed the inference pipeline into an interactive Streamlit web application, featuring dynamic 3D molecule rendering (
py3Dmol) and real-time bond length calculations. - Scientific Data Engineering: Built a robust data pipeline interfacing with the Materials Project API (
mp-api) to harvest, filter, center-of-mass correct, and tensorize complex stable oxide datasets.
π¬ Project Overview
The discovery of novel materials (e.g., for solid-state batteries or photovoltaics) is traditionally bottlenecked by the computational cost of Density Functional Theory (DFT). CrystalDiff bypasses this by using Generative AI to "dream" chemically valid, stable structures in milliseconds.
Moving beyond simple property prediction, this project serves as a Conditional Generative Foundation Model trained on the Materials Project database to understand the chemical rules of Perovskite Oxides ($ABO_3$).
π Architecture & Mathematics
The core model relies on a Time-and-Property-Conditioned Equivariant Graph Neural Network (EGNN).
1. The Forward Process (Data $\to$ Noise)
We progressively corrupt real crystal coordinate structures $x_0$ by adding Gaussian noise $\epsilon$ over $T$ timesteps:
2. The Conditional Reverse Process (Noise $\to$ Data)
The neural network learns to predict the noise $\epsilon_\theta$. Crucially, it is conditioned on both the time step embedding $t$ and a target physical property embedding $y$ (e.g., Band Gap):
3. E(n)-Equivariance (The Physics Layer)
To respect physical symmetry (rotating a crystal does not change its internal energy or chemistry), the node positions are updated via covariant vector steps, scaled by invariant edge messages $m_{ij}$:
π Scientific Validation (The Physics Test)
To prove the model learned actual chemistry and not just random point cloud distributions, we analyzed the Radial Distribution Function (RDF) of the generated crystals.
Key Findings:
- Pauli Exclusion Principle (0.0 to 1.5 Γ ): The probability density is strictly zero, proving the model learned that atoms cannot physically overlap (unlike the random noise baseline).
- Covalent Bonding (~1.9 Γ ): The first sharp peak aligns perfectly with standard Titanium-Oxygen bond lengths, demonstrating the model learned local chemical environments from scratch.
- Lattice Formation: Secondary peaks (> 2.5 Γ ) confirm the generation of long-range repeating crystalline order.
π References & Inspiration
- E(n) Equivariant Graph Neural Networks (Satorras, Hoogeboom, & Welling, ICML 2021)
- Crystal Diffusion Variational Autoencoder (CDVAE) (Xie et al., ICLR 2022)
- Scaling deep learning for materials discovery (GNoME) (Merchant et al. / DeepMind, Nature 2023)
