diff --git a/source/_posts/DPA4_05_22_2026.md b/source/_posts/DPA4_05_22_2026.md
index f84008a..f66ea75 100644
--- a/source/_posts/DPA4_05_22_2026.md
+++ b/source/_posts/DPA4_05_22_2026.md
@@ -5,76 +5,74 @@ categories:
- DPA
---
-Recently, the OpenLAM Team of the Beijing Academy of Artificial Intelligence for Science, Peking University, DeepModeling Technology, and the Institute of Applied Physics and Computational Mathematics have jointly launched DPA4, a new-generation model architecture tailored for the era of Large Atomic Models (LAMs). DPA4 claimed the top spot worldwide with its comprehensive performance score (CPS) on Matbench Discovery, an authoritative global benchmark for materials discovery, emerging as the latest State-of-the-Art (SOTA) model.
+Recently, the OpenLAM Team of the Beijing Academy of AISI, Peking University, DeepModeling Technology, and the Institute of Applied Physics and Computational Mathematics have jointly launched DPA4, a new-generation model architecture tailored for the era of Large Atomic Models (LAMs). DPA4 claimed the top spot worldwide with its comprehensive performance score (CPS) on Matbench Discovery, an authoritative global benchmark for materials discovery, emerging as the latest State-of-the-Art (SOTA) model.
-What makes DPA4 particularly remarkable is that it achieved this milestone not by expanding model parameters or relying on massive computing power. The previous leading model eSEN required over 300 GPU days for training, while DPA4 can theoretically reach the same accuracy with just a single consumer-grade RTX 5090 running for approximately one day. Meanwhile, its parameter count is less than one-tenth of that of eSEN.
+DPA4’s highlight lies in its ultra-low training threshold: the prior leading eSEN needed over 300 GPU days for training, yet DPA4 reaches matching accuracy with merely one consumer RTX 5090 running for roughly one day, and its parameter volume is less than one-tenth of eSEN’s.
-In short, the SOTA-level accuracy that once demanded exorbitant supercomputing budgets can now be achieved using a single consumer graphics card. DPA4 is redefining the Pareto frontier between accuracy and efficiency for large atomic models.
+In short, the SOTA-level accuracy once reliant on costly supercomputing is now accessible via a single consumer graphics card. DPA4 reshapes the accuracy-efficiency Pareto frontier of large atomic models.
-*Official Screenshot of Matbench Discovery (Data as of May 22, 2026)*
+###### *Official Screenshot of Matbench Discovery (Data as of May 22, 2026)*
-Adopting SO(2) equivariant linear operators combined with attention mechanisms under local coordinate systems, DPA4 strictly complies with translation, rotation, permutation symmetries and energy conservation while drastically cutting the overhead of equivariant computations. It has also pioneered compile-enabled training for machine learning potential functions globally, boosting training speed by 2 to 3 times. DPA4 secured new SOTA results and ranked first worldwide on both Matbench Discovery, the leading benchmark for materials discovery, and SPICE-MACE-OFF, a benchmark for small molecules. It sets a brand-new Pareto frontier in terms of prediction accuracy and training costs: a single RTX 5090 can finish training in around one day to match the accuracy achieved by eSEN after more than 300 GPU days. Its parameter volume is less than one-tenth of eSEN’s, and under equivalent accuracy, its training efficiency is roughly 10 times higher than that of its predecessor DPA3.
+DPA4 adopts SO(2) equivariant linear operators paired with local-coordinate attention. It satisfies translation, rotation, permutation symmetries and energy conservation while cutting equivariant computation costs drastically. It also pioneers global compile-accelerated training for machine learning potentials, lifting training speed 2–3 times. It secured SOTA top rankings on two key benchmarks: Matbench Discovery and SPICE-MACE-OFF. It strikes a new balance of precision and training cost: a single RTX 5090 trains it in one day to match eSEN’s 300+ GPU-day accuracy; parameters are under 1/10 of eSEN; versus its predecessor DPA3, training efficiency is about 10x higher at equal accuracy.
-
-
-*DPA4 Redefining the Accuracy-Efficiency Pareto Frontier for Large Atomic Models (Including Other Direct Force Pre-trained Models)*
-
-Currently, DPA4 is available for early access within the DeepModeling community. Its research paper and official full version will be open-sourced in due course. Researchers are welcome to stay tuned and join the WeChat group at the end of this article for exchanges. Below is a detailed introduction to DPA4.
+DPA4 early access is open to the DeepModeling community. Its research paper and full official code will be open-sourced later; readers may join the article’s ending WeChat group for academic exchanges. Below is a condensed technical introduction.
-## 1. DPA4 Model Architecture: SO(2) Equivalent Design Under Local Coordinate Systems
+## 1. DPA4 Model Architecture: SO(2) Equivariant Design in Local Coordinate Systems
-For a long time, to maintain rotational symmetry in global coordinate systems, equivariant models had to adopt Clebsch–Gordan tensor products to couple geometric features of different orders. The computational complexity surges sharply with the increase of angular momentum order, which is the fundamental reason why high-precision equivariant models incur enormous computational costs.
+Traditionally, SO(3) equivariant models rely on complex Clebsch–Gordan tensor products to retain rotational symmetry, whose computational complexity spikes sharply [about $$O(l_{max}^6)$$] with the increase of angular momentum order $$l_{max}$$— this is the core reason high-precision models demand massive computing resources.
-The core design philosophy of DPA4 is to reduce symmetry processing to simpler subgroups instead of performing costly tensor product operations in global coordinate systems. Specifically, DPA4 constructs a smooth local coordinate system for each interatomic bond and aligns the direction of the bond to a unified reference axis. In such local coordinate systems, the rotational equivariance that originally needed to be processed across the entire SO(3) group is simplified to processing on the SO(2) subgroup for axial rotation. As an abelian group, SO(2) enables equivariant linear mappings with an extremely concise block structure. Consequently, the computationally expensive SO(3) tensor products are replaced with highly efficient SO(2) equivariant linear operators. While fully preserving complete rotational equivariance, this design drastically reduces the computational overhead for angular calculations.
+DPA4’s core idea avoids expensive global SO(3) tensor calculations by simplifying symmetry processing to the SO(2) subgroup: it builds a local coordinate frame for each atomic bond, unifying bond orientations to a reference axis. Axial rotation symmetry only requires SO(2) processing, whose simple block-structured linear mappings replace heavy SO(3) tensor operations. Full rotational equivariance is preserved while angular computation overhead drops greatly.
On this basis, DPA4 further incorporates attention mechanisms to aggregate information from neighboring atoms. The model can adaptively focus on the most critical atomic interactions for the central atom according to local geometric and chemical environments, thus delivering strong expressive power with a compact parameter scale. The entire model strictly adheres to translation, rotation, permutation symmetries and energy conservation, ensuring full physical consistency.
-Beyond algorithmic innovations, DPA4 is also optimized for efficiency in engineering implementation:
-1. Native torch.compile Support: The model is designed to be compiler-friendly from the ground up. It can leverage torch.compile for end-to-end acceleration without additional code modifications.
-2. Native ZBL Short-range Potential: DPA4 natively integrates the ZBL repulsive potential, which smoothly describes physical behaviors at short interatomic distances and enhances the model’s robustness under extreme configurations such as high pressure, irradiation and material defects.
+Two major engineering optimizations further lift efficiency:
+1. Native torch.compile compatibility: end-to-end acceleration without extra code edits
+2. Embedded ZBL short-range repulsive potential: stabilizes simulations under high pressure, irradiation, defects and other extreme atomic configurations
-*DPA4 Model Structure Diagram*
+###### *DPA4 Model Structure Diagram*
-## 2. Benchmark Performance: Double Championship on Matbench Discovery and SPICE-MACE-OFF
+## 2. Benchmark Performance: Dual No.1 Titles
-Materials Discovery: World’s No.1 on Matbench Discovery. Initiated by top-tier institutions including the University of California, Berkeley and the University of Cambridge, Matbench Discovery is the most influential dynamic benchmark for AI-driven inorganic materials discovery worldwide and is widely recognized as the global gold standard for evaluating intelligent models in materials science. Different from simple static data fitting, it adopts a forward-looking testing mechanism that requires models to predict the thermodynamic stability of hundreds of thousands of unknown crystals, truly replicating the whole process of scientific research and exploration. Its evaluation system takes multiple metrics into account, including prediction accuracy of energy and force, F1 score and discovery acceleration factor, which are finally integrated into the comprehensive performance score CPS. Competing against state-of-the-art models from Meta, Microsoft and world-leading universities, DPA4 claimed the global top rank by its outstanding CPS score and became the new SOTA model.
+Matbench Discovery: Global Champion for Materials Discovery. Launched by UC Berkeley, Cambridge University and other top institutes, Matbench Discovery is the world’s leading dynamic benchmark for AI inorganic material prediction, widely accepted as the industry gold standard. Instead of static data fitting, it tests model capacity to forecast stability of hundreds of thousands of unknown crystals, mimicking real exploratory research. Final Comprehensive Performance Score (CPS) integrates energy/force accuracy, F1 score and discovery acceleration factors. Beating SOTA models from Meta, Microsoft and global universities, DPA4 claimed the top CPS rank.
-Small Molecules: Leading Performance on SPICE-MACE-OFF. DPA4’s advantages extend far beyond inorganic crystals. It achieved a new SOTA record on SPICE-MACE-OFF, a leading benchmark for molecular research. With a smaller parameter size, it outperformed the former leading model eSEN to take the first place. DPA4 demonstrates consistent superior performance across inorganic crystals and organic small molecules, as well as in energy and force prediction, further proving its potential as a general-purpose potential energy surface model.
+SPICE-MACE-OFF: Top Small-Molecule Performance. DPA4 excels beyond inorganic crystals, setting a new SOTA record on the SPICE-MACE-OFF small-molecule benchmark. With smaller parameter size, it outperformed the former leading model eSEN to take the first place, proving its versatility as a universal potential energy surface model for crystals and organic small molecules alike.
-*Performance on SPICE-MACE-OFF*
+###### *Performance on SPICE-MACE-OFF*
-## 3. Efficiency Comparison: Redefining the Accuracy-Cost Pareto Frontier
+## 3. Efficiency Breakthrough: A New Precision-Cost Pareto Frontier
-While the double championship across benchmarks validates DPA4’s exceptional accuracy, its true distinction lies in the remarkably low computational cost required to achieve such performance.
+Top benchmark results confirm DPA4’s outstanding accuracy, yet its revolutionary value lies in minimal training cost. Conventional leading models always require larger parameters and heavier computation; DPA4 breaks this norm:
+- Training cost: 1 RTX 5090 (~1 day) = eSEN’s 300+ GPU days of precision
+- Parameter scale: <1/10 of eSEN at equivalent CPS
+- Generation upgrade: ~10× training efficiency vs DPA3 under equal prediction accuracy
+
+
-Traditionally, topping benchmark leaderboards has always been associated with larger parameter sizes and higher training costs. DPA4, however, sets a new Pareto frontier in both accuracy and training cost:
-- Training Cost: Theoretically, a single consumer-grade RTX 5090 graphics card running for about one day can help DPA4 reach the accuracy level that eSEN, the previous SOTA model, attained after more than 300 GPU days of training.
-- Parameter Scale: Under the same CPS performance, DPA4’s parameter count is less than one-tenth of eSEN’s.
-- Generational Improvement: With equivalent prediction accuracy, DPA4’s training efficiency is approximately 10 times higher than its predecessor DPA3.
+###### *DPA4 Redefining the Accuracy-Efficiency Pareto Frontier for Large Atomic Models*
-Such substantial efficiency gains rely on multi-level engineering optimizations. While torch.compile can bring free performance boosts to conventional AI model training, machine learning potential training involves double backward computation since force is the derivative of energy. Given that compile does not support double backward, researchers had long been forced to expand batch sizes to maximize GPU utilization in potential function training. DPA4 is the first model worldwide to support compile-enabled training for machine learning potentials. Combined with automatic mixed precision (autocast) to bf16, it drastically reduces video memory consumption, laying a solid foundation for training larger models on a single graphics card.
+Multi-layer engineering optimizations enable such efficiency gains. Unlike standard AI models, potential training needs double backward calculation for force derivation, which long blocked torch.compile use and forced large batch sizes to boost GPU utilization. DPA4 is the world’s first machine learning potential supporting compile training. Paired with bf16 automatic mixed precision (AMP), it slashes VRAM usage drastically, enabling full training on a single graphics card.
*Comparison of Training Time and Peak Video Memory Usage with DPA4’s Compile and AMP Enabled*
-This breakthrough means researchers can complete model training and iteration faster, and simulate microscopic processes at larger scales and over longer time spans with the same computing budget. DPA4 has transformed large-scale, high-throughput atomic simulation from a "computing luxury" into a universally accessible tool, which holds significant application value for research and development in battery materials, catalyst design, semiconductor exploration and other fields.
+This breakthrough accelerates model iteration for researchers; fixed computing budgets support larger-scale, longer-time microscopic simulations. DPA4 turns high-throughput atomic simulation from a high-cost luxury into accessible tools, supporting battery material R&D, catalyst design, semiconductor screening and other key industries.
## Summary
-DPA4 is a new-generation general-purpose potential function architecture built for the era of Large Atomic Models (LAMs). By combining SO(2) equivariant linear operators under local coordinate systems with attention mechanisms, it maintains full physical symmetry and energy conservation while greatly lowering the overhead of equivariant computations. Further optimized via native torch.compile acceleration and built-in ZBL potential, it delivers enhanced performance at the engineering level.
+DPA4 is a universal next-gen potential framework for the LAM era. Its local SO(2) equivariant operators plus attention maintain physical consistency while cutting computation costs; native compile acceleration and built-in ZBL potential further boost engineering performance.
-DPA4 claimed the top position on both Matbench Discovery (materials discovery) and SPICE-MACE-OFF (small molecules). It establishes a brand-new Pareto frontier balancing accuracy and training cost: it matches or surpasses previous high-cost large models with less than one-tenth of the parameters and just one day of training on a single consumer graphics card. DPA4 fully proves that high accuracy and high efficiency are no longer mutually exclusive.
+It claimed dual top benchmark rankings, delivering matching or superior precision to prior costly large models with <1/10 parameters and one-day single-GPU training. DPA4 verifies high accuracy and high efficiency can be achieved simultaneously.
-At present, DPA4 is open for early access within the DeepModeling community. Its research paper and official full release will be open-sourced sequentially in the future. On the journey toward the era of Large Atomic Models, we will always adhere to the principle of open-source and open collaboration. We welcome researchers to keep track of our progress, join our community and explore together.
+Early access is open to DeepModeling community members; papers and full open-source code will be released sequentially later. The team sticks to open collaboration in LAM research, inviting global scholars to follow progress and join the community.
## Core Developers & Affiliated Institutions
Li Tiancheng (Peking University, AISI)