HyperAIHyperAI

Command Palette

Search for a command to run...

A European Research Team Has Proposed SeaCast, a high-resolution Regional Ocean Forecasting Model That Can Provide 15-day Forecasts in Just 20 seconds.

Featured Image

Marine forecasting systems play an indispensable supporting role in areas such as shipping safety, aquaculture management, coastal zone risk prevention and control, and marine ecological monitoring. In the past, these systems primarily relied on numerical models based on physical equations to operate.Taking the Mediterranean Forecasting System (MedFS) within the Copernicus Marine Environmental Monitoring Service (CMEMS) as an example, this system employs a two-way coupled wave-current numerical model to provide operational ocean forecasts for up to 10 days with a horizontal resolution of approximately 4 kilometers (1/24°).It has become a recognized forecasting reference standard in the Mediterranean region.

However, high accuracy often comes with enormous computational costs. MedFS requires 89 CPU cores and takes about 70 minutes to complete a 10-day forecast, outputting a complete sea state field covering 141 depth layers. Such a high computational load limits its application in applications with high timeliness requirements, such as rapid scenario simulation or ensemble forecasting, making it difficult to fully respond to emergency needs in actual operations.

Significant progress has been made in machine learning-based weather forecasting in recent years. Leveraging advanced architectures such as Transformers, neural operators, and graph neural networks, machine learning methods have achieved performance comparable to or even superior to traditional numerical forecasting on a global scale. However, transferring these successful experiences to high-resolution regional ocean forecasting scenarios faces numerous challenges:Irregular land-sea distribution, complex lateral boundary conditions, and the need for detailed characterization of vertical stratification variables make it difficult for existing global-scale ocean AI models to be directly adapted to regional tasks.

To address this technological gap, a joint research team comprised of the University of Helsinki in Finland, the Mediterranean Climate Change Research Center, and the University of Salento in Italy...A graph neural network model called SeaCast, specifically designed for regional ocean forecasting, was developed.The model has achieved several key breakthroughs in its technical architecture: by optimizing the construction, training and evaluation process of the graph, it can accurately adapt to the irregular geometric structure of the ocean grid; by introducing key atmospheric forcing field data near the sea surface, it can enhance the physical correlation of the forecast; by coupling lateral boundary forcing, it can accurately characterize the inflow and outflow process of seawater and ensure consistency with the global ocean circulation system, thereby achieving high-precision prediction of ocean conditions.

Research highlights:

* This study proposes SeaCast, a high-resolution regional ocean forecasting machine learning model based on graph neural networks.

This model learns directly from historical reanalysis and analytical data to forecast key elements of the Mediterranean Sea. It outperforms the operational MedFS model across all vertical levels and all simulated elements.

* After model training, a 15-day forecast across 18 vertical levels at a 1/24° grid can be completed in just 20 seconds on a single GPU, far faster than the physical base model running on a CPU cluster.

Paper address:

https://www.nature.com/articles/s41598-025-31177-w

View more papers:https://hyper.ai/papers

Datasets: Ocean state, atmospheric forcing, lateral boundary, and satellite validation data

The dataset constructed in this study covers four major categories: ocean state, atmospheric forcing, lateral boundary forcing, and satellite observations.Provides system support for the training, validation and testing of the SeaCast model.

Ocean state data primarily originates from the Mediterranean Ocean Physical Analysis and Forecasting System.The system is built upon a two-way coupling of the NEMO v4.2 ocean model and the WAVEWATCH III v6.07 wave model. To improve simulation accuracy, the system employs the three-dimensional variational assimilation scheme OceanVar, effectively fusing field observations and satellite remote sensing data. The research team selected 18 depths, each at intervals of 200 meters or less, for modeling, and the terrain data was obtained from the GEBCO global terrain database through bilinear interpolation.

The model was initially trained using daily mean data from the Mediterranean reanalysis over 35 years (1987-2021), and fine-tuned using operational analysis data from 2022-2023. This fine-tuning aimed to enable the model to learn recent ocean conditions, adapt to the requirements of operational scenarios with the analysis field as the initial condition, and accommodate updates to the MedFS operational system. Model validation used analysis data from January to June 2024 (177 samples), while test data consisted of daily initialization forecasts from early July to the end of December 2024. Each initialization generated a 15-day forecast, and forecast skill evaluation continued until January 14, 2025, to fully cover the forecast lead time of SeaCast.

The research team incorporated 2-meter air temperature, sea-level air pressure, and 10-meter wind stress component calculated from the wind component into the atmospheric forcing.Atmospheric data during the training phase were derived from 6-hour ERA5 reanalysis data and aggregated into daily averages. During the testing phase, 6-hour aggregated daily forecasts from the ECMWF Ensemble Control Forecast (ENS) and the Artificial Intelligence Forecast System (AIFS) were used to compare the impact of different atmospheric forcings. The model employed a sliding window with three consecutive time steps as the atmospheric forcing input to capture short-term trends.

Furthermore, the research team defined the Strait of Gibraltar region (west of 5.2°W) and the Dardanelles Strait region (39.9°N to 40.4°N, 25.9°E to 26.4°E) as the open lateral boundaries of the model, using MedFS or global ocean forecast data to provide dynamic boundary forcing. Considering that Copernicus ocean forecast products typically have a 10-day lead time, while this study uses a 15-day forecast standard,The research team used an innovative extrapolation method to continuously extrapolate the last predicted state of the boundary region five times.The aging time of the lateral boundary forcing field is cleverly extended, ensuring boundary consistency throughout the entire aging process.

Satellite data is mainly used for model forecast validation and error assessment, covering two types of data: sea surface temperature and sea level anomalies.Sea surface temperature was measured using the Copernicus L3S multi-sensor fusion product (diurnal scale, 1/16° resolution), incorporating only nighttime observations to eliminate the effects of diurnal heating. During validation, model forecasts were resampled to the L3S grid for comparison. Sea level anomalies were measured using the Copernicus Level 3 near-real-time product, integrating 5 Hz observations from multiple altimeter satellites, and filtered to reduce noise. The sea level height output from the model was converted to anomalies and then mapped to satellite orbit coordinates via bilinear interpolation for validation.

SeaCast: A High-Resolution Regional Ocean Forecasting Model Based on Graph Neural Networks

SeaCast is a data-driven ocean forecasting model designed specifically for the Mediterranean region, capable of providing ocean forecasts for up to 15 days across 18 vertical levels on a 1/24° (approximately 4 km) horizontal grid.Its spatial resolution is consistent with the operational MedFS system, and it can predict variables covering vertically stratified zonal currents, meridional currents, salinity, temperature and sea level height, totaling 73 forecast fields.

The most prominent advantage of this model is its computational efficiency.On a single GPU, SeaCast completes a full 15-day forecast in just 20 seconds;In contrast, MedFS requires 89 CPU cores to output 141 vertical layers of results with a 120-second time step, and it takes about 70 minutes to generate a 10-day forecast. Although the two operate fundamentally differently, the efficiency advantage of data-driven methods in high-resolution upper ocean forecasting is obvious.

SeaCast employs an encoding-processing-decoding architecture, operating on a hierarchical graph grid adapted to Mediterranean features. As shown in the figure below, the input ocean state and atmospheric forcing field are first encoded into a coarse-resolution multi-scale grid representation. Subsequently, graph neural network layers process these latent features in a hierarchical manner, enabling the model to effectively capture the short-range and long-range interactions of the ocean. The processed output is then decoded back into the original high-resolution grid.

SeaCast uses graph neural networks for autoregressive ocean prediction.

Unlike directly predicting the state at the next moment,The model focuses on learning the changing trends of ocean state on a diurnal scale. It superimposes the predicted changes with the current state and then incorporates dynamic boundary conditions to generate a complete forecast for the next time step.This state will be used as a new input to the model, enabling forecasts at different time points through an autoregressive loop. Compared to multi-scale models such as GraphCast that connect nodes only in a single grid layer, the hierarchical approach used in this study divides the forecast area into multiple independent grid layers, resulting in more uniform grid-to-graph connectivity and effectively reducing simulation bias caused by differences in the size of node neighborhoods.

The atmospheric forcing field fully considers the ocean's response to atmospheric conditions, including the 10-meter wind stress component, 2-meter air temperature, mean sea level pressure, and the sine and cosine values of the annual accumulated days as seasonal indicators. During the training phase, the predicted conditions for the Gibraltar and Dardanelles Strait boundary regions are overlaid with actual values, while the evaluation phase replaces this with MedFS forecast data to handle open boundary conditions and ensure a more realistic dynamic process of seawater inflow and outflow.

The SeaCast model was first pre-trained for 200 rounds using 35 years of daily reanalysis data, and then fine-tuned for 30 rounds using 2 years of analysis data. The pre-training was run in data parallelism for 20.5 hours (1312 GPU hours) on 64 AMD MI250x GPUs, and the fine-tuning was run on 8 GPUs for 3.5 hours (28 GPU hours).

SeaCast's forecasting skills are superior to the MedFS model.

To assess the forecasting performance of the SeaCast model, we conducted multi-dimensional experiments. Using MedFS as a benchmark, we designed controlled experiments to evaluate its forecasting capabilities in key areas such as high-temperature extreme event identification, the impact of atmospheric forcing, and training duration.

In a comparative experiment between SeaCast and MedFS, MedFS had a forecast lead time of 10 days.SeaCast achieved 15-day ocean forecasts by integrating ECMWF atmospheric products extended to 15 days and extrapolating lateral boundaries.The experiment selected six factors: zonal currents, meridional currents, salinity, temperature, sea surface temperature, and sea level anomaly. Stratified validation was employed, with a persistence baseline as the lower limit. The results are shown in the figure below.SeaCast outperforms MedFS overall, and the gap widens as forecast lead time increases.Vertical stratification shows that the advantages of temperature and current are most pronounced near the sea surface, while the optimal effect of salinity is observed in deeper waters; only at a depth of 192 meters, SeaCast did not significantly outperform MedFS, which may be related to the fact that deeper processes were not considered.

Variation of prediction errors in SeaCast, MedFS, and persistent benchmark models

To identify extreme events, researchers drew upon the definition of ocean heat waves and calculated the 90th percentile of sea surface temperature based on satellite data to define a threshold for extreme temperature events. The results are shown in the figure below.Both SeaCast and MedFS significantly outperformed the persistence benchmark in terms of recognition capabilities, with SeaCast being slightly better.Its 15-day forecast lead time provides more time for early warning.

HSS score used to detect sea surface temperature anomalies above the 90th percentile

Researchers designed various variations of the experiment to assess the impact of training period and fine-tuning. The results are shown in the figure below. For zonal currents, meridional currents, temperature, and sea surface temperature, the model trained with only 10 years of reanalysis data performed comparably to MedFS; however, salinity and sea level anomalies required 35 years of data plus fine-tuning to outperform MedFS. Fine-tuning had limited improvement on sea level anomalies, possibly due to the sparsity of the validation data, but fine-tuned versions of other elements outperformed the un-fine-tuned versions. This finding has implications for regions with limited historical data.With only 10 years of reanalysis and at a lower cost, machine learning prediction models with performance comparable to numerical models can be trained.

Differences in normalized RMSE between the SeaCast variant and MedFS relative to the baseline model at different forecast lead times

AI-Driven Ocean Forecasting: Exploration and Practice by Global Academia and Industry

Globally, academia and industry are working together with unprecedented depth and breadth to promote the integration and innovation of artificial intelligence and marine forecasting technology. A number of representative research results and operational systems are reshaping the technological landscape of this field.

Among them, the European Centre for Medium-Range Weather Prediction (ECMWF), as the authoritative body for global medium-range weather forecasting, while continuously optimizing its traditional numerical forecasting system IFS, has...In recent years, the AIFS artificial intelligence forecasting system has been launched and has entered the operational phase.It is worth noting that ECMWF is extending this data-driven framework to Earth system models, focusing on machine learning modeling of components such as ocean, sea ice, and ocean waves.

Meanwhile, NVIDIA's Earth-2 initiative showcases the tech giant's strategic moves in the field of climate and ocean simulation. Earth-2 is not a single model, but a full-stack technology platform that encompasses global weather forecasting, climate simulation, generative AI downscaling, and data assimilation.One of its core components, FourCastNet, as an early global forecasting model based on Transformer, has achieved forecasting skills comparable to traditional numerical models.

Furthermore, Google Research's NeuralGCM represents a positive exploration of hybrid modeling approaches. This model combines a differentiable atmospheric dynamic core with a subgrid physical parameterization scheme that replaces machine learning, enabling decades of stable climate simulations.

This demonstrates that artificial intelligence is gradually being embedded into the core aspects of ocean forecasting, moving beyond its role as an auxiliary tool. Whether as a supplement to physical models, an end-to-end alternative, or integrated in a hybrid form, the value of data-driven approaches has moved beyond theoretical verification and is on the verge of operational exploration and industry application. In the future, with the continuous accumulation of multimodal observation data and the further integration of generative AI with physical mechanisms, ocean forecasting technology is expected to achieve a new balance between accuracy, timeliness, and interpretability, providing a more solid technological foundation for scientific research and industrial applications.

Reference Links:
1.https://mp.weixin.qq.com/s/dlEDxumoeTCkfkgY2s7V2g
2.https://mp.weixin.qq.com/s/dqhe6tWYrYvh06HTepsFpw