DA-INR: Dynamic-Aware Spatio-temporal Representation Learning for Dynamic MRI Reconstruction

1Ulsan National Institute of Science and Technology (UNIST)
Corresponding authors
MICCAI 2025

Abstract

Dynamic MRI reconstruction, one of inverse problems, has seen a surge by the use of deep learning techniques. Especially, the practical difficulty of obtaining ground truth data has led to the emergence of unsupervised learning approaches. A recent promising method among them is implicit neural representation (INR), which defines the data as a continuous function that maps coordinate values to the corresponding signal values. This allows for filling in missing information only with incomplete measurements and solving the inverse problem effectively. Nevertheless, previous works incorporating this method have faced drawbacks such as long optimization time and the need for extensive hyperparameter tuning. To address these issues, we propose Dynamic-Aware INR (DA-INR), an INR-based model for dynamic MRI reconstruction that captures the spatial and temporal continuity of dynamic MRI data in the image domain and explicitly incorporates the temporal redundancy of the data into the model structure. As a result, DA-INR outperforms other models in reconstruction quality even at extreme undersampling ratios while significantly reducing optimization time and requiring minimal hyperparameter tuning.

Overall Pipeline

MY ALT TEXT

Overall pipeline of DA-INR. A deformation network \( \Psi_t \) takes a spatio-temporal coordinate \( (x, y, t) \) as input to output deformation field \( \Delta \mathbf{x} = (\Delta x, \Delta y) \) based on a canonical space. A pretrained feature extractor extracts features from an undersampled data in the image domain. A canonical network \( \Psi_x \) takes the deformed coordinate \( \mathbf{x}' \) and the features \( \mathbf{f}' \) to predict \( t^{\text{th}} \) frame in the image domain, \( d_\theta \). These two models are optimized by L1 loss computation in the frequency domain with Non-uniform Fast Fourier Transform (NuFFT). "Sampling" means upscaling the coordinates or the features by nearest-neighborhood or bilinear interpolation. F.E and H.E mean Frequency Encoding and Hash Encoding.

Encoding Temporal Redundancy

MY ALT TEXT

Encoding Temporal Redundancy in DA-INR. In DA-INR, the cells of the image in the canonical space plays a regularization role to those of all other frames. The purplish lines between frame-by-frame indicate that DA-INR is continuous in time, but does not merely represent dynamic MRI data as 3D mass like existing methods.

Qualitative Results

MY ALT TEXT
Qualitative Results in \( (y-x) \) domain of \(AF=9.8\) in cardiac cine data reconstruction at diastole and systole.
Reconstruction 2
Qualitative Results in \( (x-t) \) domain of cardiac cine data reconstruction in various \(AF\)s.
Reconstruction 33
Qualitative results of DCE liver data reconstruction with an undersampling ratio of 34 spokes per frame \( (AF=11.3) \). We visualize reconstruction at different contrast phases (left), and compare signal intensity flow for aorta (AO) and portal vein (PV) ROI (right).

Spatio-temporal Interpolation Ability of DA-INR

Spatial Interpolation

MY ALT TEXT
Qualitative results in spatial interpolation \( (\times 1.2, \times 1.5, \times 2) \) task of cardiac cine data at \(AF=9.8\).

Temporal Interpolation

MY ALT TEXT
Comprehensive results in temporal interpolation \( (\times 2, \times 3) \) task of cardiac cine data. (a), (b) Temporal interpolation results at \(AF=6.1\) for \( (\times2) \) and \( (\times3) \), respectively. (c) The visual comparison of each method in the \( (x-t) \) domain.

Ablation Study of Pretrained Feature Extractor Types

Feature Extractor PSNR (dB) SSIM GPU Memory Usage (GB) Runtime (sec)
w/o Encoder 29.59 0.8807 1.9 1332.80
EDSR [1] 29.53 0.8790 3.5 2826.45
RDN [2] 29.28 0.8750 12.0 6024.91
SwinIR [3] 29.34 0.8816 18.3 5889.91
MDSR [4] (Ours) 30.13 0.8835 3.5 1445.50
References
  1. Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. CoRR, abs/1707.02921, 2017.
  2. Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. Residual dense network for image super-resolution. CoRR, abs/1802.08797, 2018.
  3. Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. SwinIR: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1833โ€“1844, 2021.
  4. Shangqi Gao and Xiahai Zhuang. Multi-scale deep neural networks for real image super-resolution. CoRR, abs/1904.10698, 2019.

BibTeX

@misc{baik2025dynamicawarespatiotemporalrepresentationlearning,
      title={Dynamic-Aware Spatio-temporal Representation Learning for Dynamic MRI Reconstruction}, 
      author={Dayoung Baik and Jaejun Yoo},
      year={2025},
      eprint={2501.09049},
      archivePrefix={arXiv},
      primaryClass={eess.IV},
      url={https://arxiv.org/abs/2501.09049}, 
}