The State of the Art in Image Upscaling and Video Super-Resolution (Nov 13, 2025)

A comprehensive analysis of 30+ algorithms, performance benchmarks, and emerging trends in real-time image and video super-resolution.


Executive Summary: Key Findings

🎯 Main Discoveries

1. Architecture Paradigm Shift

2. Quality Has Plateaued

3. Real-Time Revolution is Here

4. Efficiency Explosion

5. Video SR Maturation

πŸ“Š By The Numbers

Metric 2022-2023 2024-2025 Change
Typical SOTA Parameters 16-20M 5-8M 75% reduction
PSNR improvement/year +0.3 dB +0.1 dB Diminishing returns
Real-time at 4K Not practical 60+ FPS Now standard
Methods handling real-world degradation Limited Multiple Solved
Perceptual metric adoption Emerging Mainstream Standard now

Part 1: The Landscape

Historical Context: 14 Years of Evolution

The super-resolution field has undergone dramatic architectural transformations:

2014: SRCNN - The pioneer (3 layers, 1M parameters) - established CNN-based SR

2016-2018: GAN era - SRGAN, ESRGAN, BSRGAN introduced adversarial training for perceptual quality

2018: Attention mechanisms - RCAN brought channel attention to SR (16M parameters, deep networks)

2021: Transformer arrival - SwinIR demonstrated vision transformers could reduce parameters by 67% while improving quality

2023: RT4KSR challenge - Proved real-time 4K feasible (60+ FPS on commercial GPUs)

2024-2025: Mamba era - State space models emerged as efficient alternatives; hybrid architectures solidified dominance

Why This Matters

For practitioners, this means:


Part 2: The Algorithms

Category 1: Transformer-Based Methods (SOTA Quality Leaders)

Performance Specs

Architecture Innovation HAT’s breakthrough was combining two complementary attention mechanisms:

  1. Channel Attention - learns which feature channels are most important
  2. Window-based Self-Attention - captures spatial relationships locally

This β€œhybrid” approach activates more pixels in feature space than methods using only one attention type, resulting in clearer, more coherent details.

Best for: Professional image enhancement, publication-quality results, desktop applications

Availability: Open source at https://github.com/XPixelGroup/HAT


HAAT (Hybrid Attention Aggregation Transformer) - 2024

What’s New Building on HAT’s success, HAAT introduces:

Performance Specs

Best for: Research, highest-quality offline processing


SwinIR (Swin Image Restoration) - 2021 ⭐ STABLE BASELINE

Why It’s Still Relevant SwinIR kicked off the transformer revolution in SR by proving transformers could be more efficient than CNNs:

Performance Specs

Best for: Established production use, research baseline comparisons, when stability is prioritized

Availability: https://github.com/JingyunLiang/SwinIR


Emerging Transformer Variants (2024-2025)

SVTSR - Scattering Vision Transformer with spectral analysis for intricate detail capture

XTNSR - Hybrid CNN-Transformer using Xception blocks + local feature window transformers

LFESR - Local Feature Enhancement Transformer balancing global context with local detail

PARGT - Parallel Attention Recursive Generalization for fine-grained feature interaction

Status: Academic research stage; not yet mainstream production deployment


Category 2: State Space Models / Mamba (The New Frontier)

MambaIR (Mamba Image Restoration) - 2024

The Game Changer Mamba represents a fundamentally different approach to modeling dependencies:

Performance Specs

Architecture Combines vanilla Mamba foundation with:

Best for: Large-scale processing, edge devices, situations where memory is constrained


Hi-Mamba (Hierarchical Mamba) - October 2024

Key Innovation Two-path design capturing both local and regional context:

Performance Specs

Best for: Production efficiency-focused deployments


SΒ³Mamba (Scaleable State Space Model) - 2024

Unique Capability First Mamba model supporting arbitrary-scale super-resolution (not limited to 2x, 3x, 4x)

Specs


Category 3: CNN-Based Methods (GAN & Attention)

Real-ESRGAN - 2021 ⭐ INDUSTRY STANDARD FOR BLIND SR

Why It Dominates Real-World Applications Real-ESRGAN solved the β€œblind super-resolution” problem - upscaling images with unknown degradation:

Performance Specs

What Makes It Special

The Results Real-ESRGAN produces noticeably better results on:

Best for: Any production deployment on real-world images, professional photo restoration

Availability: https://github.com/xinntao/Real-ESRGAN (Apache 2.0, pre-trained models on TensorFlow Hub)


ESRGAN - 2018

Historical Importance Still competitive nearly a decade later. Introduced:

Performance Specs


RCAN (Residual Channel Attention Network) - 2018

The Attention Baseline RCAN pioneered channel attention mechanisms for SR:

Performance Specs

Significance: Established attention mechanisms as fundamental to SR architecture design


BSRGAN - 2021

Innovation: Practical degradation model for blind SR

Performance Specs

Key Feature: Random shuffling of degradation order for realistic simulation


Category 4: Diffusion-Based Methods (High-Quality Experimental)

Latent Diffusion Models for Super-Resolution - 2022-2024

Concept Operating diffusion process in lower-dimensional latent space rather than pixel space:

Architecture

  1. Feature encoder β†’ latent space
  2. Diffusion process in latent space
  3. Frequency compensation module
  4. Pixel decoder

Advantages

Disadvantages


DPM-Solver (Diffusion Probabilistic Model Solver)

The Acceleration Breakthrough High-order ODE solver reducing diffusion inference steps:

Mathematical Foundation


ControlSR (Taming Diffusion for SR) - 2024

Latest Diffusion-Based Approach Controlled diffusion process with strong constraints:

Performance


Category 5: Real-Time Specialized Methods

VPEG (Efficient Perceptual SR) - 2025 ⭐ BEST EFFICIENCY

The Breakthrough Achieves Real-ESRGAN’s perceptual quality on a fraction of computational budget:

Performance Specs

Quality Comparison vs. Real-ESRGAN

Key Achievement: Proves high efficiency and quality are no longer mutually exclusive

Best for: Real-time applications, edge deployment, resource-constrained environments


RT4KSR (Real-Time 4K Super-Resolution) - 2023 ⭐ BENCHMARK ACHIEVEMENT

The Challenge NTIRE 2023 set an audacious goal: achieve >60 FPS at 4K resolution

The Results

Key Techniques

Significance: Proved real-time 4K is achievable on commercial hardware

Constraints


REAPPEAR - 2025

Platform-Specific Optimization AMD Ryzen AI-optimized real-time super-resolution engine

Features


Category 6: Video Super-Resolution

BasicVSR++ - 2021 ⭐ VIDEO REFERENCE STANDARD

Why It’s the Baseline Most video SR research compares against BasicVSR++:

Performance Specs

Architecture

Best for: Video enhancement research, quality-focused applications

Availability: https://github.com/OpenVisualCloud/Video-Super-Resolution-Library


FRVSR (Frame-Recurrent Video SR) - 2018

Innovation: Explicit optical flow for motion handling

Architecture

Performance Specs

Key Achievement: Reduced temporal flickering through explicit motion modeling


HAMSA (Hybrid Attention + Motion Alignment) - 2024

Latest Video Approach Combines HAT’s hybrid attention with motion-aware mechanisms:

Components

Performance


Other Recurrent Methods

RLSP (Recurrent Latent Space Propagation) - 2019

RRN and variants

Common advantages of recurrent methods:


Part 3: Performance Benchmarks & Comparison

PSNR Rankings (Peak Signal-to-Noise Ratio)

Rank Method Year PSNR (Set5) Scale Architecture
1 HAT/HAAT 2023-2024 32.8+ dB 4x Transformer
2 SwinIR 2021 32.5+ dB 4x Transformer
3 RCAN 2018 32+ dB 4x CNN+Attention
4 ESRGAN 2018 32.01 dB 4x GAN
5 Real-ESRGAN 2021 24.97 dB* 4x GAN
6 SRCNN 2014 ~32 dB 4x Simple CNN

*Real-ESRGAN scores on real-world degraded images (different distribution); not directly comparable

Speed Comparison

Method Input GPU Time FPS Real-time
SRCNN 256Γ—256 CPU <100ms 10+ βœ“ Yes
ESRGAN 480Γ—480 V100 2-5s 0.2 βœ— No
Real-ESRGAN 2500Γ—2500 Mid-range 7-30 min <0.1 βœ— No
SwinIR 256Γ—256 GPU 0.5-2s 0.5-2 βœ— No
VPEG 960Γ—540 GPU <33ms >30 βœ“ Yes
RT4KSR 4K GPU 8-16ms 60-120 βœ“ Yes
BasicVSR 480p frame GPU 50-100ms 10-20 Limited
FRVSR 480p frame GPU 100-200ms 5-10 βœ— No

Parameter Efficiency

Method Parameters Memory GFLOPs (960Γ—540) Category
SRCNN 1M <10 MB 200-500M Ultra-light
VPEG 5M ~15 MB <2000M Lightweight
MambaIR 3-8M 10-20 MB <1500M Lightweight
RCAN 16M 40 MB 1000+M Heavy
ESRGAN 16.6M 33 MB 1000+M Heavy
SwinIR 16M 40 MB 1000+M Heavy
HAT 16-20M 40-50 MB 1000+M Heavy
BasicVSR 5.2M 15 MB 400M/frame Medium
Real-ESRGAN 16.7M 33-50 MB 1000+M Heavy

Real-Time Capability Matrix

Resolution:    β”‚ 480p   β”‚ 720p   β”‚ 1080p  β”‚ 2K     β”‚ 4K
Scale (2x):    β”‚ 960p   β”‚ 1440p  β”‚ 2160p  β”‚ 4K     β”‚ 8K
               β”‚        β”‚        β”‚        β”‚        β”‚
SRCNN          β”‚ βœ“ Yes  β”‚ βœ“ Yes  β”‚ ~Okay  β”‚ βœ— No   β”‚ βœ— No
VPEG           β”‚ βœ“ Yes  β”‚ βœ“ Yes  β”‚ βœ“ Yes  β”‚ ~Okay  β”‚ βœ— No
RT4KSR         β”‚ N/A    β”‚ βœ“ Yes* β”‚ βœ“ Yes* β”‚ βœ“ Yes* β”‚ ~Okay
MambaIR        β”‚ βœ“ Yes  β”‚ βœ“ Yes  β”‚ ~Okay  β”‚ βœ— No   β”‚ βœ— No
SwinIR         β”‚ ~Okay  β”‚ ~Okay  β”‚ βœ— No   β”‚ βœ— No   β”‚ βœ— No
HAT            β”‚ ~Okay  β”‚ ~Okay  β”‚ βœ— No   β”‚ βœ— No   β”‚ βœ— No
Real-ESRGAN    β”‚ ~Okay  β”‚ βœ— No   β”‚ βœ— No   β”‚ βœ— No   β”‚ βœ— No

* Specifically optimized for 4K output
~ = Possible but challenging on standard hardware

Metric Definitions

PSNR (Peak Signal-to-Noise Ratio)

SSIM (Structural Similarity Index Measure)

LPIPS (Learned Perceptual Image Patch Similarity)

VMAF (Video Multi-Method Assessment Fusion)

PI (Perceptual Index)


NTIRE 2024 Challenge (Γ—4 Super-Resolution)

Winner: XiaomiMM Team

Results

Key Insights

  1. Transformers superior for sequence relationship modeling
  2. Mamba shows promise for scalability and efficiency
  3. Hybrid approaches (CNN + Transformer) emerging as optimal

AIM 2024 Challenge (Efficient Video Super-Resolution)

Context: Optimizing AV1-compressed content

Constraints

Results

Significance: Real-time video SR moved from theoretical to practical


NTIRE 2023 Real-Time 4K Challenge

Challenge Details

Results

Impact: Proved real-time 4K is commercially viable


Technology Adoption Patterns (2023-2025)

2023-2024 Shift

  1. From pure transformers β†’ Hybrid architectures
  2. From PSNR focus β†’ Perceptual metrics (LPIPS, CLIP-IQA)
  3. From slow offline β†’ Real-time feasible
  4. From large models β†’ Compact efficient versions

2024-2025 Frontier

  1. Mamba/SSM as transformer alternative
  2. State space models moving from research to production
  3. CLIP-based semantic filtering adoption
  4. Frequency-domain losses for texture restoration
  5. Multi-stage adaptive training strategies

Part 5: Architecture Evolution Over Time

2014-2017: Simple CNN Era
β”œβ”€ SRCNN: Proof of concept
β”œβ”€ Basic CNN stacking
└─ Focus: Any improvement over interpolation

2018-2020: GAN & Attention Era
β”œβ”€ SRGAN: Adversarial training
β”œβ”€ ESRGAN: Enhanced GAN
β”œβ”€ RCAN: Channel attention
└─ Focus: Perceptual quality via GANs and attention

2021-2023: Transformer Dominance
β”œβ”€ SwinIR: Vision transformers in SR
β”œβ”€ HAT: Hybrid attention
β”œβ”€ Real-ESRGAN: Blind SR maturity
└─ Focus: Transformer efficiency and performance

2024-2025: Mamba & Hybrid Architectures
β”œβ”€ MambaIR: Linear complexity SSM
β”œβ”€ Hi-Mamba: Hierarchical state space
β”œβ”€ HAAT: Advanced hybrid attention
β”œβ”€ Diff-Mamba: Diffusion + SSM
└─ Focus: Efficiency, hybrid approaches, and emerging frontiers

Part 6: Use-Case Recommendations

Scenario 1: Real-Time Video Streaming Service

Primary:       VPEG or AIM 2024 Challenge Winners
Alternative:   RT4KSR (if static content)
Parameters:    3-5M
Target FPS:    24-30
Quality:       Balanced (VMAF optimized)
Infrastructure: GPU required
Timeline:      Weeks (proven methods)

Why: These methods proven in competition; real-time capability validated


Scenario 2: Desktop Photo Enhancement

Primary:       HAT or HAAT
Alternative:   SwinIR (stable baseline)
Parameters:    16-20M
Processing:    1-5 seconds acceptable
Quality:       Maximum
Infrastructure: GPU recommended
Timeline:      Weeks (implementations available)

Why: Highest quality acceptable when user waits seconds


Scenario 3: Mobile/Edge Device Deployment

Primary:       Quantized VPEG
Alternative:   TensorFlow Lite SRCNN
Parameters:    <5M (ideally <3M)
Target FPS:    10-15
Quality:       Acceptable (perceptual)
Infrastructure: No GPU required
Timeline:      Months (optimization work)

Why: Parameter constraints dominate; quantization essential


Scenario 4: 4K Real-Time Broadcast

Primary:       RT4KSR or variants
Alternative:   Custom optimized method
Target FPS:    60+
Quality:       Maintain over Bicubic
Infrastructure: High-end GPU or FPGA
Timeline:      Months (custom optimization)

Why: RT4KSR specifically designed for this; proven track record


Scenario 5: Real-World Degraded Images (Photo Restoration)

Primary:       Real-ESRGAN
Alternative:   BSRGAN
Blind SR:      Essential (unknown degradation)
Processing:    <30 seconds acceptable
Quality:       Industry standard
Infrastructure: GPU recommended
Timeline:      Weeks (pre-trained models available)

Why: Only methods specifically trained for unknown degradation types


Scenario 6: Video Quality (High-Quality Offline)

Primary:       BasicVSR++ or HAMSA
Alternative:   FRVSR
Real-time:     Not required
Quality:       Maximum PSNR
Infrastructure: GPU cluster
Timeline:      Hours per video

Why: Reference quality standards; recurrent architecture for temporal consistency


Scenario 7: Research Publication

Primary:       HAAT or latest NTIRE winner
Alternative:   HAT (stable baseline)
Focus:         PSNR + LPIPS + perceptual metrics
Quality:       State-of-the-art
Infrastructure: GPU cluster (training)
Timeline:      3-6 months (training required)

Why: Need latest methods for competitive results; multiple metrics for publication


Scenario 8: Existing Production System Upgrade

Primary:       SwinIR (migration from GAN-based)
Alternative:   HAT (if quality critical)
Compatibility: Framework-agnostic (ONNX export)
Risk:          Low (well-documented methods)
Timeline:      2-4 weeks

Why: Proven stability, extensive documentation, clear performance improvements


Part 7: Key Metrics Explained

Understanding PSNR

Peak Signal-to-Noise Ratio measures pixel-level differences:

Limitation: Two images with same PSNR can look dramatically different to human eyes


Understanding SSIM

Structural Similarity models human visual perception:

Use case: Better indicator of perceived quality than PSNR alone


Understanding LPIPS (Key Metric for 2024-2025)

Learned Perceptual Image Patch Similarity:

Why it matters: LPIPS reveals why PSNR-optimized methods sometimes look worse than lower-PSNR methods

Example: Two methods both at 32 dB PSNR:


Understanding VMAF

Video Multi-Method Assessment Fusion:

Adoption in VSR: AIM 2024 challenge shifted from PSNR to VMAF for video SR


Part 8: Deployment Strategies

For GPU-Accelerated Environments

Tier 1 - Maximum Quality

Tier 2 - Balanced

Tier 3 - Real-Time


For CPU-Only Environments

Not recommended for production due to speed constraints, except:

Option 1: SRCNN variant

Option 2: Quantized lightweight model


For Mobile/Edge Deployment

Framework: TensorFlow Lite, ONNX Runtime, or PyTorch Mobile

Model Selection

  1. Keep parameters <5M (ideally <3M)
  2. Use quantization (int8 recommended, int4 for extreme constraints)
  3. Target: 1-3 FPS on mid-range devices

Process

  1. Start with VPEG or lightweight Mamba
  2. Convert to TensorFlow Lite / ONNX
  3. Apply int8 quantization (typically <1 dB PSNR loss)
  4. Test on target hardware
  5. Iterate if needed

Expected Results


For Browser-Based Deployment

Limited options due to computational constraints:

Framework: ONNX.js or TensorFlow.js

Recommendations


Part 9: The Efficiency Frontier

From Research to Production: The Efficiency Timeline

2022  β”‚ HAT/SwinIR: 16M params, ~1 second
      β”‚ Real-ESRGAN: 16.7M params, well-established
      β”‚
2023  β”‚ RT4KSR: 60+ FPS real-time proven
      β”‚ Efficiency track emerges
      β”‚
2024  β”‚ VPEG: 5M params, >30 FPS, matches Real-ESRGAN quality
      β”‚ Mamba methods: 3-8M params, linear complexity
      β”‚ AIM Efficient VSR: <250 GMacs, 24-30 FPS video
      β”‚
2025  β”‚ VPEG refined: 5M params optimal sweet spot
      β”‚ Hi-Mamba: Hierarchical efficiency
      β”‚ Multi-method ensembles emerging

The Parameter Reduction Story

Why parameters matter:

The trend:

Practical implications:


Part 10: Emerging Frontiers

1. Diffusion Models for Super-Resolution

Status: Experimental, gaining traction

Advantages

Disadvantages

Latest: ControlSR (2024) combining DPM-Solver acceleration with real-world degradation handling

Trajectory: Moving toward practical deployment; still 2-3 years from mainstream production


2. State Space Models (Mamba)

Status: Rapidly advancing from research to production

Why Exciting

Reality Check

Near term: Mamba adoption in specialized use cases (large-scale processing, mobile)

Medium term: Competitive parity with transformers on most tasks


3. CLIP-Based Semantic Filtering

Status: Entering mainstream adoption

Innovation

Impact


4. Frequency-Domain Losses

Status: Emerging standard in 2024-2025

Concept

Results


5. Multi-Stage Adaptive Pipelines

Status: Research frontier

Approach

  1. First stage: Quick initial upscaling
  2. Analysis: Detect problem areas
  3. Second stage: Refined processing on difficult regions
  4. Fusion: Blend results

Advantage: Allocate computational resources where needed


Part 11: Hardware Acceleration Support

GPU Support Matrix

Method NVIDIA AMD Intel GPU NPU/AI CPU
SRCNN βœ“ βœ“ βœ“ Limited βœ“ (slow)
ESRGAN/Real-ESRGAN βœ“ βœ“ βœ“ Limited βœ—
SwinIR βœ“ βœ“ βœ“ Limited βœ—
HAT βœ“ βœ“ βœ“ Limited βœ—
VPEG βœ“ βœ“ βœ“ βœ“ Yes Limited
MambaIR βœ“ βœ“ βœ“ Limited βœ—
RT4KSR βœ“ βœ“ βœ“ Limited βœ—
Upscayl βœ“ (Vulkan) βœ“ (Vulkan) βœ“ (Vulkan) Limited Limited

Framework Support

PyTorch

TensorFlow

ONNX

TensorFlow Lite

NCNN


Part 12: Installation & Deployment Guides

Quick Start: Using Real-ESRGAN

Installation (Python)

pip install realesrgan
# or using uv:
uv add realesrgan

Basic Usage

from realesrgan import RealESRGANer

model = RealESRGANer(scale=4, model_name='RealESRGAN_x4plus')
output = model.enhance(input_image)

For Desktop: Use Upscayl (https://github.com/upscayl/upscayl)


For Maximum Quality: HAT Deployment

GitHub: https://github.com/XPixelGroup/HAT

Installation

git clone https://github.com/XPixelGroup/HAT.git
cd HAT
uv add -r requirements.txt  # or pip install

Pre-trained Models


For Real-Time 4K: RT4KSR

GitHub: https://github.com/eduardzamfir/RT4KSR

Key Setting: Optimized specifically for 4K throughput


For Production Video: BasicVSR++

Framework: BasicSR (https://github.com/XPixelGroup/BasicSR)

Includes: Full training framework, pre-trained models, evaluation scripts


Part 13: The Future Outlook

Near Term (Next 12 Months - 2025)

Likely Developments

  1. Mamba maturation: Production-ready SSM models with transformer parity
  2. Efficiency focus: 3M parameter models becoming standard
  3. Real-time video: 30 FPS video SR on consumer GPU becoming normal
  4. Mobile deployment: Practical real-time super-resolution on mid-range phones
  5. Semantic awareness: CLIP integration becoming standard

Challenges


Medium Term (12-24 Months - 2026)

Expected Breakthroughs

  1. Arbitrary-scale SR: Seamless upscaling at any factor
  2. Unified architectures: Single model handling image+video+blind SR
  3. Adaptive methods: Real-time adjustment to image content
  4. Quantum considerations: Exploring quantum-friendly approaches

Long Term (24+ Months - 2027+)

Speculative Frontiers

  1. Neural rendering: Direct feature space manipulation
  2. Neuromorphic hardware: Spiking networks for ultra-efficient SR
  3. Foundation models: Large pretrained models for adaptation
  4. Task-agnostic: Single model for all image restoration tasks

Part 14: Critical Insights for Decision Making

The Quality Ceiling

Reality: PSNR improvements have plateaued at ~32.8-32.9 dB

Implication: Further algorithm innovation unlikely to yield significant PSNR gains

Solution: The field is shifting toward:

The Efficiency Revolution

Key Finding: 5M parameter models now match 16M+ parameter models in perceptual quality

VPEG Case Study:

Implication: The efficiency frontier has moved dramatically; old assumptions about quality vs. speed tradeoffs are outdated

Real-Time Achievement

Proven: 4K real-time (60+ FPS) is achievable and production-ready

Proven: Video real-time (30 FPS) is commercial reality

Implication: Resource constraints no longer excuse non-real-time deployments for most use cases

Blind Super-Resolution Solved

Reality: Real-ESRGAN and variants effectively handle real-world degraded images

Implication: Can now deploy production systems without knowing exact degradation type

The Hybrid Advantage

Finding: CNN-Transformer hybrids outperform pure architectures

Examples: HAT (hybrid attention), HAMSA (hybrid + motion)

Implication: Future architectures will likely embrace hybrid approaches


Part 15: Comparative Quick Reference

One-Liner Descriptions

Best Overall Quality: HAT/HAAT (32.8+ dB PSNR) Best Efficiency: VPEG (5M params, >30 FPS) Best Real-Time 4K: RT4KSR (60-120 FPS) Best Real-World Photos: Real-ESRGAN (blind SR) Best Video Quality: BasicVSR++ (reference standard) Best Research Stability: SwinIR (proven baseline) Best Emerging Tech: Hi-Mamba (hierarchical SSM) Best Video Real-Time: AIM 2024 Challenge Winners (<33ms)


Decision Tree

START: What's your primary constraint?

β”œβ”€ Quality (no time limit)
β”‚  └─ Use: HAT or HAAT (2023-2024)
β”‚
β”œβ”€ Speed (must be real-time)
β”‚  β”œβ”€ 4K video: RT4KSR (60+ FPS)
β”‚  β”œβ”€ Single image: VPEG (>30 FPS)
β”‚  └─ Video: AIM 2024 winners (<33ms/frame)
β”‚
β”œβ”€ Real-world degradation (unknown type)
β”‚  └─ Use: Real-ESRGAN (blind SR specialist)
β”‚
β”œβ”€ Edge device (limited memory/CPU)
β”‚  └─ Use: Quantized VPEG or SRCNN
β”‚
β”œβ”€ Video processing (temporal consistency)
β”‚  β”œβ”€ High quality: BasicVSR++
β”‚  β”œβ”€ Real-time: AIM 2024 challenge winners
β”‚  └─ Motion-aware: HAMSA
β”‚
└─ Research/Publication
   └─ Use: HAAT or latest NTIRE winner

Conclusion

The State of Super-Resolution in 2025

Where We Are

  1. Quality plateau achieved: Further PSNR improvements unlikely
  2. Real-time is now standard: Not an aspirational goal anymore
  3. Efficiency revolutionized: 75% parameter reduction while improving quality
  4. Practical deployment ready: Production systems can be deployed today
  5. Hybrid approaches winning: CNN-Transformer combinations dominating

What Works Best

Need Solution Why
Maximum quality HAT 32.8+ dB PSNR proven
Real-time anything VPEG or RT4KSR Tested in competition
Photo restoration Real-ESRGAN Only blind SR specialist
Video SR BasicVSR++ Reference standard
Mobile/Edge Quantized VPEG Proven efficiency
Research HAAT Latest SOTA

The Next Frontier

The field is transitioning from β€œhow to improve PSNR” to β€œhow to handle real-world complexity better”:

For Practitioners

Start here:

  1. For real-time: Use VPEG or RT4KSR (proven in competition)
  2. For quality: Use HAT or SwinIR (established baselines)
  3. For real photos: Use Real-ESRGAN (industry standard)
  4. For video: Use BasicVSR++ (reference implementation)

Then adapt:

The Bottom Line

Super-resolution has matured from research curiosity to production technology. The question is no longer β€œcan we do this” but β€œwhich method best fits my constraints.” Start with proven methods from recent competitions (RT4KSR, VPEG, AIM 2024 winners), validate on your data, and iterate.

The 2025 frontier is no longer about pushing PSNR; it’s about real-world quality, efficiency, and practical deployment.


References

Key Papers & Sources

Foundational

GANs & Attention Era

Transformer Revolution

State Space Models

Diffusion Models

Efficient Methods

Real-Time & Video

Challenges & Competitions

Open Source Implementations

Data & Benchmarks


Report Compiled: November 2025 Coverage Period: 2014-2025 (with emphasis on 2022-2025) Total Methods Analyzed: 30+ Data Points: 200+


This report synthesizes research from academic papers, challenge proceedings, and industry implementations. For specific method citations and detailed comparisons, refer to the reference section and original paper repositories.