Image Generation Module via Diffusion Models

Research Project

Project Overview

A comprehensive Python package implementing diffusion models for image generation, featuring multiple diffusion processes, sampling strategies, and controllable generation capabilities. The package provides both programmatic APIs and an interactive Streamlit dashboard for experimentation without coding.

Core Capabilities

Diffusion Process Variants

The package implements three fundamental approaches to the diffusion process:

  • Variance Exploding (VE): Adds noise with increasing variance while preserving signal energy, suitable for high-frequency detail preservation
  • Variance Preserving (VP): Maintains constant total variance throughout the diffusion process, offering balanced training stability
  • Sub-Variance Preserving (Sub-VP): A hybrid approach that provides finer control over the noise schedule, though requiring longer training to converge

Each variant follows a well-defined mathematical framework based on Stochastic Differential Equations (SDEs), providing theoretical guarantees about convergence and generation quality.

Sampling Methods

Four distinct numerical solvers for the reverse diffusion process:

SamplerTypeCharacteristicsBest Use Case
Euler-MaruyamaStochasticSimple, fast, stochastic trajectoriesQuick generation with acceptable quality
Predictor-CorrectorStochasticTwo-stage refinement per stepHigh-quality generation when time permits
Probability Flow ODEDeterministicConverges to distribution meanConsistent outputs, good for interpolation
Exponential IntegratorStochasticAdvanced numerical stabilityComplex dynamics with fewer steps

Each sampler provides different trade-offs between generation speed (as few as 50 steps) and output quality (up to 1000+ steps for maximum fidelity).

Noise Scheduling

Two scheduling strategies control how noise is distributed across diffusion timesteps:

  • Linear Schedule: Uniform noise addition, simple and predictable
  • Cosine Schedule: More noise in middle timesteps, preserving early structure and late details

The scheduler directly impacts training stability and generation quality, with cosine scheduling generally producing superior results for natural images.

Controllable Generation

Grayscale Colorization

Transforms grayscale images into plausible color versions without model retraining:

model = GenerativeModel.load("pretrained_model.pth")
gray_image = load_grayscale_image()
colorized = model.colorize(gray_image, n_steps=500)

The colorization process works by:

  1. Converting RGB → grayscale by averaging channels
  2. Guiding the reverse diffusion to preserve luminance structure
  3. Generating chromatic information conditioned on grayscale content

This leverages the model’s learned distribution of natural image colors without requiring paired training data.

Region Imputation (Inpainting)

Fills missing or masked regions coherently with surrounding content:

# Create binary mask (1 = fill, 0 = preserve)
mask = create_mask(image, region=(x, y, width, height))
completed = model.imputation(image, mask, n_steps=500)

The imputation algorithm:

  • Preserves unmasked regions throughout the reverse process
  • Generates masked regions conditioned on visible boundaries
  • Maintains global coherence through diffusion dynamics

Particularly effective for:

  • Removing unwanted objects
  • Completing partially occluded scenes
  • Restoring damaged or corrupted image regions

Class-Conditioned Generation

Generates images belonging to specific categories:

# Train with class labels
model.train(dataset, epochs=100, use_labels=True)

# Generate images of class 3 (e.g., "ship" in CIFAR-10)
ships = model.generate(num_samples=4, class_label=3)

Conditioning is implemented through:

  • Class embeddings injected at multiple network layers
  • Classifier-free guidance for stronger conditioning
  • Optional guidance strength parameter for quality/diversity trade-off

Architecture & Design

Modular Component System

The package follows a plugin architecture where each component type (diffusers, samplers, schedulers) inherits from abstract base classes:

# All diffusers implement this interface
class BaseDiffusion(ABC):
    @abstractmethod
    def drift(self, x, t): ...

    @abstractmethod
    def diffusion(self, t): ...

    @abstractmethod
    def sde(self, x, t): ...

This allows users to:

  • Mix and match any combination of components
  • Create custom implementations by inheriting from base classes
  • Swap components without modifying existing code

Score-Based Neural Network

The core denoising model is a U-Net architecture with:

  • Time embedding: Sinusoidal positional encoding for timestep conditioning
  • Skip connections: Preserving multi-scale features for detail reconstruction
  • Group normalization: Stable training across batch sizes
  • Attention layers: Capturing long-range dependencies (optional)

The network learns the score function (gradient of log probability) rather than directly predicting noise, providing better theoretical properties.

Secure Serialization System

A unique feature is the CustomClassWrapper for loading models with user-defined components:

# Save model with custom diffuser
model.save("model.pth", include_classes=True)

# Load on different machine without copying source code
loaded_model = GenerativeModel.load("model.pth",
                                     allow_custom=True,
                                     confirm_execution=True)

The system:

  • Serializes both model weights and class definitions
  • Requires explicit user confirmation before executing custom code
  • Runs user code in a restricted environment with limited imports
  • Prevents common security vulnerabilities while enabling collaboration

Evaluation Metrics

Three standard metrics for quantitative assessment:

Bits Per Dimension (BPD)

Measures how well the model compresses data:

  • Lower values indicate better density estimation
  • Comparable across different image sizes
  • Computed via Monte Carlo estimation of the evidence lower bound

Fréchet Inception Distance (FID)

Compares feature distributions between real and generated images:

  • Uses Inception v3 features for perceptual similarity
  • Lower values indicate more realistic generations
  • Industry-standard metric for generative model comparison

Inception Score (IS)

Evaluates both quality and diversity:

  • Measures KL divergence between conditional and marginal class distributions
  • Higher values indicate clearer, more diverse samples
  • Sensitive to class distribution imbalance

Usage example:

from image_gen.metrics import FID, InceptionScore

fid_metric = FID()
fid_score = fid_metric(real_images, generated_images)

is_metric = InceptionScore()
is_score, is_std = is_metric(generated_images)

Interactive Dashboard

A Streamlit web application provides GUI access to all features:

  • Model Configuration: Dropdowns for diffuser/sampler/scheduler selection
  • Parameter Tuning: Sliders for steps, guidance strength, noise levels
  • Live Generation: Real-time image generation with progress bars
  • Comparison Mode: Side-by-side visualization of different configurations
  • Export Options: Download generated images or save model checkpoints

Available at:

The dashboard includes:

  • Internationalization (English/Spanish)
  • Dark/light theme support
  • Responsive layout for mobile devices
  • Preset configurations for common use cases

Development Philosophy

Code Quality Standards

  • Style: Google Python Style Guide compliance
  • Type Hints: Full type annotation for IDE support and type checking
  • Documentation: Google-style docstrings with auto-generated API docs

Documentation Approach

Multi-layered documentation strategy:

  1. API Reference: Auto-generated from docstrings via MkDocs
  2. Tutorials: Jupyter notebooks with executable examples
  3. Theory: Mathematical foundations and algorithm explanations
  4. Examples: Real-world use cases with full code

Published at:

Extensibility

Users can extend the package by:

from image_gen.diffusion.base import BaseDiffusion

class CustomDiffusion(BaseDiffusion):
    def drift(self, x, t):
        # Custom drift function
        return -0.5 * x * self.sigma(t)**2

    def diffusion(self, t):
        # Custom diffusion coefficient
        return self.sigma(t)

# Use custom diffuser with existing samplers
model = GenerativeModel(diffusion=CustomDiffusion())

Practical Applications

Research

  • Experimenting with novel diffusion formulations
  • Comparing sampling strategies systematically
  • Developing custom conditioning mechanisms

Education

  • Understanding SDE-based generative models
  • Visualizing diffusion dynamics through the dashboard
  • Hands-on experimentation without infrastructure setup

Creative Tools

  • Generating synthetic training data
  • Creating variations of existing images
  • Prototyping image editing workflows

Installation & Requirements

pip install image-gen-diffusion

Hardware Recommendations:

  • Recommended: CUDA GPU with 6+ GB VRAM
  • Optimal: Modern GPU (RTX 3060+), 16+ GB system RAM

License & Attribution

Released under MIT License. Free for academic, commercial, and personal use with attribution. No warranty provided. Users responsible for ensuring generated content complies with applicable laws and doesn’t infringe third-party rights.

Technologies used

Python
PyTorch
Streamlit