Image Generation Module via Diffusion Models

Project Overview

A comprehensive Python package implementing diffusion models for image generation, featuring multiple diffusion processes, sampling strategies, and controllable generation capabilities. The package provides both programmatic APIs and an interactive Streamlit dashboard for experimentation without coding.

Core Capabilities

Diffusion Process Variants

The package implements three fundamental approaches to the diffusion process:

Variance Exploding (VE): Adds noise with increasing variance while preserving signal energy, suitable for high-frequency detail preservation
Variance Preserving (VP): Maintains constant total variance throughout the diffusion process, offering balanced training stability
Sub-Variance Preserving (Sub-VP): A hybrid approach that provides finer control over the noise schedule, though requiring longer training to converge

Each variant follows a well-defined mathematical framework based on Stochastic Differential Equations (SDEs), providing theoretical guarantees about convergence and generation quality.

Sampling Methods

Four distinct numerical solvers for the reverse diffusion process:

Sampler	Type	Characteristics	Best Use Case
Euler-Maruyama	Stochastic	Simple, fast, stochastic trajectories	Quick generation with acceptable quality
Predictor-Corrector	Stochastic	Two-stage refinement per step	High-quality generation when time permits
Probability Flow ODE	Deterministic	Converges to distribution mean	Consistent outputs, good for interpolation
Exponential Integrator	Stochastic	Advanced numerical stability	Complex dynamics with fewer steps

Each sampler provides different trade-offs between generation speed (as few as 50 steps) and output quality (up to 1000+ steps for maximum fidelity).

Noise Scheduling

Two scheduling strategies control how noise is distributed across diffusion timesteps:

Linear Schedule: Uniform noise addition, simple and predictable
Cosine Schedule: More noise in middle timesteps, preserving early structure and late details

The scheduler directly impacts training stability and generation quality, with cosine scheduling generally producing superior results for natural images.

Controllable Generation

Grayscale Colorization

Transforms grayscale images into plausible color versions without model retraining:

model = GenerativeModel.load("pretrained_model.pth")
gray_image = load_grayscale_image()
colorized = model.colorize(gray_image, n_steps=500)

The colorization process works by:

Converting RGB → grayscale by averaging channels
Guiding the reverse diffusion to preserve luminance structure
Generating chromatic information conditioned on grayscale content

This leverages the model’s learned distribution of natural image colors without requiring paired training data.

Region Imputation (Inpainting)

Fills missing or masked regions coherently with surrounding content:

# Create binary mask (1 = fill, 0 = preserve)
mask = create_mask(image, region=(x, y, width, height))
completed = model.imputation(image, mask, n_steps=500)

The imputation algorithm:

Preserves unmasked regions throughout the reverse process
Generates masked regions conditioned on visible boundaries
Maintains global coherence through diffusion dynamics

Particularly effective for:

Removing unwanted objects
Completing partially occluded scenes
Restoring damaged or corrupted image regions

Class-Conditioned Generation

Generates images belonging to specific categories:

# Train with class labels
model.train(dataset, epochs=100, use_labels=True)

# Generate images of class 3 (e.g., "ship" in CIFAR-10)
ships = model.generate(num_samples=4, class_label=3)

Conditioning is implemented through:

Class embeddings injected at multiple network layers
Classifier-free guidance for stronger conditioning
Optional guidance strength parameter for quality/diversity trade-off

Architecture & Design

Modular Component System

The package follows a plugin architecture where each component type (diffusers, samplers, schedulers) inherits from abstract base classes:

# All diffusers implement this interface
class BaseDiffusion(ABC):
    @abstractmethod
    def drift(self, x, t): ...

    @abstractmethod
    def diffusion(self, t): ...

    @abstractmethod
    def sde(self, x, t): ...

This allows users to:

Mix and match any combination of components
Create custom implementations by inheriting from base classes
Swap components without modifying existing code

Score-Based Neural Network

The core denoising model is a U-Net architecture with:

Time embedding: Sinusoidal positional encoding for timestep conditioning
Skip connections: Preserving multi-scale features for detail reconstruction
Group normalization: Stable training across batch sizes
Attention layers: Capturing long-range dependencies (optional)

The network learns the score function (gradient of log probability) rather than directly predicting noise, providing better theoretical properties.

Secure Serialization System

A unique feature is the CustomClassWrapper for loading models with user-defined components:

# Save model with custom diffuser
model.save("model.pth", include_classes=True)

# Load on different machine without copying source code
loaded_model = GenerativeModel.load("model.pth",
                                     allow_custom=True,
                                     confirm_execution=True)

The system:

Serializes both model weights and class definitions
Requires explicit user confirmation before executing custom code
Runs user code in a restricted environment with limited imports
Prevents common security vulnerabilities while enabling collaboration

Evaluation Metrics

Three standard metrics for quantitative assessment:

Bits Per Dimension (BPD)

Measures how well the model compresses data:

Lower values indicate better density estimation
Comparable across different image sizes
Computed via Monte Carlo estimation of the evidence lower bound

Fréchet Inception Distance (FID)

Compares feature distributions between real and generated images:

Uses Inception v3 features for perceptual similarity
Lower values indicate more realistic generations
Industry-standard metric for generative model comparison

Inception Score (IS)

Evaluates both quality and diversity:

Measures KL divergence between conditional and marginal class distributions
Higher values indicate clearer, more diverse samples
Sensitive to class distribution imbalance

Usage example:

from image_gen.metrics import FID, InceptionScore

fid_metric = FID()
fid_score = fid_metric(real_images, generated_images)

is_metric = InceptionScore()
is_score, is_std = is_metric(generated_images)

Interactive Dashboard

A Streamlit web application provides GUI access to all features:

Model Configuration: Dropdowns for diffuser/sampler/scheduler selection
Parameter Tuning: Sliders for steps, guidance strength, noise levels
Live Generation: Real-time image generation with progress bars
Comparison Mode: Side-by-side visualization of different configurations
Export Options: Download generated images or save model checkpoints

Available at:

Local: Run streamlit run dashboard.py
Online (CPU-only): https://image-gen-htd.streamlit.app/

The dashboard includes:

Internationalization (English/Spanish)
Dark/light theme support
Responsive layout for mobile devices
Preset configurations for common use cases

Development Philosophy

Code Quality Standards

Style: Google Python Style Guide compliance
Type Hints: Full type annotation for IDE support and type checking
Documentation: Google-style docstrings with auto-generated API docs

Documentation Approach

Multi-layered documentation strategy:

API Reference: Auto-generated from docstrings via MkDocs
Tutorials: Jupyter notebooks with executable examples
Theory: Mathematical foundations and algorithm explanations
Examples: Real-world use cases with full code

Published at:

MkDocs: https://hectortablero.github.io/image-gen/
DeepWiki: https://deepwiki.com/HectorTablero/image-gen

Extensibility

Users can extend the package by:

from image_gen.diffusion.base import BaseDiffusion

class CustomDiffusion(BaseDiffusion):
    def drift(self, x, t):
        # Custom drift function
        return -0.5 * x * self.sigma(t)**2

    def diffusion(self, t):
        # Custom diffusion coefficient
        return self.sigma(t)

# Use custom diffuser with existing samplers
model = GenerativeModel(diffusion=CustomDiffusion())

Practical Applications

Research

Experimenting with novel diffusion formulations
Comparing sampling strategies systematically
Developing custom conditioning mechanisms

Education

Understanding SDE-based generative models
Visualizing diffusion dynamics through the dashboard
Hands-on experimentation without infrastructure setup

Creative Tools

Generating synthetic training data
Creating variations of existing images
Prototyping image editing workflows

Installation & Requirements

pip install image-gen-diffusion

Hardware Recommendations:

Recommended: CUDA GPU with 6+ GB VRAM
Optimal: Modern GPU (RTX 3060+), 16+ GB system RAM

License & Attribution

Released under MIT License. Free for academic, commercial, and personal use with attribution. No warranty provided. Users responsible for ensuring generated content complies with applicable laws and doesn’t infringe third-party rights.