Image Generation Module via Diffusion Models
Project Overview
A comprehensive Python package implementing diffusion models for image generation, featuring multiple diffusion processes, sampling strategies, and controllable generation capabilities. The package provides both programmatic APIs and an interactive Streamlit dashboard for experimentation without coding.
Core Capabilities
Diffusion Process Variants
The package implements three fundamental approaches to the diffusion process:
- Variance Exploding (VE): Adds noise with increasing variance while preserving signal energy, suitable for high-frequency detail preservation
- Variance Preserving (VP): Maintains constant total variance throughout the diffusion process, offering balanced training stability
- Sub-Variance Preserving (Sub-VP): A hybrid approach that provides finer control over the noise schedule, though requiring longer training to converge
Each variant follows a well-defined mathematical framework based on Stochastic Differential Equations (SDEs), providing theoretical guarantees about convergence and generation quality.
Sampling Methods
Four distinct numerical solvers for the reverse diffusion process:
| Sampler | Type | Characteristics | Best Use Case |
|---|---|---|---|
| Euler-Maruyama | Stochastic | Simple, fast, stochastic trajectories | Quick generation with acceptable quality |
| Predictor-Corrector | Stochastic | Two-stage refinement per step | High-quality generation when time permits |
| Probability Flow ODE | Deterministic | Converges to distribution mean | Consistent outputs, good for interpolation |
| Exponential Integrator | Stochastic | Advanced numerical stability | Complex dynamics with fewer steps |
Each sampler provides different trade-offs between generation speed (as few as 50 steps) and output quality (up to 1000+ steps for maximum fidelity).
Noise Scheduling
Two scheduling strategies control how noise is distributed across diffusion timesteps:
- Linear Schedule: Uniform noise addition, simple and predictable
- Cosine Schedule: More noise in middle timesteps, preserving early structure and late details
The scheduler directly impacts training stability and generation quality, with cosine scheduling generally producing superior results for natural images.
Controllable Generation
Grayscale Colorization
Transforms grayscale images into plausible color versions without model retraining:
model = GenerativeModel.load("pretrained_model.pth")
gray_image = load_grayscale_image()
colorized = model.colorize(gray_image, n_steps=500)
The colorization process works by:
- Converting RGB → grayscale by averaging channels
- Guiding the reverse diffusion to preserve luminance structure
- Generating chromatic information conditioned on grayscale content
This leverages the model’s learned distribution of natural image colors without requiring paired training data.
Region Imputation (Inpainting)
Fills missing or masked regions coherently with surrounding content:
# Create binary mask (1 = fill, 0 = preserve)
mask = create_mask(image, region=(x, y, width, height))
completed = model.imputation(image, mask, n_steps=500)
The imputation algorithm:
- Preserves unmasked regions throughout the reverse process
- Generates masked regions conditioned on visible boundaries
- Maintains global coherence through diffusion dynamics
Particularly effective for:
- Removing unwanted objects
- Completing partially occluded scenes
- Restoring damaged or corrupted image regions
Class-Conditioned Generation
Generates images belonging to specific categories:
# Train with class labels
model.train(dataset, epochs=100, use_labels=True)
# Generate images of class 3 (e.g., "ship" in CIFAR-10)
ships = model.generate(num_samples=4, class_label=3)
Conditioning is implemented through:
- Class embeddings injected at multiple network layers
- Classifier-free guidance for stronger conditioning
- Optional guidance strength parameter for quality/diversity trade-off
Architecture & Design
Modular Component System
The package follows a plugin architecture where each component type (diffusers, samplers, schedulers) inherits from abstract base classes:
# All diffusers implement this interface
class BaseDiffusion(ABC):
@abstractmethod
def drift(self, x, t): ...
@abstractmethod
def diffusion(self, t): ...
@abstractmethod
def sde(self, x, t): ...
This allows users to:
- Mix and match any combination of components
- Create custom implementations by inheriting from base classes
- Swap components without modifying existing code
Score-Based Neural Network
The core denoising model is a U-Net architecture with:
- Time embedding: Sinusoidal positional encoding for timestep conditioning
- Skip connections: Preserving multi-scale features for detail reconstruction
- Group normalization: Stable training across batch sizes
- Attention layers: Capturing long-range dependencies (optional)
The network learns the score function (gradient of log probability) rather than directly predicting noise, providing better theoretical properties.
Secure Serialization System
A unique feature is the CustomClassWrapper for loading models with user-defined components:
# Save model with custom diffuser
model.save("model.pth", include_classes=True)
# Load on different machine without copying source code
loaded_model = GenerativeModel.load("model.pth",
allow_custom=True,
confirm_execution=True)
The system:
- Serializes both model weights and class definitions
- Requires explicit user confirmation before executing custom code
- Runs user code in a restricted environment with limited imports
- Prevents common security vulnerabilities while enabling collaboration
Evaluation Metrics
Three standard metrics for quantitative assessment:
Bits Per Dimension (BPD)
Measures how well the model compresses data:
- Lower values indicate better density estimation
- Comparable across different image sizes
- Computed via Monte Carlo estimation of the evidence lower bound
Fréchet Inception Distance (FID)
Compares feature distributions between real and generated images:
- Uses Inception v3 features for perceptual similarity
- Lower values indicate more realistic generations
- Industry-standard metric for generative model comparison
Inception Score (IS)
Evaluates both quality and diversity:
- Measures KL divergence between conditional and marginal class distributions
- Higher values indicate clearer, more diverse samples
- Sensitive to class distribution imbalance
Usage example:
from image_gen.metrics import FID, InceptionScore
fid_metric = FID()
fid_score = fid_metric(real_images, generated_images)
is_metric = InceptionScore()
is_score, is_std = is_metric(generated_images)
Interactive Dashboard
A Streamlit web application provides GUI access to all features:
- Model Configuration: Dropdowns for diffuser/sampler/scheduler selection
- Parameter Tuning: Sliders for steps, guidance strength, noise levels
- Live Generation: Real-time image generation with progress bars
- Comparison Mode: Side-by-side visualization of different configurations
- Export Options: Download generated images or save model checkpoints
Available at:
- Local: Run
streamlit run dashboard.py - Online (CPU-only): https://image-gen-htd.streamlit.app/
The dashboard includes:
- Internationalization (English/Spanish)
- Dark/light theme support
- Responsive layout for mobile devices
- Preset configurations for common use cases
Development Philosophy
Code Quality Standards
- Style: Google Python Style Guide compliance
- Type Hints: Full type annotation for IDE support and type checking
- Documentation: Google-style docstrings with auto-generated API docs
Documentation Approach
Multi-layered documentation strategy:
- API Reference: Auto-generated from docstrings via MkDocs
- Tutorials: Jupyter notebooks with executable examples
- Theory: Mathematical foundations and algorithm explanations
- Examples: Real-world use cases with full code
Published at:
- MkDocs: https://hectortablero.github.io/image-gen/
- DeepWiki: https://deepwiki.com/HectorTablero/image-gen
Extensibility
Users can extend the package by:
from image_gen.diffusion.base import BaseDiffusion
class CustomDiffusion(BaseDiffusion):
def drift(self, x, t):
# Custom drift function
return -0.5 * x * self.sigma(t)**2
def diffusion(self, t):
# Custom diffusion coefficient
return self.sigma(t)
# Use custom diffuser with existing samplers
model = GenerativeModel(diffusion=CustomDiffusion())
Practical Applications
Research
- Experimenting with novel diffusion formulations
- Comparing sampling strategies systematically
- Developing custom conditioning mechanisms
Education
- Understanding SDE-based generative models
- Visualizing diffusion dynamics through the dashboard
- Hands-on experimentation without infrastructure setup
Creative Tools
- Generating synthetic training data
- Creating variations of existing images
- Prototyping image editing workflows
Installation & Requirements
pip install image-gen-diffusion
Hardware Recommendations:
- Recommended: CUDA GPU with 6+ GB VRAM
- Optimal: Modern GPU (RTX 3060+), 16+ GB system RAM
License & Attribution
Released under MIT License. Free for academic, commercial, and personal use with attribution. No warranty provided. Users responsible for ensuring generated content complies with applicable laws and doesn’t infringe third-party rights.