Local Engine - Blender 5.2+ - CUDA Native

PALLAIDIUM

Sculpt narratives in time.

Pallaidium lets you develop narratives directly with the visual and sonic elements that trigger emotion in you right now.

BLENDER VSE - INTEGRATED WORKFLOW

Creativity is about combining elements. Pallaidium empowers you to create and combine elements across all media, letting you explore the immediate emotional impact of these combinations as they play together on a single timeline.

The Sculptor's Method

Direct Exploration of the Audio-Visual Material

Traditional screenwriting requires you to work in words, projecting how a scene might eventually feel in the cinema. Pallaidium changes this paradigm entirely.

By placing generative models inside Blender's Video Sequence Editor (VSE), you work directly in the medium itself. Draft visual plates, test dialogue pacing, and adjust musical arrangements side-by-side. You are like a sculptor working directly with clay, discovering the story through the weight, texture, and unexpected details of the physical assets.

Direct sculpting in film clay
SENSORY FRAMEWORK
● SENSORY IMMEDIACY

Don't wait for post-production to hear a character's voice. Generate cloned dialogue paths, evaluate performance cadences, and let the voice direct the visual cuts.

● DIRECT FEEDBACK

Play a musical stem, generate a sequence frame, and observe how they interact on the timeline. Follow the sensory chemistry wherever it leads your narrative.

Direct Sensory Input - Output Matrix

Active Translation Matrix

Select a starting asset below. Anything can become anything via text prompts and descriptive layers inside Pallaidium.

01 // STARTING ELEMENT
Active Process Routing

TEXT_TO_SENSORY_CHANNELS

LATENT SYNC
INPUT: Screenplay Markup (Fountain)
Pallaidium Core Engine

Translating screenplay descriptions to multi-track layouts

OUTPUT: Track Media Generations
Guidance: Shared Temporal Clocks State: Asynchronous
Timeline Strips Generated
Video Plate [Channel 3]
Wan-2B / LTX-2 Synthesized from screen text
Spectrogram
Audio Score [Channel 1]
Foundation Music Stereo Stem in G-Minor
Waveform
Voice Track [Channel 2]
Chatterbox Vocal Synthesizer Dialogue Node
Sound effect
Foley Design [Channel 1]
ACE Step Ambient Foley Engine
Workflow Patterns

Flexible Creative Loops

These diagrams show just two examples of the infinite routes you can explore on the VSE timeline.

● EXAMPLE LOOP A // THE TRADITIONAL PROGRESSION

Drafting the Screenplay First

You begin with a text concept or screenplay strip, generating visuals, speech tracks, and soundtracks step-by-step to match the timing of your written screenplay.

1 Write screenplay strips using Blender Screenwriter.
2 Generate images with consistent style and characters using FLUX Klein 9b and LoRAs.
3 Do coverage of the scenes in various angles using Qwen.
4 Do speech tracks using Chatterbox Vocal Clones.
5 Animate Images, Prompts and Speech into video using LTX 2.3.
○ EXAMPLE LOOP B // THE SENSORY PROGRESSION

Sculpting from a Music Track First

Import or generate a musical score that evokes the correct emotion. Explore images that match its mood, imagine the characters, generate descriptions, and output a screenplay.

1 Import sound files or generate themes with Foundation or AceStep Music.
2 Generate images to discover which characters match the sound.
3 Extract dynamic movement descriptions and speech to text.
4 Convert generated descriptions into a screenplay via Subtitle Editor and Screenwriter.
Validated Local Model Layers

System Model Stack

LTX-2 & Multi-Input

Video

Fast latent video generation using a 3-stage temporal process. The multi-input variant supports custom VSE LoRAs and detail passes.

Weights: Hugging Face HuggingFace →

Wan T2V / I2V

Video

Generates fluid sequences with strong physics adherence and temporal rendering characteristics. High motion range accuracy.

Weights: Wan-AI HuggingFace →

SkyReels V1 (Hunyuan)

Video

Text-to-video and image-to-video models leveraging Hunyuan DiT. Features a compressed INT4 architecture option for reduced memory allocation.

Weights: Skywork HuggingFace →

MiniMax Cloud Engine

Video

Alternative API-steered generation system for rapid draft generation. Requires setting an active personal key.

Provider: MiniMax API minimaxi.com →

FLUX.2 Dev (4-Bit Quantized)

Image

Optimized text-to-image pipeline for detailed layouts, graphics, and high-fidelity prompt compliance. Fits within 6 GB limits.

Weights: Hugging Face (BnB) HuggingFace →

FLUX.2 Klein (4B & 9B)

Image

Lightweight parameter modifications. Intended for fast layout previews and efficient tile upscaling.

Weights: Black Forest Labs HuggingFace →

FLUX Kontext & Relighting

Image

Performs instruction-based changes on loaded image strips. Allows you to re-light or edit specific areas without manual masking.

Weights: Kontext Community HuggingFace →

OmniGen V1

Image

A unified multi-image model for image editing, generation, and multi-reference image composition.

Weights: Shitao HuggingFace →

Lumina Image 2.0

Image

Advanced diffusion transformer model designed to process high-density visual details and complex layout conditions.

Weights: Alpha-VLLM HuggingFace →

BiRefNet-HR

Image

High-resolution automated background extraction engine. Isolates subjects directly on VSE layers for fast compositing.

Weights: ZhengPeng7 HuggingFace →

Chatterbox & Turbo

Audio

Clones voice performance details directly from short local reference files. Handles dialogue generation and long formats without processing lag.

Source: Resemble AI GitHub →

MMAudio Sync Sound

Audio

Calculates synchronization coordinates directly from timeline video tracks to generate synchronized sound effects and foley tracks.

Weights: HK Cheng Rex HuggingFace →

Foundation Music 1

Audio

Generates stereo musical stems and arrangements directly from text directions, BPM cues, and scale settings.

Weights: tin2tin Diffusers HuggingFace →

ACE Step Audio

Audio

Flexible text-to-audio engine designed for sound design, ambient environments, and structural track sound effects.

Source: ACE-Step Team GitHub →

Florence-2 Captioning

Text

Analyzes frame layers on the timeline to generate accurate captions, object coordinates, and tracking labels.

Weights: Microsoft HuggingFace →

MoviiGen Prompt Engine

Text

Rewrites simple text inputs into rich, descriptive prompts structured for the temporal constraints of video models.

Weights: ZuluVision HuggingFace →

Marlin Video Captions

Text

Generates narrative descriptions of motion and visual sequences from video strips, aiding the screenwriter layout.

Weights: Lunar Labs HuggingFace →
Master-Satellite Topology 7 Synced Nodes

The Sculpting Armature

Observe the active data path. Pallaidium sits at the core of your VSE timeline, converting structured inputs from satellite modules into sensory motion, speech, and sound.

CORE CENTRAL NODE [PALLAIDIUM ENGINE]

Pallaidium Core Generative Engine

The master orchestration framework that coordinates localized diffusion passes directly in Blender's Video Sequence Editor (VSE). It ingests structural, directional, and temporal parameters from the satellite nodes and compiles them into sensory sequences using Wan-AI, LTX-2, FLUX, and Chatterbox. All media tracks are coordinated under a shared emotional state.

Status: Active Host
Orchestrates: Wan, FLUX, LTX, Chatterbox, ACE-Step, Foundation Music
Main Repository →
Screenwriter Preview
SATELLITE 01 / ARMATURE [WRITER]

Blender Screenwriter

Draft screenplays directly in Blender using Fountain markup. Automatically compiles dialogue and heading sections into timed sequence tracks, setting up a template for visual and audio development.

Data Pass: Fountain - Subtitle Tracks Repo →
GPT4Blender Preview
SATELLITE 02 / REASONER [GPT4ALL]

GPT4BLENDER

Brings local, offline LLMs via GPT4ALL directly into Blender. Generates and refines scene descriptors, narrative setups, and prompts locally to explore different physical textures in your sequence.

Data Pass: LLM Core - Editor Panels Repo →
Text to Strip Preview
SATELLITE 03 / COMPILER [COMPILER]

Text to Strip

Converts active script documents or structured prompt texts from Blender's text editor directly into sequenced subtitle strips, preparing inputs for batch layout down the timeline.

Data Pass: Text Docs - VSE Strip Nodes Repo →
Subtitle Editor Preview
SATELLITE 04 / SEQUENCE [SUBTITLE]

Subtitle Editor

Provides visual track navigation, edit synchronization, translation tracks, and formatting tools. Combines with Whisper models to transcribe voice plates and auto-generate text prompts.

Data Pass: Timeline Speech - Transcription Repo →
VSE Masking Preview
SATELLITE 05 / SEQUENCE [MASK]

VSE Masking Tools

Draw masking boundaries on timeline elements using Blender's Clip Editor. Converts selections into timeline strips to target localized inpainting and img2img passes.

Data Pass: Visual Selection - Alpha Masks Repo →
Add Rendered Strips Preview
SATELLITE 06 / SEQUENCE [RENDER]

Add Rendered Strips

Renders 3D layouts, grease pencil drafts, or viewport angles directly to movie strips on the timeline. Ensures these tracks are immediately compatible with Pallaidium's image-to-video workflow.

Data Pass: 3D Viewport - MP4 Strip Input Repo →

Get Started

One-time setup - Models download on first use

Video Walkthrough

System Requirements

  • Windows 10/11 (preferred platform)
  • Blender 5.2 or later
  • NVIDIA GPU with 6 GB+ VRAM
  • CUDA 12.4
  • 20 GB+ free disk space
  • ~ Limited support for Linux

Before You Begin

1. Install Git (must be on your system PATH).

2. Download Blender 5.2+ and unzip it into your Documents folder.

3. Download Pallaidium .ZIP.

Tip: shorten the Blender folder name - long paths can cause unzip failures on Windows.

  1. Run as Administrator: right-click blender.exe - "Run as Administrator". Required for write permissions on Windows.
  2. Install the add-on: Preferences - Add-ons - Install - select the downloaded Pallaidium ZIP - enable it.
  3. Install Dependencies: in Add-on Preferences click Install Dependencies and wait for it to finish.
  4. Open the studio: restart your computer, launch Blender as Admin, open the Video Sequence Editor - Sidebar (N) - Generative AI.
First run: the chosen model downloads automatically (5-10 GB). The screen may appear frozen during this. This is normal - do not close Blender.
View Repository on GitHub

If any modules are missing after install, use blender_pip to install them manually.

Diagnostics Tool Available

Verify your library bindings and local model allocations automatically.

Run Module Checker →
Decentralized Studio Collective

Build alongside other filmmakers and AI artists.

Share results, suggest physical layout models, propose plugin structures, and troubleshoot issues directly with developers on the open Discord server.

"When you have Pallaidium installed, reach out on Discord or leave a note on how it is working for you. It means the world to me to know someone is using it."

Tin2Tin - Creator of Pallaidium