PALLAIDIUM

The Sculptor's Method

Direct Exploration of the Audio-Visual Material

Traditional screenwriting requires you to work in words, projecting how a scene might eventually feel in the cinema. Pallaidium changes this paradigm entirely.

By placing generative models inside Blender's Video Sequence Editor (VSE), you work directly in the medium itself. Draft visual plates, test dialogue pacing, and adjust musical arrangements side-by-side. You are like a sculptor working directly with clay, discovering the story through the weight, texture, and unexpected details of the physical assets.

SENSORY FRAMEWORK

● SENSORY IMMEDIACY

Don't wait for post-production to hear a character's voice. Generate cloned dialogue paths, evaluate performance cadences, and let the voice direct the visual cuts.

● DIRECT FEEDBACK

Play a musical stem, generate a sequence frame, and observe how they interact on the timeline. Follow the sensory chemistry wherever it leads your narrative.

Direct Sensory Input - Output Matrix

Active Translation Matrix

Select a starting asset below. Anything can become anything via text prompts and descriptive layers inside Pallaidium.

01 // STARTING ELEMENT

Active Process Routing

TEXT_TO_SENSORY_CHANNELS

LATENT SYNC

INPUT: Screenplay Markup (Fountain)

Pallaidium Core Engine

Translating screenplay descriptions to multi-track layouts

OUTPUT: Track Media Generations

Guidance: Shared Temporal Clocks State: Asynchronous

Timeline Strips Generated

Video Plate [Channel 3]

Wan-2B / LTX-2 Synthesized from screen text

Audio Score [Channel 1]

Foundation Music Stereo Stem in G-Minor

Voice Track [Channel 2]

Chatterbox Vocal Synthesizer Dialogue Node

Foley Design [Channel 1]

ACE Step Ambient Foley Engine

Workflow Patterns

Flexible Creative Loops

These diagrams show just two examples of the infinite routes you can explore on the VSE timeline.

● EXAMPLE LOOP A // THE TRADITIONAL PROGRESSION

Drafting the Screenplay First

You begin with a text concept or screenplay strip, generating visuals, speech tracks, and soundtracks step-by-step to match the timing of your written screenplay.

1 Write screenplay strips using Blender Screenwriter.

2 Generate images with consistent style and characters using FLUX Klein 9b and LoRAs.

3 Do coverage of the scenes in various angles using Qwen.

4 Do speech tracks using Chatterbox Vocal Clones.

5 Animate Images, Prompts and Speech into video using LTX 2.3.

○ EXAMPLE LOOP B // THE SENSORY PROGRESSION

Sculpting from a Music Track First

Import or generate a musical score that evokes the correct emotion. Explore images that match its mood, imagine the characters, generate descriptions, and output a screenplay.

1 Import sound files or generate themes with Foundation or AceStep Music.

2 Generate images to discover which characters match the sound.

3 Extract dynamic movement descriptions and speech to text.

4 Convert generated descriptions into a screenplay via Subtitle Editor and Screenwriter.

Validated Local Model Layers

System Model Stack

LTX-2 & Multi-Input

Video

Fast latent video generation using a 3-stage temporal process. The multi-input variant supports custom VSE LoRAs and detail passes.

Weights: Hugging Face HuggingFace →

Wan T2V / I2V

Video

Generates fluid sequences with strong physics adherence and temporal rendering characteristics. High motion range accuracy.

Weights: Wan-AI HuggingFace →

SkyReels V1 (Hunyuan)

Video

Text-to-video and image-to-video models leveraging Hunyuan DiT. Features a compressed INT4 architecture option for reduced memory allocation.

Weights: Skywork HuggingFace →

MiniMax Cloud Engine

Video

Alternative API-steered generation system for rapid draft generation. Requires setting an active personal key.

Provider: MiniMax API minimaxi.com →

FLUX.2 Dev (4-Bit Quantized)

Image

Optimized text-to-image pipeline for detailed layouts, graphics, and high-fidelity prompt compliance. Fits within 6 GB limits.

Weights: Hugging Face (BnB) HuggingFace →

FLUX.2 Klein (4B & 9B)

Image

Lightweight parameter modifications. Intended for fast layout previews and efficient tile upscaling.

Weights: Black Forest Labs HuggingFace →

FLUX Kontext & Relighting

Image

Performs instruction-based changes on loaded image strips. Allows you to re-light or edit specific areas without manual masking.

Weights: Kontext Community HuggingFace →

OmniGen V1

Image

A unified multi-image model for image editing, generation, and multi-reference image composition.

Weights: Shitao HuggingFace →

Lumina Image 2.0

Image

Advanced diffusion transformer model designed to process high-density visual details and complex layout conditions.

Weights: Alpha-VLLM HuggingFace →

BiRefNet-HR

Image

High-resolution automated background extraction engine. Isolates subjects directly on VSE layers for fast compositing.

Weights: ZhengPeng7 HuggingFace →

Chatterbox & Turbo

Audio

Clones voice performance details directly from short local reference files. Handles dialogue generation and long formats without processing lag.

Source: Resemble AI GitHub →

MMAudio Sync Sound

Audio

Calculates synchronization coordinates directly from timeline video tracks to generate synchronized sound effects and foley tracks.

Weights: HK Cheng Rex HuggingFace →

Foundation Music 1

Audio

Generates stereo musical stems and arrangements directly from text directions, BPM cues, and scale settings.

Weights: tin2tin Diffusers HuggingFace →

ACE Step Audio

Audio

Flexible text-to-audio engine designed for sound design, ambient environments, and structural track sound effects.

Source: ACE-Step Team GitHub →

Florence-2 Captioning

Text

Analyzes frame layers on the timeline to generate accurate captions, object coordinates, and tracking labels.

Weights: Microsoft HuggingFace →

MoviiGen Prompt Engine

Text

Rewrites simple text inputs into rich, descriptive prompts structured for the temporal constraints of video models.

Weights: ZuluVision HuggingFace →

Marlin Video Captions

Text

Generates narrative descriptions of motion and visual sequences from video strips, aiding the screenwriter layout.

Weights: Lunar Labs HuggingFace →

Master-Satellite Topology 7 Synced Nodes

The Sculpting Armature

Observe the active data path. Pallaidium sits at the core of your VSE timeline, converting structured inputs from satellite modules into sensory motion, speech, and sound.

CORE CENTRAL NODE [PALLAIDIUM ENGINE]

Pallaidium Core Generative Engine

The master orchestration framework that coordinates localized diffusion passes directly in Blender's Video Sequence Editor (VSE). It ingests structural, directional, and temporal parameters from the satellite nodes and compiles them into sensory sequences using Wan-AI, LTX-2, FLUX, and Chatterbox. All media tracks are coordinated under a shared emotional state.

Status: Active Host

Orchestrates: Wan, FLUX, LTX, Chatterbox, ACE-Step, Foundation Music

Main Repository →

SATELLITE 01 / ARMATURE [WRITER]

Blender Screenwriter

Draft screenplays directly in Blender using Fountain markup. Automatically compiles dialogue and heading sections into timed sequence tracks, setting up a template for visual and audio development.

Data Pass: Fountain - Subtitle Tracks Repo →

SATELLITE 02 / REASONER [GPT4ALL]

GPT4BLENDER

Brings local, offline LLMs via GPT4ALL directly into Blender. Generates and refines scene descriptors, narrative setups, and prompts locally to explore different physical textures in your sequence.

Data Pass: LLM Core - Editor Panels Repo →

SATELLITE 03 / COMPILER [COMPILER]

Text to Strip

Converts active script documents or structured prompt texts from Blender's text editor directly into sequenced subtitle strips, preparing inputs for batch layout down the timeline.

Data Pass: Text Docs - VSE Strip Nodes Repo →

SATELLITE 04 / SEQUENCE [SUBTITLE]

Subtitle Editor

Provides visual track navigation, edit synchronization, translation tracks, and formatting tools. Combines with Whisper models to transcribe voice plates and auto-generate text prompts.

Data Pass: Timeline Speech - Transcription Repo →

SATELLITE 05 / SEQUENCE [MASK]

VSE Masking Tools

Draw masking boundaries on timeline elements using Blender's Clip Editor. Converts selections into timeline strips to target localized inpainting and img2img passes.

Data Pass: Visual Selection - Alpha Masks Repo →

SATELLITE 06 / SEQUENCE [RENDER]

Add Rendered Strips

Renders 3D layouts, grease pencil drafts, or viewport angles directly to movie strips on the timeline. Ensures these tracks are immediately compatible with Pallaidium's image-to-video workflow.

Data Pass: 3D Viewport - MP4 Strip Input Repo →

Get Started

One-time setup - Models download on first use

Video Walkthrough

System Requirements

✓ Windows 10/11 (preferred platform)
✓ Blender 5.2 or later
✓ NVIDIA GPU with 6 GB+ VRAM
✓ CUDA 12.4
✓ 20 GB+ free disk space
~ Limited support for Linux

Before You Begin

1. Install Git (must be on your system PATH).

2. Download Blender 5.2+ and unzip it into your Documents folder.

3. Download Pallaidium .ZIP.

Tip: shorten the Blender folder name - long paths can cause unzip failures on Windows.

Run as Administrator: right-click blender.exe - "Run as Administrator". Required for write permissions on Windows.
Install the add-on: Preferences - Add-ons - Install - select the downloaded Pallaidium ZIP - enable it.
Install Dependencies: in Add-on Preferences click Install Dependencies and wait for it to finish.
Open the studio: restart your computer, launch Blender as Admin, open the Video Sequence Editor - Sidebar (N) - Generative AI.

First run: the chosen model downloads automatically (5-10 GB). The screen may appear frozen during this. This is normal - do not close Blender.

View Repository on GitHub

If any modules are missing after install, use blender_pip to install them manually.

Sculpt narratives in time.

Direct Exploration of the Audio-Visual Material

Active Translation Matrix

TEXT_TO_SENSORY_CHANNELS

Flexible Creative Loops

Drafting the Screenplay First

Sculpting from a Music Track First

System Model Stack

LTX-2 & Multi-Input

Wan T2V / I2V

SkyReels V1 (Hunyuan)

MiniMax Cloud Engine

FLUX.2 Dev (4-Bit Quantized)

FLUX.2 Klein (4B & 9B)

FLUX Kontext & Relighting

OmniGen V1

Lumina Image 2.0

BiRefNet-HR

Chatterbox & Turbo

MMAudio Sync Sound

Foundation Music 1

ACE Step Audio

Florence-2 Captioning

MoviiGen Prompt Engine

Marlin Video Captions

The Sculpting Armature

Pallaidium Core Generative Engine

Blender Screenwriter

GPT4BLENDER

Text to Strip

Subtitle Editor

VSE Masking Tools

Add Rendered Strips

Get Started

System Requirements

Before You Begin

Build alongside other filmmakers and AI artists.