OmniFaceRig: Fully Automatic Inner-Mouth-Aware Face Rigging

Abstract

Facial rigging—creating FACS-based blendshapes together with inner-mouth geometry (teeth, gums, and tongue)—remains a major bottleneck in 3D character production. Existing pipelines still require substantial designer effort, especially for manual landmark annotation, per-character template adjustment, and inner-mouth placement.

We present OmniFaceRig, a fully automatic end-to-end pipeline that converts a static surface-only 3D character mesh, with no pre-modeled oral cavity, into an inner-mouth-aware FACS rig with up to 155 blendshapes, procedurally fitted teeth, gums, and tongue, and re-packed UV/texture. OmniFaceRig supports diverse topologies—humans, humanoids, long-muzzled animals (e.g., dogs, wolves, foxes), and short-muzzled animals (e.g., cats, bears, rabbits, tigers)—with no manual landmarks, no user-provided templates, and no per-asset setup.

The pipeline combines hybrid VLM+CV riggability checking, multi-model face parsing, dense keypoint-driven template registration, procedural inner-mouth construction, and collision-aware blendshape transfer. For non-human characters, OmniFaceRig selects topology-specific face and inner-mouth templates and uses collision-aware inner-mouth fitting to reduce teeth-face intersections without exposing users to category-specific tuning.

We also publicly release Omni-Bench, a freely available benchmark dataset of 1,000 biped 3D characters with FACS facial blendshapes and inner-mouth geometry, spanning humans, humanoids, cats, dogs, and other animals. Experiments show high final rigging success on screened Omni-Bench inputs, nearly complete face detection recall from the segmentation ensemble, reliable inner-mouth placement with low penetration, and 20–30 s end-to-end processing time per asset on a single A100 GPU, including data I/O. Together, OmniFaceRig provides an automatic path from static generated characters to animation-ready facial rigs across both human and non-human topologies.

155
FACS blendshapes per rig, including procedurally generated teeth, gums & tongue
1,000
rigged biped characters released in the open Omni-Bench benchmark
20–30s
end-to-end per asset on a single A100 GPU, including data I/O
4
topology categories: human, humanoid, long- & short-muzzled animals

Showcase

Explore rigged characters in 3D

Each card below shows three columns: the input body image, the original surface-only GLB, and the rigged GLB with FACS animation. Click the ▶ Load button on a card to activate both 3D viewers at once. Inside each viewer, drag the splitter to toggle texture (left half) vs. solid grey (right half); camera and split position are shared across both viewers (left-drag rotate, scroll zoom, shift+drag pan). After 2.5 s of idle time the model slowly auto-rotates.

Human

12 assets

Human

Foreboard Image

Original Asset

Rigged + FACS

Human

Foreboard Image

Original Asset

Rigged + FACS

Human

Foreboard Image

Original Asset

Rigged + FACS

Human

Foreboard Image

Original Asset

Rigged + FACS

Human

Foreboard Image

Original Asset

Rigged + FACS

Human

Foreboard Image

Original Asset

Rigged + FACS

Human

Foreboard Image

Original Asset

Rigged + FACS

Human

Foreboard Image

Original Asset

Rigged + FACS

Human

Foreboard Image

Original Asset

Rigged + FACS

Human

Foreboard Image

Original Asset

Rigged + FACS

Human

Foreboard Image

Original Asset

Rigged + FACS

Human

Foreboard Image

Original Asset

Rigged + FACS

Humanoid

12 assets

Humanoid

Foreboard Image

Original Asset

Rigged + FACS

Humanoid

Foreboard Image

Original Asset

Rigged + FACS

Humanoid

Foreboard Image

Original Asset

Rigged + FACS

Humanoid

Foreboard Image

Original Asset

Rigged + FACS

Humanoid

Foreboard Image

Original Asset

Rigged + FACS

Humanoid

Foreboard Image

Original Asset

Rigged + FACS

Humanoid

Foreboard Image

Original Asset

Rigged + FACS

Humanoid

Foreboard Image

Original Asset

Rigged + FACS

Humanoid

Foreboard Image

Original Asset

Rigged + FACS

Humanoid

Foreboard Image

Original Asset

Rigged + FACS

Humanoid

Foreboard Image

Original Asset

Rigged + FACS

Humanoid

Foreboard Image

Original Asset

Rigged + FACS

Long-muzzled

13 assets

Long-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Long-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Long-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Long-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Long-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Long-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Long-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Long-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Long-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Long-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Long-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Long-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Long-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Short-muzzled

13 assets

Short-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Short-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Short-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Short-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Short-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Short-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Short-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Short-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Short-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Short-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Short-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Short-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Short-muzzled

Foreboard Image

Original Asset

Rigged + FACS

Results

Qualitative results across diverse topologies

Main qualitative result: 7 characters across 6 FACS expressions

Main qualitative result — each row is one input asset; the leftmost column shows the input static surface-only mesh, the remaining columns are generated FACS expressions.

Additional qualitative result #1.

Additional qualitative result #2.

Additional qualitative result #3.

Additional qualitative result #4.

Additional qualitative result #5.

Additional qualitative result #6.

Failure cases: bird, fish-like, insect, seahorse-like

Failure cases — characters whose facial topology falls outside our current template families (bird with a sharp beak, fish-like flat snout, insect with oversized compound eyes, seahorse-like elongated snout).

1 / 8

99.0%

Final rigging success (human + humanoid)

0.85 mm

Mean vertex error (MAE)

0.05%

Inner-mouth penetration (3.4× better)

~100%

Face detection recall (ensemble)

87.3%

Segmentation mIoU (vs 62.1%)

Omni-Bench Dataset

A new benchmark for face auto-rigging

Omni-Bench overview — 1,000 biped 3D characters with FACS facial blendshapes and inner-mouth geometry, spanning realistic humans, humanoid characters, cats, dogs, and a long tail of other common animals.

To facilitate future research in automatic facial rigging, we release Omni-Bench, to our knowledge the first open-source benchmark of biped 3D characters with FACS facial blendshapes and inner-mouth geometry (teeth, gums, and tongue). Omni-Bench contains 1,000 rigged biped 3D characters, split into 500 human and humanoid characters and 500 animals (150 cats across 10 breeds, 150 dogs across 10 breeds, and 200 other common animals including bears, tigers, lions, foxes, wolves, rabbits, and deer). All assets are released in T-pose with up to 12 appearance variations and full generation-pipeline metadata (text prompt, intermediate 2D reference image, final 3D mesh).

Omni-Bench stands out in three critical ways. First, it uniquely supports both human and animal characters within a unified rigging framework. Second, every model is equipped with a complete set of auto-generated FACS blendshapes (up to 155 shapes) that explicitly include teeth, gums, and tongue — a feature absent in nearly all existing large-scale datasets. Third, each asset includes the complete generation pipeline data (text + reference image + 3D mesh), making it a valuable resource for text-to-3D generation and multimodal research.

Comparison with existing 3D facial / character datasets — Omni-Bench is the only dataset providing FACS blendshapes with inner-mouth geometry for both humans and animals.

Dataset	Year	Total Models	Species	FACS Blendshapes	Inner Mouth	Full Character	Text + 2D Image
BFM	2009	200	Human	✗	✗	✗	✗
CoMA	2018	144	Human	✗	✗	✗	✗
VOCASET	2019	12 (4D)	Human	✗	✗	✗	✗
FaceScape	2020	16,940	Human	✗	✗	✗	✗
ICT FaceKit	2020	Parametric	Human	✓	✓	✗	✗
Multiface	2022	13	Human	✗	✗	✗	✗
RaBit	2023	1,500	Human & Cartoon	✗	✗	✓	✗
Anymate	2025	230,000	Human & Animal	✗	✗	✓	✗
Omni-Bench (Ours)	2026	1,000	Human & Animal	✓	✓	✓	✓

OmniFaceRig Fully Automatic Inner-Mouth-Aware Face Rigging
Across Diverse 3D Character Topologies

Explore rigged characters in 3D

Human

Humanoid

Long-muzzled

Short-muzzled

Qualitative results across diverse topologies

Two-stage architecture

Stage 1 · Face Template Fitting

Stage 2 · Blendshape Rig Construction

A new benchmark for face auto-rigging