Init Release
3
.gitmodules
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
[submodule "o-voxel/third_party/eigen"]
|
||||
path = o-voxel/third_party/eigen
|
||||
url = https://gitlab.com/libeigen/eigen.git
|
||||
21
LICENSE
Normal file
@@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) Microsoft Corporation.
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE
|
||||
222
README.md
Normal file
@@ -0,0 +1,222 @@
|
||||

|
||||
|
||||
# Native and Compact Structured Latents for 3D Generation
|
||||
|
||||
<a href="https://microsoft.github.io/trellis.2"><img src="https://img.shields.io/badge/Paper-Arxiv-b31b1b.svg" alt="Paper"></a>
|
||||
<a href="https://huggingface.co/microsoft/TRELLIS.2-4B"><img src="https://img.shields.io/badge/Hugging%20Face-Model-yellow" alt="Hugging Face"></a>
|
||||
<a href="https://microsoft.github.io/trellis.2"><img src="https://img.shields.io/badge/Project-Website-blue" alt="Project Page"></a>
|
||||
<a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-green" alt="License"></a>
|
||||
|
||||
https://github.com/user-attachments/assets/5ee056e4-73a9-4fd8-bf60-59cae90d3dfc
|
||||
|
||||
*(Compressed version due to GitHub size limits. See the full-quality video on our project page!)*
|
||||
|
||||
**TRELLIS.2** is a state-of-the-art large 3D generative model (4B parameters) designed for high-fidelity **image-to-3D** generation. It leverages a novel "field-free" sparse voxel structure termed **O-Voxel** to reconstruct and generate arbitrary 3D assets with complex topologies, sharp features, and full PBR materials.
|
||||
|
||||
|
||||
## ✨ Features
|
||||
|
||||
### 1. High Quality, Resolution & Efficiency
|
||||
Our 4B-parameter model generates high-resolution fully textured assets with exceptional fidelity and efficiency using vanilla DiTs. It utilizes a Sparse 3D VAE with 16× spatial downsampling to encode assets into a compact latent space.
|
||||
|
||||
| Resolution | Total Time* | Breakdown (Shape + Mat) |
|
||||
| :--- | :--- | :--- |
|
||||
| **512³** | **~3s** | 2s + 1s |
|
||||
| **1024³** | **~17s** | 10s + 7s |
|
||||
| **1536³** | **~60s** | 35s + 25s |
|
||||
|
||||
<small>*Tested on NVIDIA H100 GPU.</small>
|
||||
|
||||
### 2. Arbitrary Topology Handling
|
||||
The **O-Voxel** representation breaks the limits of iso-surface fields. It robustly handles complex structures without lossy conversion:
|
||||
* ✅ **Open Surfaces** (e.g., clothing, leaves)
|
||||
* ✅ **Non-manifold Geometry**
|
||||
* ✅ **Internal Enclosed Structures**
|
||||
|
||||
### 3. Rich Texture Modeling
|
||||
Beyond basic colors, TRELLIS.2 models arbitrary surface attributes including **Base Color, Roughness, Metallic, and Opacity**, enabling photorealistic rendering and transparency support.
|
||||
|
||||
### 4. Minimalist Processing
|
||||
Data processing is streamlined for instant conversions that are fully **rendering-free** and **optimization-free**.
|
||||
* **< 10s** (Single CPU): Textured Mesh → O-Voxel
|
||||
* **< 100ms** (CUDA): O-Voxel → Textured Mesh
|
||||
|
||||
|
||||
## 🗺️ Roadmap
|
||||
|
||||
- [x] Paper release
|
||||
- [x] Release image-to-3D inference code
|
||||
- [x] Release pretrained checkpoints (4B)
|
||||
- [x] Hugging Face Spaces demo
|
||||
- [ ] Release shape-conditioned texture generation inference code (Current schdule: before 12/24/2025)
|
||||
- [ ] Release training code (Current schdule: before 12/31/2025)
|
||||
|
||||
|
||||
## 🛠️ Installation
|
||||
|
||||
### Prerequisites
|
||||
- **System**: The code is currently tested only on **Linux**.
|
||||
- **Hardware**: An NVIDIA GPU with at least 24GB of memory is necessary. The code has been verified on NVIDIA A100 and H100 GPUs.
|
||||
- **Software**:
|
||||
- The [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit-archive) is needed to compile certain packages. Recommended version is 12.4.
|
||||
- [Conda](https://docs.anaconda.com/miniconda/install/#quick-command-line-install) is recommended for managing dependencies.
|
||||
- Python version 3.8 or higher is required.
|
||||
|
||||
### Installation Steps
|
||||
1. Clone the repo:
|
||||
```sh
|
||||
git clone -b main https://github.com/microsoft/TRELLIS.2.git --recursive
|
||||
cd TRELLIS.2
|
||||
```
|
||||
|
||||
2. Install the dependencies:
|
||||
|
||||
**Before running the following command there are somethings to note:**
|
||||
- By adding `--new-env`, a new conda environment named `trellis2` will be created. If you want to use an existing conda environment, please remove this flag.
|
||||
- By default the `trellis2` environment will use pytorch 2.6.0 with CUDA 12.4. If you want to use a different version of CUDA, you can remove the `--new-env` flag and manually install the required dependencies. Refer to [PyTorch](https://pytorch.org/get-started/previous-versions/) for the installation command.
|
||||
- If you have multiple CUDA Toolkit versions installed, `CUDA_HOME` should be set to the correct version before running the command. For example, if you have CUDA Toolkit 12.4 and 13.0 installed, you can run `export CUDA_HOME=/usr/local/cuda-12.4` before running the command.
|
||||
- By default, the code uses the `flash-attn` backend for attention. For GPUs do not support `flash-attn` (e.g., NVIDIA V100), you can install `xformers` manually and set the `ATTN_BACKEND` environment variable to `xformers` before running the code. See the [Minimal Example](#minimal-example) for more details.
|
||||
- The installation may take a while due to the large number of dependencies. Please be patient. If you encounter any issues, you can try to install the dependencies one by one, specifying one flag at a time.
|
||||
- If you encounter any issues during the installation, feel free to open an issue or contact us.
|
||||
|
||||
Create a new conda environment named `trellis2` and install the dependencies:
|
||||
```sh
|
||||
. ./setup.sh --new-env --basic --flash-attn --nvdiffrast --nvdiffrec --cumesh --o-voxel --flexgemm
|
||||
```
|
||||
The detailed usage of `setup.sh` can be found by running `. ./setup.sh --help`.
|
||||
```sh
|
||||
Usage: setup.sh [OPTIONS]
|
||||
Options:
|
||||
-h, --help Display this help message
|
||||
--new-env Create a new conda environment
|
||||
--basic Install basic dependencies
|
||||
--flash-attn Install flash-attention
|
||||
--cumesh Install cumesh
|
||||
--o-voxel Install o-voxel
|
||||
--flexgemm Install flexgemm
|
||||
--nvdiffrast Install nvdiffrast
|
||||
--nvdiffrec Install nvdiffrec
|
||||
```
|
||||
|
||||
## 📦 Pretrained Weights
|
||||
|
||||
The pretrained model **TRELLIS.2-4B** is available on Hugging Face. Please refer to the model card there for more details.
|
||||
|
||||
| Model | Parameters | Resolution | Link |
|
||||
| :--- | :--- | :--- | :--- |
|
||||
| **TRELLIS.2-4B** | 4 Billion | 512³ - 1536³ | [Hugging Face](https://huggingface.co/microsoft/TRELLIS.2-4B) |
|
||||
|
||||
|
||||
## 🚀 Usage
|
||||
|
||||
### 1. Image to 3D Generation
|
||||
|
||||
#### Minimal Example
|
||||
|
||||
Here is an [example](example.py) of how to use the pretrained models for 3D asset generation.
|
||||
|
||||
```python
|
||||
import os
|
||||
os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
|
||||
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" # Can save GPU memory
|
||||
import cv2
|
||||
import imageio
|
||||
from PIL import Image
|
||||
import torch
|
||||
from trellis2.pipelines import Trellis2ImageTo3DPipeline
|
||||
from trellis2.utils import render_utils
|
||||
from trellis2.renderers import EnvMap
|
||||
import o_voxel
|
||||
|
||||
# 1. Setup Environment Map
|
||||
envmap = EnvMap(torch.tensor(
|
||||
cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),
|
||||
dtype=torch.float32, device='cuda'
|
||||
))
|
||||
|
||||
# 2. Load Pipeline
|
||||
pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B")
|
||||
pipeline.cuda()
|
||||
|
||||
# 3. Load Image & Run
|
||||
image = Image.open("assets/example_image/T.png")
|
||||
mesh = pipeline.run(image)[0]
|
||||
mesh.simplify(16777216) # nvdiffrast limit
|
||||
|
||||
# 4. Render Video
|
||||
video = render_utils.make_pbr_vis_frames(render_utils.render_video(mesh, envmap=envmap))
|
||||
imageio.mimsave("sample.mp4", video, fps=15)
|
||||
|
||||
# 5. Export to GLB
|
||||
glb = o_voxel.postprocess.to_glb(
|
||||
vertices = mesh.vertices,
|
||||
faces = mesh.faces,
|
||||
attr_volume = mesh.attrs,
|
||||
coords = mesh.coords,
|
||||
attr_layout = mesh.layout,
|
||||
voxel_size = mesh.voxel_size,
|
||||
aabb = [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
|
||||
decimation_target = 1000000,
|
||||
texture_size = 4096,
|
||||
remesh = True,
|
||||
remesh_band = 1,
|
||||
remesh_project = 0,
|
||||
verbose = True
|
||||
)
|
||||
glb.export("sample.glb", extension_webp=True)
|
||||
```
|
||||
|
||||
Upon execution, the script generates the following files:
|
||||
- `sample.mp4`: A video visualizing the generated 3D asset with PBR materials and environmental lighting.
|
||||
- `sample.glb`: The extracted PBR-ready 3D asset in GLB format.
|
||||
|
||||
**Note:** The `.glb` file is exported in `OPAQUE` mode by default. Although the alpha channel is preserved within the texture map, it is not active initially. To enable transparency, import the asset into your 3D software and manually connect the texture's alpha channel to the material's opacity or alpha input.
|
||||
|
||||
#### Web Demo
|
||||
|
||||
[app.py](app.py) provides a simple web demo for image to 3D asset generation. you can run the demo with the following command:
|
||||
```sh
|
||||
python app.py
|
||||
```
|
||||
|
||||
Then, you can access the demo at the address shown in the terminal.
|
||||
|
||||
### 2. PBR Texture Generation
|
||||
|
||||
Will be released soon. Please stay tuned!
|
||||
|
||||
## 🧩 Related Packages
|
||||
|
||||
TRELLIS.2 is built upon several specialized high-performance packages developed by our team:
|
||||
|
||||
* **[O-Voxel](o-voxel):**
|
||||
Core library handling the logic for converting between textured meshes and the O-Voxel representation, ensuring instant bidirectional transformation.
|
||||
* **[FlexGEMM](https://github.com/JeffreyXiang/FlexGEMM):**
|
||||
Efficient sparse convolution implementation based on Triton, enabling rapid processing of sparse voxel structures.
|
||||
* **[CuMesh](https://github.com/JeffreyXiang/CuMesh):**
|
||||
CUDA-accelerated mesh utilities used for high-speed post-processing, remeshing, decimation, and UV-unwrapping.
|
||||
|
||||
|
||||
## ⚖️ License
|
||||
|
||||
This model and code are released under the **[MIT License](LICENSE)**.
|
||||
|
||||
Please note that certain dependencies operate under separate license terms:
|
||||
|
||||
- [**nvdiffrast**](https://github.com/NVlabs/nvdiffrast): Utilized for rendering generated 3D assets. This package is governed by its own [License](https://github.com/NVlabs/nvdiffrast/blob/main/LICENSE.txt).
|
||||
|
||||
- [**nvdiffrec**](https://github.com/NVlabs/nvdiffrec): Implements the split-sum renderer for PBR materials. This package is governed by its own [License](https://github.com/NVlabs/nvdiffrec/blob/main/LICENSE.txt).
|
||||
|
||||
## 📚 Citation
|
||||
|
||||
If you find this model useful for your research, please cite our work:
|
||||
|
||||
```bibtex
|
||||
@article{
|
||||
xiang2025trellis2,
|
||||
title={Native and Compact Structured Latents for 3D Generation},
|
||||
author={Xiang, Jianfeng and Chen, Xiaoxue and Xu, Sicheng and Wang, Ruicheng and Lv, Zelong and Deng, Yu and Zhu, Hongyuan and Dong, Yue and Zhao, Hao and Yuan, Nicholas Jing and Yang, Jiaolong},
|
||||
journal={Tech report},
|
||||
year={2025}
|
||||
}
|
||||
```
|
||||
14
SECURITY.md
Normal file
@@ -0,0 +1,14 @@
|
||||
<!-- BEGIN MICROSOFT SECURITY.MD V1.0.0 BLOCK -->
|
||||
|
||||
## Security
|
||||
|
||||
Microsoft takes the security of our software products and services seriously, which
|
||||
includes all source code repositories in our GitHub organizations.
|
||||
|
||||
**Please do not report security vulnerabilities through public GitHub issues.**
|
||||
|
||||
For security reporting information, locations, contact information, and policies,
|
||||
please review the latest guidance for Microsoft repositories at
|
||||
[https://aka.ms/SECURITY.md](https://aka.ms/SECURITY.md).
|
||||
|
||||
<!-- END MICROSOFT SECURITY.MD BLOCK -->
|
||||
645
app.py
Normal file
@@ -0,0 +1,645 @@
|
||||
import gradio as gr
|
||||
|
||||
import os
|
||||
os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
|
||||
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
|
||||
from datetime import datetime
|
||||
import shutil
|
||||
import cv2
|
||||
from typing import *
|
||||
import torch
|
||||
import numpy as np
|
||||
from PIL import Image
|
||||
import base64
|
||||
import io
|
||||
from trellis2.modules.sparse import SparseTensor
|
||||
from trellis2.pipelines import Trellis2ImageTo3DPipeline
|
||||
from trellis2.renderers import EnvMap
|
||||
from trellis2.utils import render_utils
|
||||
import o_voxel
|
||||
|
||||
|
||||
MAX_SEED = np.iinfo(np.int32).max
|
||||
TMP_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'tmp')
|
||||
MODES = [
|
||||
{"name": "Normal", "icon": "assets/app/normal.png", "render_key": "normal"},
|
||||
{"name": "Clay render", "icon": "assets/app/clay.png", "render_key": "clay"},
|
||||
{"name": "Base color", "icon": "assets/app/basecolor.png", "render_key": "base_color"},
|
||||
{"name": "HDRI forest", "icon": "assets/app/hdri_forest.png", "render_key": "shaded_forest"},
|
||||
{"name": "HDRI sunset", "icon": "assets/app/hdri_sunset.png", "render_key": "shaded_sunset"},
|
||||
{"name": "HDRI courtyard", "icon": "assets/app/hdri_courtyard.png", "render_key": "shaded_courtyard"},
|
||||
]
|
||||
STEPS = 8
|
||||
DEFAULT_MODE = 3
|
||||
DEFAULT_STEP = 3
|
||||
|
||||
|
||||
css = """
|
||||
/* Overwrite Gradio Default Style */
|
||||
.stepper-wrapper {
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
.stepper-container {
|
||||
padding: 0;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
.step-button {
|
||||
flex-direction: row;
|
||||
}
|
||||
|
||||
.step-connector {
|
||||
transform: none;
|
||||
}
|
||||
|
||||
.step-number {
|
||||
width: 16px;
|
||||
height: 16px;
|
||||
}
|
||||
|
||||
.step-label {
|
||||
position: relative;
|
||||
bottom: 0;
|
||||
}
|
||||
|
||||
.wrap.center.full {
|
||||
inset: 0;
|
||||
height: 100%;
|
||||
}
|
||||
|
||||
.wrap.center.full.translucent {
|
||||
background: var(--block-background-fill);
|
||||
}
|
||||
|
||||
.meta-text-center {
|
||||
display: block !important;
|
||||
position: absolute !important;
|
||||
top: unset !important;
|
||||
bottom: 0 !important;
|
||||
right: 0 !important;
|
||||
transform: unset !important;
|
||||
}
|
||||
|
||||
/* Previewer */
|
||||
.previewer-container {
|
||||
position: relative;
|
||||
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif;
|
||||
width: 100%;
|
||||
height: 722px;
|
||||
margin: 0 auto;
|
||||
padding: 20px;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
}
|
||||
|
||||
.previewer-container .tips-icon {
|
||||
position: absolute;
|
||||
right: 10px;
|
||||
top: 10px;
|
||||
z-index: 10;
|
||||
border-radius: 10px;
|
||||
color: #fff;
|
||||
background-color: var(--color-accent);
|
||||
padding: 3px 6px;
|
||||
user-select: none;
|
||||
}
|
||||
|
||||
.previewer-container .tips-text {
|
||||
position: absolute;
|
||||
right: 10px;
|
||||
top: 50px;
|
||||
color: #fff;
|
||||
background-color: var(--color-accent);
|
||||
border-radius: 10px;
|
||||
padding: 6px;
|
||||
text-align: left;
|
||||
max-width: 300px;
|
||||
z-index: 10;
|
||||
transition: all 0.3s;
|
||||
opacity: 0%;
|
||||
user-select: none;
|
||||
}
|
||||
|
||||
.previewer-container .tips-text p {
|
||||
font-size: 14px;
|
||||
line-height: 1.2;
|
||||
}
|
||||
|
||||
.tips-icon:hover + .tips-text {
|
||||
display: block;
|
||||
opacity: 100%;
|
||||
}
|
||||
|
||||
/* Row 1: Display Modes */
|
||||
.previewer-container .mode-row {
|
||||
width: 100%;
|
||||
display: flex;
|
||||
gap: 8px;
|
||||
justify-content: center;
|
||||
margin-bottom: 20px;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
.previewer-container .mode-btn {
|
||||
width: 24px;
|
||||
height: 24px;
|
||||
border-radius: 50%;
|
||||
cursor: pointer;
|
||||
opacity: 0.5;
|
||||
transition: all 0.2s;
|
||||
border: 2px solid #ddd;
|
||||
object-fit: cover;
|
||||
}
|
||||
.previewer-container .mode-btn:hover { opacity: 0.9; transform: scale(1.1); }
|
||||
.previewer-container .mode-btn.active {
|
||||
opacity: 1;
|
||||
border-color: var(--color-accent);
|
||||
transform: scale(1.1);
|
||||
}
|
||||
|
||||
/* Row 2: Display Image */
|
||||
.previewer-container .display-row {
|
||||
margin-bottom: 20px;
|
||||
min-height: 400px;
|
||||
width: 100%;
|
||||
flex-grow: 1;
|
||||
display: flex;
|
||||
justify-content: center;
|
||||
align-items: center;
|
||||
}
|
||||
.previewer-container .previewer-main-image {
|
||||
max-width: 100%;
|
||||
max-height: 100%;
|
||||
flex-grow: 1;
|
||||
object-fit: contain;
|
||||
display: none;
|
||||
}
|
||||
.previewer-container .previewer-main-image.visible {
|
||||
display: block;
|
||||
}
|
||||
|
||||
/* Row 3: Custom HTML Slider */
|
||||
.previewer-container .slider-row {
|
||||
width: 100%;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
align-items: center;
|
||||
gap: 10px;
|
||||
padding: 0 10px;
|
||||
}
|
||||
|
||||
.previewer-container input[type=range] {
|
||||
-webkit-appearance: none;
|
||||
width: 100%;
|
||||
max-width: 400px;
|
||||
background: transparent;
|
||||
}
|
||||
.previewer-container input[type=range]::-webkit-slider-runnable-track {
|
||||
width: 100%;
|
||||
height: 8px;
|
||||
cursor: pointer;
|
||||
background: #ddd;
|
||||
border-radius: 5px;
|
||||
}
|
||||
.previewer-container input[type=range]::-webkit-slider-thumb {
|
||||
height: 20px;
|
||||
width: 20px;
|
||||
border-radius: 50%;
|
||||
background: var(--color-accent);
|
||||
cursor: pointer;
|
||||
-webkit-appearance: none;
|
||||
margin-top: -6px;
|
||||
box-shadow: 0 2px 5px rgba(0,0,0,0.2);
|
||||
transition: transform 0.1s;
|
||||
}
|
||||
.previewer-container input[type=range]::-webkit-slider-thumb:hover {
|
||||
transform: scale(1.2);
|
||||
}
|
||||
|
||||
/* Overwrite Previewer Block Style */
|
||||
.gradio-container .padded:has(.previewer-container) {
|
||||
padding: 0 !important;
|
||||
}
|
||||
|
||||
.gradio-container:has(.previewer-container) [data-testid="block-label"] {
|
||||
position: absolute;
|
||||
top: 0;
|
||||
left: 0;
|
||||
}
|
||||
"""
|
||||
|
||||
|
||||
head = """
|
||||
<script>
|
||||
function refreshView(mode, step) {
|
||||
// 1. Find current mode and step
|
||||
const allImgs = document.querySelectorAll('.previewer-main-image');
|
||||
for (let i = 0; i < allImgs.length; i++) {
|
||||
const img = allImgs[i];
|
||||
if (img.classList.contains('visible')) {
|
||||
const id = img.id;
|
||||
const [_, m, s] = id.split('-');
|
||||
if (mode === -1) mode = parseInt(m.slice(1));
|
||||
if (step === -1) step = parseInt(s.slice(1));
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// 2. Hide ALL images
|
||||
// We select all elements with class 'previewer-main-image'
|
||||
allImgs.forEach(img => img.classList.remove('visible'));
|
||||
|
||||
// 3. Construct the specific ID for the current state
|
||||
// Format: view-m{mode}-s{step}
|
||||
const targetId = 'view-m' + mode + '-s' + step;
|
||||
const targetImg = document.getElementById(targetId);
|
||||
|
||||
// 4. Show ONLY the target
|
||||
if (targetImg) {
|
||||
targetImg.classList.add('visible');
|
||||
}
|
||||
|
||||
// 5. Update Button Highlights
|
||||
const allBtns = document.querySelectorAll('.mode-btn');
|
||||
allBtns.forEach((btn, idx) => {
|
||||
if (idx === mode) btn.classList.add('active');
|
||||
else btn.classList.remove('active');
|
||||
});
|
||||
}
|
||||
|
||||
// --- Action: Switch Mode ---
|
||||
function selectMode(mode) {
|
||||
refreshView(mode, -1);
|
||||
}
|
||||
|
||||
// --- Action: Slider Change ---
|
||||
function onSliderChange(val) {
|
||||
refreshView(-1, parseInt(val));
|
||||
}
|
||||
</script>
|
||||
"""
|
||||
|
||||
|
||||
empty_html = f"""
|
||||
<div class="previewer-container">
|
||||
<svg style=" opacity: .5; height: var(--size-5); color: var(--body-text-color);"
|
||||
xmlns="http://www.w3.org/2000/svg" width="100%" height="100%" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather feather-image"><rect x="3" y="3" width="18" height="18" rx="2" ry="2"></rect><circle cx="8.5" cy="8.5" r="1.5"></circle><polyline points="21 15 16 10 5 21"></polyline></svg>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def image_to_base64(image):
|
||||
buffered = io.BytesIO()
|
||||
image = image.convert("RGB")
|
||||
image.save(buffered, format="jpeg", quality=85)
|
||||
img_str = base64.b64encode(buffered.getvalue()).decode()
|
||||
return f"data:image/jpeg;base64,{img_str}"
|
||||
|
||||
|
||||
def start_session(req: gr.Request):
|
||||
user_dir = os.path.join(TMP_DIR, str(req.session_hash))
|
||||
os.makedirs(user_dir, exist_ok=True)
|
||||
|
||||
|
||||
def end_session(req: gr.Request):
|
||||
user_dir = os.path.join(TMP_DIR, str(req.session_hash))
|
||||
shutil.rmtree(user_dir)
|
||||
|
||||
|
||||
def preprocess_image(image: Image.Image) -> Image.Image:
|
||||
"""
|
||||
Preprocess the input image.
|
||||
|
||||
Args:
|
||||
image (Image.Image): The input image.
|
||||
|
||||
Returns:
|
||||
Image.Image: The preprocessed image.
|
||||
"""
|
||||
processed_image = pipeline.preprocess_image(image)
|
||||
return processed_image
|
||||
|
||||
|
||||
def pack_state(latents: Tuple[SparseTensor, SparseTensor, int]) -> dict:
|
||||
shape_slat, tex_slat, res = latents
|
||||
return {
|
||||
'shape_slat_feats': shape_slat.feats.cpu().numpy(),
|
||||
'tex_slat_feats': tex_slat.feats.cpu().numpy(),
|
||||
'coords': shape_slat.coords.cpu().numpy(),
|
||||
'res': res,
|
||||
}
|
||||
|
||||
|
||||
def unpack_state(state: dict) -> Tuple[SparseTensor, SparseTensor, int]:
|
||||
shape_slat = SparseTensor(
|
||||
feats=torch.from_numpy(state['shape_slat_feats']).cuda(),
|
||||
coords=torch.from_numpy(state['coords']).cuda(),
|
||||
)
|
||||
tex_slat = shape_slat.replace(torch.from_numpy(state['tex_slat_feats']).cuda())
|
||||
return shape_slat, tex_slat, state['res']
|
||||
|
||||
|
||||
def get_seed(randomize_seed: bool, seed: int) -> int:
|
||||
"""
|
||||
Get the random seed.
|
||||
"""
|
||||
return np.random.randint(0, MAX_SEED) if randomize_seed else seed
|
||||
|
||||
|
||||
def image_to_3d(
|
||||
image: Image.Image,
|
||||
seed: int,
|
||||
resolution: str,
|
||||
ss_guidance_strength: float,
|
||||
ss_guidance_rescale: float,
|
||||
ss_sampling_steps: int,
|
||||
ss_rescale_t: float,
|
||||
shape_slat_guidance_strength: float,
|
||||
shape_slat_guidance_rescale: float,
|
||||
shape_slat_sampling_steps: int,
|
||||
shape_slat_rescale_t: float,
|
||||
tex_slat_guidance_strength: float,
|
||||
tex_slat_guidance_rescale: float,
|
||||
tex_slat_sampling_steps: int,
|
||||
tex_slat_rescale_t: float,
|
||||
req: gr.Request,
|
||||
progress=gr.Progress(track_tqdm=True),
|
||||
) -> str:
|
||||
# --- Sampling ---
|
||||
outputs, latents = pipeline.run(
|
||||
image,
|
||||
seed=seed,
|
||||
preprocess_image=False,
|
||||
sparse_structure_sampler_params={
|
||||
"steps": ss_sampling_steps,
|
||||
"guidance_strength": ss_guidance_strength,
|
||||
"guidance_rescale": ss_guidance_rescale,
|
||||
"rescale_t": ss_rescale_t,
|
||||
},
|
||||
shape_slat_sampler_params={
|
||||
"steps": shape_slat_sampling_steps,
|
||||
"guidance_strength": shape_slat_guidance_strength,
|
||||
"guidance_rescale": shape_slat_guidance_rescale,
|
||||
"rescale_t": shape_slat_rescale_t,
|
||||
},
|
||||
tex_slat_sampler_params={
|
||||
"steps": tex_slat_sampling_steps,
|
||||
"guidance_strength": tex_slat_guidance_strength,
|
||||
"guidance_rescale": tex_slat_guidance_rescale,
|
||||
"rescale_t": tex_slat_rescale_t,
|
||||
},
|
||||
pipeline_type={
|
||||
"512": "512",
|
||||
"1024": "1024_cascade",
|
||||
"1536": "1536_cascade",
|
||||
}[resolution],
|
||||
return_latent=True,
|
||||
)
|
||||
mesh = outputs[0]
|
||||
mesh.simplify(16777216) # nvdiffrast limit
|
||||
images = render_utils.render_snapshot(mesh, resolution=1024, r=2, fov=36, nviews=STEPS, envmap=envmap)
|
||||
state = pack_state(latents)
|
||||
torch.cuda.empty_cache()
|
||||
|
||||
# --- HTML Construction ---
|
||||
# The Stack of 48 Images
|
||||
images_html = ""
|
||||
for m_idx, mode in enumerate(MODES):
|
||||
for s_idx in range(STEPS):
|
||||
# ID Naming Convention: view-m{mode}-s{step}
|
||||
unique_id = f"view-m{m_idx}-s{s_idx}"
|
||||
|
||||
# Logic: Only Mode 0, Step 0 is visible initially
|
||||
is_visible = (m_idx == DEFAULT_MODE and s_idx == DEFAULT_STEP)
|
||||
vis_class = "visible" if is_visible else ""
|
||||
|
||||
# Image Source
|
||||
img_base64 = image_to_base64(Image.fromarray(images[mode['render_key']][s_idx]))
|
||||
|
||||
# Render the Tag
|
||||
images_html += f"""
|
||||
<img id="{unique_id}"
|
||||
class="previewer-main-image {vis_class}"
|
||||
src="{img_base64}"
|
||||
loading="eager">
|
||||
"""
|
||||
|
||||
# Button Row HTML
|
||||
btns_html = ""
|
||||
for idx, mode in enumerate(MODES):
|
||||
active_class = "active" if idx == DEFAULT_MODE else ""
|
||||
# Note: onclick calls the JS function defined in Head
|
||||
btns_html += f"""
|
||||
<img src="{mode['icon_base64']}"
|
||||
class="mode-btn {active_class}"
|
||||
onclick="selectMode({idx})"
|
||||
title="{mode['name']}">
|
||||
"""
|
||||
|
||||
# Assemble the full component
|
||||
full_html = f"""
|
||||
<div class="previewer-container">
|
||||
<div class="tips-wrapper">
|
||||
<div class="tips-icon">💡Tips</div>
|
||||
<div class="tips-text">
|
||||
<p>● <b>Render Mode</b> - Click on the circular buttons to switch between different render modes.</p>
|
||||
<p>● <b>View Angle</b> - Drag the slider to change the view angle.</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Row 1: Viewport containing 48 static <img> tags -->
|
||||
<div class="display-row">
|
||||
{images_html}
|
||||
</div>
|
||||
|
||||
<!-- Row 2 -->
|
||||
<div class="mode-row" id="btn-group">
|
||||
{btns_html}
|
||||
</div>
|
||||
|
||||
<!-- Row 3: Slider -->
|
||||
<div class="slider-row">
|
||||
<input type="range" id="custom-slider" min="0" max="{STEPS - 1}" value="{DEFAULT_STEP}" step="1" oninput="onSliderChange(this.value)">
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
|
||||
return state, full_html
|
||||
|
||||
|
||||
def extract_glb(
|
||||
state: dict,
|
||||
decimation_target: int,
|
||||
texture_size: int,
|
||||
req: gr.Request,
|
||||
progress=gr.Progress(track_tqdm=True),
|
||||
) -> Tuple[str, str]:
|
||||
"""
|
||||
Extract a GLB file from the 3D model.
|
||||
|
||||
Args:
|
||||
state (dict): The state of the generated 3D model.
|
||||
decimation_target (int): The target face count for decimation.
|
||||
texture_size (int): The texture resolution.
|
||||
|
||||
Returns:
|
||||
str: The path to the extracted GLB file.
|
||||
"""
|
||||
user_dir = os.path.join(TMP_DIR, str(req.session_hash))
|
||||
shape_slat, tex_slat, res = unpack_state(state)
|
||||
mesh = pipeline.decode_latent(shape_slat, tex_slat, res)[0]
|
||||
glb = o_voxel.postprocess.to_glb(
|
||||
vertices=mesh.vertices,
|
||||
faces=mesh.faces,
|
||||
attr_volume=mesh.attrs,
|
||||
coords=mesh.coords,
|
||||
attr_layout=pipeline.pbr_attr_layout,
|
||||
grid_size=res,
|
||||
aabb=[[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
|
||||
decimation_target=decimation_target,
|
||||
texture_size=texture_size,
|
||||
remesh=True,
|
||||
remesh_band=1,
|
||||
remesh_project=0,
|
||||
use_tqdm=True,
|
||||
)
|
||||
now = datetime.now()
|
||||
timestamp = now.strftime("%Y-%m-%dT%H%M%S") + f".{now.microsecond // 1000:03d}"
|
||||
os.makedirs(user_dir, exist_ok=True)
|
||||
glb_path = os.path.join(user_dir, f'sample_{timestamp}.glb')
|
||||
glb.export(glb_path, extension_webp=True)
|
||||
torch.cuda.empty_cache()
|
||||
return glb_path, glb_path
|
||||
|
||||
|
||||
with gr.Blocks(delete_cache=(600, 600)) as demo:
|
||||
gr.Markdown("""
|
||||
## Image to 3D Asset with [TRELLIS.2](https://microsoft.github.io/trellis.2)
|
||||
* Upload an image (preferably with an alpha-masked foreground object) and click Generate to create a 3D asset.
|
||||
* Click Extract GLB to export and download the generated GLB file if you're satisfied with the result. Otherwise, try another time.
|
||||
""")
|
||||
|
||||
with gr.Row():
|
||||
with gr.Column(scale=1, min_width=360):
|
||||
image_prompt = gr.Image(label="Image Prompt", format="png", image_mode="RGBA", type="pil", height=400)
|
||||
|
||||
resolution = gr.Radio(["512", "1024", "1536"], label="Resolution", value="1024")
|
||||
seed = gr.Slider(0, MAX_SEED, label="Seed", value=0, step=1)
|
||||
randomize_seed = gr.Checkbox(label="Randomize Seed", value=True)
|
||||
decimation_target = gr.Slider(100000, 1000000, label="Decimation Target", value=500000, step=10000)
|
||||
texture_size = gr.Slider(1024, 4096, label="Texture Size", value=2048, step=1024)
|
||||
|
||||
generate_btn = gr.Button("Generate")
|
||||
|
||||
with gr.Accordion(label="Advanced Settings", open=False):
|
||||
gr.Markdown("Stage 1: Sparse Structure Generation")
|
||||
with gr.Row():
|
||||
ss_guidance_strength = gr.Slider(1.0, 10.0, label="Guidance Strength", value=7.5, step=0.1)
|
||||
ss_guidance_rescale = gr.Slider(0.0, 1.0, label="Guidance Rescale", value=0.7, step=0.01)
|
||||
ss_sampling_steps = gr.Slider(1, 50, label="Sampling Steps", value=12, step=1)
|
||||
ss_rescale_t = gr.Slider(1.0, 6.0, label="Rescale T", value=5.0, step=0.1)
|
||||
gr.Markdown("Stage 2: Shape Generation")
|
||||
with gr.Row():
|
||||
shape_slat_guidance_strength = gr.Slider(1.0, 10.0, label="Guidance Strength", value=7.5, step=0.1)
|
||||
shape_slat_guidance_rescale = gr.Slider(0.0, 1.0, label="Guidance Rescale", value=0.5, step=0.01)
|
||||
shape_slat_sampling_steps = gr.Slider(1, 50, label="Sampling Steps", value=12, step=1)
|
||||
shape_slat_rescale_t = gr.Slider(1.0, 6.0, label="Rescale T", value=3.0, step=0.1)
|
||||
gr.Markdown("Stage 3: Material Generation")
|
||||
with gr.Row():
|
||||
tex_slat_guidance_strength = gr.Slider(1.0, 10.0, label="Guidance Strength", value=1.0, step=0.1)
|
||||
tex_slat_guidance_rescale = gr.Slider(0.0, 1.0, label="Guidance Rescale", value=0.0, step=0.01)
|
||||
tex_slat_sampling_steps = gr.Slider(1, 50, label="Sampling Steps", value=12, step=1)
|
||||
tex_slat_rescale_t = gr.Slider(1.0, 6.0, label="Rescale T", value=3.0, step=0.1)
|
||||
|
||||
with gr.Column(scale=10):
|
||||
with gr.Walkthrough(selected=0) as walkthrough:
|
||||
with gr.Step("Preview", id=0):
|
||||
preview_output = gr.HTML(empty_html, label="3D Asset Preview", show_label=True, container=True)
|
||||
extract_btn = gr.Button("Extract GLB")
|
||||
with gr.Step("Extract", id=1):
|
||||
glb_output = gr.Model3D(label="Extracted GLB", height=724, show_label=True, display_mode="solid", clear_color=(0.25, 0.25, 0.25, 1.0))
|
||||
download_btn = gr.DownloadButton(label="Download GLB")
|
||||
|
||||
with gr.Column(scale=1, min_width=172):
|
||||
examples = gr.Examples(
|
||||
examples=[
|
||||
f'assets/example_image/{image}'
|
||||
for image in os.listdir("assets/example_image")
|
||||
],
|
||||
inputs=[image_prompt],
|
||||
fn=preprocess_image,
|
||||
outputs=[image_prompt],
|
||||
run_on_click=True,
|
||||
examples_per_page=18,
|
||||
)
|
||||
|
||||
output_buf = gr.State()
|
||||
|
||||
|
||||
# Handlers
|
||||
demo.load(start_session)
|
||||
demo.unload(end_session)
|
||||
|
||||
image_prompt.upload(
|
||||
preprocess_image,
|
||||
inputs=[image_prompt],
|
||||
outputs=[image_prompt],
|
||||
)
|
||||
|
||||
generate_btn.click(
|
||||
get_seed,
|
||||
inputs=[randomize_seed, seed],
|
||||
outputs=[seed],
|
||||
).then(
|
||||
lambda: gr.Walkthrough(selected=0), outputs=walkthrough
|
||||
).then(
|
||||
image_to_3d,
|
||||
inputs=[
|
||||
image_prompt, seed, resolution,
|
||||
ss_guidance_strength, ss_guidance_rescale, ss_sampling_steps, ss_rescale_t,
|
||||
shape_slat_guidance_strength, shape_slat_guidance_rescale, shape_slat_sampling_steps, shape_slat_rescale_t,
|
||||
tex_slat_guidance_strength, tex_slat_guidance_rescale, tex_slat_sampling_steps, tex_slat_rescale_t,
|
||||
],
|
||||
outputs=[output_buf, preview_output],
|
||||
)
|
||||
|
||||
extract_btn.click(
|
||||
lambda: gr.Walkthrough(selected=1), outputs=walkthrough
|
||||
).then(
|
||||
extract_glb,
|
||||
inputs=[output_buf, decimation_target, texture_size],
|
||||
outputs=[glb_output, download_btn],
|
||||
)
|
||||
|
||||
|
||||
# Launch the Gradio app
|
||||
if __name__ == "__main__":
|
||||
os.makedirs(TMP_DIR, exist_ok=True)
|
||||
|
||||
# Construct ui components
|
||||
btn_img_base64_strs = {}
|
||||
for i in range(len(MODES)):
|
||||
icon = Image.open(MODES[i]['icon'])
|
||||
MODES[i]['icon_base64'] = image_to_base64(icon)
|
||||
|
||||
pipeline = Trellis2ImageTo3DPipeline.from_pretrained('microsoft/TRELLIS.2-4B')
|
||||
pipeline.cuda()
|
||||
|
||||
envmap = {
|
||||
'forest': EnvMap(torch.tensor(
|
||||
cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),
|
||||
dtype=torch.float32, device='cuda'
|
||||
)),
|
||||
'sunset': EnvMap(torch.tensor(
|
||||
cv2.cvtColor(cv2.imread('assets/hdri/sunset.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),
|
||||
dtype=torch.float32, device='cuda'
|
||||
)),
|
||||
'courtyard': EnvMap(torch.tensor(
|
||||
cv2.cvtColor(cv2.imread('assets/hdri/courtyard.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),
|
||||
dtype=torch.float32, device='cuda'
|
||||
)),
|
||||
}
|
||||
|
||||
demo.launch(css=css, head=head)
|
||||
BIN
assets/app/basecolor.png
Normal file
|
After Width: | Height: | Size: 4.8 KiB |
BIN
assets/app/clay.png
Normal file
|
After Width: | Height: | Size: 4.4 KiB |
BIN
assets/app/hdri_city.png
Normal file
|
After Width: | Height: | Size: 8.6 KiB |
BIN
assets/app/hdri_courtyard.png
Normal file
|
After Width: | Height: | Size: 9.2 KiB |
BIN
assets/app/hdri_forest.png
Normal file
|
After Width: | Height: | Size: 10 KiB |
BIN
assets/app/hdri_interior.png
Normal file
|
After Width: | Height: | Size: 9.1 KiB |
BIN
assets/app/hdri_night.png
Normal file
|
After Width: | Height: | Size: 6.8 KiB |
BIN
assets/app/hdri_studio.png
Normal file
|
After Width: | Height: | Size: 6.2 KiB |
BIN
assets/app/hdri_sunrise.png
Normal file
|
After Width: | Height: | Size: 8.2 KiB |
BIN
assets/app/hdri_sunset.png
Normal file
|
After Width: | Height: | Size: 7.9 KiB |
BIN
assets/app/normal.png
Normal file
|
After Width: | Height: | Size: 3.8 KiB |
|
After Width: | Height: | Size: 106 KiB |
|
After Width: | Height: | Size: 180 KiB |
|
After Width: | Height: | Size: 153 KiB |
|
After Width: | Height: | Size: 98 KiB |
|
After Width: | Height: | Size: 85 KiB |
|
After Width: | Height: | Size: 53 KiB |
|
After Width: | Height: | Size: 156 KiB |
|
After Width: | Height: | Size: 163 KiB |
|
After Width: | Height: | Size: 202 KiB |
|
After Width: | Height: | Size: 106 KiB |
|
After Width: | Height: | Size: 166 KiB |
|
After Width: | Height: | Size: 113 KiB |
|
After Width: | Height: | Size: 90 KiB |
|
After Width: | Height: | Size: 75 KiB |
|
After Width: | Height: | Size: 137 KiB |
|
After Width: | Height: | Size: 164 KiB |
|
After Width: | Height: | Size: 171 KiB |
|
After Width: | Height: | Size: 128 KiB |
|
After Width: | Height: | Size: 134 KiB |
|
After Width: | Height: | Size: 146 KiB |
|
After Width: | Height: | Size: 126 KiB |
|
After Width: | Height: | Size: 174 KiB |
|
After Width: | Height: | Size: 124 KiB |
|
After Width: | Height: | Size: 108 KiB |
|
After Width: | Height: | Size: 122 KiB |
|
After Width: | Height: | Size: 93 KiB |
|
After Width: | Height: | Size: 91 KiB |
|
After Width: | Height: | Size: 166 KiB |
|
After Width: | Height: | Size: 97 KiB |
|
After Width: | Height: | Size: 109 KiB |
|
After Width: | Height: | Size: 109 KiB |
|
After Width: | Height: | Size: 191 KiB |
|
After Width: | Height: | Size: 187 KiB |
|
After Width: | Height: | Size: 106 KiB |
|
After Width: | Height: | Size: 115 KiB |
|
After Width: | Height: | Size: 98 KiB |
|
After Width: | Height: | Size: 176 KiB |
|
After Width: | Height: | Size: 112 KiB |
|
After Width: | Height: | Size: 101 KiB |
|
After Width: | Height: | Size: 183 KiB |
|
After Width: | Height: | Size: 113 KiB |
BIN
assets/example_image/T.png
Executable file
|
After Width: | Height: | Size: 1.6 MiB |
|
After Width: | Height: | Size: 101 KiB |
|
After Width: | Height: | Size: 99 KiB |
|
After Width: | Height: | Size: 203 KiB |
|
After Width: | Height: | Size: 176 KiB |
|
After Width: | Height: | Size: 176 KiB |
|
After Width: | Height: | Size: 127 KiB |
|
After Width: | Height: | Size: 104 KiB |
|
After Width: | Height: | Size: 152 KiB |
|
After Width: | Height: | Size: 152 KiB |
|
After Width: | Height: | Size: 201 KiB |
|
After Width: | Height: | Size: 148 KiB |
|
After Width: | Height: | Size: 139 KiB |
|
After Width: | Height: | Size: 747 KiB |
|
After Width: | Height: | Size: 213 KiB |
|
After Width: | Height: | Size: 230 KiB |
|
After Width: | Height: | Size: 150 KiB |
|
After Width: | Height: | Size: 77 KiB |
|
After Width: | Height: | Size: 84 KiB |
|
After Width: | Height: | Size: 163 KiB |
|
After Width: | Height: | Size: 178 KiB |
|
After Width: | Height: | Size: 224 KiB |
|
After Width: | Height: | Size: 94 KiB |
|
After Width: | Height: | Size: 104 KiB |
|
After Width: | Height: | Size: 297 KiB |
|
After Width: | Height: | Size: 249 KiB |
|
After Width: | Height: | Size: 182 KiB |
|
After Width: | Height: | Size: 241 KiB |
|
After Width: | Height: | Size: 204 KiB |
|
After Width: | Height: | Size: 97 KiB |
|
After Width: | Height: | Size: 87 KiB |
BIN
assets/hdri/city.exr
Normal file
BIN
assets/hdri/courtyard.exr
Normal file
BIN
assets/hdri/forest.exr
Normal file
BIN
assets/hdri/interior.exr
Normal file
15
assets/hdri/license.txt
Normal file
@@ -0,0 +1,15 @@
|
||||
All HDRIs are licensed as CC0.
|
||||
|
||||
These were created by Greg Zaal (Poly Haven https://polyhaven.com).
|
||||
Originals used for each HDRI:
|
||||
- City: https://polyhaven.com/a/portland_landing_pad
|
||||
- Courtyard: https://polyhaven.com/a/courtyard
|
||||
- Forest: https://polyhaven.com/a/ninomaru_teien
|
||||
- Interior: https://polyhaven.com/a/hotel_room
|
||||
- Night: Probably https://polyhaven.com/a/moonless_golf
|
||||
- Studio: Probably https://polyhaven.com/a/studio_small_01
|
||||
- Sunrise: https://polyhaven.com/a/spruit_sunrise
|
||||
- Sunset: https://polyhaven.com/a/venice_sunset
|
||||
|
||||
1K resolution of each was taken, and compressed with oiiotool:
|
||||
oiiotool input.exr --ch R,G,B -d float --compression dwab:300 --clamp:min=0.0:max=32000.0 -o output.exr
|
||||
BIN
assets/hdri/night.exr
Normal file
BIN
assets/hdri/studio.exr
Normal file
BIN
assets/hdri/sunrise.exr
Normal file
BIN
assets/hdri/sunset.exr
Normal file
BIN
assets/teaser.webp
Normal file
|
After Width: | Height: | Size: 283 KiB |
48
example.py
Normal file
@@ -0,0 +1,48 @@
|
||||
import os
|
||||
os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
|
||||
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" # Can save GPU memory
|
||||
import cv2
|
||||
import imageio
|
||||
from PIL import Image
|
||||
import torch
|
||||
from trellis2.pipelines import Trellis2ImageTo3DPipeline
|
||||
from trellis2.utils import render_utils
|
||||
from trellis2.renderers import EnvMap
|
||||
import o_voxel
|
||||
|
||||
# 1. Setup Environment Map
|
||||
envmap = EnvMap(torch.tensor(
|
||||
cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),
|
||||
dtype=torch.float32, device='cuda'
|
||||
))
|
||||
|
||||
# 2. Load Pipeline
|
||||
pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B")
|
||||
pipeline.cuda()
|
||||
|
||||
# 3. Load Image & Run
|
||||
image = Image.open("assets/example_image/T.png")
|
||||
mesh = pipeline.run(image)[0]
|
||||
mesh.simplify(16777216) # nvdiffrast limit
|
||||
|
||||
# 4. Render Video
|
||||
video = render_utils.make_pbr_vis_frames(render_utils.render_video(mesh, envmap=envmap))
|
||||
imageio.mimsave("sample.mp4", video, fps=15)
|
||||
|
||||
# 5. Export to GLB
|
||||
glb = o_voxel.postprocess.to_glb(
|
||||
vertices = mesh.vertices,
|
||||
faces = mesh.faces,
|
||||
attr_volume = mesh.attrs,
|
||||
coords = mesh.coords,
|
||||
attr_layout = mesh.layout,
|
||||
voxel_size = mesh.voxel_size,
|
||||
aabb = [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
|
||||
decimation_target = 1000000,
|
||||
texture_size = 4096,
|
||||
remesh = True,
|
||||
remesh_band = 1,
|
||||
remesh_project = 0,
|
||||
verbose = True
|
||||
)
|
||||
glb.export("sample.glb", extension_webp=True)
|
||||
174
o-voxel/README.md
Normal file
@@ -0,0 +1,174 @@
|
||||
# O-Voxel: A Native 3D Representation
|
||||
|
||||
**O-Voxel** is a sparse, voxel-based native 3D representation designed for high-quality 3D generation and reconstruction. Unlike traditional methods that rely on fields (e.g., Occupancy fields, SDFs), O-Voxel utilizes a **Flexible Dual Grid** formulation to robustly represent surfaces with arbitrary topology (including non-manifold and open surfaces) and **volumetric surface properties** such as Physically-Based Rendering (PBR) material attributes.
|
||||
|
||||
This library provides an efficient implementation for the instant bidirectional conversion between Meshes and O-Voxels, along with tools for sparse voxel compression, serialization, and rendering.
|
||||
|
||||

|
||||
|
||||
## Key Features
|
||||
|
||||
- **🧱 Flexible Dual Grid**: A geometry representation that solves a enhanced QEF (Quadratic Error Function) to accurately capture sharp features and open boundaries without requiring watertight meshes.
|
||||
- **🎨 Volumetric PBR Attributes**: Native support for physically-based rendering properties (Base Color, Metallic, Roughness, Opacity) aligned with the sparse voxel grid.
|
||||
- **⚡ Instant Bidirectional Conversion**: Rapid `Mesh <-> O-Voxel` conversion without expensive SDF evaluation, flood-filling, or iterative optimization.
|
||||
- **💾 Efficient Compression**: Supports custom `.vxz` format for compact storage of sparse voxel structures using Z-order/Hilbert curve encoding.
|
||||
- **🛠️ Production Ready**: Tools to export converted assets directly to `.glb` with UV unwrapping and texture baking.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
git clone -b main https://github.com/microsoft/TRELLIS.2.git --recursive
|
||||
pip install TRELLIS.2/o_voxel --no-build-isolation
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
> See also the [examples](examples) directory for more detailed usage.
|
||||
|
||||
### 1. Convert Mesh to O-Voxel [[link]](examples/mesh2ovox.py)
|
||||
Convert a standard 3D mesh (with textures) into the O-Voxel representation.
|
||||
|
||||
```python
|
||||
asset = trimesh.load("path/to/mesh.glb")
|
||||
|
||||
# 1. Geometry Voxelization (Flexible Dual Grid)
|
||||
# Returns: occupied indices, dual vertices (QEF solution), and edge intersected
|
||||
mesh = asset.to_mesh()
|
||||
vertices = torch.from_numpy(mesh.vertices).float()
|
||||
faces = torch.from_numpy(mesh.faces).long()
|
||||
voxel_indices, dual_vertices, intersected = o_voxel.convert.mesh_to_flexible_dual_grid(
|
||||
vertices, faces,
|
||||
grid_size=RES, # Resolution
|
||||
aabb=[[-0.5,-0.5,-0.5],[0.5,0.5,0.5]], # Axis-aligned bounding box
|
||||
face_weight=1.0, # Face term weight in QEF
|
||||
boundary_weight=0.2, # Boundary term weight in QEF
|
||||
regularization_weight=1e-2, # Regularization term weight in QEF
|
||||
timing=True
|
||||
)
|
||||
## sort to ensure align between geometry and material voxelization
|
||||
vid = o_voxel.serialize.encode_seq(voxel_indices)
|
||||
mapping = torch.argsort(vid)
|
||||
voxel_indices = voxel_indices[mapping]
|
||||
dual_vertices = dual_vertices[mapping]
|
||||
intersected = intersected[mapping]
|
||||
|
||||
# 2. Material Voxelization (Volumetric Attributes)
|
||||
# Returns: dict containing 'base_color', 'metallic', 'roughness', etc.
|
||||
voxel_indices_mat, attributes = o_voxel.convert.textured_mesh_to_volumetric_attr(
|
||||
asset,
|
||||
grid_size=RES,
|
||||
aabb=[[-0.5,-0.5,-0.5],[0.5,0.5,0.5]],
|
||||
timing=True
|
||||
)
|
||||
## sort to ensure align between geometry and material voxelization
|
||||
vid_mat = o_voxel.serialize.encode_seq(voxel_indices_mat)
|
||||
mapping_mat = torch.argsort(vid_mat)
|
||||
attributes = {k: v[mapping_mat] for k, v in attributes.items()}
|
||||
|
||||
# Save to compressed .vxz format
|
||||
## packing
|
||||
dual_vertices = dual_vertices * RES - voxel_indices
|
||||
dual_vertices = (torch.clamp(dual_vertices, 0, 1) * 255).type(torch.uint8)
|
||||
intersected = (intersected[:, 0:1] + 2 * intersected[:, 1:2] + 4 * intersected[:, 2:3]).type(torch.uint8)
|
||||
attributes['dual_vertices'] = dual_vertices
|
||||
attributes['intersected'] = intersected
|
||||
o_voxel.io.write("ovoxel_helmet.vxz", voxel_indices, attributes)
|
||||
```
|
||||
|
||||
### 2. Recover Mesh from O-Voxel [[link]](examples/ovox2mesh.py)
|
||||
Reconstruct the surface mesh from the sparse voxel data.
|
||||
|
||||
```python
|
||||
# Load data
|
||||
coords, data = o_voxel.io.read("path/to/ovoxel.vxz")
|
||||
dual_vertices = data['dual_vertices']
|
||||
intersected = data['intersected']
|
||||
base_color = data['base_color']
|
||||
## ... other attributes omitted for brevity
|
||||
|
||||
# Depack
|
||||
dual_vertices = dual_vertices / 255
|
||||
intersected = torch.cat([
|
||||
intersected % 2,
|
||||
intersected // 2 % 2,
|
||||
intersected // 4 % 2,
|
||||
], dim=-1).bool()
|
||||
|
||||
# Extract Mesh
|
||||
# O-Voxel connects dual vertices to form quads, optionally splitting them
|
||||
# based on geometric features.
|
||||
rec_verts, rec_faces = o_voxel.convert.flexible_dual_grid_to_mesh(
|
||||
coords.cuda(),
|
||||
dual_vertices.cuda(),
|
||||
intersected.cuda(),
|
||||
split_weight=None, # Auto-split based on min angle if None
|
||||
grid_size=RES,
|
||||
aabb=[[-0.5,-0.5,-0.5],[0.5,0.5,0.5]],
|
||||
)
|
||||
```
|
||||
|
||||
### 3. Export to GLB [[link]](examples/ovox2glb.py)
|
||||
For visualization in standard 3D viewers, you can clean, UV-unwrap, and bake the volumetric attributes into textures.
|
||||
|
||||
```python
|
||||
# Assuming you have the reconstructed verts/faces and volume attributes
|
||||
mesh = o_voxel.postprocess.to_glb(
|
||||
vertices=rec_verts,
|
||||
faces=rec_faces,
|
||||
attr_volume=attr_tensor, # Concatenated attributes
|
||||
coords=coords,
|
||||
attr_layout={'base_color': slice(0,3), 'metallic': slice(3,4), ...},
|
||||
grid_size=RES,
|
||||
aabb=[[-0.5,-0.5,-0.5],[0.5,0.5,0.5]],
|
||||
decimation_target=100000,
|
||||
texture_size=2048,
|
||||
verbose=True,
|
||||
)
|
||||
mesh.export("rec_helmet.glb")
|
||||
```
|
||||
|
||||
### 4. Voxel Rendering [[link]](examples/render_ovox.py)
|
||||
Render the voxel representation directly.
|
||||
|
||||
```python
|
||||
# Load data
|
||||
coords, data = o_voxel.io.read("ovoxel_helmet.vxz")
|
||||
position = (coords / RES - 0.5).cuda()
|
||||
base_color = (data['base_color'] / 255).cuda()
|
||||
|
||||
# Render
|
||||
renderer = o_voxel.rasterize.VoxelRenderer(
|
||||
rendering_options={"resolution": 512, "ssaa": 2}
|
||||
)
|
||||
output = renderer.render(
|
||||
position=position, # Voxel centers
|
||||
attrs=base_color, # Color/Opacity etc.
|
||||
voxel_size=1.0/RES,
|
||||
extrinsics=extr,
|
||||
intrinsics=intr
|
||||
)
|
||||
# output.attr contains the rendered image (C, H, W)
|
||||
```
|
||||
|
||||
## API Overview
|
||||
|
||||
### `o_voxel.convert`
|
||||
Core algorithms for the conversion between meshes and O-Voxels.
|
||||
* `mesh_to_flexible_dual_grid`: Determines the active sparse voxels and solves the QEF to determine dual vertex positions within voxels based on mesh-voxel grid intersections.
|
||||
* `flexible_dual_grid_to_mesh`: Reconnects dual vertices to form a surface.
|
||||
* `textured_mesh_to_volumetric_attr`: Samples texture maps into voxel space.
|
||||
|
||||
### `o_voxel.io`
|
||||
Handles sparse voxel file I/O operations.
|
||||
* **Formats**: `.npz` (NumPy), `.ply` (Point Cloud), `.vxz` (Custom compressed, recommended).
|
||||
* **Functions**: `read()`, `write()`.
|
||||
|
||||
### `o_voxel.serialize`
|
||||
Utilities for spatial hashing and ordering.
|
||||
* `encode_seq` / `decode_seq`: Converts 3D coordinates to/from Morton codes (Z-order) or Hilbert curves for efficient storage and processing.
|
||||
|
||||
### `o_voxel.rasterize`
|
||||
* `VoxelRenderer`: A lightweight renderer for sparse voxel visualization during training.
|
||||
|
||||
### `o_voxel.postprocess`
|
||||
* `to_glb`: A comprehensive pipeline for mesh cleaning, remeshing, UV unwrapping, and texture baking.
|
||||