Foundation Models4 min read

FllumaOne dataset: 100,000 executable CAD programs released

FllumaOne-100K pairs executable Flluma Python programs with STEP geometry, point clouds, feature trees and eight visible-edge renderings.

The Brieftide

TL;DR

  • 01FllumaOne-100K pairs executable Flluma Python programs with STEP geometry, point clouds, feature trees and eight visible-edge renderings.
  • 02FllumaOne-100K is a code-native multimodal CAD dataset of 100,000 accepted samples, released on arXiv (arXiv:2606.17696) on 16 June 2026 and authored by Jizong Zhan.
  • 03Models were produced by executing Python programs in Flluma and programs were retained only after kernel geometry, solid validity, and export checks passed.

FllumaOne-100K is a code-native multimodal CAD dataset of 100,000 accepted samples, released on arXiv (arXiv:2606.17696) on 16 June 2026 and authored by Jizong Zhan. Each model is generated by executable Python programs in Flluma, a Qt/C++ OpenCASCADE-based CAD system, and every sample aligns its program with a structured feature tree, STEP geometry, a surface point cloud, natural-language descriptions, metadata, and eight canonical visible-edge renderings.

What does FllumaOne contain?

FllumaOne-100K contains 100,000 accepted samples distributed across four template-level complexity regimes, and each sample exposes modeling operations, parameters, and feature dependencies together with validated geometry. The dataset provides a training-oriented intermediate representation (a structured feature tree), STEP exports, surface point clouds, natural-language descriptions, metadata, and eight canonical visible-edge renderings per sample, all aligned with the executable Python program that generated the model.

How was the dataset generated and validated?

Models were produced by executing Python programs in Flluma and programs were retained only after kernel geometry, solid validity, and export checks passed. Release reports record modality completeness and split-level duplicate tests, and the primary release keeps only accepted samples that satisfy those checks. Flluma is described as a Qt/C++ OpenCASCADE-based CAD system, which the authors used to generate code-native, executable model provenance rather than only final geometry.

How well do models trained on FllumaOne perform?

A Qwen2.5-Coder-1.5B LoRA baseline trained on 80,000 samples achieved 99.98% Python syntax validity, 99.97% Flluma build success, and 99.14% STEP-export validity on a held-out 10,000-sample test split. For the 9,909 predictions that were converted to surface point clouds, the mean normalized Chamfer Distance is 0.002124. Those concrete metrics quantify program-generation fidelity (syntax), build-level reproducibility (Flluma builds), file-export correctness (STEP), and geometric closeness for reconstructed surfaces (Chamfer distance).

Why it matters

FllumaOne assembles executable program provenance with validated CAD geometry, which changes the unit of dataset utility from static B-Rep files to editable, replayable construction histories. That pairing lets research move beyond single-shot reconstruction toward conditioned reconstruction, executable program synthesis, feature-tree prediction, B-Rep analysis, retrieval, design completion, and editable reverse engineering, all tasks the authors list as supported by the dataset. The high baseline success rates also set a concrete benchmark for future program-synthesis and CAD-reconstruction efforts.

What to watch

Look for independent reproductions of the Qwen2.5-Coder-1.5B LoRA baseline on the held-out 10,000-sample split, and for new baselines that measure program-editability and feature-tree recovery rather than only final-geometry metrics. The arXiv submission identifier is arXiv:2606.17696 and the manuscript was submitted on 16 June 2026; follow-on work will likely surface in citations and competing dataset releases.

Advertisement

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement