QueryCAD: Grounded Question Answering for CAD Models

ArtiMinds Robotics

QueryCAD answers questions about CAD models with precise measurements.

SegCAD: Open-Vocabulary CAD Segmentation

Abstract

CAD models are widely used in industry and are essential for robotic automation processes. However, these models are rarely considered in novel AI-based approaches, such as the automatic synthesis of robot programs, as there are no readily available methods that would allow CAD models to be incorporated for the analysis, interpretation or extraction of information.

To address these limitations, we propose QueryCAD, the first system designed for CAD Question Answering, enabling the extraction of precise information from CAD models using natural language queries. QueryCAD incorporates CAD-Seg, an open-vocabulary instance segmentation model we developed to identify and select specific parts of the CAD model based on part descriptions. We further propose a CAD Question-Answering Benchmark to evaluate QueryCAD and establish a foundation for future research.

Lastly, we integrate QueryCAD within an automatic robot program synthesis framework, validating its ability to enhance deep-learning solutions for robotics by enabling them to process CAD models. However, configuring the skills' parameters for a specific task remains a manual and time-consuming endeavor. Existing approaches for learning or optimizing these parameters often require numerous real-world executions or do not work in dynamic environments.

CAD-Enhanced Program Synthesis

The combination of MetaWizard with QueryCAD enables directly answering natural-language questions about CAD models asked by MetaWizard.

Prompt

Prompt with 3 in-context samples:

System

You are an assistant to write code to query information about a cad object.
All measurements are provided in meters. Your code should also work in meters.
In your response, always include all steps and don't skip steps already given in previous messages
You can assume width is x, depth is y and height is z axis.
Ensure that diameters and radius are respected correctly.
Ensure to multiply the `half_extents` by 2 to to get the height, width or depth of a element.
Save your result in a variable called `solution`.

You can use following python classes and functions in you code. The implementation is hidden.
Do not use any other custom functions.
```python
@dataclass
class Shape:
    """the cad shape"""
    
    """shape center in x, y, and z direction in meters"""
    center: Tuple[float, float, float]

    """shape size (half-width, half-depth, half-height) from the center to the edge in meters"""
    half_extents: Tuple[float, float, float]

@dataclass
class Face:
    """a face of a cad object"""

    """form of face (either plane or cylinder)"""
    type: str 

    """if the face is a cylinder, this property returns the radius of the cylinder"""
    radius: Optional[float]

@dataclass
class Part:
    """a part of the cad object"""
    
    """part center in x, y, and z direction in meters"""
    center: Tuple[float, float, float]

    """part size (half-width, half-depth, half-height) from the center to the edge in meters"""
    half_extents: Tuple[float, float, float]

    """sides the part is visible on (list containing 'top', 'bottom', 'left', 'right', 'front', 'back')"""
    sides: List[str]

    """the cad surfaces that makes up the part"""
    faces: List[Face]

def get_parts_by_instruction(shape: Shape, instruction: str, sides: Optional[List[str]] = None) -> List[Part]:
    """get parts of a shape that looks like `instruction` and optional side the part must be visible on 

    :param shape: the cad shape
    :param instruction: free text instruction specifying optical properties of the part, empty string to select everything
    :param sides: the sides of the object the part is visible from (list of top, bottom, left, right, front, back), None to search on all sides
    """
    pass
```
If you need information about the object itself, directly access the variable `shape`.
```python
shape: Shape
```

User

What is the part that looks like a shaft and has a radius of 6 millimeters?

Let's think step by step

Assistant

First, we have to select all parts that look like a `shaft`. 
```python
filtered_parts = get_parts_by_instruction(shape, "shaft")
```

Next, we need to iterate over those parts
```python
possible_parts = []
for filtered_part in filtered_parts:
    # We only select the parts that have a cylindrical face with a radius of 6 millimeters
    # We need to convert 6 millimeters into meters first
    radius_threshold = 6 / 1000
    if any(face.radius == radius_threshold for face in filtered_part.faces if face.type == "cylinder"):
        # all parts that match the users instruction get added to `possible_parts`
        possible_parts.append(filtered_part)
```

finally, we will save the answer into `solution`
```python
solution = possible_parts
```

User

What is the radius of each hole with a bevel visible from the right side?

Let's think step by step

Assistant

We need to first fetch all parts that look like a hole with a bevel and are visible from the `right`.
```python
hole_parts = get_parts_by_instruction(shape, "holes with a bevel", sides=["right"])
```

For each part visible from the right side, we determine the maximum radius one of its cylindrical faces has.
```python
radii = []
for part in hole_parts:
    hole_radii = [face.radius for face in part.faces if face.type == "cylinder"]
    if len(hole_radii) > 0:
        radii.append(max(hole_radii))
```

Store the list of radii in the variable solution.
```python
solution = radii
```

User

What is the position of the screw?

Let's think step by step

Assistant

First, we retrieve all screws visible on the shape.
```python
screw_parts = get_parts_by_instruction(shape, "screws")
```

For each screw, we compute its center position.
```python
screw_positions = [screw.center for screw in screw_parts]
```

Finally, we store the screw positions in the variable solution.
```python
solution = screw_positions
```

User

{instruction}

Let's think step by step

CAD-Q&A Benchmark

Distribution of dataset questions. Every question is composed of a part it asks about (inner circle), a property to retrieve (middle circle) and optionally one or multiple sides the part must be visible from (outer circle). Some questions enforce additional filtering based on the part properties, like retrieving only parts with a radius of 5 mm.

Open GitHub

BibTeX

@article{kienle2024querycad,
	title        = {QueryCAD: Grounded Question Answering for CAD Models},
	author       = {Claudius Kienle and Benjamin Alt and Darko Katic and Rainer Jäkel},
	year         = 2024,
	url          = {https://arxiv.org/abs/2409.08704},
	eprint       = {2409.08704},
	archiveprefix = {arXiv},
	primaryclass = {cs.RO}
}

Acknowledgement

This work was supported by the German Federal Ministry of Education and Research (grant 01MJ22003B) and the Learning Factory Global Production at the wbk Institute of Production Science at the Karlsruhe Institute of Technology.