How Boon AI Detects Mechanical Ductwork with 85%+ Accuracy

By Boon AI Engineering
The Problem Every Mechanical Estimator Knows
If you've ever stared at a dense mechanical floor plan — ducts running in every direction, elbows stacked on reducers, flex connections snaking to diffusers — you know the drill. Counting every fitting, measuring every duct run, classifying every component. It takes hours per sheet. Miss a reducer or miscount your elbows, and your bid is off before it starts.
At Boon, we asked: what would it take for AI to read these drawings the way an experienced estimator does?
The answer turned out to be harder — and more interesting — than we expected. Here's how we built a ductwork detection system that achieves F1 scores above 0.85 for duct detection and 0.95+ for fitting identification, and what that means for mechanical contractors.
What We Actually Detect
Our system doesn't just find "ducts." It understands the full anatomy of an HVAC plan:

Duct Runs
Every straight duct segment — horizontal, vertical, and diagonal — with accurate width measurements. Our duct detection models achieve an F1 score of 0.85, meaning they correctly identify 85% of all duct runs while maintaining high precision (very few false detections).
43 Types of Fittings
We don't just detect "an elbow." We distinguish between:
- Elbows: 90° smooth, 45° smooth, 90° with square vanes, 45° with square vanes, miter bends
- Dampers: Fire, smoke, fire-smoke combination, volume, motorized
- Reducers: Standard transitions, offset transitions, rectangular-to-round, round-to-rectangular
- Tap-ins and branches for branch duct connections
- GRDs (Grilles, Registers, Diffusers): Supply, return, and exhaust — classified by type
- Flex duct connections
- Terminal units: VAV boxes, fan coil units, fan-powered terminal units
- Equipment: Air handling units, rooftop units, exhaust fans
For the families we've fully benchmarked — elbows, dampers, junctions, reducers, and GRDs — our detection models achieve F1 scores above 0.95. That's better than 95% accuracy on identifying and classifying these components.

Width and Dimensions
Beyond detection, the system measures duct widths automatically. This means we can distinguish between a 12" branch and a 24" trunk without manual measurement.
Why This Is Hard — And How We Solved It
Construction drawings aren't photographs. Off-the-shelf AI models built for general image recognition fail spectacularly on mechanical plans. Here's why, and what we did differently.
Construction Drawings Are a Unique Challenge
Mechanical floor plans are dense, technical documents with overlapping layers of information — structural grids, architectural elements, plumbing, electrical, and HVAC all competing for space on the same sheet. Duct lines are thin, often only a few pixels wide when digitized. Fittings can look nearly identical at small scales. Labels overlap with the components they describe.
General-purpose computer vision wasn't built for this. We had to build specialized AI from the ground up.
Our AI Understands Connectivity, Not Just Pixels
Standard AI models optimize for "how many pixels did I get right?" That's fine for detecting cars in photos, but terrible for ductwork. If the model gets 99% of pixels correct but breaks a duct run in the middle, the takeoff is wrong.
Our models are trained with topology-aware objectives that explicitly penalize broken connections. They don't just ask "did you find duct pixels?" — they ask "are the ducts you found actually connected the way they should be?" This is a key reason our accuracy numbers translate to reliable takeoffs, not just impressive benchmarks.

Specialized Models for Different Components
Early on, we tried training a single AI model to detect all fitting types at once. It worked — but not well enough. Different fittings have wildly different visual characteristics. An elbow looks nothing like a VAV box. A fire damper symbol is tiny compared to an air handling unit.
So we built specialized detection models for each fitting family. Each is optimized for its specific component type — the right data balance, the right training strategy, the right model capacity. The result? Multiple families already exceed 0.95 F1.
Multiple AI Models Working Together
The production system isn't a single model — it's an ensemble of specialized neural networks working in concert:
- Duct detection models find where ducts run and how wide they are
- Fitting detectors identify and classify every component by type and subtype
- Classification models distinguish between supply, return, and exhaust terminals
- Dimension readers extract duct sizes from drawing labels
These models inform each other. Fitting detections help determine where duct segments connect. Duct body predictions guide width measurements. The ensemble is greater than the sum of its parts.
What These Numbers Mean for Your Workflow
Let's translate the metrics into practical impact.
An F1 score of 0.85 for duct detection means:
- On a typical mechanical floor plan with 100 duct segments, the system correctly identifies ~85 of them with high confidence
- False positives are minimal — what the system marks as a duct almost always is one
- The remaining segments typically need minor cleanup: extending a duct run that was slightly short, or connecting two segments that the AI detected separately
An F1 score of 0.95+ for fittings means:
- On a plan with 130 fittings, ~124+ are correctly detected and classified
- Elbows are classified by angle (45° vs. 90°) and type (smooth vs. square vanes vs. miter)
- Dampers are classified by function (fire, smoke, volume)
In practice, this translates to:
- A mechanical estimator reviewing AI-generated takeoffs instead of building them from scratch
- Hours of work reduced to minutes of verification
- Consistent counting that doesn't vary based on who's doing the takeoff or how tired they are


The Data Behind the Model
Accuracy claims are only as good as the data they're built on. Here's what grounds ours:
- 356 real mechanical drawing PDFs from actual construction projects — not synthetic or simplified test drawings
- 72,828 individual human annotations created by trained annotators who understand HVAC symbols
- Rigorous data quality controls — we learned the hard way that bad annotations are worse than no annotations. After investing heavily in data quality infrastructure, we implemented thorough annotation audits that verify alignment, completeness, and accuracy before any data enters training
- Benchmark testing on held-out drawings the model has never seen during training — our F1 scores reflect real generalization, not memorization
What's Coming Next
We're not done. Active development includes:
- Connectivity prediction — automatically determining which ducts connect to which, building the complete HVAC network graph from a flat drawing
- Even more fitting subtypes — expanding from 43 to comprehensive coverage of every HVAC symbol in common use
- Faster processing — continuously optimizing our pipeline speed
- Cross-page system tracking — following duct systems across multiple sheets in a plan set
Built for Mechanical Contractors
Every technical decision we've made was driven by one question: does this help a mechanical estimator produce a more accurate takeoff, faster?
The answer, across 356 drawings and 72,828 annotations, is yes.
If you're a mechanical contractor tired of spending hours on ductwork takeoffs, we'd love to show you what 0.85+ F1 looks like in practice on your drawings.
Key Metrics at a Glance

| Component | Score |
|---|---|
| Duct detection | F1: 0.85 |
| Duct width/area measurement | F1: 0.87 |
| Elbow detection (5 subtypes) | F1: >0.95 |
| Damper detection (5 subtypes) | F1: >0.95 |
| Junction detection | F1: >0.95 |
| Reducer detection | F1: >0.95 |
| GRD detection (supply/return/exhaust) | F1: >0.95 |
| Total fitting subtypes | 43 |
| Training data | 72,828 annotations on 356 drawings |
All metrics measured on held-out test sets — drawings the models never saw during training.

.png)
