CamoNAS: Neural Architecture Search for Camouflaged Detection
CamoNAS is a frequency-aware multi-resolution NAS for camouflaged object detection.
TL;DR
- 01CamoNAS is a frequency-aware multi-resolution NAS for camouflaged object detection.
- 02The paper describes a frequency-aware multi-resolution NAS that searches both cell-level operations and network-level downsampling paths and appears as The Visual Computer 42, Article 194 (2026).
- 03CamoNAS departs from hand-designed, intuition-driven COD architectures by automating architecture choices.
CamoNAS, a Neural Architecture Search framework targeting Camouflaged Object Detection, was submitted to arXiv on 2 Jul 2026 as arXiv:2607.01870 and authored by Dawei Ren, Yan Zhang, Hongying Tang, Qiaoling Zhou and Jianpo Liu. The paper describes a frequency-aware multi-resolution NAS that searches both cell-level operations and network-level downsampling paths and appears as The Visual Computer 42, Article 194 (2026).
What is CamoNAS?
CamoNAS is a frequency-aware multi-resolution Neural Architecture Search framework designed to detect and segment objects that blend into their surroundings, and it explicitly searches both cell-level operations and network-level downsampling paths within a hierarchical search space. The paper frames this as a response to challenges in Camouflaged Object Detection, namely weak edge cues and ill-defined boundaries, and reports state-of-the-art performance on four COD benchmarks: CAMO, COD10K, NC4K and CHAMELEON.
CamoNAS departs from hand-designed, intuition-driven COD architectures by automating architecture choices. The authors also note the code availability in the manuscript, writing, "Our code is available at this https URL."
How does CamoNAS work?
CamoNAS combines a hierarchical NAS search space with an RGB frequency dual-stream architecture, where a learnable wavelet transform complements the RGB spatial stream, and the framework searches both cell-level operations and network-level downsampling paths. The dual-stream design pairs a conventional RGB spatial pipeline with a frequency-processing branch implemented via a learnable wavelet transform.
The paper frames two linked search targets. First, CamoNAS searches cell-level operations, meaning the micro-architectural building blocks inside network cells. Second, it searches network-level downsampling paths, selecting how resolution changes occur across the network. Together these form a hierarchical search space tailored to camouflaged object detection. That combination aims to address the COD problems of weak edge cues and ill-defined boundaries by exposing the NAS to both operation choices and multi-resolution layout decisions.
Why does it matter?
Automating architecture design for camouflaged object detection matters because prior COD models relied on hand-designed architectures and multi-scale feature fusion often guided by intuition rather than systematic search. CamoNAS replaces that ad hoc process with a structured NAS that explicitly includes frequency processing and multi-resolution layout choices. If the method generalizes, it could change how researchers approach challenging segmentation tasks that hinge on subtle appearance cues, by treating frequency representation and downsampling topology as searchable dimensions rather than fixed design choices.
The paper supports its claim with concrete evaluation: it reports state-of-the-art results across four standard COD benchmarks, CAMO, COD10K, NC4K and CHAMELEON, positioning CamoNAS as a competitive automated alternative to human-designed COD networks.
What to watch
Follow any public code release and replication studies, since the authors indicate code availability in the manuscript. Also watch for independent evaluations on the four named benchmarks, CAMO, COD10K, NC4K and CHAMELEON, and for work that adapts the learnable wavelet transform or the dual-stream idea to other low-contrast segmentation tasks.
Technical and bibliographic details: the arXiv submission file size is listed as 3,715 KB, the record identifier is arXiv:2607.01870, and the journal reference is The Visual Computer 42, Article 194 (2026) with related DOI https://doi.org/10.1007/s00371-026-04411-3.
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Multimodal AIMIT Masked IRL: LLMs help robots clarify and ignore cues
MIT’s Masked IRL uses two LLMs to clarify vague prompts, cut demonstration data nearly fivefold.
Multimodal LLM evaluation: four missing capabilities (2026)
A paper by Po-han Li et al. finds benchmarks miss temporal-spatial coherence, physical-world understanding.
ReMMD: Multilingual Multi-Image Benchmark and Agent Release
ReMMD introduces ReMMDBench (500 samples, 2,756 images) and ReMMD-Agent; GPT-5.2 yields 41.80% accuracy and 39.12% macro-F1.
Amazon Nova embeddings beat Cohere for Vexcel aerial search
Amazon Nova Multimodal Embeddings, evaluated on Vexcel imagery via Amazon Bedrock.