HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos

Nwoye Chinedu Innocent ; Yu Tong ; Gonzalez Cristians ; Seeliger Barbara ; Mascagni Pietro ; Mutter Didier ; Marescaux Jacques ; Padoy Nicolas

Rendezvous: Attention Mechanisms for the Recognition of Surgical Action
  Triplets in Endoscopic Videos

Abstract

Out of all existing frameworks for surgical workflow analysis in endoscopicvideos, action triplet recognition stands out as the only one aiming to providetruly fine-grained and comprehensive information on surgical activities. Thisinformation, presented as combinations, is highlychallenging to be accurately identified. Triplet components can be difficult torecognize individually; in this task, it requires not only performingrecognition simultaneously for all three triplet components, but also correctlyestablishing the data association between them. To achieve this task, weintroduce our new model, the Rendezvous (RDV), which recognizes tripletsdirectly from surgical videos by leveraging attention at two different levels.We first introduce a new form of spatial attention to capture individual actiontriplet components in a scene; called Class Activation Guided AttentionMechanism (CAGAM). This technique focuses on the recognition of verbs andtargets using activations resulting from instruments. To solve the associationproblem, our RDV model adds a new form of semantic attention inspired byTransformer networks; called Multi-Head of Mixed Attention (MHMA). Thistechnique uses several cross and self attentions to effectively capturerelationships between instruments, verbs, and targets. We also introduceCholecT50 - a dataset of 50 endoscopic videos in which every frame has beenannotated with labels from 100 triplet classes. Our proposed RDV modelsignificantly improves the triplet prediction mean AP by over 9% compared tothe state-of-the-art methods on this dataset.

Code Repositories

CAMMA-public/cholect50
pytorch
Mentioned in GitHub
CAMMA-public/cholect45
pytorch
Mentioned in GitHub
camma-public/ivtmetrics
Mentioned in GitHub
camma-public/rendezvous
pytorch
Mentioned in GitHub
camma-public/tripnet
Official
pytorch
Mentioned in GitHub
camma-public/ssg-vqa
pytorch
Mentioned in GitHub
camma-public/ssg-qa
pytorch
Mentioned in GitHub
camma-public/attention-tripnet
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
action-triplet-recognition-on-cholect50Rendezvous (TensorFlow v1)
Mean AP: 29.9
action-triplet-recognition-on-cholect50Attention Tripnet (TensorFlow v1)
Mean AP: 23.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp