多模态智能感知与认知实验室
多模态智能感知与认知实验室
新闻资讯
学术团队
学术方向
学术论文
硕博招生
查正军
教授
最新
A Closer Look at the Reflection Formulation in Single Image Reflection Removal
Adaptive Texture and Spectrum Clue Mining for Generalizable Face Forgery Detection
Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation
Cross-Modal Semantic Alignment Learning for Text-Based Person Search
DDOD: Dive Deeper into the Disentanglement of Object Detector
Learning Discriminative Noise Guidance for Image Forgery Detection and Localization
Neuromorphic Event Signal-Driven Network for Video De-raining
On Exploring Multiplicity of Primitives and Attributes for Texture Recognition in the Wild
Prototype-Augmented Self-Supervised Generative Network for Generalized Zero-Shot Learning
Unleashing Knowledge Potential of Source Hypothesis for Source-Free Domain Adaptation
Adaptive Frequency Filters As Efficient Global Token Mixers
Alleviating Spatial Misalignment and Motion Interference for UAV-based Video Recognition
Category-Stitch Learning for Union Domain Generalization
Continual Image Deraining With Hypergraph Convolutional Networks
Decoupling-and-Aggregating for Image Exposure Correction
Deep Texton-Coherence Network for Camouflaged Object Detection
Domain Generalization Via Encoding and Resampling in a Unified Latent Space
DreamWaltz: Make a Scene with Complex 3D Animatable Avatars
ECENet: Explainable and Context-Enhanced Network for Muti-modal Fact verification
Edge-aware Regional Message Passing Controller for Image Forgery Localization
Entity-Enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding
Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning
Exploring Tuning Characteristics of Ventral Stream's Neurons for Few-Shot Image Classification
Fusion-Based Low-Light Image Enhancement
Generalized UAV Object Detection via Frequency Domain Disentanglement
Grounding 3D Object Affordance from 2D Interactions in Images
Hierarchical Semantic Enhancement Network for Multimodal Fake News Detection
Image De-Raining Transformer
Learning Cross-Representation Affinity Consistency for Sparsely Supervised Biomedical Instance Segmentation
Learning Semantics-Grounded Vocabulary Representation for Video-Text Retrieval
Learning to Dub Movies via Hierarchical Prosody Models
Learning Video-Text Aligned Representations for Video Captioning
Location-Free Camouflage Generation Network
MaTCR: Modality-Aligned Thought Chain Reasoning for Multimodal Task-Oriented Dialogue Generation
Neural Dependencies Emerging from Learning Massive Categories
NTIRE 2023 Challenge on Efficient Super-Resolution: Methods and Results
Random Shuffle Transformer for Image Restoration
Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models
Self-Organizing Pathway Expansion for Non-Exemplar Class-Incremental Learning
Self-supervised Cross-view Representation Reconstruction for Change Captioning
Semantic and Relation Modulation for Audio-Visual Event Localization
Spatial-Aware Token for Weakly Supervised Object Localization
Streaming Video Model
Synergy between Semantic Segmentation and Image Denoising via Alternate Boosting
Text-Driven Generative Domain Adaptation with Spectral Consistency Regularization
A Model-Driven Deep Unfolding Method for JPEG Artifacts Removal
AS-Net: Class-Aware Assistance and Suppression Network for Few-Shot Learning
Automatic Relation-aware Graph Network Proliferation
Bijective Mapping Network for Shadow Removal
Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Cross-modal Semantic Alignment Pre-training for Vision-and-Language Navigation
Debiased Batch Normalization via Gaussian Process for Generalizable Person Re-identification
Degradation-agnostic Correspondence from Resolution-asymmetric Stereo
E-Commerce Storytelling Recommendation Using Attentional Domain-Transfer Network and Adversarial Pre-Training
Efficient Model-Driven Network for Shadow Removal
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Enhancement by Your Aesthetic: An Intelligible Unsupervised Personalized Enhancer for Low-Light Images
Event-driven Video Deblurring via Spatio-Temporal Relation-Aware Network
Exploring Figure-Ground Assignment Mechanism in Perceptual Organization
Exploring Fourier Prior for Single Image Rain Removal
Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment
I(^mbox2)Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning
JPEG Artifacts Removal via Contrastive Representation Learning
JPEG Compression-aware Image Forgery Localization
Learning Degradation-Invariant Representation for Robust Real-World Person Re-Identification
Learning Dual Convolutional Dictionaries for Image De-raining
Lifelong Unsupervised Domain Adaptive Person Re-identification with Coordinated Anti-forgetting and Adaptation
Lightweight Wavelet-Based Network for JPEG Artifacts Removal
Long Short-Term Relation Transformer With Global Gating for Video Captioning
Long-Range Feature Dependencies Capturing for Low-Resolution Image Classification
Modality-Adaptive Mixup and Invariant Decomposition for RGB-Infrared Person Re-identification
Online Residual Quantization Via Streaming Data Correlation Preserving
Principled Knowledge Extrapolation with GANs
Progressive Pan-Sharpening via Cross-Scale Collaboration Networks
ProgressiveMotionSeg: Mutually Reinforced Framework for Event-Based Motion Segmentation
Rank Diminishing in Deep Neural Networks
S2N: Suppression-Strengthen Network for Event-Based Recognition Under Variant Illuminations
Single Image Shadow Detection via Complementary Mechanism
Stochastic Window Transformer for Image Restoration
Unsupervised Coherent Video Cartoonization with Perceptual Motion Consistency
Weakly Supervised High-Fidelity Clothing Model Generation
A Decomposition-based Network for Non-uniform Illuminated Retinal Image Enhancement
A Mutually Attentive Co-Training Framework for Semi-Supervised Recognition
Attack-Guided Perceptual Data Generation for Real-world Re-Identification
Cluster and Scatter: A Multi-grained Active Semi-supervised Learning Framework for Scalable Person Re-identification
Cross-Patch Graph Convolutional Network for Image Denoising
Deep Coattention-Based Comparator for Relative Representation Learning in Person Re-Identification
Disentangle Your Dense Object Detector
Domain-Oriented Semantic Embedding for Zero-Shot Learning
End-to-end Boundary Exploration for Weakly-supervised Semantic Segmentation
Exploiting Sample Uncertainty for Domain Adaptive Person Re-Identification
Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers
Group-aware Label Transfer for Domain Adaptive Person Re-identification
Human activity recognition by manifold regularization based dynamic graph convolutional networks
Image De-Raining via Continual Learning
Improving De-raining Generalization via Neural Reorganization
Language-Conditioned Region Proposal and Retrieval Network for Referring Expression Comprehension
Laplacian Pyramid Neural Network for Dense Continuous-Value Regression for Complex Scenes
Learning and Fusing Multiple User Interest Representations for Micro-Video and Movie Recommendations
Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment
Learning Dual Priors for JPEG Compression Artifacts Removal
Leveraging Deep Statistics for Underwater Image Enhancement
Light Field Super-Resolution With Zero-Shot Learning
Local-binarized very deep residual network for visual categorization
Low-Rank Subspaces in GANs
Multifocal Attention-Based Cross-Scale Network for Image De-raining
One-Shot Texture Retrieval Using Global Grouping Metric
Pose-Guided Feature Learning with Knowledge Distillation for Occluded Person Re-Identification
Rain Streak Removal via Dual Graph Convolutional Network
Rethinking Graph Neural Architecture Search From Message-Passing
Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning
Self-Supervised Visual Representations Learning by Contrastive Mask Prediction
SLiKER: Sparse loss induced kernel ensemble regression
Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos
Structure-Guided Deep Video Inpainting
Structured Multi-Level Interaction Network for Video Moment Localization via Language Query
Successive Graph Convolutional Network for Image De-raining
Training Spiking Neural Networks with Accumulated Spiking Flow
Uncertainty Principles of Encoding GANs
Understanding Noise Injection in GANs
Weakly Supervised Neuron Reconstruction From Optical Microscopy Images With Morphological Priors
A generalized least-squares approach regularized with graph embedding for dimensionality reduction
A Structured Graph Attention Network for Vehicle Re-Identification
Adversarial Attribute-Text Embedding for Person Search With Natural Language Query
ASTA-Net: Adaptive Spatio-Temporal Attention Network for Person Re-Identification in Videos
CircleNet for Hip Landmark Detection
Co-Saliency Spatio-Temporal Interaction Network for Person Re-Identification in Videos
ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection
Deep Degradation Prior for Low-Quality Image Classification
Deep Structure-Revealed Network for Texture Recognition
DeepFacePencil: Creating Face Images from Freehand Sketches
Domain-Aware Visual Bias Eliminating for Generalized Zero-Shot Learning
Dual Context-Aware Refinement Network for Person Search
Dual Path Interaction Network for Video Moment Localization
Filtration and Distillation: Enhancing Region Attention for Fine-Grained Visual Categorization
Fine-grained Feature Alignment with Part Perspective Transformation for Vehicle ReID
Frank-Wolfe Network: An Interpretable Deep Structure for Non-Sparse Coding
Hierarchical Granularity Transfer Learning
Hierarchical Gumbel Attention Network for Text-based Person Search
Iterative Context-Aware Graph Inference for Visual Dialog
Joint Sketch-Attribute Learning for Fine-Grained Face Synthesis
JPEG Artifacts Removal via Compression Quality Ranker-Guided Networks
Learning Rich Part Hierarchies With Progressive Attention Networks for Fine-Grained Image Recognition
Learning Semantic-aware Normalization for Generative Adversarial Networks
Learning to Discretely Compose Reasoning Module Networks for Video Captioning
March on Data Imperfections: Domain Division and Domain Generalization for Semantic Segmentation
Memory-Augmented Relation Network for Few-Shot Learning
Multi-Objective Matrix Normalization for Fine-Grained Visual Recognition
Multi-Scale Group Transformer for Long Sequence Modeling in Speech Separation
Multi-Scale Spatial-Temporal Integration Convolutional Tube for Human Action Recognition
Neuronal Population Reconstruction From Ultra-Scale Optical Microscopy Images via Progressive Learning
Nighttime Dehazing with a Synthetic Benchmark
Object Relational Graph With Teacher-Recommended Learning for Video Captioning
Parsing-Based View-Aware Embedding Network for Vehicle Re-Identification
Posterior-Guided Neural Architecture Search
Real-World Image Denoising with Deep Boosting
Real-World Person Re-Identification via Degradation Invariance Learning
Robust Deep Co-Saliency Detection With Group Semantic and Pyramid Attention
Self-Supervised Domain-Aware Generative Network for Generalized Zero-Shot Learning
Semantic Image Analogy with a Conditional Single-Image GAN
Space-Time Video Super-Resolution Using Temporal Profiles
Spatiotemporal Fusion in 3D CNNs: A Probabilistic View
State-Relabeling Adversarial Active Learning
Structural Semantic Adversarial Active Learning for Image Captioning
Towards Neuron Segmentation from Macaque Brain Images: A Weakly Supervised Approach
Towards Semantically Scalable Image Coding using Semantic Map
Transferrable Referring Expression Grounding with Concept Transfer and Context Inheritance
Visual Object Tracking via Guessing and Matching
A generalized multi-dictionary least squares framework regularized with multi-graph embeddings
A Two-Stream Mutual Attention Network for Semi-Supervised Biomedical Segmentation with Noisy Labels
Abstract Reasoning with Distracting Features
Accurate Segmentation of Synaptic Cleft with Contour Growing Concatenated with a Convnet
Adaptive Alignment Network for Person Re-identification
Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding
Adaptive Transfer Network for Cross-Domain Person Re-Identification
BERT4SessRec: Content-Based Video Relevance Prediction with Bidirectional Encoder Representations from Transformer
Camera Lens Super-Resolution
Context-Reinforced Semantic Segmentation
Convolutional Attention Networks for Scene Text Recognition
Cross-Fiber Spatial-Temporal Co-enhanced Networks for Video Action Recognition
Cross-Modality Feature Learning via Convolutional Autoencoder
DADNet: Dilated-Attention-Deformable ConvNet for Crowd Counting
Deep Adversarial Graph Attention Convolution Network for Text-Based Person Search
Deep Multiple-Attribute-Perceived Network for Real-World Texture Recognition
Dense 3D-Convolutional Neural Network for Person Re-Identification in Videos
Densely Supervised Hierarchical Policy-Value Network for Image Paragraph Generation
Domain-Specific Embedding Network for Zero-Shot Recognition
Dynamically building diversified classifier pruning ensembles via canonical correlation analysis
Exploring the Task Cooperation in Multi-goal Visual Navigation
Extract Bone Parts Without Human Prior: End-to-end Convolutional Neural Network for Pediatric Bone Age Assessment
Fast and Accurate Electron Microscopy Image Registration with 3D Convolution
Ground-Aware Point Cloud Semantic Segmentation for Autonomous Driving
Hierarchical Global-Local Temporal Modeling for Video Captioning
Hybrid Image Enhancement With Progressive Laplacian Enhancing Unit
Illumination-Invariant Person Re-Identification
Instance Segmentation from Volumetric Biomedical Images Without Voxel-Wise Labeling
JPEG Artifacts Reduction via Deep Convolutional Sparse Coding
Knowing User Better: Jointly Predicting Click-Through and Playtime for Micro-Video
Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding
Learning Compact Appearance Representation for Video-Based Person Re-Identification
Learning Deep Bilinear Transformation for Fine-grained Image Representation
Learning to Assemble Neural Module Tree Networks for Visual Grounding
LinesToFacePhoto: Face Photo Generation From Lines With Conditional Self-Attention Generative Adversarial Networks
Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition
Making History Matter: History-Advantage Sequence Training for Visual Dialog
Manifold Alignment via Global and Local Structures Preserving PCA Framework
MLTS: A Multi-Language Scene Text Spotter
Mutually Reinforced Spatio-Temporal Convolutional Tube for Human Action Recognition
Near-Duplicate Video Retrieval Through Toeplitz Kernel Partial Least Squares
Neural Network-Based Arithmetic Coding for Inter Prediction Information in HEVC
One-Shot Texture Retrieval with Global Context Metric
Progressive Learning for Neuronal Population Reconstruction from Optical Microscopy Images
Progressive Retinex: Mutually Reinforced Illumination-Noise Perception Network for Low-Light Image Enhancement
Question-Aware Tube-Switch Network for Video Question Answering
Robust Deep Co-Saliency Detection with Group Semantic
Semantic-Embedding and Shape-Aware U-Net for Ultrasound Eyeball Segmentation
Spatiotemporal-Textual Co-Attention Network for Video Question Answering
Structure-Aware Residual Pyramid Network for Monocular Depth Estimation
3D Cnn-Based Soma Segmentation from Brain Images at Single-Neuron Resolution
A CNN-Based In-Loop Filter with CU Classification for HEVC
A Fast Uyghur Text Detector for Complex Background Images
A Feature-Adaptive Semi-Supervised Framework for Co-saliency Detection
Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling
CA(_mbox3)Net: Contextual-Attentional Attribute-Appearance Network for Person Re-Identification
CCLBR: Congestion Control-Based Load Balanced Routing in Unstructured P2P Systems
CCNet: Cluster-Coordinated Net for Learning Multi-agent Communication Protocols with Reinforcement Learning
Co-occurrent Structural Edge Detection for Color-Guided Depth Map Super-Resolution
Collaborative Detection and Caption Network
Connectionist Temporal Fusion for Sign Language Translation
Content-Based Video Relevance Prediction with Second-Order Relevance and Attention Modeling
Context-Aware Visual Policy Network for Sequence-Level Image Captioning
Deep Residual Attention Network for Spectral Image Super-Resolution
LA-Net: Layout-Aware Dense Network for Monocular Depth Estimation
MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition
Object Trajectory Proposal via Hierarchical Volume Grouping
Particle Swarm Programming-Based Interactive Content-Based Image Retrieval
PIRM2018 Challenge on Spectral Image Super-Resolution: Methods and Results
Session details: Vision-2 (Object & Scene Understanding)
Session details: Vision-3 (Applications in Multimedia)
Temporal Hierarchical Attention at Category- and Item-Level for Micro-Video Click-Through Prediction
Temporal-Contextual Attention Network for Video-Based Person Re-identification
Towards Human-Level License Plate Recognition
Adaptive Pooling in Multi-instance Learning for Web Video Annotation
Diversity-induced weighted classifier ensemble learning
Guest Editorial: Knowledge-Based Multimedia Computing
Improving triplet-wise training of convolutional neural network for vehicle re-identification
Progressive tone mapping of brain images at single-neuron resolution
A robust vision inspection system for detecting surface defects of film capacitors
A Unified Scheme for Super-Resolution and Depth Estimation From Asymmetric Stereoscopic Video
Action recognition with novel high-level pose features
Building Locally Discriminative Classifier Ensemble Through Classifier Fusion Among Nearest Neighbors
Collaborative Q-Learning Based Routing Control in Unstructured P2P Networks
Comparative Deep Learning of Hybrid Representations for Image Recommendations
Guest Editorial: Large-Scale Multimedia Content Analysis on Social Media
Linear Distance Preserving Pseudo-Supervised and Unsupervised Hashing
Multi-Scale Triplet CNN for Person Re-Identification
p-Laplacian Regularized Sparse Coding for Human Activity Recognition
Social media analytics and learning
An Attribute-Assisted Reranking Model for Web Image Search
Corrections to \"Exploiting Web Images for Semantic Video Indexing Via Robust Sample-Specific Loss\"
Depth map super-resolution using stereo-vision-assisted model
Guest editorial: selected papers from ICIMCS 2012
Learning Multi-view Deep Features for Small Object Retrieval in Surveillance Scenarios
Robust Multiview Feature Learning for RGB-D Image Understanding
Semantic-Based Location Recommendation With Multimodal Venue Semantics
Sparse canonical correlation analysis for recognition
Sparse principle motion component for one-shot gesture recognition
The AdaBoost algorithm for vehicle detection based on CNN features
A novel segmentation based video-denoising method with noise level estimation
A Stereo-Vision-Assisted model for depth map super-resolution
Achieving dynamic load balancing through mobile agents in small world P2P networks
Adaptive Learning for Celebrity Identification With Video Context
Attribute-Augmented Semantic Hierarchy: Towards a Unified Framework for Content-Based Image Retrieval
Exploiting Web Images for Semantic Video Indexing Via Robust Sample-Specific Loss
Gradient-domain-based enhancement of multi-view depth video
Improving Color Constancy with Internet Photo Collections
Introduction to the Special Issue Best Papers of ACM Multimedia 2013
Product Aspect Ranking and Its Applications
Robust (Semi) Nonnegative Graph Embedding
A Pattern Matching Based Model for Implicit Opinion Question Identification
Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval
Beyond Text QA: Multimedia Answer Generation by Harvesting Web Information
Click-boosting random walk for image search reranking
Detecting Group Activities With Multi-Camera Context
GPSView: A scenic driving route planner
Hierarchical Organization of Collaboratively Constructed Content
Interactive social group recommendation for Flickr photos
Learning attribute-aware dictionary for image classification and search
Marginalized multi-layer multi-instance kernel for video concept detection
Multimedia encyclopedia construction by mining web knowledge
Partial-Duplicate Image Retrieval via Saliency-Guided Visual Matching
Robust Semantic Video Indexing by Harvesting Web Images
Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search
A comprehensive representation scheme for video semantic ontology and its applications in semantic concept detection
Active learning for social image retrieval using Locally Regressive Optimal Design
Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews
Attribute feedback
Attribute feedback
Attribute-assisted reranking for web image retrieval
Automatic labeling hierarchical topics
Combining SIFT and Global Features for Web Image Classification
Difficulty Guided Image Retrieval Using Linear Multiple Feature Embedding
Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification
Interactive Video Indexing With Statistical Active Learning
k-Partite graph reinforcement and its application in multimedia information retrieval
Mining Travel Patterns from Geotagged Photos
Multimedia Question Answering
Oracle in Image Search: A Content-Based Approach to Performance Prediction
Parallel Lasso for Large-Scale Video Concept Detection
Robust Non-negative Graph Embedding: Towards noisy data, unreliable graphs, and noisy labels
Semantic-Gap-Oriented Active Learning for Multilabel Image Annotation
Text Mining in Multimedia
The 4th International Conference on Internet Multimedia Computing and Service, ICIMCS '12, Wuhan, China, September 9-11, 2012
Topology Adaptation Based on Mobile Agent in Unstructured P2P Networks
Video Browser Showdown by NUS
Visual query attributes suggestion
Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews
Difficulty guided image retrieval using linear multiview embedding
Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews
Hierarchical organization of unstructured consumer reviews
Integrating rich information for video recommendation with multi-task rank aggregation
Learning \"verb-object\" concepts for semantic image annotation
Learning concept bundles for video search with complex queries
Less is More: Efficient 3-D Object Retrieval With Query View Selection
Locally regressive G-optimal design for image retrieval
Matching Content-based Saliency Regions for partial-duplicate image retrieval
Mining Travel Patterns from GPS-Tagged Photos
Multimedia answering: enriching text QA with media information
Optimizing multimodal reranking for web image search
Product comparison using comparative relations
Query expansion by spatial co-occurrence for image retrieval
Research and applications on georeferenced multimedia: a survey
Semi-automatic Flickr Group Suggestion
ShotTagger: tag location for internet videos
Utilizing Related Samples to Enhance Interactive Concept-Based Video Search
Evaluation of histogram based interest point detector in web image classification and search
Joint Learning of Labels and Distance Metric
Mediapedia: Mining Web Knowledge to Construct Multimedia Encyclopedia
TRECVID 2010 Known-item Search by NUS
Utilizing related samples to learn complex queries in interactive concept-based video search
Visual query suggestion: Towards capturing user intent in internet image search
Which Tags Are Related to Visual Content?
An efficient sparse metric learning in high-dimensional space via emphl(_mbox1)-penalized log-determinant regularization
Graph-based semi-supervised learning with multiple labels
Robust Distance Metric Learning with Auxiliary Knowledge
Visual query suggestion
A joint appearance-spatial distance for kernel-based image categorization
Feature Detection and Correspondence for Camera Calibration
Graph-based semi-supervised learning with multi-label
Joint multi-label multi-instance learning for image classification
MSRA atT TRECVID 2008: High-Level Feature Extraction and Automatic Search
Optimized video scene segmentation
Unbiased active learning for image retrieval
Building a comprehensive ontology to refine video concept detection
MSRA-USTC-SJTU at TRECVID 2007: High-Level Feature Extraction and Search
Refining video annotation by exploiting pairwise concurrent relation
引用
×