Intent and Emotions in Image Search and Viewing Research

Intent and Emotions in Image Search and Viewing
This article delves into the intricate relationship between user intent, emotions, and behavior during image search and viewing. It presents two key studies conducted by Microsoft Research, aiming to understand and predict user engagement with visual content.
Study 1: Behavioral and Physiological Responses to Image Viewing
The first study focused on identifying behavioral and physiological indicators of interest, curiosity, and novelty in image viewing. Researchers collected data from 50 participants who viewed 80 different images. The data collected included:
- Facial Expressions: Analyzing micro-expressions to gauge emotional responses.
- Eye Gaze: Tracking where participants looked on the images to understand attention patterns.
- Electrodermal Responses (EDR): Measuring skin conductance, which is often linked to emotional arousal.
Participants also provided self-reported data on their level of interest, perceived novelty, complexity, and comprehensibility of each image. The findings revealed that:
- Subjectivity of Interest: Not all participants perceived interest in the same way.
- Pleasantness as a Key Indicator: The strongest predictor of interest was the perceived pleasantness of an image.
Study 2: Automatic Recognition of User Intent in Image Search
The second study aimed to develop a system that could automatically recognize a user's intent during the early stages of an image search session. This involved designing seven distinct search scenarios categorized under three primary intent conditions:
- Finding Items: Users looking for specific or new items.
- Re-finding Items: Users trying to locate previously seen items.
- Entertainment: Users browsing for enjoyment or inspiration.
Similar to the first study, researchers collected behavioral and physiological responses (facial expressions, eye gaze, EDR) from the participants. Additionally, they incorporated implicit user interactions, such as mouse movements and keystrokes, which provide subtle cues about user engagement and intent.
Participants engaged in seven different search tasks using a custom-built image retrieval platform. The collected data was then used to train machine learning models designed to predict users' search intentions. These models utilized a combination of:
- Visual Content: Features extracted from the images themselves.
- User Interactions: Data from mouse movements, clicks, and keystrokes.
- Spontaneous Responses: Behavioral and physiological data.
Machine Learning Model Performance
By fusing visual and user interaction features, the developed machine learning system achieved a significant F1-score of 0.722 in classifying the three user intent categories (finding, re-finding, entertainment) using a user-independent cross-validation approach. This indicates a high level of accuracy in predicting user intent.
Key Informative Features
The study identified the most crucial features for predicting search intent:
- Eye Gaze: Patterns in eye movement proved highly informative.
- Implicit User Interactions: Mouse movements and keystrokes provided valuable insights into user behavior and intent.
Conclusion
The research highlights the potential of combining behavioral, physiological, and interaction data with machine learning to understand and predict user intent in image search. This has significant implications for improving search engine design, personalized content delivery, and user experience in digital environments.
Related Research Areas:
- Artificial Intelligence
- Human-Computer Interaction
- Social Sciences
Related Videos:
- Embodied AI: Bridging the Gap Between the Digital and Physical World | AI House Davos 2025
- Fostering appropriate reliance on AI
- Panel: Beyond Language: The future of multimodal models in healthcare, gaming, and AI
- Data Formulator: Create Rich Visualization with AI iteratively
- Insights into the Challenges and Opportunities of Large Multi-Modal Models for Blind and Low Vision Users: CLIP
- Panel: Generative AI for Global Impact: Challenges and Opportunities
- Keynote: Building Globally Equitable AI
- Panel: Transforming the Natural Sciences with AI
- Augmenting Human Cognition and Decision Making with AI
- NeurIPs 2020: MosAIc: Finding Artistic Connections across Culture with Conditional Image Retrieval
Follow Microsoft Research:
- X (Twitter)
- YouTube
- RSS Feed
Share this page:
- Share on X
- Share on Facebook
- Share on LinkedIn
- Share on Reddit
Original article available at: https://www.microsoft.com/en-us/research/video/intent-and-emotions-in-image-search-and-viewing/?locale=fr-ca