Jeremy Wolfe – Anne Treisman’s legacy and the future of visual search
For most researchers in the visual search trade, Anne Treisman’s work was foundational. Whether you agreed or disagreed with her, you could not ignore the body of data and theory that she created. In this talk, I will review some of my agreements and disagreements with Treisman’s Feature Integration Theory. My Guided Search theory, in its various incarnations, was the product of my fruitful interaction with Anne. For the most part, our arguments dealt with tasks where observers looked for one target amongst a set of items randomly distributed on an otherwise blank background. In the second part of the talk, I will consider whether the rules that govern those tasks are relevant when we search in real scenes, when we might be searching for more than one type of target, and when we don’t know how many instances of targets might be present in the search stimulus. The answer will be a qualified “yes”. In the third section, if I have not exhausted the allotted time and the patience of the audience, I will discuss some of the problems posed by socially important search tasks like cancer screening and consider whether basic behavioral research has solutions to offer.
Session 1: Search guidance and attentional capture
Steven Luck (Keynote) – Mechanisms for the suppression of irrelevant objects during visual search
We have long known that attention can be directed toward items containing task-relevant feature values. But can attention also be directed away from irrelevant features (i.e., features indicating than an item is a nontarget)? In this presentation, I will review recent studies indicating that items containing distinctive nontarget feature values can be suppressed so that they attract attention less than “neutral” items. This mechanism can be used to suppress salient singletons, as assessed with psychophysics, eye tracking, and ERPs (with significant correlations among these measures, suggesting that they all reflect the same underlying mechanism). This mechanism can also be used to suppress nonsalient distractor items. However, the suppression mechanism does not appear to be under direct voluntary control. First, if observers are cued to avoid a specific color, the first eye movement tends to be directed to the to-be-avoided color. Second, the suppression appears to build up over trials. Third, if automatic priming from the previous trial is put into competition with explicit cuing of the to-be-avoided color, priming wins and suppression loses. The emerging picture is that explicit goals can direct attention toward but not away from specific feature values, but goal-driven experience with target and distractor features can lead to automatic suppression of to-be-avoided features.
Charles Folk – Semantic templates and attentional capture
Over the last 25 years, research on attentional guidance and capture has focused on the relative influence of bottom-up salience, top-down set, and more recently, selection history. An implicit assumption in much of this work has been that attentional guidance is limited to preattentively processed feature information (e.g., color, orientation, brightness, etc.). For example, a color singleton might capture attention based on a low level, salient, feature contrast, and that capture might be modulated by a top-down set for a particular color value. However, a growing number of studies looking at visual search in naturalistic scenes suggest that semantic/categorical information can have a dramatic impact on overt attention allocation as measured by eye movements. In addition, there is strong evidence that emotional content (independent of featural content) can produce evidence of attentional capture. Here we address whether attentional capture by semantic/categorical content is limited to emotional stimuli, or whether establishing a top-down set or template for semantic information can result in the contingent capture of attention by stimuli matching the semantic set. A series of behavioral and electrophysiological studies using an RSVP methodology will be reviewed that explore the degree to which natural images depicting exemplars from superordinate categories can elicit the capture of covert attention, and whether such capture is contingent on a top-down set for the relevant category.
Chris Olivers – Proactive and reactive control over target selection in visual search
Searching for more than one type of target often, but not always, results in switch costs. Using a gaze-contingent eye-tracking paradigm in which we instruct participants to simultaneously look for two target objects presented among distractors, we find that the occurrence of switch costs depends on target availability. When both targets are available in a display, thus giving the observer free choice on what to look for, little to no switch costs occur. In contrast, clear switch costs emerge when only one of the two targets is available, so that the target object is being imposed. This pattern occurs within and across various stimulus dimensions, and can be explained by assuming limited active attentional guidance in combination with a role for different types of cognitive control in visual search. While full target availability allows for proactive control over target selection, single target availability requires reactive control in response to unanticipated targets. I will furthermore present combined eye-tracking + fMRI and eye-tracking + EEG studies tracing both source and dynamics of these different control processes in visual search.
Jan Theeuwes – Statistical learning drives visual selection
Lingering biases of attentional selection affect the deployment of attention above and beyond top-down and bottom-up control. In this talk I will present an overview of recent studies investigating how statistical learning regarding the distractor determines attentional control. In all these experiments we used the classic additional singleton task in which participants searched for a salient shape singleton while ignoring a color distractor singleton. The distractor singleton was presented more often in one location than in all other locations. Even though observers were not aware of the statistical regularities, we show that the location of the distractor was suppressed relative to all other locations. Moreover, we show that this learning in highly flexible and adaptive. We argue that selection history modulates the topographical landscape of spatial ‘priority’ maps, such that attention is biased towards locations having a high activation and biased away from locations that are suppressed.
Dominique Lamy – Attentional capture without attentional engagement: a camera metaphor of attention
Most models of spatial attention assume that attention operates like a spotlight and that stimuli appearing in the focus of attention are mandatorily processed. Here, we show that when an irrelevant object captures attention, the shift of attention can be shallow and not followed by attentional engagement. In three sets of experiments, we measured spatial shifts of attention to an irrelevant distractor (or cue) as enhanced performance when the target appeared at the same vs. at a different location relative to the cue and attentional engagement as enhanced performance when the response-relevant feature at the cued location was compatible vs. incompatible with the target’s response feature. We found that (1) attentional shifts to irrelevant onsets were followed by attentional engagement at the cued location only with relevant-color and not with irrelevant-color onsets (contingent attentional engagement); (2) Attentional shifts to relevant-color cues were independent of conscious perception of the cue, whereas attentional engagement was contingent on it; (3) Attentional shifts to relevant-color cues were unaffected by the attentional blink, whereas attentional engagement was reduced and the N2pc component of the ERP suppressed.
We discuss the implications of these findings for the distinction between stimulus-driven and goal-dependent attentional capture, the mechanisms indexed by the N2pc and more broadly, models of spatial attention. In particular, we suggest that attention operates like a camera, which requires both aligning the zoom lens and pushing the shutter button, rather than like a spotlight.
Session 2: Search guidance based on (acquired) ST/LT memory / selective attention in visual WM
Leonardo Chelazzi – Plasticity of priority maps of space
In the past we have pioneered research with human participants exploring the impact of reward on visual selective attention. For example, in a recent study using visual search, we have demonstrated that reward can alter the “landscape” of spatial priority maps, increasing priority for locations associated with greater reward during a learning phase and reducing it for locations associated with smaller reward. Importantly, we could also demonstrate that the effects persisted for several days after the end of the learning episode, during an extinction phase, and generalized to new tasks and stimuli. With an ongoing program of research, we are now assessing whether similar effects can be induced via statistical learning. In a series of experiments using variants of a visual search task, unbeknownst to the participants, we manipulate the probability of occurrence of the sought target and/or of a salient distractor across locations. The evidence indicates that, similar to the influence of reward, uneven probabilities of the critical items alter deployment of attention in a way that can optimize performance under certain conditions but can hinder it under other conditions. We argue that these effects reflect durable changes in priority maps of space. Importantly, in all cases above, changes in attentional performance were obtained even though participants had no clue as to the adopted manipulation. Future studies will try to understand whether reward-based learning and statistical learning operate via shared or independent mechanisms. In summary, reward and statistical learning appear to be strong (and implicit) determinants of attentional deployment.
Alejandro Lleras – Search efficiency for targets defined by two feature dimensions can be predicted based on search efficiency measures for targets defined along a single dimension
A new model for efficient visual search (Contrast Signal Theory – CST) is proposed whereby the goal of early parallel processing is to compute a contrast signal between the target template in memory and each item in the display. This architecture allows the visual system to compute fast and confident decisions about items in the display that are sufficiently different from the target such that parallel, peripheral evaluation of these items is sufficient to discard them as non-targets. In this model, the logarithmic search observed when a target is sufficiently different from lures is proposed to be inversely proportional to that [lure-target] contrast signal, such that evidence accumulation will accrue faster at locations where contrast is larger (i.e., the lure-target similarity is low) than where contrast is smaller (lure-target similarity is high). The Contrast Signal Theory has shown some early successes: it allows one to predict RTs for heterogeneous displays based on performance observed in homogeneous displays. Here, we ask: can search efficiency for targets that differ from distractors along two dimensions (color and shape) be predicted by the search efficiency observed for targets that differ from distractors along a single dimension (only differ in color or only differ in shape)? Predictions from various models are compared. Results from ten experiments show that there is a simple equation to derive the combined ([color x shape]) search efficiency based on the search efficiency observed along individual dimensions ([color] & [shape]).
Roy Luria – An object based pointer system underlying visual working memory ability to access its online representations
The world around us constantly changes, posing a difficult challenge for our visual system that needs to constantly modify the information it represents accordingly. This process is done by Visual working memory (VWM) that is able to access a specific representation and modify it according to changes in the environment.
We argue that in order to access and modify the corresponding information, each representation within the VWM workspace must be stably mapped to the relevant stimuli. The idea of such a “pointer system” has been theoretically proposed in the past (e.g., FINST, Pylyshyn, 2000), but empirical support for it was largely limited to a tracking task, in which the only relevant information was spatial.
First, we provide evidence that VWM relies on such a pointer system in a shape change detection task, in which spatial information is task-irrelevant. By manipulating the pointer’s stability, we demonstrated that the loss of a pointer was accompanied by stable electrophysiological and behavioral markers, allowing us to use them as signatures of the pointer system. Next, we examined how the pointer system operates. Specifically, we asked whether pointers are allocated based on a spatial, featural, or object-based code. The results indicate that the pointer system relies on objecthood information to map and access each VWM representation.
Session 3: Brain mechanisms of visual search
Jeff Schall (Keynote) – Neural Control of Visual Search
This presentation will survey performance, neural and computational findings demonstrating that gaze is guided during visual search through the operation of distinct stages of visual selection and saccade preparation. These stages can be selectively manipulated though target-distractor similarity, stimulus-response mapping rules, and unexpected perturbation of the visual array. Such manipulations indicate that they are instantiated in different neural populations with distinct connectivity and functional properties. Race and accumulator models provide a comprehensive account of the saccade preparation stage and of the conversion of salience evidence into saccade commands.
Daniel Baldauf – Functional connectivity mechanisms of attention
The neural mechanisms of spatial attention, via feedback signals from spatially-mapped control areas in frontal / parietal cortex, have been described in much detail. For non-spatial attention to different sensory modalities, complex objects, and so on, the control mechanisms seem much more complex and experimental work has just begun to identify possible sources of top-down control in the inferior part of frontal cortex. Obviously, however, spatial and non-spatial attention is often combined in everyday tasks. How these different control networks work together is a major question in cognitive neuroscience. To answer these remaining questions, we combined MEG and fMRI data in human subjects to identify not only the sources for spatial and non-spatial feedback signals, but also the mechanisms by which these different networks interact with sensory areas in attention. We identified two separable networks in the superior- and inferior-frontal cortex, mediating spatial versus non-spatial attention, respectively. Using multi-voxel pattern analysis, we found spatial and non-spatial information are represented in different subpopulations of frontal cortex. Most importantly, our analyses of temporally high-resolving MEG data also show that both control structures engage selectively in coherent interactions with sensory areas that represent the attended stimulus. Rather than a zero-phase lag connection, which would indicate common input, the interactions between frontal cortex and sensory areas are phase-shifted to allow for a 20ms transmission time. This seems to be just the right time for signals in one area to arrive at a time of maximum depolarization in the connected area, increasing their impact. Further, we were able to identify top-down directionality of these oscillatory interactions, establishing the superior- versus inferior-frontal cortex as key sources of spatial versus non-spatial attentional inputs, respectively.
Talia Konkle – Predicting visual search from the representational architecture of high-level visual cortex
While many prominent models of visual search focus on characterizing how attention is deployed, it is also clear that representational factors contribute to visual search speeds, such as target-distractor similarity (Duncan and Humphreys, 1989). In this line of work, we examined the extent to which performance on a visual search task can be predicted from the stable representational architecture of the visual system, independent of attentional dynamics. Overall, we found strong brain/behavior correlations across most of the higher-level visual system, including both the ventral and dorsal pathways when considering both macro-scale sectors as well as smaller meso-scale regions. These results suggest that visual search for real-world object categories is well predicted by the stable, task-independent architecture of the visual system.
Session 4: New data and models of visual search
Gregory Zelinsky (Keynote) – Predicting goal-directed attention control: A tale of two deep networks
The ability to control the allocation of attention underlies all goal-directed behavior. Here two recent efforts are summarized that apply deep learning methods to model this core perceptual-cognitive ability.
The first of these is Deep-BCN, the first deep neural network implementation of the widely-accepted biased-competition theory (BCT) of attention control. Deep-BCN is an 8-layer deep network pre-trained for object classification, one whose layers and their functional connectivity are mapped to early-visual (V1, V2/V3, V4), ventral (PIT, AIT), and frontal (PFC) brain areas as informed by BCT. Deep-BCN also has a superior colliculus and a frontal-eye field, and can therefore make eye movements. We compared Deep-BCN’s eye movements to those made by 15 people performing a categorical search for one of 25 target categories of common objects and found that it predicted both the number of fixations during search and the saccade-distance travelled before search termination. With Deep-BCN, a DNN implementation of BCT now exists that can be used to predict the neural and behavioral responses of an attention control mechanism as it mediates a goal-directed behavior—in our study the eye movements made in search of a target goal.
The second model of attention control is ATTNet, a deep network model of the ATTention Network. ATTNet is similar to Deep-BCN in that both have layers mapped to early-visual and ventral brain structures in the attention network and are aligned with BCT. However, they differ in two key respects. ATTNet includes layers mapped to dorsal structures, enabling it to learn how to prioritize the selection of visual inputs for the purpose of directing a high-resolution attention window. But a more fundamental difference is that ATTNet learns to shift its attention as it greedily seeks out reward. Using deep reinforcement learning, an attention shift to a target object elicits reward that makes all the network’s states leading up to that covert action more likely to occur in the future. ATTNet also learns to prioritize the visual input so as to efficiently control the direction of its focal routing window—the colloquial spotlight of attention. It does this, not only to find reward faster, but also to restrict its visual inputs to potentially rewarding patterns for the purpose of improving classification success. This selective routing behavior was quantified as a “priority map” and used to predict the gaze fixations made by 30 subjects searching 240 images from Microsoft COCO (the dataset used to train ATTNet) for a target from one of three object categories. Both subjects and ATTNet showed evidence for attention being preferentially directed to target goals, behaviorally measured as oculomotor guidance to the targets. Other well-established findings in the search literature were observed.
In summary, ATTNet is the first behaviorally-validated model of attention control that uses deep reinforcement to learn to shift a focal routing window to select image patterns. This is theoretically important in that it shows how a reward-based mechanism might be used by the brain to learn how to shift attention. Deep-BCN is also theoretically important in being the first deep network designed to capture the core tenant of BCT: that a top-down goal state biases a competition among object representations for the selective routing of a visual input, with the purpose of this selective routing being greater classification success. Together, Deep-BCN and ATTNet begin to explore the space of ways that cognitive neuroscience and machine learning can blend to form a new computational neuroscience, one harnessing the power and promise of deep learning.
Ruth Rosenholtz – Capacity limits and how the visual system copes with them
Our visual system cannot process everything with full fidelity, nor, in a given moment, perform all possible visual tasks. Rather, it must lose some information, and prioritize some tasks over others. A number of strategies have developed for dealing with this limited capacity. A popular proposal posits limited access to higher-level processing; that a mechanism known as selective attention serially gates access to that resource; and that the gate operates early in visual processing. However, since this account was originally proposed, we as a field have learned a great deal about capacity limits in vision. I will discuss the implications for selective attention theory. Furthermore, I will examine what we have learned from studying an alternative mechanism for dealing with limited capacity: efficient coding, particularly in the visual periphery. In this scheme, visual processing has limited bandwidth rather than limited access to higher-level processing. Finally, evidence suggests that we should look for additional capacity limits late in processing, taking the form of general-purpose limits on the complexity of the tasks one can perform at a given moment. A general-purpose decision process may deal with such limits by “cutting corners” when the task becomes too complicated.
Arni Kristjánsson – New insights from visual foraging tasks into visual attention and visual working memory
The assessment of the functional properties of visual attention and visual working memory has in past decades been dominated by single-target visual searches. But our goals from one moment to the next are unlikely to involve only a single target, and more recently, paradigms involving visual foraging for multiple targets have been used to investigate visual attention and working memory. Set-size effects in single-target visual search tasks partly form the foundation of many theories of visual search. We therefore manipulated set-size in a visual foraging task, involving both “feature” and “conjunction” foraging. The target selection times during foraging revealed specific components of the foraging pattern indicating that single-target search tasks only provide a snapshot of visual attention. Foraging tasks can also provide insights into the operational principles of visual working memory, and our results indicate that participants are able to change their foraging patterns according to task demands suggests that visual working representations used for attentional guidance are flexible, but not restricted to a single value as some current theories suggest. Our results show how single-target visual search tasks vastly undersample the operation of visual attention and visual working memory, providing only a snap-shot of the function of visual attention and visual working memory and this limited information is bound to be reflected in theoretical accounts based on such tasks.
Monica Castelhano – The Surface Guidance Framework: How Scene Surface can inform Search Strategies
The spatial relationship between objects and scenes and its effects on visual search performance has been well-established. In previous studies, we have shown that the spatial relationship can be exploited to explain eye movement patterns, to explain how initial scene representations affect subsequent search performance, and to distinguish the contribution of spatial vs. semantic information.
Using the newly proposed Surface Guidance Framework, we operationalize target relevant and irrelevant scene regions. We divide scenes into three regions (upper, mid, lower) that correspond with possible relevant surfaces (wall, countertop, floor). Target relevant regions are defined as the region a target object is expected (e.g., painting, toaster, rug). Here, we explore how relevant and irrelevant regions of a scene are processed in two classic visual search paradigms (set size and sudden onset) to further explore mechanisms of attention during search in scenes.
In Study 1, we explored how spatial associations affect search by manipulating search size in both target relevant or target irrelevant regions. We found that only set size increases in target relevant regions adversely affected search performance. In Study 2, we manipulated whether a suddenly-onsetting distractor object appeared in a target relevant or target irrelevant region. We found that fixations to the distractor were significantly more likely to occur in the target relevant condition and negatively affected search performance.
The Surface Guidance Framework allows us further explore how spatial associations can narrow processing to specific areas of the scene relevant to the task. Viewing effects of scene context through the lens of target relevancy allows us to develop new understanding of how the spatial relationship between objects and scenes can affect performance and processing.
Michael Hout – Passive search strategies improve attentional guidance and object recognition during demanding visual search
Hybrid visual memory search (i.e., search for more items than can be maintained in working memory) requires observers to search both through a visual display and through the contents of memory in order to find designated “target” items (e.g., walking through the grocery store looking for items on your grocery list, airport baggage screeners looking for many prohibited items in travelers’ luggage). A substantial body of research on this task has shown that observers are able to search for a very large number of items with relative ease. However, the attentional mechanisms that drive hybrid search remain somewhat unclear. In our first two experiments, we investigated the role that cognitive strategies play in facilitating hybrid search for categorically-defined targets. We hypothesized that observers in a hybrid search task would naturally adopt a strategy in which they remain somewhat passive, allowing targets to “pop out,” rather than actively directing their attention around the visual display. Experiment 1 compared behavioral responses in passive, active, and uninstructed hybrid search. Contrary to our expectations, we found that uninstructed search tended to be active in nature, but we also found that adopting a passive strategy led to more efficient performance. In Experiment 2, we replicated these findings, and tracked the eye movements of observers. We found that oculomotor behavior in passive hybrid search was characterized by faster, larger saccades, a tendency to fixate fewer non-target items, and an improved ability to classify items as either targets or distractors. In Experiment 3, we explored whether the benefits of passive search were limited only to particularly demanding search tasks (i.e., those that require observers to search for many items at once), or if performance benefits also appear when people are asked to find a single, categorically-defined target. Once again, we tracked the eye movements of participants and found strikingly similar results to our hybrid search task. Namely, that passive searchers were faster and less accurate, but more efficient overall. Additionally, passive search led to improved attentional guidance, better object recognition, and fewer target recognition failures. Together, our results indicate two surprising findings. First, that hybrid visual search is more active in nature than expected, and second, that adopting a passive search strategy leads to performance and oculomotor improvements during hybrid and single-target search. These findings fill a gap in the literature regarding the nature of strategy use during visual search, and the potential benefits of strategy adoption during challenging search tasks.