Jeremy Wolfe – Anne Treisman’s legacy and the future of visual search
For most researchers in the visual search trade, Anne Treisman’s work was foundational. Whether you agreed or disagreed with her, you could not ignore the body of data and theory that she created. In this talk, I will review some of my agreements and disagreements with Treisman’s Feature Integration Theory. My Guided Search theory, in its various incarnations, was the product of my fruitful interaction with Anne. For the most part, our arguments dealt with tasks where observers looked for one target amongst a set of items randomly distributed on an otherwise blank background. In the second part of the talk, I will consider whether the rules that govern those tasks are relevant when we search in real scenes, when we might be searching for more than one type of target, and when we don’t know how many instances of targets might be present in the search stimulus. The answer will be a qualified “yes”. In the third section, if I have not exhausted the allotted time and the patience of the audience, I will discuss some of the problems posed by socially important search tasks like cancer screening and consider whether basic behavioral research has solutions to offer.
Session 1: Search guidance and attentional capture
Steven Luck (Keynote) – Mechanisms for the suppression of irrelevant objects during visual search
We have long known that attention can be directed toward items containing task-relevant feature values. But can attention also be directed away from irrelevant features (i.e., features indicating than an item is a nontarget)? In this presentation, I will review recent studies indicating that items containing distinctive nontarget feature values can be suppressed so that they attract attention less than “neutral” items. This mechanism can be used to suppress salient singletons, as assessed with psychophysics, eye tracking, and ERPs (with significant correlations among these measures, suggesting that they all reflect the same underlying mechanism). This mechanism can also be used to suppress nonsalient distractor items. However, the suppression mechanism does not appear to be under direct voluntary control. First, if observers are cued to avoid a specific color, the first eye movement tends to be directed to the to-be-avoided color. Second, the suppression appears to build up over trials. Third, if automatic priming from the previous trial is put into competition with explicit cuing of the to-be-avoided color, priming wins and suppression loses. The emerging picture is that explicit goals can direct attention toward but not away from specific feature values, but goal-driven experience with target and distractor features can lead to automatic suppression of to-be-avoided features.
Chris Olivers – Proactive and reactive control over target selection in visual search
Searching for more than one type of target often, but not always, results in switch costs. Using a gaze-contingent eye-tracking paradigm in which we instruct participants to simultaneously look for two target objects presented among distractors, we find that the occurrence of switch costs depends on target availability. When both targets are available in a display, thus giving the observer free choice on what to look for, little to no switch costs occur. In contrast, clear switch costs emerge when only one of the two targets is available, so that the target object is being imposed. This pattern occurs within and across various stimulus dimensions, and can be explained by assuming limited active attentional guidance in combination with a role for different types of cognitive control in visual search. While full target availability allows for proactive control over target selection, single target availability requires reactive control in response to unanticipated targets. I will furthermore present combined eye-tracking + fMRI and eye-tracking + EEG studies tracing both source and dynamics of these different control processes in visual search.
Jan Theeuwes – Statistical learning drives visual selection
Lingering biases of attentional selection affect the deployment of attention above and beyond top-down and bottom-up control. In this talk I will present an overview of recent studies investigating how statistical learning regarding the distractor determines attentional control. In all these experiments we used the classic additional singleton task in which participants searched for a salient shape singleton while ignoring a color distractor singleton. The distractor singleton was presented more often in one location than in all other locations. Even though observers were not aware of the statistical regularities, we show that the location of the distractor was suppressed relative to all other locations. Moreover, we show that this learning in highly flexible and adaptive. We argue that selection history modulates the topographical landscape of spatial ‘priority’ maps, such that attention is biased towards locations having a high activation and biased away from locations that are suppressed.
Dominique Lamy – Attentional capture without attentional engagement: a camera metaphor of attention
Most models of spatial attention assume that attention operates like a spotlight and that stimuli appearing in the focus of attention are mandatorily processed. Here, we show that when an irrelevant object captures attention, the shift of attention can be shallow and not followed by attentional engagement. In three sets of experiments, we measured spatial shifts of attention to an irrelevant distractor (or cue) as enhanced performance when the target appeared at the same vs. at a different location relative to the cue and attentional engagement as enhanced performance when the response-relevant feature at the cued location was compatible vs. incompatible with the target’s response feature. We found that (1) attentional shifts to irrelevant onsets were followed by attentional engagement at the cued location only with relevant-color and not with irrelevant-color onsets (contingent attentional engagement); (2) Attentional shifts to relevant-color cues were independent of conscious perception of the cue, whereas attentional engagement was contingent on it; (3) Attentional shifts to relevant-color cues were unaffected by the attentional blink, whereas attentional engagement was reduced and the N2pc component of the ERP suppressed.
We discuss the implications of these findings for the distinction between stimulus-driven and goal-dependent attentional capture, the mechanisms indexed by the N2pc and more broadly, models of spatial attention. In particular, we suggest that attention operates like a camera, which requires both aligning the zoom lens and pushing the shutter button, rather than like a spotlight.
Session 2: Search guidance based on (acquired) ST/LT memory / selective attention in visual WM
Leonardo Chelazzi – Plasticity of priority maps of space
In the past we have pioneered research with human participants exploring the impact of reward on visual selective attention. For example, in a recent study using visual search, we have demonstrated that reward can alter the “landscape” of spatial priority maps, increasing priority for locations associated with greater reward during a learning phase and reducing it for locations associated with smaller reward. Importantly, we could also demonstrate that the effects persisted for several days after the end of the learning episode, during an extinction phase, and generalized to new tasks and stimuli. With an ongoing program of research, we are now assessing whether similar effects can be induced via statistical learning. In a series of experiments using variants of a visual search task, unbeknownst to the participants, we manipulate the probability of occurrence of the sought target and/or of a salient distractor across locations. The evidence indicates that, similar to the influence of reward, uneven probabilities of the critical items alter deployment of attention in a way that can optimize performance under certain conditions but can hinder it under other conditions. We argue that these effects reflect durable changes in priority maps of space. Importantly, in all cases above, changes in attentional performance were obtained even though participants had no clue as to the adopted manipulation. Future studies will try to understand whether reward-based learning and statistical learning operate via shared or independent mechanisms. In summary, reward and statistical learning appear to be strong (and implicit) determinants of attentional deployment.
Alejandro Lleras – Search efficiency for targets defined by two feature dimensions can be predicted based on search efficiency measures for targets defined along a single dimension
A new model for efficient visual search (Contrast Signal Theory – CST) is proposed whereby the goal of early parallel processing is to compute a contrast signal between the target template in memory and each item in the display. This architecture allows the visual system to compute fast and confident decisions about items in the display that are sufficiently different from the target such that parallel, peripheral evaluation of these items is sufficient to discard them as non-targets. In this model, the logarithmic search observed when a target is sufficiently different from lures is proposed to be inversely proportional to that [lure-target] contrast signal, such that evidence accumulation will accrue faster at locations where contrast is larger (i.e., the lure-target similarity is low) than where contrast is smaller (lure-target similarity is high). The Contrast Signal Theory has shown some early successes: it allows one to predict RTs for heterogeneous displays based on performance observed in homogeneous displays. Here, we ask: can search efficiency for targets that differ from distractors along two dimensions (color and shape) be predicted by the search efficiency observed for targets that differ from distractors along a single dimension (only differ in color or only differ in shape)? Predictions from various models are compared. Results from ten experiments show that there is a simple equation to derive the combined ([color x shape]) search efficiency based on the search efficiency observed along individual dimensions ([color] & [shape]).
Roy Luria – An object based pointer system underlying visual working memory ability to access its online representations
The world around us constantly changes, posing a difficult challenge for our visual system that needs to constantly modify the information it represents accordingly. This process is done by Visual working memory (VWM) that is able to access a specific representation and modify it according to changes in the environment.
We argue that in order to access and modify the corresponding information, each representation within the VWM workspace must be stably mapped to the relevant stimuli. The idea of such a “pointer system” has been theoretically proposed in the past (e.g., FINST, Pylyshyn, 2000), but empirical support for it was largely limited to a tracking task, in which the only relevant information was spatial.
First, we provide evidence that VWM relies on such a pointer system in a shape change detection task, in which spatial information is task-irrelevant. By manipulating the pointer’s stability, we demonstrated that the loss of a pointer was accompanied by stable electrophysiological and behavioral markers, allowing us to use them as signatures of the pointer system. Next, we examined how the pointer system operates. Specifically, we asked whether pointers are allocated based on a spatial, featural, or object-based code. The results indicate that the pointer system relies on objecthood information to map and access each VWM representation.
Session 3: Brain mechanisms of visual search
Jeff Schall (Keynote) – Neural Control of Visual Search
This presentation will survey performance, neural and computational findings demonstrating that gaze is guided during visual search through the operation of distinct stages of visual selection and saccade preparation. These stages can be selectively manipulated though target-distractor similarity, stimulus-response mapping rules, and unexpected perturbation of the visual array. Such manipulations indicate that they are instantiated in different neural populations with distinct connectivity and functional properties. Race and accumulator models provide a comprehensive account of the saccade preparation stage and of the conversion of salience evidence into saccade commands.
Talia Konkle – Predicting visual search from the representational architecture of high-level visual cortex
While many prominent models of visual search focus on characterizing how attention is deployed, it is also clear that representational factors contribute to visual search speeds, such as target-distractor similarity (Duncan and Humphreys, 1989). In this line of work, we examined the extent to which performance on a visual search task can be predicted from the stable representational architecture of the visual system, independent of attentional dynamics. Overall, we found strong brain/behavior correlations across most of the higher-level visual system, including both the ventral and dorsal pathways when considering both macro-scale sectors as well as smaller meso-scale regions. These results suggest that visual search for real-world object categories is well predicted by the stable, task-independent architecture of the visual system.
Session 4: New data and models of visual search
Gregory Zelinsky (Keynote) – Predicting goal-directed attention control: A tale of two deep networks
The ability to control the allocation of attention underlies all goal-directed behavior. Here two recent efforts are summarized that apply deep learning methods to model this core perceptual-cognitive ability.
The first of these is Deep-BCN, the first deep neural network implementation of the widely-accepted biased-competition theory (BCT) of attention control. Deep-BCN is an 8-layer deep network pre-trained for object classification, one whose layers and their functional connectivity are mapped to early-visual (V1, V2/V3, V4), ventral (PIT, AIT), and frontal (PFC) brain areas as informed by BCT. Deep-BCN also has a superior colliculus and a frontal-eye field, and can therefore make eye movements. We compared Deep-BCN’s eye movements to those made by 15 people performing a categorical search for one of 25 target categories of common objects and found that it predicted both the number of fixations during search and the saccade-distance travelled before search termination. With Deep-BCN, a DNN implementation of BCT now exists that can be used to predict the neural and behavioral responses of an attention control mechanism as it mediates a goal-directed behavior—in our study the eye movements made in search of a target goal.
The second model of attention control is ATTNet, a deep network model of the ATTention Network. ATTNet is similar to Deep-BCN in that both have layers mapped to early-visual and ventral brain structures in the attention network and are aligned with BCT. However, they differ in two key respects. ATTNet includes layers mapped to dorsal structures, enabling it to learn how to prioritize the selection of visual inputs for the purpose of directing a high-resolution attention window. But a more fundamental difference is that ATTNet learns to shift its attention as it greedily seeks out reward. Using deep reinforcement learning, an attention shift to a target object elicits reward that makes all the network’s states leading up to that covert action more likely to occur in the future. ATTNet also learns to prioritize the visual input so as to efficiently control the direction of its focal routing window—the colloquial spotlight of attention. It does this, not only to find reward faster, but also to restrict its visual inputs to potentially rewarding patterns for the purpose of improving classification success. This selective routing behavior was quantified as a “priority map” and used to predict the gaze fixations made by 30 subjects searching 240 images from Microsoft COCO (the dataset used to train ATTNet) for a target from one of three object categories. Both subjects and ATTNet showed evidence for attention being preferentially directed to target goals, behaviorally measured as oculomotor guidance to the targets. Other well-established findings in the search literature were observed.
In summary, ATTNet is the first behaviorally-validated model of attention control that uses deep reinforcement to learn to shift a focal routing window to select image patterns. This is theoretically important in that it shows how a reward-based mechanism might be used by the brain to learn how to shift attention. Deep-BCN is also theoretically important in being the first deep network designed to capture the core tenant of BCT: that a top-down goal state biases a competition among object representations for the selective routing of a visual input, with the purpose of this selective routing being greater classification success. Together, Deep-BCN and ATTNet begin to explore the space of ways that cognitive neuroscience and machine learning can blend to form a new computational neuroscience, one harnessing the power and promise of deep learning.
Ruth Rosenholtz – Capacity limits and how the visual system copes with them
Our visual system cannot process everything with full fidelity, nor, in a given moment, perform all possible visual tasks. Rather, it must lose some information, and prioritize some tasks over others. A number of strategies have developed for dealing with this limited capacity. A popular proposal posits limited access to higher-level processing; that a mechanism known as selective attention serially gates access to that resource; and that the gate operates early in visual processing. However, since this account was originally proposed, we as a field have learned a great deal about capacity limits in vision. I will discuss the implications for selective attention theory. Furthermore, I will examine what we have learned from studying an alternative mechanism for dealing with limited capacity: efficient coding, particularly in the visual periphery. In this scheme, visual processing has limited bandwidth rather than limited access to higher-level processing. Finally, evidence suggests that we should look for additional capacity limits late in processing, taking the form of general-purpose limits on the complexity of the tasks one can perform at a given moment. A general-purpose decision process may deal with such limits by “cutting corners” when the task becomes too complicated.