Introduction
Some of the neurons in vision models are features that we aren’t particularly surprised to find. Curve detectors, for example, are a pretty natural feature for a vision system to have. In fact, they had already been discovered in the animal visual cortex. It’s easy to imagine how curve detectors are built up from earlier edge detectors, and it’s easy to guess why curve detection might be useful to the rest of the neural network.
High-low frequency detectors, on the other hand, seem more surprising. They are not a feature that we would have expected a priori to find. Yet, when systematically characterizing the early layers of InceptionV1, we found a full fifteen neurons of mixed3a
that appear to detect a high frequency pattern on one side, and a low frequency pattern on the other.
One worry we might have about the circuits approach to studying neural networks is that we might only be able to understand a limited set of highly-intuitive features. High-low frequency detectors demonstrate that it’s possible to understand at least somewhat unintuitive features.
How can we be sure that “high-low frequency detectors” are actually detecting directional transitions from low to high spatial frequency? We will rely on three methods:
Later on in the article, we dive into the mechanistic details of how they are both implemented and used. We will be able to understand the algorithm that implements them, confirming that they detect high to low frequency transitions.
A feature visualization is a synthetic input optimized to elicit maximal activation of a single, specific neuron. Feature visualizations are constructed starting from random noise, so each and every pixel in a feature visualization that’s changed from random noise is there because it caused the neuron to activate more strongly. This establishes a causal link! The behavior shown in the feature visualization is behavior that causes the neuron to fire:
From their feature visualizations, we observe that all of these high-low frequency detectors share these same characteristics:
- Detection of adjacent high and low frequencies. The detectors respond to high frequency on one side, and low frequency on the other side.
- Rotational equivariance. The detectors are rotationally equivariant: each unit detects a high-low frequency change along a particular angle, with different units spanning the full 360º of possible orientations. We will see this in more detail when we construct a tuning curve with synthetic examples, and also when we look at the weights implementing these detectors.
We can use a diversity term in our feature visualizations to jointly optimize for the activation of a neuron while encouraging different activation patterns in a batch of visualizations. We are thus reasonably confident that if high-low frequency detectors were also sensitive to other patterns, we would see signs of them in these feature visualizations. Instead, the frequency contrast remains an invariant aspect of all these visualizations. (Although other patterns form along the boundary, these are likely outside the neuron’s effective receptive field.)
We generate dataset examples by sampling from a natural data distribution (in this case, the training set) and selecting the images that cause the neurons to maximally activate. Checking against these examples helps ensure we’re not misreading the feature visualizations.
A wide range of real-world situations can cause high-low frequency detectors to fire. Oftentimes it’s a highly-textured, in-focus foreground object against a blurry background — for example, the foreground might be the microphone’s latticework, the hummingbird’s tiny head feathers, or the small rubber dots on the Lenovo ThinkPad pointing stick — but not always: we also observe that it fires for the MP3 player’s brushed metal finish against its shiny screen, or the text of a watermark.
In all cases, we see one area with high frequency and another area with low frequency. Although they often fire at an object boundary, they can also fire in cases where there is a frequency change without an object boundary. High-low frequency detectors are therefore not the same as boundary detectors.
Tuning curves show us how a neuron’s response changes with respect to a parameter. They are a standard method in neuroscience, and we’ve found them very helpful for studying artificial neural networks as well. For example, we used them to demonstrate how the response of curve detectors changes with respect to orientation. Similarly, we can use tuning curves to show how high-low frequency detectors respond.
To construct such a curve, we’ll need a set of synthetic stimuli which cause high-low frequency detectors to fire. We generate images with a high-frequency pattern on one side and a low-frequency pattern on the other. Since we’re interested in orientation, we’ll rotate this pattern to create a 1D family of stimuli:
But what frequency should we use for each side? How steep does the difference in frequency need to be? To explore this, we’ll add a second dimension varying the ratio between the two frequencies:
(Adding a second dimension will also help us see whether the results for the first dimension are robust.)
Now that we have these two dimensions, we sample the synthetic stimuli and plot each neuron’s responses to them:
Each high-low frequency detector exhibits a clear preference for a limited range of orientations. As we previously found with curve detectors, high-low frequency detectors are rotationally equivariant: each one selects for a given orientation, and together they span the full 360º space.
How are high-low frequency detectors built up from lower-level neurons? One could imagine many different circuits which could implement this behavior. To give just one example, it seems like there are at least two different ways that the oriented nature of these units could form.
- Equivariant→Equivariant Hypothesis. The first possibility is that the previous layer already has precursor features which detect oriented transitions from high frequency to low frequency. The extreme version of this hypothesis would be that the high-low frequency detector is just an identity passthrough of some lower layer neuron. A more moderate version would be something like what we see with curve detectors, where early curve detectors become refined into the larger and more sophisticated late curve detectors. Another example would be how edge detection is built up from simple Gabor filters which were already oriented. We call this Equivariant→Equivariant because the equivariance over orientation was already there in the previous layer.
- Invariant→Equivariant Hypothesis. Alternatively, previous layers might not have anything like high-low frequency detectors. Instead, the orientation might come from spatial arrangements in the neuron’s weights that govern where it is excited by low-frequency and high-frequency features.
To resolve this question — and more generally, to understand how these detectors are implemented — we can look at the weights.
Let’s look at a single detector. Glancing at the weights from conv2d2
to mixed3a
110, most of them can be roughly divided into two categories: those that activate on the left and inhibit on the right, and those that do the opposite.
The same also holds for each of the other high-low frequency detectors — but, of course, with different spatial patternsAs an aside: The 1-2-1 pattern on each column of weights is curiously reminiscent of the structure of the Sobel filter. on the weights, implementing the different orientations.
Surprisingly, across all high-low frequency detectors, the two clusters of neurons that we get for each are actually the same two clusters! One cluster appears to detect textures with a generally high frequency, and one cluster appears to detect textures with a generally low frequency.
This is exactly what we would expect to see if the Invariant→Equivariant hypothesis is true: each high-low frequency detector composes the same two components in different spatial arrangements, which then in turn govern the detector’s orientation.
These two different clusters are really striking. In the next section, we’ll investigate them in more detail.
High and Low Frequency Factors
It would be nice if we could confirm that these two clusters of neurons are real. It would also be nice if we could create a simpler way to represent them for circuit analysis later.
Factorizing the connectionsBetween two adjacent layers, “connections” reduces to the weights between the two layers. Sometimes we are interested in observing connectivity between layers that may not be directly adjacent. Because our model, a deep convnet, is non-linear, we will need to approximate the connections. A simple approach that we take is to linearize the model by removing the non-linearities. While this is not a great approximation of the model’s behavior, it does give a reasonable intuition for counterfactual influence: had the neurons in the intermediate layer fired, how it would have affected neurons in the downstream layers. We treat positive and negative influences separately. between lower layers and the high-low frequency detectors is one way that we can check whether these two clusters are meaningful, and investigate their significance. Performing a one-sided non-negative matrix factorization (NMF)We require that the channel factor be positive, but allow the spatial factor to have both positive and negative values. separates the connections into two factors.
Each factor corresponds to a vector over neurons. Feature visualization can also be used to visualize these linear combinations of neurons. Strikingly, one clearly displays a generic high-frequency image, whereas the other does the same with a low-frequency image.In InceptionV1 in particular, it’s possible that we recover these two factors so crisply in part due to the 3x3 bottleneck between conv2d2 and mixed3a. Because of this, we’re not here looking at direct weights between conv2d2 and mixed3a, but rather the “expanded weights,” which are a product of a 1x1 convolution (which reduces down to a small number of neurons) combined with a 3x3 convolution. This structure is very similar to the factorization we apply. However, as we see later in Universality, we recover similar factors for other models where this bottleneck doesn’t exist. NMF makes it easy to see this abstract circuit across many models which may not have an architecture that more explicitly reifies it. We’ll call these the HF-factor and the LF-factor:
The feature visualizations are suggestive, but how can we be sure that these factors really correspond to high and low frequency in general, rather than specific high or low frequency patterns? One thing we can do is to create synthetic stimuli again, but now plotting the responses of those two NMF factors.
Since our factors don’t correspond to an edge, our synthetic stimuli will only have one frequency region for each stimulus. To add a second dimension and again demonstrate robustness, we also vary the rotation of that region. (The frequency texture is not exactly rotationally invariant because we construct the stimulus out of orthogonal cosine waves.)
Unlike last time, these activations now mostly ignore the image’s orientation, but are sensitive to its frequency. We can average these results over all orientations in order to produce a simple tuning curve of how each factor responds to frequency. As predicted, the HF-factor responds to high frequency and the LF-factor responds to low frequency.
Now that we’ve confirmed what these factors are, let’s look at how they’re combined into high-low frequency detectors.
Construction of High-Low Frequency Detectors
NMF factors the weights into both a channel factor and a spatial factor. So far, we’ve looked at the two parts of the channel factor. The spatial factor shows the spatial weighting that combines the HF and LF factors into high-low frequency detectors.
Unsurprisingly, these weights basically reproduce the same pattern that we’d previously been seeing in Figure 5 from its two different clusters of neurons: where the HF-factor inhibits, the LF-factor activates — and vice versa. As an aside, the HF-factor here for InceptionV1 (as well as some of its NMF components, like conv2d2
123) also appears to be lightly activated by bright greens and magentas. This might be responsible for the feature visualizations of these high-low frequency detectors showing only greens and magentas on the high-frequency side.
High-low frequency detectors are therefore built up by circuits that arrange high frequency detection on one side and low frequency detection on the other.
There are some exceptions that aren’t fully captured by the NMF factorization perspective. For example, conv2d2
181 is a texture contrast detector that appears to already have spatial structure. This is the kind of feature that we would expect to be involved through an Equivariant→Equivariant circuit. If that were the case, however, we would expect its weights to the high-low frequency detector mixed3a
70 to be a solid positive stripe down the middle. What we instead observe is that it contributes as a component of high frequency detection, though perhaps with a slight positive overall bias. Although conv2d2
181 has a spatial structure, perhaps it responds more strongly to high frequency patterns.
Now that we understand how they are constructed, how are high-low frequency detectors used by higher-level features?
mixed3b
is the next layer immediately after the high-low frequency detectors. Here, high-low frequency detectors contribute to a variety of features. Their most important role seems to be supporting boundary detectors, but they also contribute to bumps and divots, line-like and curve-like shapes, and at least one each of center-surrounds, patterns, and textures.
Oftentimes, downstream features appear to ignore the “polarity” of a high-low frequency detector, responding roughly the same way regardless of which side is high frequency. For example, the vertical boundary detector mixed3b
345 (see above) is strongly excited by high-low frequency detectors that detect frequency change across a vertical line in either direction.
Whereas activation from a high-low frequency detector can help detect boundaries between different objects, inhibition from a high-low frequency detector can also add structure to an object detector by detecting regions that must be contiguous along some direction — essentially, indicating the absence of a boundary.
As we’ve mentioned, by far the primary downstream contribution of high-low frequency detectors is to boundary detectors. Of the top 20 neurons in mixed3b
with the highest L2-norm of weights across all high-low frequency detectors, eight of those 20 neurons participate in boundary detection of some sort: double boundary detectors, miscellaneous boundary detectors, and especially object boundary detectors.
Role in object boundary detection
Object boundary detectors are neurons which detect boundaries between objects, whether that means the boundary between one object and another or the transition from foreground to background. They are different from edge detectors or curve detectors: although they are sensitive to edges (indeed, some of their strongest weights are contributed by lower-level edge detectors!), object boundary detectors are also sensitive to other indicators such as color contrast and high-low frequency detection.
High-low frequency detectors contribute to these object boundary detectors by providing one piece of evidence that an object has ended and something else has begun. Some examples of object boundary detectors are shown below, along with their weights to a selection of high-low frequency detectors, grouped by orientation (ignoring polarity).
In particular, note how similar the weights are within each grouping! This shows us again that the later layers ignore the high-low frequency detectors’ polarity. Furthermore, the arrangement of excitatory and inhibitory weights contributes to each boundary detector’s overall shape, following the principles outlined above.
Beyond mixed3b
, high-low frequency detectors ultimately play a role in detecting more sophisticated object shapes in mixed4a
and beyond, by continuing to contribute to the detection of boundaries and contiguity.
So far, the scope of our investigation has been limited to InceptionV1. How common are high-low frequency detectors in convolutional neural networks generally?
Universality
High-Low Frequency Detectors in Other Networks
It’s always good to ask if what we see is the rule or an interesting exception — and high-low frequency detectors seem to be the rule. High-low frequency detectors similar to ones in InceptionV1 can be found in a variety of architectures.
Notice that these detectors are found at very similar depths within the different networks, between 29% and 33% network depth!Network depth is here defined as the index of the layer divided by the total number of layers. While the particular orientations each network’s high-low frequency detectors respond to may vary slightly, each network has its own family of detectors that together cover the full 360º and comprise a rotationally equivariant family. Architecture aside – what about networks trained on substantially different datasets? In the extreme case, one could imagine a synthetic dataset where high-low frequency detectors don’t arise. For most practical datasets, however, we expect to find them. For example, we even find some candidate high-low frequency detectors in AlexNet (Places): down-up, left-right, and up-down.
Even though these families are from three completely different networks, we also discover that their high-low frequency detectors are built up from high and low frequency components.
HF-factor and LF-factor in Other Networks
As we did with InceptionV1, we can again perform NMF on the weights of the high-low frequency detectors in each network in order to extract the strongest two factors.
The feature visualizations of the two factors reveal one clear HF-factor and one clear LF-factor, just like what we found in InceptionV1. Furthermore, the weights on the two factors are again very close to symmetric.
Our earlier conclusions therefore also hold across these different networks: high-low frequency detectors are built up from the specific spatial arrangement of a high frequency component and a low frequency component.
Conclusion
Although high-low frequency detectors represent a feature that we didn’t necessarily expect to find in a neural network, we find that we can still explore and understand them using the interpretability tools we’ve built up for exploring circuits: NMF, feature visualization, synthetic stimuli, and more.
We’ve also learned that high-low frequency detectors are built up from comprehensible lower-level parts, and we’ve shown how they contribute to later, higher-level features. Finally, we’ve seen that high-low frequency detectors are common across multiple network architectures.
Given the universality observations, we might wonder whether the existence of high-low frequency detectors isn’t so unnatural after all. We even find approximate high-low frequency detectors in AlexNet Places, with its substantially different training data. Beyond neural networks, the aesthetic quality imparted by the blurriness of an out-of-focus region of an image is already known as to photographers as bokeh. And in VR, visual blur can either provide an effective depth-of-field cue or, conversely, can induce nausea in the user when implemented in a dissonant way. Perhaps frequency detection might well be commonplace in both natural and artificial vision systems as yet another type of informational cue.
Nevertheless, whether their existence is natural or not, we find that high-low frequency detectors are possible to characterize and understand.
from Hacker News https://ift.tt/3qVWc3I
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.