Designing Equivariant Neural Networks for Perceptual Color Variation

Designing for Equivariance to Perceptual Variation in Images

This post explores the challenge of generalization in neural networks, particularly when faced with out-of-distribution data, such as images with different color properties. It introduces the concept of transformation equivariance as a solution, where a function's output changes in a corresponding way to a change in its input. The article details the development of a neural network that is equivariant to perceptual transformations, specifically changes in hue, saturation, and luminance.

The Challenge of Generalization

Neural networks often struggle to generalize to data that differs from their training distribution. For instance, a network trained on blue digits might fail on green digits. This performance degradation highlights the need for networks that can handle perceptual variations, which are common in real-world applications like robotics and medical imaging.

Transformation Equivariance

A function f is equivariant to transformations g if f(g ⋅ x) = g ⋅ f(x). Equivariant neural networks can leverage this property to generalize performance across transformed instances of data. This approach can lead to better performance with fewer parameters and less training data compared to conventional networks. Previous work has focused on geometric transformations like translations and rotations, while this research focuses on perceptual transformations.

The Hue-Saturation-Luminance (HSL) Color Space

The article explains the HSL color space as an alternative to the common RGB color space. While RGB is visualized as a cube, HSL is often represented as a cylinder. In HSL:

Hue: Defines the base color.
Saturation: Determines the vibrancy or 'washed-out' appearance.
Luminance: Indicates the brightness of the color.

This representation is crucial for understanding and manipulating color transformations.

Color Equivariant Network Design

The research proposes a neural network designed to be equivariant to color transformations by leveraging the geometry of the HSL color space. The hue channel's geometry is analogous to a circle (2D rotations), while saturation and luminance channels can be approximated by a line (1D translations).

To achieve this, input images are