Main Content

Normalize data across all channels for each observation independently

The layer normalization operation normalizes the input data across all channels for each observation independently. To speed up training of recurrent and multilayer perceptron neural networks and reduce the sensitivity to network initialization, use layer normalization after the learnable operations, such as LSTM and fully connect operations.

After normalization, the operation shifts the input by a learnable offset *β* and scales it by a learnable scale factor *γ*.

The `layernorm`

function applies the layer normalization operation to
`dlarray`

data.
Using `dlarray`

objects makes working with high
dimensional data easier by allowing you to label the dimensions. For example, you can label
which dimensions correspond to spatial, time, channel, and batch dimensions using the
`"S"`

, `"T"`

, `"C"`

, and
`"B"`

labels, respectively. For unspecified and other dimensions, use the
`"U"`

label. For `dlarray`

object functions that operate
over particular dimensions, you can specify the dimension labels by formatting the
`dlarray`

object directly, or by using the `DataFormat`

option.

**Note**

To apply layer normalization within a `layerGraph`

object
or `Layer`

array, use
`layerNormalizationLayer`

.

applies the layer normalization operation to the input data `dlY`

= layernorm(`dlX`

,`offset`

,`scaleFactor`

)`dlX`

and
transforms it using the specified offset and scale factor.

The function normalizes over the `'S'`

(spatial),
`'T'`

(time), `'C'`

(channel), and
`'U'`

(unspecified) dimensions of `dlX`

for each
observation in the `'B'`

(batch) dimension, independently.

For unformatted input data, use the `'DataFormat'`

option.

applies the layer normalization operation to the unformatted `dlY`

= layernorm(`dlX`

,`offset`

,`scaleFactor`

,'DataFormat',FMT)`dlarray`

object
`dlX`

with the format specified by `FMT`

. The output
`dlY`

is an unformatted `dlarray`

object with dimensions
in the same order as `dlX`

. For example,
`'DataFormat','SSCB'`

specifies data for 2-D image input with the format
`'SSCB'`

(spatial, spatial, channel, batch).

To specify the format of the scale and offset, use the
`'ScaleFormat'`

and `'OffsetFormat'`

options,
respectively.

`[`

specifies options using one or more name-value pair arguments in addition to the input
arguments in previous syntaxes. For example, `dlY`

] = layernorm(___,`Name,Value`

)`'Epsilon',1e-4`

sets the
epsilon value to `1e-4`

.

The layer normalization operation normalizes the elements
*x _{i}* of the input by first calculating the mean

$$\widehat{{x}_{i}}=\frac{{x}_{i}-{\mu}_{L}}{\sqrt{{\sigma}_{L}^{2}+\u03f5}},$$

where *ϵ* is a constant that improves numerical
stability when the variance is very small.

To allow for the possibility that inputs with zero mean and unit variance are not optimal for the operations that follow layer normalization, the layer normalization operation further shifts and scales the activations using the transformation

$${y}_{i}=\gamma {\widehat{x}}_{i}+\beta ,$$

where the offset *β* and scale factor
*γ* are learnable parameters that are updated during network
training.

[1] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. “Layer Normalization.” Preprint, submitted July 21, 2016. https://arxiv.org/abs/1607.06450.

`relu`

| `fullyconnect`

| `dlconv`

| `dlarray`

| `dlgradient`

| `dlfeval`

| `groupnorm`

| `batchnorm`