U-Net Encoder-Decoder for Segmentation

Symmetric encoder-decoder with skip connections for pixel-wise prediction.

Prompt

A U-Net architecture diagram drawn as the canonical "U" shape.

Left side — Encoder (downsampling):
- Four levels, each with two 3x3 convolutions + ReLU + 2x2 max pooling.
- Channels double at each level: 64, 128, 256, 512.
- Spatial resolution halves at each level.

Bottom — Bottleneck:
- Two 3x3 convolutions with 1024 channels at the lowest spatial resolution.

Right side — Decoder (upsampling):
- Four levels mirroring the encoder.
- Each level: 2x2 transposed convolution (or bilinear upsample), concatenation with the corresponding encoder feature map (skip connection drawn as a horizontal arrow), then two 3x3 convolutions.

Output:
- A 1x1 convolution produces a per-pixel class probability map (C output channels).

Annotations:
- Every block labeled with channel count.
- Skip connections drawn as bold horizontal arrows crossing the U.

Style: clean publication-style vector, navy palette with one accent color for skip connections, white background. Suitable for medical imaging / remote-sensing journals.

Use in Generator

When to use

For medical imaging / remote sensing / semantic segmentation papers.

Variations

Attention U-Net

Add an attention gate on each skip connection that filters encoder features using the decoder feature as a query. Show the gate as a small AND-style symbol on the skip arrow.

Tips

Channel counts must be labeled. Without them the figure looks generic and uninformative.
Bold the skip connections — they are the defining feature of U-Net and should pop visually.
Show the bottleneck explicitly. Many auto-generated U-Nets shrink it into the encoder by accident.

FAQ

How do I show 3D U-Net for volumetric data?

State "3D convolutions and 2x2x2 pooling, channel counts unchanged. Replace 2D feature maps with 3D cuboid icons."