Secure containers with principle of least privilege! Control exactly what your containers can do.
Understanding capabilities
Linux capabilities break down root privileges into distinct units:
services:
# Drop all capabilities, then add only what's needed
secure-app:
image: myapp
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # Bind to ports < 1024
- CHOWN # Change file ownership
# Default Docker capabilities (for reference)
default-app:
image: myapp
# Implicitly has: CHOWN, DAC_OVERRIDE, FSETID, FOWNER,
# MKNOD, NET_RAW, SETGID, SETUID, SETFCAP, SETPCAP,
# NET_BIND_SERVICE, SYS_CHROOT, KILL, AUDIT_WRITE
Common capability patterns
Web server (needs port 80/443):
services:
nginx:
image: nginx
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # Bind to privileged ports
- CHOWN # Change file ownership
- SETUID # Switch users
- SETGID # Switch groups
ports:
- "80:80"
- "443:443"
Network tools:
services:
tcpdump:
image: tcpdump
cap_drop:
- ALL
cap_add:
- NET_RAW # Raw socket access
- NET_ADMIN # Network configuration
network_mode: host
Time synchronization:
services:
chrony:
image: chrony
cap_drop:
- ALL
cap_add:
- SYS_TIME # Set system time
Read-only root filesystem
Prevent modifications to the container filesystem:
services:
api:
image: api:latest
read_only: true
tmpfs:
- /tmp # Writable temp directory
- /var/run # Runtime data
volumes:
- type: tmpfs
target: /app/cache
tmpfs:
size: 100M
Security options
Additional security controls:
services:
app:
image: myapp
security_opt:
- no-new-privileges:true # Prevent privilege escalation
- apparmor:docker-default # AppArmor profile
- seccomp:unconfined # Seccomp profile
- label:type:container_t # SELinux label
# Custom seccomp profile
restricted:
image: restricted-app
security_opt:
- seccomp:./security/seccomp-profile.json
Privileged mode (use cautiously)
Sometimes needed for system-level tools:
services:
# Docker-in-Docker
dind:
image: docker:dind
privileged: true # Full host capabilities
volumes:
- /var/lib/docker
# System monitoring
monitoring:
image: sysdig/sysdig
privileged: true
volumes:
- /dev:/host/dev
- /proc:/host/proc:ro
- /sys:/host/sys:ro
User namespace remapping
Run as non-root with user namespaces:
services:
app:
image: myapp
user: "1000:1000" # Run as specific user
userns_mode: host # Use host user namespace
# Or with custom mapping
isolated:
image: isolated-app
user: "5000:5000"
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE
Complete security example
Defense in depth approach:
services:
secure-api:
image: api:production
# User settings
user: "1000:1000"
# Capabilities
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE
# Security options
security_opt:
- no-new-privileges:true
- apparmor:docker-default
# Filesystem
read_only: true
tmpfs:
- /tmp:size=10M,mode=1770,uid=1000,gid=1000
# Resource limits
deploy:
resources:
limits:
cpus: "1"
memory: 256M
# Network isolation
networks:
- internal
# Health monitoring
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
networks:
internal:
internal: true # No external access
Pro tip
Audit container capabilities:
#!/bin/bash
# audit-capabilities.sh
for service in $(docker compose ps --services); do
echo "=== Service: $service ==="
# Get capabilities
container=$(docker compose ps -q $service)
if [ -n "$container" ]; then
echo "Current capabilities:"
docker inspect $container | jq '.[0].HostConfig.CapAdd // []'
echo "Dropped capabilities:"
docker inspect $container | jq '.[0].HostConfig.CapDrop // []'
echo "Security options:"
docker inspect $container | jq '.[0].HostConfig.SecurityOpt // []'
# Check if running as root
user=$(docker inspect $container | jq -r '.[0].Config.User // "root"')
if [ "$user" = "root" ] || [ "$user" = "" ]; then
echo "⚠️ WARNING: Running as root user"
else
echo "✅ Running as user: $user"
fi
fi
echo
done
Minimal privileges, maximum security!