Hardening
Securing Privileged Containers
disallow-write-core-pattern
Prohibit modifying procfs' core_pattern.
Attackers may attempt container escape by modifying the procfs core_pattern in a privileged container or, in a container (w/ CAP_SYS_ADMIN), unmounting specific mount points and then modifying the procfs core_pattern to execute a container escape.
Disallow writing to the procfs' core_pattern file.
- AppArmor
- BPF
disallow-mount-securityfs
Prohibit mounting securityfs.
Attackers may attempt container escape in containers (w/ CAP_SYS_ADMIN) by mounting securityfs with read-write permissions and subsequently modifying it.
Disallow mounting of new security file systems.
- AppArmor
- BPF
disallow-mount-procfs
Prohibit remounting procfs.
Attackers may attempt container escape in containers (w/ CAP_SYS_ADMIN) by remounting procfs with read-write permissions and subsequently modifying the core_pattern, among other things.
- Disallow mounting of new proc file systems.
- Prohibit using bind, rbind, move, remount options to remount
/proc**
. - When using BPF enforcer, it also prevents unmounting
/proc**
.
- AppArmor
- BPF
disallow-write-release-agent
Prohibit modifying cgroupfs' release_agent.
Attackers may attempt container escape within privileged container by directly modifying the cgroupfs release_agent.
Disallow writing to the cgroupfs' release_agent file.
- AppArmor
- BPF
disallow-mount-cgroupfs
Prohibit remounting cgroupfs.
Attackers may attempt to escape from containers (w/ CAP_SYS_ADMIN) by remounting cgroupfs with read-write permissions. Subsequently, they can modify release_agent and device access permissions, among other things.
- Disallow mounting new cgroup file systems.
- Prohibit using bind, rbind, move, remount options to remount
/sys/fs/cgroup**
. - Prohibit using rbind option to remount
/sys**
. - When using BPF enforcer, it also prevents unmounting
/sys**
.
- AppArmor
- BPF
disallow-debug-disk-device
Prohibit debugging of disk devices.
Attackers may attempt to read and write host machine files by debugging host machine disk devices within a privileged container.
It is recommended to use this rule in conjunction with disable-cap-mknod to prevent attackers from bypassing the rule with mknod.
Dynamically acquire host disk devices and restrict container access them with read-write permissions.
- AppArmor
- BPF
disallow-mount-disk-device
Prohibit mounting of host's disk devices.
Attackers may attempt to mount host machine disk devices within a privileged container, thereby gaining read-write access to host machine files.
It is recommended to use this rule in conjunction with disable-cap-mknod to prevent attackers from bypassing the rule with mknod.
Dynamically acquire host machine disk device files and prevent mounting within containers.
- AppArmor
- BPF
disallow-mount
Disable the mount system call.
MOUNT(2) is often used for privilege escalation, container escapes, and other attacks. Most microservices applications do not require mount operations. Therefore, it is recommended to use this rule to restrict container processes from using the mount()
system call.
Note: The mount system call will be disabled by default if the spec.policy.privileged
field is false.
Disable the mount system call.
- AppArmor
- BPF
disallow-umount
Disable the umount system call.
UMOUNT(2) can be used to remove the attachment of topmost mount points(such as maskedPaths), leading to privilege escalation and information disclosure. Most microservices applications do not require umount operations. Therefore, it is recommended to use this rule to restrict container processes from using the umount()
system call.
Disable the umount system call.
- AppArmor
- BPF
disallow-insmod
Prohibit loading kernel modules.
Attackers may attempt to inject code into the kernel within a container (w/ CAP_SYS_MODULE) by executing kernel module loading command.
Disable CAP_SYS_MODULE.
- AppArmor
- BPF
disallow-load-ebpf
Prohibit loading eBPF programs.
Attackers may load eBPF programs within a container (w/ CAP_SYS_ADMIN & CAP_BPF) to theft data or create rootkit.
Note: CAP_BPF was introduced starting from Linux 5.8.
Disable CAP_SYS_ADMIN & CAP_BPF.
- AppArmor
- BPF
disallow-access-procfs-root
Prohibit accessing process's root directory.
This policy prohibits processes within containers from accessing the root directory of the process filesystem (i.e., /proc/[PID]/root
), preventing attackers from exploiting shared PID namespaces to launch attacks.
Attackers may attempt to access the process filesystem outside the container by reading and writing to /proc/*/root
in environments where the PID namespace is shared with the host or other containers. This could lead to information disclosure, privilege escalation, lateral movement, and other attacks.
Disable PTRACE_MODE_READ permission.
- AppArmor
- BPF
disallow-access-kallsyms
Prohibit accessing kernel exported symbol.
Attackers may attempt to leak the base address of kernel modules from containers (w/ CAP_SYSLOG) by reading the kernel's exported symbol definitions file. This assists attackers in bypassing KASLR protection to exploit kernel vulnerabilities more easily.
Disallow reading /proc/kallsyms
file.
- AppArmor
- BPF
Disabling Capabilities
disable-cap-all
Disable all capabilities.
Disable all capabilities.
None
- AppArmor
- BPF
disable-cap-all-except-net-bind-service
Disable all capabilities except for NET_BIND_SERVICE.
Disable all capabilities except for NET_BIND_SERVICE.
This rule complies with the Restricted Policy of the Pod Security Standards.
None
- AppArmor
- BPF
disable-cap-privileged
Disable privileged capabilities.
Disable all privileged capabilities that can directly lead to escapes or affect host availability. Only allow the default capabilities.
This rule complies with the Baseline Policy of the Pod Security Standards, except for the NET_RAW capability.
None
- AppArmor
- BPF
disable-cap-[CAP]
Disable specified capability.
Disable any specified capabilities, replacing [CAP] with the values from capabilities(7), for example, disable-cap-net-raw.
None
- AppArmor
- BPF
Blocking Exploit Vectors
disallow-abuse-user-ns
Prohibit abusing user namespaces.
User namespaces can be used to enhance container isolation. However, it also increases the kernel's attack surface, making certain kernel vulnerabilities easier to exploit. Attackers can use a container to create a user namespace, gaining full privileges and thereby expanding the kernel's attack surface.
Disallowing container processes from abusing CAP_SYS_ADMIN privileges via user namespaces can reduce the kernel's attack surface and block certain exploitation paths for kernel vulnerabilities.
This rule can be used to harden containers on systems where kernel.unprivileged_userns_clone=0
or user.max_user_namespaces=0
is not set or applicable.
Disable CAP_SYS_ADMIN.
- AppArmor
- BPF
disallow-create-user-ns
Prohibit creating user namespace.
User namespaces can be used to enhance container isolation. However, it also increases the kernel's attack surface, making certain kernel vulnerabilities easier to exploit. Attackers can use a container to create a user namespace, gaining full privileges and thereby expanding the kernel's attack surface.
Disallowing container processes from creating new user namespaces can reduce the kernel's attack surface and block certain exploitation paths for kernel vulnerabilities.
This rule can be used to harden containers on systems where kernel.unprivileged_userns_clone=0
or user.max_user_namespaces=0
is not set or applicable.
Disallow creating user namespace.
- Seccomp