Skip to main content
Version: v0.7

Hardening

These rules are used for reduce the attack surface of system, such as blocking common escape vectors for containers has privileges, disabling capabilities, and blocking certain kernel vulnerability exploitation vectors.

Securing Privileged Containers

disallow-write-core-pattern

Prohibit modifying procfs' core_pattern.

Description

Attackers may attempt container escape by modifying the procfs core_pattern in a privileged container or, in a container (w/ CAP_SYS_ADMIN), unmounting specific mount points and then modifying the procfs core_pattern to execute a container escape.

Principle & Impact

Disallow writing to the procfs' core_pattern file.

Supported Enforcer
  • AppArmor
  • BPF

disallow-mount-securityfs

Prohibit mounting securityfs.

Description

Attackers may attempt container escape in containers (w/ CAP_SYS_ADMIN) by mounting securityfs with read-write permissions and subsequently modifying it.

Principle & Impact

Disallow mounting of new security file systems.

Supported Enforcer
  • AppArmor
  • BPF

disallow-mount-procfs

Prohibit remounting procfs.

Description

Attackers may attempt container escape in containers (w/ CAP_SYS_ADMIN) by remounting procfs with read-write permissions and subsequently modifying the core_pattern, among other things.

Principle & Impact
  1. Disallow mounting of new proc file systems.
  2. Prohibit using bind, rbind, move, remount options to remount /proc**.
  3. When using BPF enforcer, it also prevents unmounting /proc**.
Supported Enforcer
  • AppArmor
  • BPF

disallow-write-release-agent

Prohibit modifying cgroupfs' release_agent.

Description

Attackers may attempt container escape within privileged container by directly modifying the cgroupfs release_agent.

Principle & Impact

Disallow writing to the cgroupfs' release_agent file.

Supported Enforcer
  • AppArmor
  • BPF

disallow-mount-cgroupfs

Prohibit remounting cgroupfs.

Description

Attackers may attempt to escape from containers (w/ CAP_SYS_ADMIN) by remounting cgroupfs with read-write permissions. Subsequently, they can modify release_agent and device access permissions, among other things.

Principle & Impact
  1. Disallow mounting new cgroup file systems.
  2. Prohibit using bind, rbind, move, remount options to remount /sys/fs/cgroup**.
  3. Prohibit using rbind option to remount /sys**.
  4. When using BPF enforcer, it also prevents unmounting /sys**.
Supported Enforcer
  • AppArmor
  • BPF

disallow-debug-disk-device

Prohibit debugging of disk devices.

Description

Attackers may attempt to read and write host machine files by debugging host machine disk devices within a privileged container.

It is recommended to use this rule in conjunction with disable-cap-mknod to prevent attackers from bypassing the rule with mknod.

Principle & Impact

Dynamically acquire host disk devices and restrict container access them with read-write permissions.

Supported Enforcer
  • AppArmor
  • BPF

disallow-mount-disk-device

Prohibit mounting of host's disk devices.

Description

Attackers may attempt to mount host machine disk devices within a privileged container, thereby gaining read-write access to host machine files.

It is recommended to use this rule in conjunction with disable-cap-mknod to prevent attackers from bypassing the rule with mknod.

Principle & Impact

Dynamically acquire host machine disk device files and prevent mounting within containers.

Supported Enforcer
  • AppArmor
  • BPF

disallow-mount

Disable the mount system call.

Description

MOUNT(2) is often used for privilege escalation, container escapes, and other attacks. Most microservices applications do not require mount operations. Therefore, it is recommended to use this rule to restrict container processes from using the mount() system call.

Note: The mount system call will be disabled by default if the spec.policy.privileged field is false.

Principle & Impact

Disable the mount system call.

Supported Enforcer
  • AppArmor
  • BPF

disallow-umount

Disable the umount system call.

Description

UMOUNT(2) can be used to remove the attachment of topmost mount points(such as maskedPaths), leading to privilege escalation and information disclosure. Most microservices applications do not require umount operations. Therefore, it is recommended to use this rule to restrict container processes from using the umount() system call.

Principle & Impact

Disable the umount system call.

Supported Enforcer
  • AppArmor
  • BPF

disallow-insmod

Prohibit loading kernel modules.

Description

Attackers may attempt to inject code into the kernel within a container (w/ CAP_SYS_MODULE) by executing kernel module loading command.

Principle & Impact

Disable CAP_SYS_MODULE.

Supported Enforcer
  • AppArmor
  • BPF

disallow-load-bpf-prog, disallow-load-ebpf

Prohibit loading eBPF programs, except for those of the BPF_PROG_TYPE_SOCKET_FILTER and BPF_PROG_TYPE_CGROUP_SKB types.

Description

Attackers may load eBPF programs within a container (w/ CAP_SYS_ADMIN, CAP_BPF) to theft data or create rootkit.

Before Linux 5.8, loading eBPF programs, except for those of the BPF_PROG_TYPE_SOCKET_FILTER and BPF_PROG_TYPE_CGROUP_SKB types, needs CAP_SYS_ADMIN. Since Linux 5.8, loading eBPF programs, except for those types, needs CAP_SYS_ADMIN or CAP_BPF. And some types of eBPF programs also require CAP_NET_ADMIN or CAP_PERFMON.

The id of disallow-load-ebpf rule will be deprecated, please use disallow-load-bpf-prog instead.

Principle & Impact

Disable CAP_SYS_ADMIN & CAP_BPF.

It is recommended to use the disallow-load-all-bpf-prog rule to prohibit loading any types of eBPF programs to reduce the attack surface of kernel.

Supported Enforcer
  • AppArmor
  • BPF

disallow-access-procfs-root

Prohibit accessing process's root directory.

Description

This policy prohibits processes within containers from accessing the root directory of the process filesystem (i.e., /proc/[PID]/root), preventing attackers from exploiting shared PID namespaces to launch attacks.

Attackers may attempt to access the process filesystem outside the container by reading and writing to /proc/*/root in environments where the PID namespace is shared with the host or other containers. This could lead to information disclosure, privilege escalation, lateral movement, and other attacks.

Principle & Impact

Disable PTRACE_MODE_READ permission.

Supported Enforcer
  • AppArmor
  • BPF

disallow-access-kallsyms

Prohibit accessing kernel exported symbol.

Description

Attackers may attempt to leak the base address of kernel modules from containers (w/ CAP_SYSLOG) by reading the kernel's exported symbol definitions file. This assists attackers in bypassing KASLR protection to exploit kernel vulnerabilities more easily.

Principle & Impact

Disallow reading /proc/kallsyms file.

Supported Enforcer
  • AppArmor
  • BPF

Disabling Capabilities

disable-cap-all

Disable all capabilities.

Description

Disable all capabilities.

Principle & Impact

None

Supported Enforcer
  • AppArmor
  • BPF

disable-cap-all-except-net-bind-service

Disable all capabilities except for NET_BIND_SERVICE.

Description

Disable all capabilities except for NET_BIND_SERVICE.

This rule complies with the Restricted Policy of the Pod Security Standards.

Principle & Impact

None

Supported Enforcer
  • AppArmor
  • BPF

disable-cap-privileged

Disable privileged capabilities.

Description

Disable all privileged capabilities that can directly lead to escapes or affect host availability. Only allow the default capabilities.

This rule complies with the Baseline Policy of the Pod Security Standards, except for the NET_RAW capability.

Principle & Impact

None

Supported Enforcer
  • AppArmor
  • BPF

disable-cap-[CAP]

Disable specified capability.

Description

Disable any specified capabilities, replacing [CAP] with the values from capabilities(7), for example, disable-cap-net-raw.

Principle & Impact

None

Supported Enforcer
  • AppArmor
  • BPF

Blocking Exploit Vectors

disallow-abuse-user-ns

Prohibit abusing user namespaces.

Description

User namespaces can be used to enhance container isolation. However, it also increases the kernel's attack surface, making certain kernel vulnerabilities easier to exploit. Attackers can use a container to create a user namespace, gaining full privileges and thereby expanding the kernel's attack surface.

Disallowing container processes from abusing CAP_SYS_ADMIN privileges via user namespaces can reduce the kernel's attack surface and block certain exploitation paths for kernel vulnerabilities.

This rule can be used to harden containers on systems where kernel.unprivileged_userns_clone=0 or user.max_user_namespaces=0 is not set or applicable.

Refer to the following links for further information.

Principle & Impact

Disable CAP_SYS_ADMIN.

Supported Enforcer
  • AppArmor
  • BPF

disallow-create-user-ns

Prohibit creating user namespace.

Description

User namespaces can be used to enhance container isolation. However, it also increases the kernel's attack surface, making certain kernel vulnerabilities easier to exploit. Attackers can use a container to create a user namespace, gaining full privileges and thereby expanding the kernel's attack surface.

Disallowing container processes from creating new user namespaces can reduce the kernel's attack surface and block certain exploitation paths for kernel vulnerabilities.

This rule can be used to harden containers on systems where kernel.unprivileged_userns_clone=0 or user.max_user_namespaces=0 is not set or applicable.

Refer to the following links for further information.

Principle & Impact

Disallow creating user namespace.

Supported Enforcer
  • Seccomp

disallow-load-all-bpf-prog

Prohibit loading any types of eBPF programs.

Description

Attackers can load BPF_PROG_TYPE_SOCKET_FILTER or BPF_PROG_TYPE_CGROUP_SKB types of extended BPF (eBPF) programs without privileged permission. So they may use these types of eBPF programs to sniff network data package, or exploit vulnerabilities of the BPF verifier or JIT engine to achieve container escape.

This rule can be used to harden containers on systems where kernel.unprivileged_bpf_disabled=0.

Refer to the following links for further information.

Principle & Impact

Disallow loading any types of eBPF programs via bpf syscall with BPF_PROG_LOAD parameters.

Supported Enforcer
  • Seccomp

disallow-load-bpf-via-setsockopt

Prohibit loading cBPF programs via setsockopt system call

Description

Attackers can load classic BPF (cBPF) programs via the setsockopt syscall without privileged permission. They may use this way to perform some BPF JIT spraying. This can be a powerful means to exploit kernel vulnerabilities. Because this exploit vector does not rely on any capability and is outside the control of the kernel.unprivileged_bpf_disabled sysctl.

Refer to the following links for further information.

Principle & Impact

Disallow loading classic BPF programs via setsockopt syscall with SO_ATTACH_FILTER or SO_ATTACH_REUSEPORT_CBPF parameter.

It is recommended to use it in conjunction with the disallow-load-all-bpf-prog rule to prohibit loading any types of extended BPF programs.

Supported Enforcer
  • Seccomp

disallow-userfaultfd-creation

Prohibit creating userfaultfd objects.

Description

In Linux kernel exploits, userfaultfd is often abused by attackers to manipulate the timing of memory accesses, thus assisting in the implementation of exploits (such as conditional race vulnerabilities, UAF vulnerabilities). Its core function is to precisely control the processing timing of page errors (Page Fault), creating a predictable vulnerability trigger window for attackers.

Since Linux 5.11, the global variable sysctl_unprivileged_userfaultfd in kernel fs/userfaultfd.c is initialized to 0, and a userfaultfs object can be created only if the process has SYS_CAP_PTRACE permissions.

This rule can be used to harden containers on systems where kernel.unprivileged_userfaultfd=1. And the userfaultfd syscall is also disabled in the default Seccomp profile of the container runtime.

Principle & Impact

Disallow calling the userfaultfd system call.

Supported Enforcer
  • Seccomp