Week 5

Containers and Resource Management

OPS3 - Virtualization and Cloud Infrastructure

Welcome to Week 5!

What You'll Learn This Week

1. The Container Paradigm

1.1. The Foundation: Namespaces and Control Groups

Linux namespaces provide the fundamental isolation mechanism enabling containers.
Each namespace type isolates a different aspect of the system.
The PID (Process ID) Namespace isolates the process tree.
On a normal Linux system, all processes share a single process ID space.
With PID namespaces, each namespace has its own isolated process tree.

The Network Namespace isolates network configuration completely.
Each network namespace has its own network interfaces, IP addresses, routing tables, and firewall rules.
This enables functionality that would otherwise conflict, such as multiple containers binding to port 80 on the same physical host.
The Mount Namespace controls filesystem visibility, giving each container its own root filesystem (rootfs) that appears as the top of the directory tree.

While namespaces provide isolation, they do not inherently limit resource consumption.
Without additional mechanisms, a process in a container could consume all available CPU time or memory.
Control Groups (cgroups) solve this problem by providing resource limiting, accounting, and prioritization.
Understanding cgroups is essential because they form the foundation of resource management in all modern container platforms, from Docker to Kubernetes.

1.2. Understanding Control Groups in Depth

Control Groups, commonly abbreviated as cgroups, are a Linux kernel feature that organizes processes into hierarchical groups and applies resource constraints to those groups.
Think of cgroups as the traffic police of the Linux system: while namespaces create separate lanes of traffic (isolation), cgroups enforce speed limits and allocate road capacity (resource limits).

When you create a container and specify that it can use "2 CPU cores" and "1GB of RAM," these limits are enforced through cgroups.
The container runtime creates a cgroup for that container and configures the appropriate resource controllers.
If the containerized application tries to exceed these limits, the kernel's cgroup subsystem intervenes to enforce the constraints.

1.3. The cgroup Hierarchy

Cgroups are organized in a tree-like hierarchy, similar to how processes form a process tree.
Each cgroup can have child cgroups, and resource limits applied to a parent cgroup affect all its children.
This hierarchical structure enables sophisticated resource allocation strategies.
For example, you might create a parent cgroup allocating 50% of CPU to "production workloads" and 50% to "development workloads," then subdivide those allocations further among individual containers.

1.4. cgroup Controllers (Subsystems)

The power of cgroups comes from specialized controllers (also called subsystems), each responsible for managing a specific type of resource. The most critical controllers for container management include:

This controller regulates access to CPU time.
Administrators can use several mechanisms to control CPU allocation: - CPU Shares: A proportional weight system where containers with higher shares get more CPU time when there's contention.
For example, if Container A has 1024 shares and Container B has 512 shares, Container A receives twice as much CPU time during periods of high demand.
- CPU Quota: An absolute hard limit on CPU usage over a period.
Setting a quota of 50000 microseconds per 100000 microsecond period limits the container to 50% of one CPU core.

This controller manages memory allocation and prevents containers from consuming excessive RAM: - Memory Limit: Sets a hard cap on the amount of RAM a cgroup can use.
When this limit is reached, the kernel may trigger the Out-Of-Memory (OOM) killer to terminate processes within the cgroup.
- Memory Reservation: A soft limit that acts as a target.
The kernel will attempt to reclaim memory from the cgroup when system-wide memory pressure occurs.
- Swap Control: Determines whether the cgroup can use swap space and how much.

This controller limits disk read/write operations: - I/O Weight: Similar to CPU shares, assigns proportional I/O bandwidth. - Throttling: Sets absolute limits on read/write operations per second (IOPS) or bytes per second.

While technically separate from the primary cgroup implementation, network bandwidth can also be controlled through Traffic Control (tc) in conjunction with cgroups, allowing administrators to limit network throughput per container.

Limits the number of processes (PIDs) that can be created within a cgroup, preventing fork bombs and runaway process creation.

1.5. Practical cgroup Management

On modern Linux systems, cgroups are typically managed through the systemd init system, which uses cgroups extensively for service management. However, understanding direct cgroup manipulation is valuable for troubleshooting and advanced container configurations.

The cgroup filesystem is typically mounted at /sys/fs/cgroup/. Each controller has its own subdirectory, and creating a new cgroup is as simple as creating a directory within that hierarchy.

1.6. cgroups v1 vs v2

The Linux kernel has two major versions of the cgroup implementation.
cgroups v1 (the original implementation) allowed each controller to operate independently with separate hierarchies.
cgroups v2 (the modern unified hierarchy) consolidates all controllers into a single tree structure, simplifying management and improving consistency.

Most modern Linux distributions (such as Ubuntu 22.04+ and Debian 11+) have transitioned to cgroups v2 by default. Docker and other container runtimes now support both versions, though v2 offers improved performance and a cleaner API.

1.7. How Container Runtimes Use cgroups

When you run a container with resource limits, the container runtime (Docker, containerd, Podman, or LXC) performs these operations behind the scenes:

1.8. Example: Docker and cgroups

When you run the following Docker command:

Docker creates a cgroup and configures it approximately like this:

You can inspect these settings directly:

Figure 2: Linux Namespaces and Cgroups - Namespaces provide isolation (PID, Network, Mount) while cgroups enforce resource limits (CPU, Memory)

1.9. Section 1 Checkpoint

Summary: Namespaces: Provide isolation (Network, PID, Mount). The "Walls". Control Groups (cgroups): Provide resource limiting (CPU, RAM). The "Police". Containers share the Hot Kernel; VMs use their own Kernel.

Reflection: If a container crashes the kernel, does the host crash? Why can't you run a Windows Container on a Linux Host (natively)?

Resources: Red Hat: What are Linux Namespaces?

2. The Container Technologies Landscape

2.1. LXC (Linux Containers)

LXC provides System Containers.
The philosophy behind LXC is to offer a lightweight virtual machine experience without the overhead of hardware emulation.
An LXC container boots a full init system (like systemd), runs multiple services (SSH, Cron, Syslog), and persists data like a traditional server.
It is ideal for infrastructure consolidation where you require long-running servers but want the density and efficiency of containers.

2.2. Docker

Docker popularized Application Containers.
Unlike LXC, Docker allows you to package an application and its dependencies into a single runnable unit.
A Docker container typically runs a single process (such as Nginx or Python) and is ephemeral in nature.
Data is stored in external volumes, and the container itself can be destroyed and recreated easily.
Docker is the standard for microservices and modern software delivery.

2.3. Podman

Podman is a modern alternative to Docker, developed by Red Hat.
It maintains compatibility with Docker's commands and image format (OCI) but differs significantly in architecture.
Podman is daemonless; it does not require a background service running as root.
Instead, it starts containers directly as child processes of the user.
This architecture enhances security and natively supports "rootless" containers, allowing unprivileged users to run containers safely.

2.4. Apptainer (formerly Singularity)

Apptainer is designed specifically for High Performance Computing (HPC) and research environments.
In these environments, users run jobs on shared clusters where they do not have root access.
Apptainer accommodates this by encapsulating the entire environment into a single file (.sif) and running it with the user's existing privileges.
It prioritizes mobility of compute and integration with batch schedulers like Slurm.

2.5. Comparison Table

Figure 3: Container Technologies Landscape - LXC for system containers, Docker for application containers, Podman for secure daemonless containers, Apptainer for HPC workloads

Feature	LXC	Docker	Podman	Apptainer
Type	System Container	Application Container	Application Container	Compute Container
Philosophy	"Lightweight VM"	"Single Service"	"Single Service"	"Portable Application"
Management	`lxc-*` commands	`docker` CLI	`podman` CLI	`apptainer` CLI
Daemon	None (Process based)	Yes (dockerd)	No (Fork/Exec)	No
Network	System IP (Bridge)	Port Mapping	Port Mapping	Host Network (typ.)
Root Access	Required for setup	Required (Daemon)	No (Rootless)	No (Rootless)
Image Format	System Templates	OCI Layers	OCI Layers	Single File (.sif)
Primary Use	Infrastructure / VPS	Microservices	Secure Microservices	Scientific / Research

2.6. Section 2 Checkpoint

Summary:

LXC: System Containers (OS-like, persistent). Used for VPS/Infrastructure.
Docker: Application Containers (Single process, ephemeral). Used for Dev/Microservices.
Podman: Daemonless, secure alternative to Docker.

Reflection:

Why is "Daemonless" considered a security feature?
Which technology would you use to host a permanent MySQL server: LXC or Docker?

Resources:

Open Container Initiative (OCI)

3. Working with System Containers (LXC CLI)

3.1. Creating a Container

In the Docker ecosystem, users typically "pull" an image from a registry.
In the LXC ecosystem, the process involves "creating" a container from a template.
A template is a script or tarball that constructs the root filesystem for a specific Linux distribution.
The lxc-create command handles this process, downloading the necessary files to a directory on the host (typically /var/lib/lxc).

Figure 4: LXC Container Lifecycle - From template download through creation, start, attach, stop, to destroy

3.2. Listing and Monitoring

To view the status of containers, the lxc-ls command is used. The --fancy flag provides a formatted table showing the state (RUNNING or STOPPED), IP addresses (if running), and autostart configuration.

3.3. Starting a Container

Booting an LXC container is significantly faster than booting a virtual machine because there is no kernel initialization. The lxc-start command simply initiates the init system within the isolated namespace environment.

3.4. Accessing the Container

While it is possible to configure SSH for an LXC container, administrators often use lxc-attach to enter the container's namespace directly from the host.
This works similarly to jumping into a chroot environment but respects the namespace boundaries.
Once attached, you are the root user inside the container and can manage packages and services normally.

3.5. Stopping and Destroying

Containers should be stopped gracefully to allow services to terminate correctly. The lxc-stop command sends the appropriate signals. When a container is no longer needed, lxc-destroy removes its configuration and deletes the root filesystem directory.

3.6. Section 3 Checkpoint

Summary:

Create: Builds a rootfs from a Template (lxc-create).
Start: Boots the init system inside the namespace (lxc-start).
Attach: Enters the namespace directly (lxc-attach).

Reflection:

How does lxc-attach differ from SSH?
Where are the container filesystems actually stored on the host?

Resources:

Linux Containers (LXC) Project

4. LXC in Proxmox VE (GUI Workflow)

4.1. Step 1: Downloading Templates

Before a container can be created, a template must be available on the configured storage.
In the Proxmox GUI, navigate to the storage view (such as local or local-lvm).
The CT Templates section provides a built-in browser for downloading official templates for various distributions like Ubuntu, Debian, Alpine, and CentOS, as well as TurnKey Linux appliances which come pre-configured with software stacks.

4.2. Step 2: Creating a Container

The creation wizard guides you through the configuration.
Important settings include the Hostname, which sets the container's internal identity, and the Unprivileged option.
Unprivileged containers are the default and recommended choice; they use user namespaces to map the container's root user to a non-privileged user on the host, significantly reducing the impact of a potential container escape.
During Disk and CPU/Memory configuration, you set the resource limits that cgroups will enforce.

4.3. Step 3: Managing Resources Dynamically

One of the key advantages of containers is the ability to adjust resources without rebooting.
If a container is under memory pressure, you can navigate to the Resources tab in Proxmox and increase the Memory limit.
The change is applied instantly to the running container's cgroup.
This elasticity allows for highly efficient resource management compared to the static allocation often required for virtual machines.

4.4. Section 4 Checkpoint

Summary:

Proxmox uses typical LXC tech but wraps it in a GUI for ease of use.
Templates: Must be downloaded to storage before creation.
Unprivileged: Maps root inside container to non-root outside for security.

Reflection:

Why is "Unprivileged" the default?
How does dynamic resource resizing work with cgroups?

Resources:

Proxmox VE: Linux Container (LXC)

5. Working with Application Containers (Docker CLI)

5.1. Docker Architecture Components

The Docker platform comprises several interconnected components forming a complete ecosystem.
Understanding this architecture clarifies how Docker operates.
At the foundation, the Docker daemon (dockerd) runs as a persistent background service, managing Docker objects like images, containers, networks, and volumes.
The Docker CLI (docker) provides the familiar command-line interface.
When you run a command, it translates this into API calls to the daemon.

Figure 5: Docker Architecture - Docker CLI communicates with Docker Daemon via REST API to manage images, containers, networks, and volumes

Docker Images serve as the templates from which containers instantiate.
An image is a read-only layered filesystem containing everything needed to run an application: base OS files, application code, runtime environments, and system libraries.
Registries are repositories that store and serve these images.
Docker Hub is the public registry hosting millions of images, while organizations may operate private registries for proprietary applications.

5.2. Running a Container

Docker's most fundamental operation is docker run, which combines checking if the image exists locally, pulling it if missing, creating a container, and starting it.

Breaking this command down: -d runs the container detached (in the background).
--name web assigns a meaningful name, avoiding random identifiers.
-p 8080:80 maps host port 8080 to container port 80, making the web server accessible at http://localhost:8080.
The argument nginx specifies the image name; Docker pulls nginx:latest from Docker Hub by default.

5.3. Managing Container Lifecycle

Managing running containers involves a set of essential commands for inspection and control.

5.4. Inspecting and Debugging

The docker exec command is particularly useful for troubleshooting. It allows you to execute commands inside a running container. The -it flags allocate an interactive pseudo-TTY, giving you a shell prompt inside the container to inspect processes or check configurations.

Resource usage can be monitored using docker stats, which provides real-time information on CPU, memory, and network usage for running containers.

5.5. Building Custom Images with Dockerfiles

While pre-built images satisfy many needs, real applications require custom images. A Dockerfile is a text file containing instructions for building an image, where each instruction creates a new layer.

Consider a simple Python web application. A Dockerfile might look like this:

Each instruction has a specific purpose.
FROM establishes the base image.
WORKDIR sets the working directory.
The sequence of copying requirements.txt before the rest of the code leverages layer caching: if your application code changes but your dependencies do not, Docker reuses the layer where dependencies are installed, significantly speeding up rebuilds.
USER switches to a non-privileged user, a critical security best practice.

Figure 6: Dockerfile Image Layers - Each instruction creates a new layer; cached layers speed up rebuilds when only code changes

Building the image from this Dockerfile is done with the docker build command:

This reads the Dockerfile in the current directory (.) and builds an image tagged as myapp:1.0.

5.6. Section 5 Checkpoint

Summary:

Images: Read-only layers containing the app.
Containers: Runnable instances of images.
Dockerfile: Recipe for building images.

Reflection:

Why do we put the dependencies copy/install step before copying the app code in a Dockerfile?
What happens to data inside a Docker container when you delete it?

Resources:

Docker Curriculum

6. Working with Podman (Daemonless CLI)

6.1. Running a Container (Rootless)

By default, Podman runs containers as the user who invoked the command, mapping the user's UID to root inside the container. This is a significant security advantage.

6.2. Pods

Podman allows you to manage "Pods" locally. A Pod is a group of containers that share the same network namespace (localhost), a concept directly compatible with Kubernetes.

6.3. Section 6 Checkpoint

Summary:

Daemonless: Podman runs as the user process, avoiding the "Root Daemon" risk.
Rootless: Allows unprivileged users to run containers safely.
Pods: Groups of containers sharing a network namespace (localhost).

Reflection:

How can a rootless container bind to port 80 (privileged port)?
Why is Podman considered "Kubernetes-friendly"?

Resources:

Podman: Getting Started

7. Future Preview: Kubernetes and kubectl

7.1. The Core Concepts (The K8s Dictionary)

Understanding K8s requires learning a new vocabulary, distinct from Docker's:

Docker Concept	Kubernetes Concept	Description
Container	Pod	A Pod is the smallest unit. It usually contains one container (like Nginx), but can contain helpers ("Sidecars"). Pods share a network namespace (localhost).
Volme	Volume / PVC	Storage that persists beyond the Pod's lifecycle.
Network	Service	A stable IP address/DNS name that sits in front of dynamic Pods. If a Pod dies and is replaced, the Service IP stays the same.
Compose File	Manifest	A YAML file describing the "Desired State" (e.g., "I want 3 copies of Nginx").

7.2. The Control Loop (Desired State)

Kubernetes is Declarative.
Unlike Docker, where you say "Run this container" (Imperative), in Kubernetes you say "I want 3 Nginx Pods" (Declarative).
The Control Plane constantly checks: 1.
What is the User's Desired State?
(3 Pods) 2.

7.3. The kubectl CLI

Start familiarizing yourself with these commands now. You will use them extensively in Week 11.

1. Creating Resources (Imperative)

Command Analysis: * run: Tells Kubernetes to create a single Pod. * --image=nginx: Uses the standard Nginx image from Docker Hub. * --restart=Never: Ensures this is treated as a simple Pod, not a managed service that automatically restarts.

2. Inspecting Resources

Command Analysis: * get: The universal command to list resources. You can use it for pods, nodes, services, etc. * describe: Shows the "Event Log" for a specific resource. If your Pod creation failed (e.g., "ImagePullBackOff"), this command tells you why.

3. Scaling Applications

Command Analysis: * create deployment: Instead of a single Pod, we create a Controller that manages Pods. * --replicas=3: We tell K8s we want 3 identical copies. K8s will start 3 Pods immediately. * scale: Changing this number updates the "Desired State." Kubernetes effectively "forks" 7 more copies to reach 10.

4. Exposing to the World

Command Analysis: * expose: Creates a Service that fronts the Deployment. * --type=NodePort: Opens a specific port (e.g., 32000) on every node in the cluster. This allows external traffic to reach your internal Pods. * --port=80: The internal port the container is listening on.

7.4. Self-Correction Checklist

Pod vs Container: A Pod wraps a container. K8s manages Pods, not containers directly.
Service: Without a Service, you cannot reliably talk to a Pod because its IP changes every time it restarts.
Magnum: We will use OpenStack Magnum to build the cluster, so we don't have to manage the Master Nodes ourselves.

8. Summary

9. Additional Resources

9.1. Video Tutorials

LXC in Proxmox: YouTube (Compares to VMs, setup guide).
Docker vs. Podman: YouTube (Explains differences, cloud use cases).
Docker Basics: YouTube (Beginner guide to Docker containers).

9.2. Documentation & Further Reading

Proxmox LXC Documentation
Docker Documentation
Podman Documentation
OpenStack Zun

10. Lab Exercises

Summary

Review the key concepts covered in this week's material

Questions?