Week 8
Cloud Foundation
(OpenStack)
OPS3 - Virtualization and Cloud Infrastructure
Welcome to Week 8!
1. Case Study: The "Nebula Inc." Startup
- To understand how a cloud is built, we will follow a continuous scenario for the next three weeks.
- You
have been hired as the Lead Cloud Engineer for a new software startup called "Nebula
Inc." Currently, they have no infrastructure—just a credit card and a dream.
- Your job is to
build their Virtual Data Center (VDC) from scratch using OpenStack.
- This is not a theoretical exercise;
you will be typing the actual commands that cloud administrators use daily to construct the digital
fabric of the modern internet.
The Roadmap:
- Week 8 (Foundation): You must create the secure environment. This involves setting
up the company's "Tenant" (Project), hiring the staff (Users), purchasing the software licenses
(Images), and cabling the office (Networking).
- Week 9 (Compute): You will deploy their first web servers and secure them with
firewalls.
- Week 10 (Storage): You will attach persistent storage for their customer database.
By the end of this module, you will have a fully functional, multi-tier cloud application environment
running on infrastructure you defined yourself.
2. Deep Dive: Identity Management (Keystone)
2.1 The Authentication Workflow (The "Token Dance")
When you run a command like openstack server list, a complex sequence of events, often
called the "Token Dance," occurs in the background before you see any output.
Figure 1: The Keystone "Token Dance" - Documenting the 7-step process of authentication and
authorization
2.2 The Backend (Where are users stored?)
Keystone is modular and capable of integrating with existing enterprise systems. It can store users
locally or talk to external systems:
- SQL (Local): Users are stored in the OpenStack database (MariaDB). This is the
standard configuration for small deployments and our lab environment.
- LDAP / Active Directory (Enterprise): In large enterprises, you do not want to
create separate accounts for every system. Keystone can plug directly into the corporate Active
Directory. When a user logs in, Keystone forwards the password to the Domain Controller for
validation, ensuring Single Sign-On (SSO).
2.3 Token Providers (Fernet vs UUID)
The format of the token itself determines the performance of the cloud.
- UUID (Legacy): A random string stored in the Keystone Database. The problem with
this method is that every validation requires a Database Lookup. In massive clouds handling
10,000 requests per second, this would crush the database.
- Fernet (Modern): A cryptographic token format. The token contains the User ID and
Expiry Time, encrypted using a secret key held by the Keystone server. The major benefit is
that Keystone does not need to store the token in a database. To validate it, it simply
decrypts the token significantly reducing database load and allowing Keystone to scale infinitely.
2.4 The Hierarchy
- Domain: A high-level container (e.g., "Default" or "Corporate_A"). Use for
multi-tenant isolation.
- Project (Tenant): The workspace. Resources (VMs, Networks) belong to a Project.
- User: The human or service account.
- Role: The permission set. A User must have a Role on a Project to do
anything.
2.5 CLI Implementation (Case Study: Nebula Inc.)
- Now we apply this theory to our startup.
- "Nebula Inc." requires a dedicated, isolated environment where
its developers can work without interfering with other departments.
- In OpenStack, we achieve this
"Multi-Tenancy" by creating a specific Project.
- This project will act as the container
for all their future VMs, networks, and storage volumes.
- It also allows us to set quotas (e.g., "Maximum
10 CPUs") to control their budget.
Step 1: Create the Project
- Explanation:
--domain default: Specifies that this project lives in the default domain.
--description: Metadata for admins.
Result: Creates a record in the projects table. Returns a UUID
(e.g., a1b2c3...).
Step 2: Create the User
- Explanation:
--password-prompt: Hides input for security.
Result: Creating a user identity does NOT grant access. The user is
currently "unemployed".
Step 3: Assign the Role
- Explanation:
member: The standard permission level (can create VMs/Networks but cannot
delete other users).
Result: Creates a row in the role_assignments table linking
User+Project+Role.
2.6 Identity Verification & Management
- In a production cloud, security is an ongoing process, not a one-time setup.
- The "Principle of Least
Privilege" dictates that we must continuously verify that only the correct people have access to our
sensitive data.
- Simply listing the users in the system is insufficient; a user might exist but have no
access to anything.
- To audit this, we must inspect the Role Assignments.
Auditing Access
- Explanation:
--names: Resolves UUIDs to human-readable names.
Result: Displays exactly which project the user can access and with what
level of authority.
Managing Users
- Disabling a User:
bash
openstack user set --disable nebula_admin
Explanation: Sets enabled=False in the DB. The user cannot
request new tokens.
Section 2 Checkpoint
Summary:
- Keystone: The core authentication and authorization service; without it, nothing
works.
- Fernet Tokens: Modern, stateless tokens that improve performance by removing
database lookups.
- Role Assignments: The critical link that grants a User permission on a Project.
- Reflection:
1.
- Why do we use tokens (like Fernet) instead of just passing the username/password to every service?
- (Hint: Performance and Security).
- 2.
- If you delete a user, their history disappears.
Resources:
- OpenStack Keystone Guide
- AWS Identity and Access Management (IAM)
- Microsoft Entra ID (Azure AD)
3. Deep Dive: Image Management (Glance)
3.1 Understanding Disk Formats
Not all virtual disks are created equal. You must choose the right format for your cloud workload:
RAW is a bit-for-bit copy of the disk. It offers the fastest performance because there
is no overhead, but it is space-inefficient. A 10GB drive takes up 10GB of physical space, even if it is
empty, making it slow to copy over the network.
- QCOW2 (QEMU Copy On Write) is the standard format for OpenStack.
- It supports compression
and "Thin Provisioning," meaning a 10GB drive with only 100MB of data takes up only 100MB of physical
space.
- Crucially, it logic enables snapshot capabilities, allowing you to save the state of a VM
instantly.
ISO is a read-only archive used for installation media. While essential for building
images, it is rarely used in cloud "boot-from-image" scenarios because we prefer pre-installed operating
systems.
3.2 Glance Architecture
Glance is split into distinct components to separate the metadata from the actual data payload.
Figure 2: Glance Architecture - The separation of the API, Registry (Metadata), and Backend Store
(Data)
- Glance API: The front-end service that accepts user requests (e.g., "Upload this
image", "List images"). It verifies the user's token with Keystone before proceeding.
- Glance Registry: An internal service that stores the metadata about images
(Name, Size, Format, Owner) in the SQL database.
- Backend Store: The driver responsible for storing the actual binary data
(the heavy bits). While this can be a local file system (/var/lib/glance), production
clouds typically use a distributed storage cluster like Ceph or an object store
like AWS S3 to ensure data durability and accessibility across all compute nodes.
3.3 CLI Implementation (Case Study: Nebula Inc.)
- For any software company, consistency is key.
- We cannot have one developer running Ubuntu 20.04 and
another running Fedora 35, as this leads to the infamous "it works on my machine" problem.
- To solve
this, Nebula Inc.
- enforces a Standard Operating Environment (SOE).
- We will upload a
"Golden Image"—a pre-approved, security-hardened operating system template that all staff must use.
Step 1: Download the Source
- Explanation: We assume we are on the "Jumpbox" or Controller node. Browsers cannot
upload directly to Glance CLI; the file must exist locally.
Step 2: Upload to Glance
- Explanation:
--disk-format qcow2: Defines how the bits are organized.
--container-format bare: Indicates no extra metadata wrapper (OVF) is around
the file.
--public: IMPORTANT. By default, images are "Private" (only visible to the
uploader). This flag makes it visible to all projects in the cloud (Nebula, Admin,
Testing, etc.).
--min-ram 64: A metadata tag. Nova checks this before booting. If a user tries
to launch this on a Flavor with 32MB RAM, Nova will block the request to prevent a crash.
Result: The file is streamed into the Glance Backend Store, and a UUID is
generated.
3.4 Managing Images (Day 2 Operations)
Once images are uploaded, they are not static. You may need to update their metadata or remove obsolete
versions.
Listing Images
- Explanation: Returns a table of available images.
- Result: Checks ID, Name, and Status. Status should be active. If
status is queued, the upload failed.
Updating Metadata (Properties)
Sometimes we forget a flag or need to deprecate an OS.
- Explanation: Adds a custom key-value pair to the image metadata. The Scheduler use
this to ensure the VM lands on Intel hardware, not ARM.
- Result: The Glance Registry is updated; the actual file is untouched.
Deleting Images
- Explanation: Marks the image for deletion.
- Result: The metadata is removed from the Registry, and the backend storage driver
(Ceph/File) is instructed to free the space.
Section 3 Checkpoint
Summary:
- Glance: The image repository that provides boot disks to Nova.
- QCOW2: The preferred format for cloud images due to thin provisioning and snapshot
support.
- SOE: Standard Operating Environment ensuring consistency across all machines.
- Reflection:
1.
- If you have a slow 1Gb link interconnecting your data centers, which image format (RAW or QCOW2)
would be faster to replicate?
- Why?
- 2.
- Why is it dangerous to make every image --public?
Resources:
- OpenStack Glance Guide
- AWS AMIs (Amazon Machine
Images)
- Azure
Compute Gallery
4. Deep Dive: Networking (Neutron)
4.1 What is SDN (Software Defined Networking)?
- Network Engineers often rely on physical switches and routers to move traffic.
- In the cloud, we
virtualize this entirely using Software Defined Networking (SDN).
- The core concept of
SDN is the separation of the Control Plane (The Brain) from the Data
Plane (The Muscle).
Figure 3: Neutron SDN Architecture - The separation of the Logical Control Plane (API) from the
Physical Data Plane (Open vSwitch)
- The Control Plane (Neutron API/Server) acts as the brain of the operation.
- When you
execute a command to create a network or open port 80, you are communicating with the Control Plane.
- It
calculates the necessary logic and updates the state of the cloud database, but it does not touch a
single network packet itself.
- The Data Plane (OVS Agent/L2 Agent) sits on every compute node and acts as the muscle.
- It receives instructions from the Control Plane via a message bus (RabbitMQ) and implements them by
programming the local virtual switch.
- It is the actual software responsible for moving packets from your
VM to the physical network card.
4.2 The Virtual Switch: Open vSwitch (OVS)
- In a physical rack, servers plug into a top-of-rack switch.
- In OpenStack, VMs plug into a virtual switch
called Open vSwitch (OVS).
- Unlike a standard unmanaged switch that simply learns MAC
addresses, OVS is a production-quality, multilayer virtual switch that uses Flow
Tables.
- A Flow Table is a list of programmable rules that match specific packets (e.g., "If
source IP is A and dest IP is B...") and applies specific actions (e.g., "...drop packet" or "...tag
with VLAN 100").
- Neutron programs these flow tables dynamically to implement sophisticated features like
Security Groups (Distributed Firewalls) and Virtual Routing.
4.3 Under the Hood: The Linux Connection
Everything you learned in Week 4 applies here. Neutron uses standard Linux kernel features to build these
structures:
- Isolation = Namespaces: When you create a Router or a DHCP server, Neutron creates
a Linux Network Namespace (ip netns). This allows Project A and
Project B to both use "192.168.0.1" without conflict; they live in parallel, isolated universes.
- Cabling = Veth Pairs: When a VM "plugs in" to the OVS Bridge, Neutron creates a
Virtual Ethernet (veth) pair. One end connects to the VM's interface (inside KVM),
and the other connects to the OVS Bridge.
4.4 Flow of Traffic (North-South vs East-West)
Designing a cloud network requires understanding the two primary directions of traffic flow, as they
traverse different paths through the infrastructure.
Figure 4: North-South vs East-West Traffic - Visualizing how traffic stays within the cloud versus
how it exits to the internet
- East-West Traffic refers to communication between VMs inside the same cloud
environment (e.g., Web Server A talking to Database Server B).
- Ideally, this traffic should never leave
the virtual infrastructure.
- It flows from the source VM, through the local OVS Bridge, and is typically
encapsulated in a tunnel protocol like VXLAN to cross the physical network before
arriving at the destination compute node.
- North-South Traffic refers to communication entering or leaving the cloud (e.g., a User
accessing your Web Server from the Internet).
- This traffic must leave the virtual overlay network.
- It
passes through the Neutron Router (which lives inside a Network Namespace), undergoes
SNAT (Source NAT) to mask its private IP, and exits via the external provider network
gateway.
4.5 CLI Implementation (Case Study: Nebula Inc.)
- Now that we understand the theory of pipelines and flows, it is time to build.
- Nebula Inc.
- requires a
private, isolated network segment where their web servers can communicate safely.
- We will construct a
topology consisting of a private Virtual Switch (nebula_net), an IP addressing scheme
(nebula_subnet), and a Virtual Router (nebula_router) to connect to the
outside world.
Step 1: Create the Switch (Network)
- Explanation: Initializes the logical switch in the database (Control Plane).
- Result: A network UUID is created. OVS is not touched yet.
Step 2: Define Addressing (Subnet)
- Explanation:
--network nebula_net: Attaches this IP logic to the switch.
Result: The DHCP Agent (Data Plane) spawns a dnsmasq process
in a namespace to serve IPs.
Step 3: Build the Gateway (Router)
- Explanation: Creates a virtual router instance.
Step 4: Wiring (Interface Attachment)
- Explanation: This is the equivalent of plugging a patch cable from the Switch
(nebula_subnet) into the Router's LAN port.
- Result: The L3 Agent creates a generic router namespace and assigns the gateway IP
192.168.50.1.
Step 5: Uplink (External Gateway)
- Explanation: Connects the Router's WAN port to the Provider Network
(public).
- Result: Enables the router to route traffic to the internet (North-South flow).
4.6 Verification
Log in to Horizon -> Network -> Network Topology. You should see the Nebula Router
creating a bridge between the Blue (Private) line and the Red (Public) line.
Section 4 Checkpoint
Summary:
- SDN: Separates the "Brain" (Neutron API) from the "Muscle" (OVS/Agents).
- OVS: Uses Flow Tables to direct traffic and enforce security, replacing physical
switch logic.
- Linux Foundations: Capabilities like Namespaces and Veth pairs are the building
blocks of the cloud.
- Reflection:
1.
- Recall: In Week 4, we used ip netns exec.
- How does that relate to a
Neutron Router?
- 2.
- Flows: If you add a Security Group rule to allow SSH, what actually changes on the
Compute Node?
Resources:
- OpenStack Neutron Guide
- AWS VPC (Virtual Private Cloud)
- Azure
Virtual Network (VNet)
6. Industry Comparison: The "Polyglot" Cloud
Engineer
6.1 Concept Mapping
| Concept |
OpenStack Term |
AWS Term |
Azure Term |
| Identity Service |
Keystone |
IAM (Identity & Access Mgmt) |
Microsoft Entra ID (Azure AD) |
| The "Container" |
Project (Tenant) |
Account |
Subscription / Resource Group |
| Image Service |
Glance |
AMI Registry |
Azure Compute Gallery |
| Network Service |
Neutron |
VPC (Virtual Private Cloud) |
VNet (Virtual Network) |
| Routing |
Neutron Router |
Internet Gateway (IGW) |
VPN Gateway / VNet Peering |
6.2 The "Standard Operating Environment" across
Clouds
In Section 3, we discussed the "Golden Image." This strategy is universal.
- OpenStack: You use Packer to build a QCOW2 image and upload it to
Glance.
- AWS: You use Packer to build an AMI and upload it to
EC2.
- Azure: You use Packer to build a VHD and upload it to
Azure Compute Gallery.
- The Conclusion: The tool (Packer) and the workflow (Build -> Validate -> Upload) are
identical; only the target file format changes.
7. Summary
4. Lab Exercises
Summary
Review the key concepts covered in this week's material
Questions?