Week 11
Automation and Cloud API
OPS3 - Virtualization and Cloud Infrastructure
Welcome to DevOps
1. Advanced CLI Techniques
1.1 Formatting Output
The CLI natively supports JSON output, which provides a structured and predictable data format that scripting languages can easily parse. By appending --format json to any command, we strip away the visual formatting and receive raw data objects.
Command Analysis:
* --format json: Forces the CLI to output raw JSON data instead of an ASCII table. This is essential for piping data into tools like jq or Python scripts.
1.2 Parsing with jq
jq is a lightweight command-line JSON processor that allows us to filter, slice, and map JSON data directly in the terminal. It acts as a bridge between the verbose API output and the specific strings (like UUIDs) needed for subsequent commands.
Code Analysis:
* $(): Command substitution; runs the inner command and assigns the output to the variable.
* |: The pipe operator passes the output of the openstack command directly to jq.
* jq -r .id: Filters the JSON to find the key "id". The -r (raw) flag removes quotation marks, leaving just the UUID.
* select(): A powerful jq function that acts like a WHERE clause in SQL, allowing you to filter lists based on conditions.
In the example above, the -f json flag forces OpenStack to output JSON. We then pipe this valid JSON to jq. The -r flag is crucial as it outputs "raw" strings without quotation marks, making the output ready for variable assignment. We also use the select function to filter the array, ignoring any servers that are building, paused, or shut down.
1.3 Architectural Insight: Golden Images vs. Post-Boot Config
The OpenStack for Architects book details two competing strategies for deploying applications: Golden Images and Post-Boot Configuration.
Figure 1: Pet vs Cattle - Manual "Pet" servers require constant care, while Automated "Cattle" servers are replaceable and identical
- Golden Images (Mutable/Baked) involve installing all application dependencies—such as Apache, PHP, and custom code—into the Virtual Machine image before it is ever launched.
- This is typically done using tools like Packer.
- The primary advantage is speed; since the software is pre-installed, the VM is ready almost instantly upon boot.
- However, this method suffers from "Image Sprawl," where every minor code change requires building and uploading a new multi-gigabyte image to Glance, consuming storage and bandwidth.
- Post-Boot Configuration (Immutable/Runtime) takes a different approach.
- You launch a generic, "Vanilla" operating system image (like Ubuntu Cloud Image) and use automation tools to install software after the instance boots.
- While this results in a slower initial startup time as packages are downloaded and installed, it offers superior flexibility.
- A single small base image can serve thousands of different purposes.
- Modern cloud architecture typically favors a Hybrid Approach, using a base image for the OS and tools like Ansible for the final application configuration.
Section 1 Checkpoint
- Summary: JSON is the lingua franca of Cloud APIs, providing a structured format that is difficult for humans to read but trivial for machines to parse.
- To work effectively with this data in a shell environment, jq is an essential tool for extracting specific fields like resource IDs.
- Before writing any automation script, a cloud engineer must master the ability to retrieve clean, predictable data programmatically rather than relying on brittle text parsing methods like grep.
- Reflection: Consider why grep is a poor choice for parsing JSON data; a simple change in line breaks or spacing could break a script, whereas jq parses the data structure itself.
- Also, recall that the -r flag in jq strips quotes from the output, which is essentially when passing values to other CLI commands.
Resources:
jq Tutorial
2. Cloud-Init: The Standard for Bootstrapping
2.1 How it Works: The Datasource
- The magic of Cloud-Init relies on a Datasource.
- On boot, Cloud-Init acts like a detective, probing the network to find out where it is running.
- In OpenStack (and AWS), it typically queries the Metadata Service at the "Magic IP" 169.254.169.254.
- If it receives a response, it pulls down a JSON payload containing the instance's Hostname, SSH Keys, and the User Data provided by the operator.
2.2 Execution Stages
- Cloud-Init does not run as a single script; it executes in distinct stages throughout the boot process to ensure dependencies are met:
1.
- Generator: Determines if cloud-init should run at all.
- 2.
- Local (Init): Finds the datasource and applies networking.
- This is critical because without networking, it cannot fetch further data.
2.3 The Cloud-Config Format
While User Data can be a simple Bash script, the preferred format is Cloud-Config. This is a declarative YAML syntax that abstractly defines what you want, rather than how to do it. To use this format, the input string must begin with the #cloud-config directive.
Config Analysis:
* #cloud-config: The required header telling Cloud-Init this is declarative YAML.
* packages: A list of software to install via the OS package manager (apt, yum).
* runcmd: A list of shell commands to execute after packages are installed. This is often used to start services or configure files.
2.4 Common Patterns (The Cookbook)
Writing User Data requires understanding common patterns. Below are standard recipes frequently used in production.
Pattern 1: The Web Server
This pattern installs a web server, writes a custom index file, and ensures the service is running.
Pattern Analysis:
* write_files: Creates use-case specific configuration files. The content block allows multiline text.
* runcmd: Restarts the service to ensure the new configuration is applied.
Pattern 2: The User Creator
This pattern creates a new user account, grants it sudo privileges without a password requirement, and injects an SSH public key for secure access.
Pattern Analysis:
* users: A dedicated module for user management.
* sudo: Grants password-less root access, critical for automated management tools like Ansible.
Pattern 3: The Update
This pattern instructs the system to upgrade all installed packages on boot. Use this cautiously, as it significantly increases the boot time.
Pattern Analysis:
* package_upgrade: true: Forces an apt-get upgrade or yum update on first boot. While secure, it adds significant time to the boot process.
2.5 Using it in CLI
To inject this configuration, you save the YAML to a local file (e.g., setup.yaml) and pass it to the compute API during the server creation process.
Command Analysis:
* --user-data setup.yaml: Injects the contents of the file setup.yaml into the instance's metadata service. Cloud-Init reads this file upon first boot.
2.6 Troubleshooting (When things go wrong)
A common mistake is assuming that if a server boots, the automation worked. If your script fails (e.g., a syntax error in YAML), the server will still boot, but your app won't be there. To debug this, you must SSH into the server and check the logs:
- /var/log/cloud-init.log: The high-level log of what cloud-init attempted to do.
- /var/log/cloud-init-output.log: The raw stdout/stderr of your scripts. If your apt-get install failed, the error message will be here.
Section 2 Checkpoint
- Summary: Cloud-Init is the bridge between a generic OS image and a functional server.
- It relies on a Datasource (Metadata Service) to fetch configuration.
- It executes in strict Stages (Init -> Config -> Final) to ensure the network is ready before attempting to install software.
- Debugging automation failures requires inspecting the logs inside the VM, as errors here rarely stop the instance from booting.
Reflection: Why is the "Magic IP" (169.254.169.254) accessible from inside the VM without any internet access? (Hint: It is a Link-Local address routed explicitly by the Hypervisor/Neutron).
Resources:
Cloud-Init Documentation
3. Automating with Scripts
3.1 The "Bash Loop" (Imperative)
Imagine a scenario where you need to provision a cluster of five servers for a Load Balancing laboratory. Doing this manually is tedious and error-prone. A simple loop can automate the process effectively.
Script Analysis:
* for i in {1..5}: Creates a loop that runs 5 times, with variable $i set to 1, 2, 3, 4, 5.
* web-$i: Dynamically names the servers (web-1, web-2...) using the variable.
* --network private-net: Ensures all servers attach to the same internal network.
3.2 Python Automation (The SDK)
While Bash scripts are useful for quick tasks, they often become unmaintainable "spaghetti code" when applied to complex systems. For professional cloud engineering, the OpenStack SDK (Python) provides a robust alternative.
- Why Python?
- Python offers several advantages over shell scripting.
- First, Error Handling is handled gracefully through try/except blocks, preventing the script from crashing unexpectedly.
- Second, Python's native Data Structures, such as Dictionaries and Lists, are far easier to manipulate than parsing string output from a CLI commands.
- Finally, the logic required for Idempotency—checking if a resource exists before attempting to create it—is significantly cleaner to implement.
3.2.1 Authentication (The clouds.yaml)
Hardcoding passwords into scripts is a major security risk. Instead, OpenStack uses a standardized configuration file named clouds.yaml to decouple credentials from code. When you run a script, the SDK searches for this file in a specific order of precedence:
This allowing you to share your Python script with a colleague without accidentally sharing your password—they simply use their own clouds.yaml.
Example Content (clouds.yaml):
Explanation:
* clouds: The top-level key containing all cloud definitions.
* openstack: The specific profile name. In Python, we select this with cloud='openstack'.
* auth_url: The Keystone API endpoint. The SDK sends credentials here to get a token.
Connecting in Python:
3.2.2 Reading Resources (Listing Servers)
The SDK returns Objects, not text. This means you can access properties like .id or .status directly without complex parsing.
Code Analysis:
* conn.compute.servers(): Returns a "generator" (an iterable list) of Server objects.
* server.name: We access the data using dot-notation, which is type-safe and cleaner than grep.
3.2.3 Creating Resources (The Clean Way)
Creating a server in Python allows us to wrap the logic in a Try/Except block to handle failures (like Quota errors) gracefully.
Code Analysis:
* create_server(): Accepts arguments as standard Python types (Strings, Lists).
* wait_for_server(): A helper function that pauses the script until the server enters the ACTIVE state, replacing manual sleep loops.
* try/except: If the cloud is full or the network ID is wrong, the script captures the error and prints a friendly message instead of crashing with a stack trace.
4. Infrastructure as Code: Heat vs Terraform
4.1 The Two Giants
- Two primary tools dominate this landscape.
- Heat is the OpenStack Native orchestration engine.
- It is built directly into the platform, requires no external installation, and uses YAML templates.
- It is the ideal choice for pure OpenStack environments where external tool dependencies are undesirable.
- Terraform, created by HashiCorp, is the Industry Standard for multi-cloud provisioning.
4.2 Syntax Comparison (Creating a Server)
Option A: OpenStack Heat (HOT)
Option B: Terraform (HCL)
Comparison:
* Heat: Uses type: OS::Nova::Server and nested properties.
* Terraform: Uses resource "type" "name" and = assignment syntax. Both achieve the exact same result.
Note: In this course, we focus on Heat because it requires no external setup and allows you to understand the underlying OpenStack resource model directly. However, in a multi-cloud professional environment, Terraform is the tool you will most likely encounter.
Section 4 Checkpoint
- Summary: We have moved from Imperative scripts, where we define strict procedural steps, to Declarative IaC, where we define the target architecture.
- A critical property of these modern tools is Idempotency—the ability to execute the same script multiple times without causing errors or duplicating resources.
- If the resource already exists in the desired state, the tool simply does nothing.
- Reflection: Consider why a company using multiple cloud providers (e.g., AWS and on-prem OpenStack) would prefer Terraform over Heat.
- Also, think about the consequences of removing a resource definition from a Terraform file or Heat template; unlike a script which simply stops running, IaC tools will actively destroy the resource to ensure the real world matches your definition.
5. Orchestration with Heat (The Template Engine)
5.1 Anatomy of a Template
Heat uses YAML templates known as HOT (Heat Orchestration Templates). Every template follows a standard skeleton:
Structure Analysis:
* Version: Always required. Defines the syntax version (HOT 2018-08-31 is standard for Queens/Rocky releases).
* Parameters: Variables passed in (Input).
* Outputs: Variables passed out (Return values).
5.2 Building Blocks (Primitives)
Rather than writing a massive script immediately, let's look at how to create individual components.
Creating a Network
Resource Analysis:
* resources: The top-level keyword indicating the start of the infrastructure definition block.
* my_private_net: The Logical ID (Variable Name) used to reference this resource elsewhere in the template.
* type: The specific OpenStack resource class (e.g., OS::Neutron::Net).
* properties: Configuration specific to that resource (like the network name).
Creating a Security Group
Resource Analysis:
* rules: OpenStack Security Groups are Default Deny. No traffic is allowed unless explicitly permitted here.
* protocol: The definition (tcp, udp, icmp).
* port_range_min/max: The port range (80 to 80 means just port 80).
* remote_ip_prefix: Defines Who can access this port (The Source). 0.0.0.0/0 is CIDR notation for "The entire internet." For specific networks, you would use something like 192.168.1.0/24.
Creating a Block Storage Volume
Resource Analysis:
* my_data_volume: The Logical ID.
* type: OS::Cinder::Volume: Explicitly creates a block device in Cinder.
* size: The capacity in Gigabytes (GB).
* name: The display name visible in the dashboard.
Creating a Virtual Machine
Resource Analysis:
* my_server: The Logical ID.
* type: OS::Nova::Server: The standard compute instance type.
* image / flavor: The Mandatory properties defining the specs.
* Note: There are many other optional properties not shown here, such as key_name (SSH Access), networks (Connectivity), security_groups (Firewall), and user_data (Cloud-Init Scripts). We will combine these in the Unified Stack example below.
5.3 The Unified Stack
The true power of Heat comes from combining these primitives using Intrinsic Functions.
- { get_resource: X }: Gets the ID of resource X.
- { get_param: Y }: Gets the value of user input Y.
- { get_attr: [Z, val] }: Gets a specific attribute (like an IP address) from resource Z.
Full Deployment Example (deployment.yaml):
Stack Analysis:
* Floating IP: We created a FloatingIP resource on the public network and then an Association resource to link it to our server. This is how the server becomes accessible from your laptop.
* User Data: We embedded a Cloud-Config payload to install Docker and launch Nginx as a container. Heat injects this into Cloud-Init, which executes the declarative instructions on boot.
* Dependency Chain: The association depends on both the floating_ip and the web_instance. Heat orchestrates this perfectly.
5.4 The Terraform Translation (Rosetta Stone)
To prove that these skills are transferable, here is the exact same Nginx server we built in Heat, translated into Terraform. Notice that while the keywords differ (resources vs resource), the structural logic—defining a network, security group, and server with dependencies—is identical.
Terraform (main.tf)
Translation Analysis:
* References: Heat uses get_resource. Terraform uses resource_type.resource_name.id.
* Structure: Both tools define resources, properties, and dependencies. The syntax changes (YAML vs HCL), but the concepts are universal. By learning Heat, you are effectively learning the logic needed for Terraform.
5.5 Beyond Single VMs: Magnum (Kubernetes)
In Section 5.3, we installed Docker on a single VM. While fine for development, production requires clusters.
OpenStack Magnum is the service that bridges Heat and Containers.
- Orchestration: Magnum uses Heat under the hood to deploy a stack.
- Resources: It automatically creates the Master Nodes, Worker Nodes, Load Balancers, and Private Networks.
- Result: Instead of a VM with Docker, you get a fully manageable Kubernetes Cluster.
To deploy a production-grade Kubernetes cluster on OpenStack, we use the Magnum CLI. This happens in 3 phases:
Phase 1: Create the Cluster Template
This defines the "Shape" of the cluster (OS Image, Keypair, Network Driver).
Command Analysis:
* template create: Sets the blueprint.
* --image: Magnum requires special Fedora Atomic or CoreOS images optimized for containers, not standard Ubuntu.
* --coe: Specifies the engine. Magnum also supports Docker Swarm and Apache Mesos, but Kubernetes is the standard.
Phase 2: Launch the Cluster
This triggers Heat to actually build the stack (VMs, Load Balancers, Security Groups).
Command Analysis:
* cluster create: The trigger. This tells Heat to start provisioning resources.
* --master-count: High Availability (HA) starts at 3 masters, but for labs, 1 is sufficient.
* --node-count: The number of workers where your actual Pods (like Nginx) will run.
Phase 3: Configure Client Access
Once the cluster is CREATE_COMPLETE, we download the credentials to talk to it.
Command Analysis:
* cluster config: This command fetches the TLS certificates and API endpoints from OpenStack.
* export KUBECONFIG: Tells the kubectl tool where to find these credentials. Without this, kubectl doesn't know which cluster to talk to.
5.5.1 Step 4: Deploying Workloads (Pods vs VMs)
Now that the cluster is running, we stop talking to OpenStack (Heat) and start talking to Kubernetes (kubectl). Here is how we deploy Nginx with 3 Replicas (Load Balanced).
Kubernetes Manifest (nginx-deployment.yaml)
Stack Analysis:
* Replicas: 3: Instead of creating web_server_01, web_server_02, etc., we simply ask for "3 copies". Kubernetes ensures they are always running.
* Service (LoadBalancer): This object talks to OpenStack Neutron/Octavia to provision a real Load Balancer that distributes traffic to those 3 pods.
Section 5 Checkpoint
Summary: Heat templates allow us to define an entire infrastructure stack in a single file. By understanding the core structure (Parameters, Resources, Outputs) and the Building Blocks (Cinder, Nova, Neutron resources), we can assemble complex environments that are consistently reproducible.
Reflection: Why is it better to define the Security Group inside the template rather than assuming it already exists? (Hint: It makes the template "self-contained" and easier to deploy in a fresh project).
6. Configuration Management with Ansible
6.1 The Inventory
Ansible needs to know what it is managing. This is defined in an Inventory file. While it supports a simple INI format, YAML is preferred for clarity.
Example Inventory (hosts.yaml):
Inventory Analysis:
* all: The root group containing every server.
* children: Sub-groups (e.g., webservers, databases) allow you to target specific roles.
* ansible_host: Variable defining the actual IP to connect to.
6.2 Ad-Hoc Commands
For quick, one-off tasks, you don't need to write a script. You can simply "speak" to your cluster using the CLI.
Command Analysis:
* all / webservers: The target group from the inventory.
* -m ping: The Module to run. 'ping' in Ansible checks SSH connectivity and Python availability, not ICMP.
* -a: Arguments passed to the module.
6.3 Playbooks (The Core)
While Ad-Hoc commands are useful, the real power lies in Playbooks. These are YAML files that describe a complex set of tasks—a "play."
Example Playbook (site.yaml):
Playbook Analysis:
* apt, copy, service: These are Modules. They abstract away the OS details (e.g., you don't type apt-get install, you just say state: present).
* Idempotency: This is the most critical concept. If you run this playbook 100 times, it will only make changes the first time. On subsequent runs, it checks "Is Apache present?", sees "Yes", and does nothing. This makes it safe to run against production systems repeatedly.
6.4 The Unified Pipeline (Integration)
The ultimate goal is to chain these tools together. A simple Bash script can act as the "glue" that triggers Heat to build the infrastructure, waits for the output, and then passes that information to Ansible for configuration.
Example: deploy.sh
Pipeline Analysis:
* Glue Code: Bash is used here not to manage resources, but to manage tools. It bridges the gap between Heat (Infrastructure) and Ansible (Config).
* Dynamic Inventory: Note how we create hosts.ini on the fly. Note: ansible_user=ubuntu assumes an Ubuntu image; adjust this for Rocky/CentOS (rocky) or Fedora (fedora).
Section 6 Checkpoint
Summary: Ansible fills the gap of "Day 2 Operations." It uses an Inventory to group servers and Playbooks to define their configuration. Unlike a Bash script which runs blindly, Ansible is Idempotent—it only acts if the system is not in the desired state.
Reflection: Compare this to the Bash script in Section 3. If you ran that Bash script twice, it would try to create the servers again (and fail). If you run an Ansible playbook twice, it simply reports "OK" (No Change).
7. Version Control: Managing your Templates
7.1 Why Git?
- History: "Who changed the firewall rule last Tuesday?" Git tells you exactly who and why.
- Rollback: If a new template breaks production, git revert allows you to instantly return to the working version.
- Collaboration: Multiple engineers can work on the same stack without overwriting each other's files.
7.2 The Basic Workflow
Students are expected to manage their Capstone project using these commands:
Git Analysis:
* commit: This is your "Save Game" button. Make a commit every time you reach a stable state (e.g., "Heat template works", "Ansible connects").
* GitOps: In advanced environments, applying a commit to a Git repository automatically triggers the deploy.sh pipeline we wrote above. This is known as GitOps.
7.3 Strategic Summary
To help you lock in the mental model of "Which Tool, When?", review this comparison:
| Tool |
Phase |
Scope |
| Cloud-Init |
Boot time |
Single VM |
| Bash |
Glue |
Tool orchestration |
| Python SDK |
API automation |
Fine-grained logic |
| Heat |
Infrastructure |
Declarative Stacks |
| Magnum |
Clusters |
Platform-level |
| Ansible |
Day-2 Ops |
Fleet Management |
| Kubernetes |
Workloads |
Container Orchestration |
| --- |
|
|
8. Summary and Next Steps
Course Conclusion
- Congratulations! You have completed the core curriculum for Operating Systems 3.
- We have progressed from Bare-Metal Virtualization to Cloud-Native Automation.
- Key Takeaway: The future of infrastructure is Declarative, Automated, and Scalable.
Checklist:
- Can you differentiate between Imperative (Bash) and Declarative (Heat) automation?
- Do you understand why Idempotency is critical for Day-2 operations?
- Can you explain the transition from Infrastructure-as-Service (Nova) to Platform-as-Service (Magnum)?
- Are you ready to use Git to manage your project templates?
9. Additional Resources
10. Lab Exercises
Summary
Review the key concepts covered in this week's material
Questions?