Week 11

Automation and Cloud API

OPS3 - Virtualization and Cloud Infrastructure

Welcome to DevOps

1. Advanced CLI Techniques

1.1 Formatting Output

The CLI natively supports JSON output, which provides a structured and predictable data format that scripting languages can easily parse. By appending --format json to any command, we strip away the visual formatting and receive raw data objects.

Command Analysis: * --format json: Forces the CLI to output raw JSON data instead of an ASCII table. This is essential for piping data into tools like jq or Python scripts.

1.2 Parsing with jq

jq is a lightweight command-line JSON processor that allows us to filter, slice, and map JSON data directly in the terminal. It acts as a bridge between the verbose API output and the specific strings (like UUIDs) needed for subsequent commands.

Code Analysis: * $(): Command substitution; runs the inner command and assigns the output to the variable. * |: The pipe operator passes the output of the openstack command directly to jq. * jq -r .id: Filters the JSON to find the key "id". The -r (raw) flag removes quotation marks, leaving just the UUID. * select(): A powerful jq function that acts like a WHERE clause in SQL, allowing you to filter lists based on conditions. In the example above, the -f json flag forces OpenStack to output JSON. We then pipe this valid JSON to jq. The -r flag is crucial as it outputs "raw" strings without quotation marks, making the output ready for variable assignment. We also use the select function to filter the array, ignoring any servers that are building, paused, or shut down.

1.3 Architectural Insight: Golden Images vs. Post-Boot Config

The OpenStack for Architects book details two competing strategies for deploying applications: Golden Images and Post-Boot Configuration.

Figure 1: Pet vs Cattle - Manual "Pet" servers require constant care, while Automated "Cattle" servers are replaceable and identical

Section 1 Checkpoint

Resources: jq Tutorial

2. Cloud-Init: The Standard for Bootstrapping

2.1 How it Works: The Datasource

2.2 Execution Stages

2.3 The Cloud-Config Format

While User Data can be a simple Bash script, the preferred format is Cloud-Config. This is a declarative YAML syntax that abstractly defines what you want, rather than how to do it. To use this format, the input string must begin with the #cloud-config directive.

Config Analysis: * #cloud-config: The required header telling Cloud-Init this is declarative YAML. * packages: A list of software to install via the OS package manager (apt, yum). * runcmd: A list of shell commands to execute after packages are installed. This is often used to start services or configure files.

2.4 Common Patterns (The Cookbook)

Writing User Data requires understanding common patterns. Below are standard recipes frequently used in production.

Pattern 1: The Web Server This pattern installs a web server, writes a custom index file, and ensures the service is running.

Pattern Analysis: * write_files: Creates use-case specific configuration files. The content block allows multiline text. * runcmd: Restarts the service to ensure the new configuration is applied.

Pattern 2: The User Creator This pattern creates a new user account, grants it sudo privileges without a password requirement, and injects an SSH public key for secure access.

Pattern Analysis: * users: A dedicated module for user management. * sudo: Grants password-less root access, critical for automated management tools like Ansible.

Pattern 3: The Update This pattern instructs the system to upgrade all installed packages on boot. Use this cautiously, as it significantly increases the boot time.

Pattern Analysis: * package_upgrade: true: Forces an apt-get upgrade or yum update on first boot. While secure, it adds significant time to the boot process.

2.5 Using it in CLI

To inject this configuration, you save the YAML to a local file (e.g., setup.yaml) and pass it to the compute API during the server creation process.

Command Analysis: * --user-data setup.yaml: Injects the contents of the file setup.yaml into the instance's metadata service. Cloud-Init reads this file upon first boot.

2.6 Troubleshooting (When things go wrong)

A common mistake is assuming that if a server boots, the automation worked. If your script fails (e.g., a syntax error in YAML), the server will still boot, but your app won't be there. To debug this, you must SSH into the server and check the logs:

Section 2 Checkpoint

Reflection: Why is the "Magic IP" (169.254.169.254) accessible from inside the VM without any internet access? (Hint: It is a Link-Local address routed explicitly by the Hypervisor/Neutron).

Resources: Cloud-Init Documentation

3. Automating with Scripts

3.1 The "Bash Loop" (Imperative)

Imagine a scenario where you need to provision a cluster of five servers for a Load Balancing laboratory. Doing this manually is tedious and error-prone. A simple loop can automate the process effectively.

Script Analysis: * for i in {1..5}: Creates a loop that runs 5 times, with variable $i set to 1, 2, 3, 4, 5. * web-$i: Dynamically names the servers (web-1, web-2...) using the variable. * --network private-net: Ensures all servers attach to the same internal network.

3.2 Python Automation (The SDK)

While Bash scripts are useful for quick tasks, they often become unmaintainable "spaghetti code" when applied to complex systems. For professional cloud engineering, the OpenStack SDK (Python) provides a robust alternative.

3.2.1 Authentication (The clouds.yaml)

Hardcoding passwords into scripts is a major security risk. Instead, OpenStack uses a standardized configuration file named clouds.yaml to decouple credentials from code. When you run a script, the SDK searches for this file in a specific order of precedence:

This allowing you to share your Python script with a colleague without accidentally sharing your password—they simply use their own clouds.yaml.

Example Content (clouds.yaml):

Explanation: * clouds: The top-level key containing all cloud definitions. * openstack: The specific profile name. In Python, we select this with cloud='openstack'. * auth_url: The Keystone API endpoint. The SDK sends credentials here to get a token.

Connecting in Python:

3.2.2 Reading Resources (Listing Servers)

The SDK returns Objects, not text. This means you can access properties like .id or .status directly without complex parsing.

Code Analysis: * conn.compute.servers(): Returns a "generator" (an iterable list) of Server objects. * server.name: We access the data using dot-notation, which is type-safe and cleaner than grep.

3.2.3 Creating Resources (The Clean Way)

Creating a server in Python allows us to wrap the logic in a Try/Except block to handle failures (like Quota errors) gracefully.

Code Analysis: * create_server(): Accepts arguments as standard Python types (Strings, Lists). * wait_for_server(): A helper function that pauses the script until the server enters the ACTIVE state, replacing manual sleep loops. * try/except: If the cloud is full or the network ID is wrong, the script captures the error and prints a friendly message instead of crashing with a stack trace.

4. Infrastructure as Code: Heat vs Terraform

4.1 The Two Giants

4.2 Syntax Comparison (Creating a Server)

Option A: OpenStack Heat (HOT)

Option B: Terraform (HCL)

Comparison: * Heat: Uses type: OS::Nova::Server and nested properties. * Terraform: Uses resource "type" "name" and = assignment syntax. Both achieve the exact same result. Note: In this course, we focus on Heat because it requires no external setup and allows you to understand the underlying OpenStack resource model directly. However, in a multi-cloud professional environment, Terraform is the tool you will most likely encounter.

Section 4 Checkpoint

5. Orchestration with Heat (The Template Engine)

5.1 Anatomy of a Template

Heat uses YAML templates known as HOT (Heat Orchestration Templates). Every template follows a standard skeleton:

Structure Analysis: * Version: Always required. Defines the syntax version (HOT 2018-08-31 is standard for Queens/Rocky releases). * Parameters: Variables passed in (Input). * Outputs: Variables passed out (Return values).

5.2 Building Blocks (Primitives)

Rather than writing a massive script immediately, let's look at how to create individual components.

Creating a Network

Resource Analysis: * resources: The top-level keyword indicating the start of the infrastructure definition block. * my_private_net: The Logical ID (Variable Name) used to reference this resource elsewhere in the template. * type: The specific OpenStack resource class (e.g., OS::Neutron::Net). * properties: Configuration specific to that resource (like the network name).

Creating a Security Group

Resource Analysis: * rules: OpenStack Security Groups are Default Deny. No traffic is allowed unless explicitly permitted here. * protocol: The definition (tcp, udp, icmp). * port_range_min/max: The port range (80 to 80 means just port 80). * remote_ip_prefix: Defines Who can access this port (The Source). 0.0.0.0/0 is CIDR notation for "The entire internet." For specific networks, you would use something like 192.168.1.0/24.

Creating a Block Storage Volume

Resource Analysis: * my_data_volume: The Logical ID. * type: OS::Cinder::Volume: Explicitly creates a block device in Cinder. * size: The capacity in Gigabytes (GB). * name: The display name visible in the dashboard.

Creating a Virtual Machine

Resource Analysis: * my_server: The Logical ID. * type: OS::Nova::Server: The standard compute instance type. * image / flavor: The Mandatory properties defining the specs. * Note: There are many other optional properties not shown here, such as key_name (SSH Access), networks (Connectivity), security_groups (Firewall), and user_data (Cloud-Init Scripts). We will combine these in the Unified Stack example below.

5.3 The Unified Stack

The true power of Heat comes from combining these primitives using Intrinsic Functions.

Full Deployment Example (deployment.yaml):

Stack Analysis: * Floating IP: We created a FloatingIP resource on the public network and then an Association resource to link it to our server. This is how the server becomes accessible from your laptop. * User Data: We embedded a Cloud-Config payload to install Docker and launch Nginx as a container. Heat injects this into Cloud-Init, which executes the declarative instructions on boot. * Dependency Chain: The association depends on both the floating_ip and the web_instance. Heat orchestrates this perfectly.

5.4 The Terraform Translation (Rosetta Stone)

To prove that these skills are transferable, here is the exact same Nginx server we built in Heat, translated into Terraform. Notice that while the keywords differ (resources vs resource), the structural logic—defining a network, security group, and server with dependencies—is identical.

Terraform (main.tf)

Translation Analysis: * References: Heat uses get_resource. Terraform uses resource_type.resource_name.id. * Structure: Both tools define resources, properties, and dependencies. The syntax changes (YAML vs HCL), but the concepts are universal. By learning Heat, you are effectively learning the logic needed for Terraform.

5.5 Beyond Single VMs: Magnum (Kubernetes)

In Section 5.3, we installed Docker on a single VM. While fine for development, production requires clusters.

OpenStack Magnum is the service that bridges Heat and Containers.

To deploy a production-grade Kubernetes cluster on OpenStack, we use the Magnum CLI. This happens in 3 phases:

Phase 1: Create the Cluster Template This defines the "Shape" of the cluster (OS Image, Keypair, Network Driver).

Command Analysis: * template create: Sets the blueprint. * --image: Magnum requires special Fedora Atomic or CoreOS images optimized for containers, not standard Ubuntu. * --coe: Specifies the engine. Magnum also supports Docker Swarm and Apache Mesos, but Kubernetes is the standard.

Phase 2: Launch the Cluster This triggers Heat to actually build the stack (VMs, Load Balancers, Security Groups).

Command Analysis: * cluster create: The trigger. This tells Heat to start provisioning resources. * --master-count: High Availability (HA) starts at 3 masters, but for labs, 1 is sufficient. * --node-count: The number of workers where your actual Pods (like Nginx) will run.

Phase 3: Configure Client Access Once the cluster is CREATE_COMPLETE, we download the credentials to talk to it.

Command Analysis: * cluster config: This command fetches the TLS certificates and API endpoints from OpenStack. * export KUBECONFIG: Tells the kubectl tool where to find these credentials. Without this, kubectl doesn't know which cluster to talk to.

5.5.1 Step 4: Deploying Workloads (Pods vs VMs)

Now that the cluster is running, we stop talking to OpenStack (Heat) and start talking to Kubernetes (kubectl). Here is how we deploy Nginx with 3 Replicas (Load Balanced).

Kubernetes Manifest (nginx-deployment.yaml)

Stack Analysis: * Replicas: 3: Instead of creating web_server_01, web_server_02, etc., we simply ask for "3 copies". Kubernetes ensures they are always running. * Service (LoadBalancer): This object talks to OpenStack Neutron/Octavia to provision a real Load Balancer that distributes traffic to those 3 pods.

Section 5 Checkpoint

Summary: Heat templates allow us to define an entire infrastructure stack in a single file. By understanding the core structure (Parameters, Resources, Outputs) and the Building Blocks (Cinder, Nova, Neutron resources), we can assemble complex environments that are consistently reproducible.

Reflection: Why is it better to define the Security Group inside the template rather than assuming it already exists? (Hint: It makes the template "self-contained" and easier to deploy in a fresh project).

6. Configuration Management with Ansible

6.1 The Inventory

Ansible needs to know what it is managing. This is defined in an Inventory file. While it supports a simple INI format, YAML is preferred for clarity.

Example Inventory (hosts.yaml):

Inventory Analysis: * all: The root group containing every server. * children: Sub-groups (e.g., webservers, databases) allow you to target specific roles. * ansible_host: Variable defining the actual IP to connect to.

6.2 Ad-Hoc Commands

For quick, one-off tasks, you don't need to write a script. You can simply "speak" to your cluster using the CLI.

Command Analysis: * all / webservers: The target group from the inventory. * -m ping: The Module to run. 'ping' in Ansible checks SSH connectivity and Python availability, not ICMP. * -a: Arguments passed to the module.

6.3 Playbooks (The Core)

While Ad-Hoc commands are useful, the real power lies in Playbooks. These are YAML files that describe a complex set of tasks—a "play."

Example Playbook (site.yaml):

Playbook Analysis: * apt, copy, service: These are Modules. They abstract away the OS details (e.g., you don't type apt-get install, you just say state: present). * Idempotency: This is the most critical concept. If you run this playbook 100 times, it will only make changes the first time. On subsequent runs, it checks "Is Apache present?", sees "Yes", and does nothing. This makes it safe to run against production systems repeatedly.

6.4 The Unified Pipeline (Integration)

The ultimate goal is to chain these tools together. A simple Bash script can act as the "glue" that triggers Heat to build the infrastructure, waits for the output, and then passes that information to Ansible for configuration.

Example: deploy.sh

Pipeline Analysis: * Glue Code: Bash is used here not to manage resources, but to manage tools. It bridges the gap between Heat (Infrastructure) and Ansible (Config). * Dynamic Inventory: Note how we create hosts.ini on the fly. Note: ansible_user=ubuntu assumes an Ubuntu image; adjust this for Rocky/CentOS (rocky) or Fedora (fedora).

Section 6 Checkpoint

Summary: Ansible fills the gap of "Day 2 Operations." It uses an Inventory to group servers and Playbooks to define their configuration. Unlike a Bash script which runs blindly, Ansible is Idempotent—it only acts if the system is not in the desired state.

Reflection: Compare this to the Bash script in Section 3. If you ran that Bash script twice, it would try to create the servers again (and fail). If you run an Ansible playbook twice, it simply reports "OK" (No Change).

7. Version Control: Managing your Templates

7.1 Why Git?

7.2 The Basic Workflow

Students are expected to manage their Capstone project using these commands:

Git Analysis: * commit: This is your "Save Game" button. Make a commit every time you reach a stable state (e.g., "Heat template works", "Ansible connects"). * GitOps: In advanced environments, applying a commit to a Git repository automatically triggers the deploy.sh pipeline we wrote above. This is known as GitOps.

7.3 Strategic Summary

To help you lock in the mental model of "Which Tool, When?", review this comparison:

Tool Phase Scope
Cloud-Init Boot time Single VM
Bash Glue Tool orchestration
Python SDK API automation Fine-grained logic
Heat Infrastructure Declarative Stacks
Magnum Clusters Platform-level
Ansible Day-2 Ops Fleet Management
Kubernetes Workloads Container Orchestration
---

8. Summary and Next Steps

Course Conclusion

Checklist:

9. Additional Resources

10. Lab Exercises

Summary

Review the key concepts covered in this week's material

Questions?