Linux for DevOps Complete Beginner's Guide "tutorial of 11 in-depth chapters covering every Linux skill you need to start your DevOps career — with real examples, diagrams, and projects.

Linux for DevOps — Complete 11-Chapter Pro Course | 22+ Real Case Projects

🐧 LINUX FOR DEVOPS — COMPLETE 11 CHAPTERS · 22+ REAL PROJECTS

Linux for DevOps
Full Course with Case Scenarios

11 exhaustive chapters · 2+ production case projects per chapter · Line-by-line command explanations · Copy-paste ready

01Linux Kernel & System Info 02Filesystem & Permissions 03Bash Scripting Automation 04Users, SSH & Security 05Processes & systemd 06Networking & Firewall 07Package Management 08Cron Jobs Scheduling 09Log Analysis (grep/awk/sed) 10LVM Storage Management 11Env Vars & Configuration

Chapter 1: Linux Kernel — The Heart of DevOps

📖 Definition: Linux is an open-source Unix-like kernel created by Linus Torvalds in 1991. It manages hardware, processes, memory, file systems, and I/O. Linux powers 96% of cloud servers, all top 500 supercomputers, Android devices, and every container runtime (Docker, Kubernetes).

1.1 Essential System Information Commands

$ uname -r
5.15.0-91-generic
$ cat /etc/os-release
NAME="Ubuntu" VERSION="22.04.3 LTS"
$ hostnamectl
Static hostname: ubuntu-server

🔍 Line-by-Line Explanation:
• uname -r → displays kernel release version. DevOps engineers check this before deploying kernel modules.
• cat /etc/os-release → shows distribution name and version. Used in automation scripts for distro-specific logic (apt vs yum).
• hostnamectl → shows system hostname and OS details in one command.

📁 CASE PROJECT 1.1: Production Server Audit Script

Scenario: You’re a DevOps engineer joining a company. You need to audit 100 production servers to inventory OS versions, kernel, CPU, RAM, and disk usage for capacity planning.

$ cat server-audit.sh
#!/bin/bash
echo "=== SERVER AUDIT REPORT ==="
echo "Hostname: $(hostname)"
echo "Kernel: $(uname -r)"
echo "OS: $(cat /etc/os-release | grep PRETTY_NAME | cut -d'"' -f2)"
echo "CPU Cores: $(nproc)"
echo "RAM: $(free -h | awk '/^Mem:/ {print $2}')"
echo "Disk /: $(df -h / | awk 'NR==2 {print $5}')"
echo "Uptime: $(uptime -p)"

🎯 Solution Logic: grep PRETTY_NAME | cut -d'"' -f2 extracts human-readable OS name. free -h | awk '/^Mem:/ {print $2}' gets total RAM. Run via for server in $(cat servers.txt); do ssh $server 'bash -s' < server-audit.sh; done for bulk audit across all servers.

📁 CASE PROJECT 1.2: Kernel Version Compliance Checker

Scenario: Security team mandates kernel version 5.15+ across all production servers due to a CVE vulnerability. Write a script that flags non-compliant servers and fails CI/CD pipelines.

$ cat kernel-compliance.sh
#!/bin/bash
KERNEL=$(uname -r | cut -d'-' -f1)
REQUIRED="5.15"
if [[ "$(printf '%s\n' "$REQUIRED" "$KERNEL" | sort -V | head -n1)" != "$REQUIRED" ]]; then
  echo "❌ FAIL: Kernel $KERNEL is below $REQUIRED"
  exit 1
else
  echo "✅ PASS: Kernel $KERNEL meets requirement"
  exit 0
fi

🎯 Logic Breakdown: cut -d'-' -f1 removes build suffix. sort -V performs version sorting. If required version is greater than current, script exits with error (exit 1), blocking deployment pipelines. Integrate with Jenkins/GitHub Actions pre-deployment checks.

Chapter 2: Linux Filesystem Hierarchy & Permission Octal System

📖 Definition: In Linux, everything is a file — including directories, devices, sockets, and processes. The Filesystem Hierarchy Standard (FHS) defines directories like /etc (configurations), /var/log (logs), /proc (process info). Permissions: read(r)=4, write(w)=2, execute(x)=1. Octal mode combines these (e.g., 755 = rwxr-xr-x, 644 = rw-r--r--).

📁 CASE PROJECT 2.1: WordPress Security Hardening

Scenario: Your WordPress site was hacked because of world-writable files. Implement proper file permissions to prevent unauthorized modifications.

$ sudo find /var/www/html -type f -exec chmod 644 {} \;
$ sudo find /var/www/html -type d -exec chmod 755 {} \;
$ sudo chown -R www-data:www-data /var/www/html
$ sudo chmod 600 /var/www/html/wp-config.php
$ sudo chmod 640 /var/www/html/.htaccess

🔐 Detailed Permission Logic:
• find -type f -exec chmod 644 → all files get rw-r--r-- (owner read/write, group/others read only).
• find -type d -exec chmod 755 → directories get rwxr-xr-x (execute needed to traverse).
• chown -R www-data:www-data → web server user owns all files.
• chmod 600 wp-config.php → database credentials file is private to owner only.
• chmod 640 .htaccess → web server can read, group can't write.

📁 CASE PROJECT 2.2: Shared Development Environment with SGID

Scenario: 5 developers need to collaborate on a shared project directory. Any file created should automatically be writable by all team members.

$ sudo groupadd devteam
$ sudo usermod -aG devteam alice
$ sudo usermod -aG devteam bob
$ sudo usermod -aG devteam carol
$ sudo mkdir -p /opt/project
$ sudo chown -R :devteam /opt/project
$ sudo chmod 2775 /opt/project
$ sudo chmod 664 /opt/project/*
$ umask 002 # Add to each user's .bashrc

👥 Explanation: SGID bit (2 in 2775) makes new files and directories inherit the group 'devteam'. chmod 2775 = rwxrwsr-x (s = SGID). umask 002 ensures new files are created with 664 permissions. This setup eliminates permission conflicts without requiring root intervention.

Chapter 3: Bash Scripting — Variables, Conditionals, Loops, Functions

📖 Definition: Bash (Bourne Again SHell) is the most common Linux shell and scripting language. Scripts start with #!/bin/bash shebang. Key concepts: variables ($VAR), command substitution ($(command)), conditionals (if/then/else), loops (for/while/until), functions, exit codes (0=success).

📁 CASE PROJECT 3.1: Deployment Rollback System with 5 Version Retention

Scenario: Application deployment failed in production. Create a script that maintains last 5 versions and can rollback to any previous version instantly.

$ cat deploy-manager.sh
#!/bin/bash
set -euo pipefail
APP_DIR="/opt/myapp"
BACKUP_DIR="/opt/backups"
VERSION=$(date +%Y%m%d_%H%M%S)

# Backup current version before deployment
backup() {
  tar -czf "$BACKUP_DIR/app_$VERSION.tar.gz" -C "$APP_DIR" .
  echo "✅ Backup created: app_$VERSION.tar.gz"
}

# Keep only last 5 backups
cleanup() {
  ls -t $BACKUP_DIR/app_*.tar.gz | tail -n +6 | xargs -r rm -f
}

# Rollback to specific version
rollback() {
  local backup_file=$1
  if [ ! -f "$BACKUP_DIR/$backup_file" ]; then
    echo "❌ Backup $backup_file not found!"
    return 1
  fi
  rm -rf $APP_DIR/*
  tar -xzf "$BACKUP_DIR/$backup_file" -C "$APP_DIR"
  sudo systemctl restart myapp
  echo "✅ Rollback to $backup_file complete"
}

case ${1:-} in
  backup) backup; cleanup;;
  rollback) rollback "$2";;
  *) echo "Usage: $0 {backup|rollback }"; exit 1;;
esac

🔄 Complete Logic Breakdown:
• set -euo pipefail → exit on error, unset variable, or pipe failure (production best practice).
• date +%Y%m%d_%H%M%S → timestamp like 20250315_143022 for unique backups.
• ls -t | tail -n +6 | xargs -r rm -f → sorts by time, skips first 5, deletes older ones.
• case ${1:-} → handles command-line arguments (backup/rollback).
• Usage: ./deploy-manager.sh backup before deployment, ./deploy-manager.sh rollback app_20250315_143022.tar.gz to restore.

📁 CASE PROJECT 3.2: Automated Log Rotator with Size Threshold

Scenario: Application logs grow too fast and fill up disk space. Create a script that rotates logs when size exceeds 100MB and compresses old logs.

$ cat log-rotator.sh
#!/bin/bash
LOG_FILE="/var/log/myapp/application.log"
MAX_SIZE_MB=100
MAX_SIZE_BYTES=$((MAX_SIZE_MB * 1024 * 1024))

if [ -f "$LOG_FILE" ]; then
  CURRENT_SIZE=$(stat -c%s "$LOG_FILE")
  if [ $CURRENT_SIZE -gt $MAX_SIZE_BYTES ]; then
    TIMESTAMP=$(date +%Y%m%d_%H%M%S)
    mv "$LOG_FILE" "${LOG_FILE}.${TIMESTAMP}"
    gzip "${LOG_FILE}.${TIMESTAMP}"
    touch "$LOG_FILE"
    chown appuser:appuser "$LOG_FILE"
    chmod 644 "$LOG_FILE"
    # Send SIGHUP to reopen log file (for apps that support it)
    kill -HUP $(pgrep -f "myapp") 2>/dev/null || true
    echo "$(date): Log rotated (was ${CURRENT_SIZE} bytes)" >> /var/log/log-rotator.log
  fi
fi

# Delete logs older than 30 days
find /var/log/myapp/ -name "*.gz" -mtime +30 -delete

📝 Explanation: stat -c%s gets file size in bytes. If >100MB, rename with timestamp, compress with gzip. kill -HUP sends SIGHUP signal to reopen log file (standard log rotation pattern). find -mtime +30 -delete removes logs older than 30 days automatically.

Chapter 4: User Management, Sudo Access & SSH Hardening

📖 Definition: Linux is a multi-user OS. Each user has a UID (User ID). UID 0 = root (superuser). UID 1-999 = system accounts. UID 1000+ = human users. Key files: /etc/passwd (user accounts), /etc/shadow (encrypted passwords), /etc/sudoers (sudo permissions). SSH keys provide passwordless authentication using public-key cryptography.

📁 CASE PROJECT 4.1: Bastion Host Jumpserver Setup

Scenario: You need to secure access to 50 internal production servers. Only allow SSH through a single bastion host (jump server) with no direct internet access to internal servers.

# On Bastion Host:
$ sudo useradd -m -s /bin/bash jumpser
$ sudo mkdir -p /home/jumpser/.ssh
$ sudo chmod 700 /home/jumpser/.ssh
$ echo "AllowUsers jumpser" | sudo tee -a /etc/ssh/sshd_config
$ sudo systemctl restart sshd

# On Client Machine (~/.ssh/config):
Host bastion
    HostName bastion.company.com
    User jumpser
    IdentityFile ~/.ssh/id_ed25519

Host 10.0.*.*
    ProxyJump bastion
    User ubuntu
    IdentityFile ~/.ssh/id_ed25519

# Now connect directly:
$ ssh 10.0.1.10 # Automatically tunnels through bastion!

🔐 Explanation: Only user 'jumpser' can SSH directly to bastion. AllowUsers directive restricts SSH access. On client, ProxyJump directive tunnels SSH through bastion automatically. Internal servers have no public IP and only allow SSH from bastion's private IP. This creates a secure DMZ pattern.

📁 CASE PROJECT 4.2: Automated SSH Key Rotation (30-day Security Policy)

Scenario: Security policy requires SSH keys to be rotated every 30 days. Automate key rotation across 100+ servers without locking yourself out.

$ cat rotate-ssh-keys.sh
#!/bin/bash
set -e
KEY_COMMENT="devops-$(date +%Y%m%d)"
KEY_PATH="$HOME/.ssh/id_ed25519_$KEY_COMMENT"

# Generate new key pair
ssh-keygen -t ed25519 -f "$KEY_PATH" -N "" -C "$KEY_COMMENT"

# Deploy to all servers
for server in $(cat servers.txt); do
  echo "Rotating keys on $server..."
  ssh-copy-id -i "${KEY_PATH}.pub" "$server"
  # Remove old keys (older than 30 days by comment pattern)
  ssh "$server" "sed -i '/devops-/d' ~/.ssh/authorized_keys"
  # Verify we can still connect
  ssh -i "$KEY_PATH" "$server" "echo '✅ Connected with new key'"
done

# Update local SSH config to use new key
sed -i "s|IdentityFile.*id_ed25519|IdentityFile $KEY_PATH|g" ~/.ssh/config
echo "✅ Key rotation complete. Old keys removed from all servers."

🔄 Explanation: Generates new Ed25519 key pair with date comment. ssh-copy-id adds new public key to each server. sed -i '/devops-/d' removes all old keys with 'devops-' pattern. Verification step ensures new key works before removing old ones, preventing lockout.

Chapter 5: Process Management, Signals & systemd Services

📖 Definition: A process is a running instance of a program. Each process has a PID (Process ID). Signals are software interrupts sent to processes: SIGTERM (15) = graceful termination, SIGKILL (9) = force kill, SIGHUP (1) = reload config. systemd is the modern init system that manages services, dependencies, and logging.

📁 CASE PROJECT 5.1: Auto-healing Service with Restart Limits

Scenario: Your Node.js application crashes randomly due to memory issues. Configure systemd to auto-restart the service but prevent infinite crash loops.

$ sudo nano /etc/systemd/system/myapp.service
[Unit]
Description=My Node.js Application
After=network.target
Wants=docker.service

[Service]
Type=simple
User=appuser
Group=appuser
WorkingDirectory=/opt/myapp
ExecStart=/usr/bin/node /opt/myapp/server.js
Environment="NODE_ENV=production"
Environment="PORT=8080"
Restart=on-failure
RestartSec=10
StartLimitBurst=5
StartLimitIntervalSec=60
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

$ sudo systemctl daemon-reload
$ sudo systemctl enable --now myapp
$ sudo systemctl status myapp

🩺 Explanation: Restart=on-failure restarts only on crash (not normal stop). RestartSec=10 waits 10 seconds between restarts. StartLimitBurst=5 and StartLimitIntervalSec=60 limit to 5 restarts in 60 seconds; after that, systemd stops trying. LimitNOFILE=65536 increases file descriptor limit for high-traffic apps.

📁 CASE PROJECT 5.2: Memory Leak Detector with Auto-Restart

Scenario: Your application has a known memory leak. Monitor memory usage and restart automatically when it exceeds 2GB, before the server runs out of memory.

$ cat /opt/scripts/memory-monitor.sh
#!/bin/bash
SERVICE_NAME="myapp"
MEMORY_LIMIT_MB=2048
PID=$(pgrep -f "node.*server.js" | head -1)

if [ -n "$PID" ]; then
  MEM_KB=$(ps -o rss= -p $PID 2>/dev/null)
  if [ -n "$MEM_KB" ]; then
    MEM_MB=$((MEM_KB / 1024))
    if [ $MEM_MB -gt $MEMORY_LIMIT_MB ]; then
      echo "$(date): Memory limit exceeded (${MEM_MB}MB > ${MEMORY_LIMIT_MB}MB), restarting..."
      sudo systemctl restart $SERVICE_NAME
    fi
  fi
fi

# Schedule with cron every 5 minutes
$ sudo crontab -e
*/5 * * * * /opt/scripts/memory-monitor.sh >> /var/log/memory-monitor.log 2>&1

📊 Explanation: pgrep -f "node.*server.js" finds process by full command line pattern. ps -o rss= gets Resident Set Size in KB. Convert to MB, compare with limit. If exceeded, systemctl restart gracefully restarts service. Cron runs every 5 minutes for proactive monitoring.

Chapter 6: Networking — IP, Ports, DNS, Firewall (UFW/iptables)

📖 Definition: Linux networking commands: ip addr (show interfaces), ss (socket statistics), ping (ICMP reachability), dig (DNS lookup), curl (HTTP testing). UFW (Uncomplicated Firewall) is a frontend for iptables. Common ports: 22=SSH, 80=HTTP, 443=HTTPS, 3306=MySQL, 5432=PostgreSQL, 6379=Redis.

📁 CASE PROJECT 6.1: DDoS Mitigation with Rate Limiting

Scenario: Your web server is under SYN flood attack. Implement rate limiting to protect against DDoS while allowing legitimate traffic.

# Basic UFW rate limiting for SSH
$ sudo ufw limit ssh

# Advanced iptables rate limiting for HTTP/HTTPS
$ sudo iptables -A INPUT -p tcp --dport 80 -m limit --limit 25/minute --limit-burst 50 -j ACCEPT
$ sudo iptables -A INPUT -p tcp --dport 80 -j DROP
$ sudo iptables -A INPUT -p tcp --dport 443 -m limit --limit 25/minute --limit-burst 50 -j ACCEPT
$ sudo iptables -A INPUT -p tcp --dport 443 -j DROP

# Protect against SYN flood specifically
$ sudo iptables -A INPUT -p tcp --syn -m limit --limit 12/s --limit-burst 24 -j ACCEPT
$ sudo iptables -A INPUT -p tcp --syn -j DROP

# Save iptables rules persistently
$ sudo apt install iptables-persistent
$ sudo netfilter-persistent save

🛡️ Explanation: UFW limit allows 6 connections per 30 seconds per IP. iptables rate limits HTTP to 25 packets/minute with burst 50. SYN flood protection limits SYN packets to 12/second. Excess connections are dropped, preventing resource exhaustion while legitimate traffic passes.

📁 CASE PROJECT 6.2: Internal Service Discovery without DNS

Scenario: No DNS server is available in your development environment. Map internal services using /etc/hosts for service discovery.

# Add internal service mappings
$ echo "10.0.1.10 db.postgres.internal" | sudo tee -a /etc/hosts
$ echo "10.0.1.20 cache.redis.internal" | sudo tee -a /etc/hosts
$ echo "10.0.1.30 api.auth.internal" | sudo tee -a /etc/hosts

# Test connectivity
$ ping -c 2 db.postgres.internal
$ nc -zv db.postgres.internal 5432

# Ansible playbook to distribute hosts file across cluster
$ cat distribute-hosts.yml
- hosts: all
  tasks:
    - name: Add internal service mappings
      lineinfile:
        path: /etc/hosts
        line: "{{ item }}"
      loop:
        - "10.0.1.10 db.postgres.internal"
        - "10.0.1.20 cache.redis.internal"

🌐 Explanation: Manual hostname resolution via /etc/hosts bypasses DNS. Useful for container networking, development environments, or air-gapped networks. nc -zv tests if port is open and reachable. Ansible playbook distributes same hosts file across entire cluster for consistent service discovery.

Chapter 7: Package Management — apt, yum, dnf, and DevOps Toolchain

📖 Definition: Package managers automate software installation, updates, and dependency resolution. Ubuntu/Debian use apt (.deb packages). CentOS/RHEL use yum or dnf (.rpm packages). Key commands: apt update (refresh indexes), apt install, apt upgrade, apt autoremove (clean orphans).

📁 CASE PROJECT 7.1: Offline Package Mirror for Air-gapped Servers

Scenario: You have servers in a secure, air-gapped environment with no internet access. Download all required packages on an internet-connected machine and transfer them.

# On internet-connected machine (download only)
$ sudo apt-get update
$ sudo apt-get install --download-only nginx docker.io postgresql-14 redis
$ sudo tar -czf offline-packages.tar.gz /var/cache/apt/archives/*.deb

# Transfer tarball to air-gapped server (via USB/DVD)
$ sudo tar -xzf offline-packages.tar.gz -C /var/cache/apt/archives/
$ sudo dpkg -i /var/cache/apt/archives/*.deb

# Handle dependencies automatically
$ sudo apt-get install --fix-broken
$ sudo apt-get install -f

# For RPM-based systems (CentOS/RHEL)
$ sudo yum install --downloadonly --downloaddir=/tmp/packages nginx docker
$ sudo rpm -ivh /tmp/packages/*.rpm

📦 Explanation: --download-only fetches packages without installing. Tarball contains all .deb files with dependencies. On air-gapped server, extract and run dpkg -i to install. apt-get install -f resolves any missing dependencies. Critical for secure/classified environments.

📁 CASE PROJECT 7.2: Complete DevOps Workstation Provisioning Script

Scenario: New DevOps engineers join the team. Create a one-click script that installs all required tools: Docker, kubectl, Terraform, Ansible, Helm, AWS CLI.

$ cat provision-devops-tools.sh
#!/bin/bash
set -e

# Update system
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget git vim jq unzip htop net-tools

# Install Docker
curl -fsSL https://get.docker.com | bash
sudo usermod -aG docker $USER

# Install kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# Install Terraform
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform -y

# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Install AWS CLI v2
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip && sudo ./aws/install

# Install Ansible
sudo apt install -y python3-pip
pip3 install ansible

# Verify installations
echo "✅ Docker: $(docker --version)"
echo "✅ kubectl: $(kubectl version --client)"
echo "✅ Terraform: $(terraform version)"
echo "✅ Helm: $(helm version)"
echo "✅ AWS CLI: $(aws --version)"
echo "✅ Ansible: $(ansible --version | head -1)"

echo "🎉 DevOps workstation provisioned! Please log out and back in for Docker permissions."

🛠️ Explanation: One-click setup for complete DevOps toolchain. Docker from official convenience script. kubectl from K8s releases. Terraform from HashiCorp APT repo. Helm from get.helm.sh. AWS CLI v2 from Amazon. Ansible via pip. Verification ensures all tools installed correctly. New engineers can start working immediately.

Chapter 8: Cron Jobs — Automating Scheduled Tasks

📖 Definition: Cron is a time-based job scheduler. Syntax: minute hour day month weekday command. Special: */5 = every 5 units, 1-5 = range, 0,12 = specific values. @reboot = run at startup. Always redirect output: command >> /var/log/job.log 2>&1 because cron has minimal PATH.

📁 CASE PROJECT 8.1: Automated PostgreSQL Backup with Retention Policy

Scenario: Production database must be backed up daily at 2 AM. Keep 7 days of backups, compress them, and optionally upload to S3.

$ cat /opt/scripts/postgres-backup.sh
#!/bin/bash
BACKUP_DIR="/backups/postgres"
DB_NAME="production_db"
DB_USER="backup_user"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p "$BACKUP_DIR"

# Perform backup
PGPASSWORD="secret" pg_dump -U $DB_USER -h localhost $DB_NAME | gzip > "$BACKUP_DIR/db_${DATE}.sql.gz"

# Delete backups older than 7 days
find "$BACKUP_DIR" -name "db_*.sql.gz" -mtime +7 -delete

# Optional: Upload to S3 (requires AWS CLI configured)
# aws s3 cp "$BACKUP_DIR/db_${DATE}.sql.gz" s3://mybucket/backups/

echo "$(date): Backup completed - db_${DATE}.sql.gz" >> /var/log/backup.log

# Add to crontab
$ sudo crontab -e
# Daily backup at 2:00 AM
0 2 * * * /opt/scripts/postgres-backup.sh >> /var/log/backup.log 2>&1

# Health check every 5 minutes (monitoring)
*/5 * * * * curl -fsS https://hc-ping.com/your-uuid || /opt/scripts/alert.sh

⏰ Explanation: pg_dump creates SQL backup, piped to gzip for compression. find -mtime +7 -delete removes backups older than 7 days. Cron runs daily at 2 AM. Health check pings monitoring service; if fails, triggers alert. Always capture output to log for debugging.

📁 CASE PROJECT 8.2: SSL Certificate Auto-Renewal (Let's Encrypt)

Scenario: Let's Encrypt certificates expire every 90 days. Automate renewal with pre and post hooks to avoid downtime.

$ sudo crontab -e
# Check certificate renewal twice daily (at 12 AM and 12 PM)
0 0,12 * * * /usr/bin/certbot renew --quiet --deploy-hook "systemctl reload nginx" --pre-hook "echo 'Renewal starting' >> /var/log/certbot.log" >> /var/log/certbot-renew.log 2>&1

# Alternative: Weekly renewal check with email notification
0 2 * * 0 /usr/bin/certbot renew --quiet && echo "SSL renewed on $(date)" | mail -s "SSL Certificate Status" admin@company.com

# For wildcard certificates (DNS challenge)
$ cat /opt/scripts/dns-renew.sh
#!/bin/bash
certbot renew --manual --preferred-challenges dns --quiet
systemctl reload nginx
systemctl reload haproxy

🔒 Explanation: Certbot renew checks all certificates. --deploy-hook runs after successful renewal (reloads nginx). --pre-hook runs before renewal. Logs captured for audit. For wildcard certificates, DNS challenge requires manual intervention or API integration. Cron ensures certificates never expire unexpectedly.

Chapter 9: Log Analysis — grep, sed, awk Mastery

📖 Definition: Logs are the language servers speak. grep (global regular expression print) searches patterns. sed (stream editor) finds/replaces text. awk processes structured data by columns. Combine with pipes (|) for powerful log analysis pipelines.

📁 CASE PROJECT 9.1: Nginx Error Analyzer with Alerting

Scenario: Monitor Nginx access logs for 5xx errors. Generate hourly report with top failing endpoints and IP addresses.

$ cat /opt/scripts/nginx-error-analyzer.sh
#!/bin/bash
LOG_FILE="/var/log/nginx/access.log"
REPORT_DIR="/var/reports"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p "$REPORT_DIR"

# Total 5xx errors in last hour
ERROR_COUNT=$(awk -v date="$(date --date='1 hour ago' '+%d/%b/%Y:%H')" '$4 ~ date && $9 >= 500' $LOG_FILE | wc -l)

if [ $ERROR_COUNT -gt 100 ]; then
  echo "ALERT: High 5xx error rate: $ERROR_COUNT errors in last hour" | mail -s "⚠️ High Error Rate" oncall@company.com
fi

# Top 10 failing endpoints (5xx)
awk '$9 >= 500 {print $7, $9}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 > "$REPORT_DIR/failing_endpoints_$DATE.txt"

# Top 10 IP addresses causing 5xx
awk '$9 >= 500 {print $1}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 > "$REPORT_DIR/top_ips_$DATE.txt"

# Real-time monitoring with tail -f and grep
# Use: tail -f /var/log/nginx/access.log | grep --line-buffered "HTTP/1.1\" 5[0-9][0-9]"

# One-liner for quick analysis
$ awk '$9 == 404 {print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -20

🔎 Explanation: awk -v date=... passes variable to awk. $4 ~ date matches timestamp field. $9 >= 500 filters 5xx errors. Alert triggers if >100 errors/hour. sort | uniq -c | sort -rn counts duplicates and sorts descending. Real-time monitoring with tail -f | grep shows errors as they happen.

📁 CASE PROJECT 9.2: Failed SSH Login Intrusion Detection

Scenario: Detect brute force SSH attacks by analyzing auth.log. Block IPs with >10 failed attempts in 5 minutes.

$ cat /opt/scripts/ssh-attack-detector.sh
#!/bin/bash
AUTH_LOG="/var/log/auth.log"
THRESHOLD=10
TIME_WINDOW=5 # minutes

# Extract failed SSH attempts in last 5 minutes
FAILED_IPS=$(grep "Failed password" $AUTH_LOG | grep "$(date --date="$TIME_WINDOW minutes ago" '+%b %d %H:%M')" | awk '{print $(NF-3)}' | sort | uniq -c | awk -v t=$THRESHOLD '$1 > t {print $2}')

# Block attacking IPs using ufw
for IP in $FAILED_IPS; do
  if ! sudo ufw status | grep -q "$IP"; then
    sudo ufw deny from $IP to any
    echo "$(date): Blocked attacking IP $IP" >> /var/log/ssh-block.log
  fi
done

# Alternative using sed to anonymize IPs in logs
sed -i 's/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/REDACTED/g' $AUTH_LOG

# Count unique attackers per day
grep "Failed password" $AUTH_LOG | awk '{print $(NF-3)}' | sort | uniq -c | sort -rn | head -20

# Schedule to run every 5 minutes
$ sudo crontab -e
*/5 * * * * /opt/scripts/ssh-attack-detector.sh

🛡️ Explanation: grep "Failed password" finds failed SSH attempts. awk '{print $(NF-3)}' extracts IP address (NF = last field, NF-3 = 4th from end). Count attempts per IP, block if >10 in 5 minutes. sed -i redacts IPs for privacy compliance (GDPR). Run every 5 minutes via cron for automatic attack mitigation.

Chapter 10: LVM (Logical Volume Manager) — Flexible Storage Management

📖 Definition: LVM adds abstraction between physical disks and filesystems. Components: PV (Physical Volume) = raw disk, VG (Volume Group) = pool of storage, LV (Logical Volume) = virtual partition. Benefits: resize without unmounting, snapshots, striping, mirroring. Commands: pvcreate, vgcreate, lvcreate, lvextend, resize2fs (ext4) or xfs_growfs (XFS).

📁 CASE PROJECT 10.1: Dynamic Volume Expansion for Growing Database

Scenario: PostgreSQL database is running out of space on /var/lib/postgresql. Add new disk and extend LVM volume without downtime.

# Step 1: Add new disk to VM (e.g., /dev/sdb from cloud console)
# Step 2: Create physical volume
$ sudo pvcreate /dev/sdb
$ sudo pvs # Verify PV created

# Step 3: Extend volume group (assuming existing VG named 'vg_data')
$ sudo vgextend vg_data /dev/sdb
$ sudo vgs # Verify free space in VG

# Step 4: Extend logical volume (add 20GB)
$ sudo lvextend -L +20G /dev/vg_data/lv_postgres
$ # Or extend to use all free space: lvextend -l +100%FREE /dev/vg_data/lv_postgres

# Step 5: Resize filesystem online (no unmount needed!)
$ sudo resize2fs /dev/vg_data/lv_postgres # For ext4
# For XFS: sudo xfs_growfs /mount/point

# Step 6: Verify new size
$ df -h /var/lib/postgresql
$ sudo lvdisplay /dev/vg_data/lv_postgres

# Create LVM snapshot for backup (point-in-time)
$ sudo lvcreate -L 10G -s -n lv_postgres_snap /dev/vg_data/lv_postgres
$ sudo mount /dev/vg_data/lv_postgres_snap /mnt/snapshot
$ # Backup from snapshot, then remove: sudo lvremove vg_data/lv_postgres_snap

💾 Explanation: pvcreate /dev/sdb initializes new disk as PV. vgextend adds PV to existing VG. lvextend -L +20G adds 20GB to LV. resize2fs expands filesystem online (critical for production - no downtime!). LVM snapshots provide consistent backups without stopping database writes.

📁 CASE PROJECT 10.2: LVM Thin Provisioning for Container Storage

Scenario: Docker/Kubernetes nodes need thin provisioning to over-allocate storage efficiently. Create thin pool for container volumes.

# Create thin pool (allocate metadata and data)
$ sudo pvcreate /dev/sdb /dev/sdc
$ sudo vgcreate vg_container /dev/sdb /dev/sdc
$ sudo lvcreate -L 1G -n thin_pool_meta vg_container
$ sudo lvcreate -L 100G -n thin_pool_data vg_container
$ sudo lvconvert --type thin-pool --poolmetadata vg_container/thin_pool_meta vg_container/thin_pool_data

# Create thin volumes (over-provisioned)
$ sudo lvcreate -V 50G -T vg_container/thin_pool_data -n container_vol1
$ sudo lvcreate -V 50G -T vg_container/thin_pool_data -n container_vol2
$ sudo lvcreate -V 50G -T vg_container/thin_pool_data -n container_vol3

# Format and mount
$ sudo mkfs.ext4 /dev/vg_container/container_vol1
$ sudo mount /dev/vg_container/container_vol1 /var/lib/docker

# Monitor thin pool usage
$ sudo lvs -a vg_container
$ sudo lvdisplay --maps vg_container/thin_pool_data

# Extend thin pool when data usage > 80%
$ sudo pvcreate /dev/sdd
$ sudo vgextend vg_container /dev/sdd
$ sudo lvextend -L +50G vg_container/thin_pool_data

📊 Explanation: Thin provisioning allows over-allocation (e.g., 150GB allocated from 100GB pool). lvconvert --type thin-pool creates thin pool with separate metadata LV. Thin volumes consume space only as written. Monitor pool usage with lvs -a. Extend pool online when needed. Perfect for container storage where many volumes have low actual usage.

Chapter 11: Environment Variables & Configuration Management (12-Factor App)

📖 Definition: Environment variables store configuration outside code (12-Factor App methodology). Key concepts: export VAR=value (set), $VAR (access), env (list all), .env files (load multiple vars). Never commit secrets to git! Use .gitignore. For production, use secret managers (HashiCorp Vault, AWS Secrets Manager).

📁 CASE PROJECT 11.1: Multi-Environment Deployment with .env Files

Scenario: Same application runs in development, staging, and production with different databases, API keys, and log levels. Implement environment-specific configuration.

# Directory structure
/opt/myapp/
├── .env.base
├── .env.development
├── .env.staging
├── .env.production
└── load-config.sh

$ cat .env.base
APP_NAME=MyApplication
LOG_FORMAT=json

$ cat .env.development
APP_ENV=development
DATABASE_URL=postgresql://localhost/dev_db
DEBUG=true
LOG_LEVEL=debug

$ cat .env.production
APP_ENV=production
DATABASE_URL=postgresql://prod-db.internal:5432/app_prod
DEBUG=false
LOG_LEVEL=warn
API_KEY=prod-secret-key

$ cat load-config.sh
#!/bin/bash
ENV=${1:-development}
CONFIG_DIR="/opt/myapp"

if [ ! -f "$CONFIG_DIR/.env.$ENV" ]; then
  echo "❌ Environment file .env.$ENV not found!"
  exit 1
fi

# Load base then environment-specific (override)
set -a
source "$CONFIG_DIR/.env.base"
source "$CONFIG_DIR/.env.$ENV"
set +a

# Validate required variables
for var in DATABASE_URL APP_ENV; do
  if [ -z "${!var}" ]; then
    echo "❌ Required variable $var is not set for $ENV environment!"
    exit 1
  fi
done

echo "✅ Loaded configuration for $ENV environment"
echo "   DATABASE_URL: ${DATABASE_URL}"

# Usage: source load-config.sh production
# Then run: node server.js

# Docker run with env file
$ docker run --env-file .env.production myapp:latest

# Kubernetes secret from env file
$ kubectl create secret generic app-config --from-env-file=.env.production

🌍 Explanation: set -a automatically exports all variables from sourced files. ${!var} is indirect variable expansion (gets value of variable named by $var). Validation ensures required configs exist. Never commit .env files to git! Use .gitignore. Docker and Kubernetes support env-file natively.

📁 CASE PROJECT 11.2: Secure Secrets Management with HashiCorp Vault

Scenario: Production secrets (API keys, database passwords) cannot be stored in files or environment variables. Integrate with HashiCorp Vault for dynamic secrets.

# Start Vault dev server (testing only)
$ vault server -dev
$ export VAULT_ADDR='http://127.0.0.1:8200'

# Store a secret
$ vault kv put secret/myapp DATABASE_PASSWORD="s3cr3t" API_KEY="abc123"

# Retrieve secret
$ vault kv get -field=DATABASE_PASSWORD secret/myapp

# Application script to fetch secrets
$ cat /opt/scripts/fetch-secrets.sh
#!/bin/bash
export DATABASE_PASSWORD=$(vault kv get -field=DATABASE_PASSWORD secret/myapp)
export API_KEY=$(vault kv get -field=API_KEY secret/myapp)
exec node /opt/myapp/server.js

# systemd service with Vault integration
$ sudo nano /etc/systemd/system/myapp.service
[Service]
Environment="VAULT_ADDR=http://vault.internal:8200"
ExecStartPre=/usr/local/bin/vault login -method=token token=$(cat /etc/vault-token)
ExecStart=/opt/scripts/fetch-secrets.sh

# Environment variable best practices
$ # Never hardcode secrets in scripts
$ # Never commit .env files to git
$ echo ".env" >> .gitignore
$ echo "*.env" >> .gitignore
$ echo "secrets/" >> .gitignore

# Validate no secrets in git history
$ git log -p | grep -i "password\|secret\|key\|token"

🔐 Explanation: Vault provides dynamic, encrypted secrets with audit logs. vault kv put stores secrets, vault kv get retrieves them. Secrets never appear in process lists or logs. systemd can fetch secrets before starting service. Always gitignore env files and scan for accidentally committed secrets. This is production-grade security.

🎉 CONGRATULATIONS! You've completed all 11 chapters of the Linux for DevOps course. 🎉
You now have production-grade knowledge of: Linux kernel, filesystem permissions, bash scripting, user management, systemd, networking, package management, cron scheduling, log analysis, LVM storage, and environment configuration.
Next recommended course: Docker & Kubernetes Orchestration — with the same line-by-line depth!

🌓 Light/Dark Mode

Sumit Sharma

11 Posts View All Posts

Linux for DevOpsFull Course with Case Scenarios