Linux for DevOps: Complete Beginner to Advanced Guide (With Real Projects & Commands)
Linux for DevOps Complete Beginner's Guide "tutorial of 11 in-depth chapters covering every Linux skill you need to start your DevOps career โ with real examples, diagrams, and projects.
๐ง LINUX FOR DEVOPS โ COMPLETE 11 CHAPTERS ยท 22+ REAL PROJECTS
Linux for DevOps
Linux for DevOps
Full Course with Case Scenarios
11 exhaustive chapters ยท 2+ production case projects per chapter ยท Line-by-line command explanations ยท Copy-paste ready
Chapter 1: Linux Kernel โ The Heart of DevOps
๐ Definition: Linux is an open-source Unix-like kernel created by Linus Torvalds in 1991. It manages hardware, processes, memory, file systems, and I/O. Linux powers 96% of cloud servers, all top 500 supercomputers, Android devices, and every container runtime (Docker, Kubernetes).
1.1 Essential System Information Commands
$ uname -r
5.15.0-91-generic
$ cat /etc/os-release
NAME="Ubuntu" VERSION="22.04.3 LTS"
$ hostnamectl
Static hostname: ubuntu-server
๐ Line-by-Line Explanation:
โข
โข
โข
โข
uname -r โ displays kernel release version. DevOps engineers check this before deploying kernel modules.โข
cat /etc/os-release โ shows distribution name and version. Used in automation scripts for distro-specific logic (apt vs yum).โข
hostnamectl โ shows system hostname and OS details in one command.๐ CASE PROJECT 1.1: Production Server Audit Script
Scenario: You’re a DevOps engineer joining a company. You need to audit 100 production servers to inventory OS versions, kernel, CPU, RAM, and disk usage for capacity planning.
$ cat server-audit.sh
#!/bin/bash
echo "=== SERVER AUDIT REPORT ==="
echo "Hostname: $(hostname)"
echo "Kernel: $(uname -r)"
echo "OS: $(cat /etc/os-release | grep PRETTY_NAME | cut -d'"' -f2)"
echo "CPU Cores: $(nproc)"
echo "RAM: $(free -h | awk '/^Mem:/ {print $2}')"
echo "Disk /: $(df -h / | awk 'NR==2 {print $5}')"
echo "Uptime: $(uptime -p)"
๐ฏ Solution Logic:
grep PRETTY_NAME | cut -d'"' -f2 extracts human-readable OS name. free -h | awk '/^Mem:/ {print $2}' gets total RAM. Run via for server in $(cat servers.txt); do ssh $server 'bash -s' < server-audit.sh; done for bulk audit across all servers.๐ CASE PROJECT 1.2: Kernel Version Compliance Checker
Scenario: Security team mandates kernel version 5.15+ across all production servers due to a CVE vulnerability. Write a script that flags non-compliant servers and fails CI/CD pipelines.
$ cat kernel-compliance.sh
#!/bin/bash
KERNEL=$(uname -r | cut -d'-' -f1)
REQUIRED="5.15"
if [[ "$(printf '%s\n' "$REQUIRED" "$KERNEL" | sort -V | head -n1)" != "$REQUIRED" ]]; then
echo "โ FAIL: Kernel $KERNEL is below $REQUIRED"
exit 1
else
echo "โ
PASS: Kernel $KERNEL meets requirement"
exit 0
fi
๐ฏ Logic Breakdown:
cut -d'-' -f1 removes build suffix. sort -V performs version sorting. If required version is greater than current, script exits with error (exit 1), blocking deployment pipelines. Integrate with Jenkins/GitHub Actions pre-deployment checks.Chapter 2: Linux Filesystem Hierarchy & Permission Octal System
๐ Definition: In Linux, everything is a file โ including directories, devices, sockets, and processes. The Filesystem Hierarchy Standard (FHS) defines directories like
/etc (configurations), /var/log (logs), /proc (process info). Permissions: read(r)=4, write(w)=2, execute(x)=1. Octal mode combines these (e.g., 755 = rwxr-xr-x, 644 = rw-r--r--).๐ CASE PROJECT 2.1: WordPress Security Hardening
Scenario: Your WordPress site was hacked because of world-writable files. Implement proper file permissions to prevent unauthorized modifications.
$ sudo find /var/www/html -type f -exec chmod 644 {} \;
$ sudo find /var/www/html -type d -exec chmod 755 {} \;
$ sudo chown -R www-data:www-data /var/www/html
$ sudo chmod 600 /var/www/html/wp-config.php
$ sudo chmod 640 /var/www/html/.htaccess
๐ Detailed Permission Logic:
โข
โข
โข
โข
โข
โข
find -type f -exec chmod 644 โ all files get rw-r--r-- (owner read/write, group/others read only).โข
find -type d -exec chmod 755 โ directories get rwxr-xr-x (execute needed to traverse).โข
chown -R www-data:www-data โ web server user owns all files.โข
chmod 600 wp-config.php โ database credentials file is private to owner only.โข
chmod 640 .htaccess โ web server can read, group can't write.๐ CASE PROJECT 2.2: Shared Development Environment with SGID
Scenario: 5 developers need to collaborate on a shared project directory. Any file created should automatically be writable by all team members.
$ sudo groupadd devteam
$ sudo usermod -aG devteam alice
$ sudo usermod -aG devteam bob
$ sudo usermod -aG devteam carol
$ sudo mkdir -p /opt/project
$ sudo chown -R :devteam /opt/project
$ sudo chmod 2775 /opt/project
$ sudo chmod 664 /opt/project/*
$ umask 002 # Add to each user's .bashrc
๐ฅ Explanation: SGID bit (2 in 2775) makes new files and directories inherit the group 'devteam'.
chmod 2775 = rwxrwsr-x (s = SGID). umask 002 ensures new files are created with 664 permissions. This setup eliminates permission conflicts without requiring root intervention.Chapter 3: Bash Scripting โ Variables, Conditionals, Loops, Functions
๐ Definition: Bash (Bourne Again SHell) is the most common Linux shell and scripting language. Scripts start with
#!/bin/bash shebang. Key concepts: variables ($VAR), command substitution ($(command)), conditionals (if/then/else), loops (for/while/until), functions, exit codes (0=success).๐ CASE PROJECT 3.1: Deployment Rollback System with 5 Version Retention
Scenario: Application deployment failed in production. Create a script that maintains last 5 versions and can rollback to any previous version instantly.
$ cat deploy-manager.sh
#!/bin/bash
set -euo pipefail
APP_DIR="/opt/myapp"
BACKUP_DIR="/opt/backups"
VERSION=$(date +%Y%m%d_%H%M%S)
# Backup current version before deployment
backup() {
tar -czf "$BACKUP_DIR/app_$VERSION.tar.gz" -C "$APP_DIR" .
echo "โ
Backup created: app_$VERSION.tar.gz"
}
# Keep only last 5 backups
cleanup() {
ls -t $BACKUP_DIR/app_*.tar.gz | tail -n +6 | xargs -r rm -f
}
# Rollback to specific version
rollback() {
local backup_file=$1
if [ ! -f "$BACKUP_DIR/$backup_file" ]; then
echo "โ Backup $backup_file not found!"
return 1
fi
rm -rf $APP_DIR/*
tar -xzf "$BACKUP_DIR/$backup_file" -C "$APP_DIR"
sudo systemctl restart myapp
echo "โ
Rollback to $backup_file complete"
}
case ${1:-} in
backup) backup; cleanup;;
rollback) rollback "$2";;
*) echo "Usage: $0 {backup|rollback }"; exit 1;;
esac
๐ Complete Logic Breakdown:
โข
โข
โข
โข
โข Usage:
โข
set -euo pipefail โ exit on error, unset variable, or pipe failure (production best practice).โข
date +%Y%m%d_%H%M%S โ timestamp like 20250315_143022 for unique backups.โข
ls -t | tail -n +6 | xargs -r rm -f โ sorts by time, skips first 5, deletes older ones.โข
case ${1:-} โ handles command-line arguments (backup/rollback).โข Usage:
./deploy-manager.sh backup before deployment, ./deploy-manager.sh rollback app_20250315_143022.tar.gz to restore.๐ CASE PROJECT 3.2: Automated Log Rotator with Size Threshold
Scenario: Application logs grow too fast and fill up disk space. Create a script that rotates logs when size exceeds 100MB and compresses old logs.
$ cat log-rotator.sh
#!/bin/bash
LOG_FILE="/var/log/myapp/application.log"
MAX_SIZE_MB=100
MAX_SIZE_BYTES=$((MAX_SIZE_MB * 1024 * 1024))
if [ -f "$LOG_FILE" ]; then
CURRENT_SIZE=$(stat -c%s "$LOG_FILE")
if [ $CURRENT_SIZE -gt $MAX_SIZE_BYTES ]; then
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
mv "$LOG_FILE" "${LOG_FILE}.${TIMESTAMP}"
gzip "${LOG_FILE}.${TIMESTAMP}"
touch "$LOG_FILE"
chown appuser:appuser "$LOG_FILE"
chmod 644 "$LOG_FILE"
# Send SIGHUP to reopen log file (for apps that support it)
kill -HUP $(pgrep -f "myapp") 2>/dev/null || true
echo "$(date): Log rotated (was ${CURRENT_SIZE} bytes)" >> /var/log/log-rotator.log
fi
fi
# Delete logs older than 30 days
find /var/log/myapp/ -name "*.gz" -mtime +30 -delete
๐ Explanation:
stat -c%s gets file size in bytes. If >100MB, rename with timestamp, compress with gzip. kill -HUP sends SIGHUP signal to reopen log file (standard log rotation pattern). find -mtime +30 -delete removes logs older than 30 days automatically.Chapter 4: User Management, Sudo Access & SSH Hardening
๐ Definition: Linux is a multi-user OS. Each user has a UID (User ID). UID 0 = root (superuser). UID 1-999 = system accounts. UID 1000+ = human users. Key files:
/etc/passwd (user accounts), /etc/shadow (encrypted passwords), /etc/sudoers (sudo permissions). SSH keys provide passwordless authentication using public-key cryptography.๐ CASE PROJECT 4.1: Bastion Host Jumpserver Setup
Scenario: You need to secure access to 50 internal production servers. Only allow SSH through a single bastion host (jump server) with no direct internet access to internal servers.
# On Bastion Host:
$ sudo useradd -m -s /bin/bash jumpser
$ sudo mkdir -p /home/jumpser/.ssh
$ sudo chmod 700 /home/jumpser/.ssh
$ echo "AllowUsers jumpser" | sudo tee -a /etc/ssh/sshd_config
$ sudo systemctl restart sshd
# On Client Machine (~/.ssh/config):
Host bastion
HostName bastion.company.com
User jumpser
IdentityFile ~/.ssh/id_ed25519
Host 10.0.*.*
ProxyJump bastion
User ubuntu
IdentityFile ~/.ssh/id_ed25519
# Now connect directly:
$ ssh 10.0.1.10 # Automatically tunnels through bastion!
๐ Explanation: Only user 'jumpser' can SSH directly to bastion.
AllowUsers directive restricts SSH access. On client, ProxyJump directive tunnels SSH through bastion automatically. Internal servers have no public IP and only allow SSH from bastion's private IP. This creates a secure DMZ pattern.๐ CASE PROJECT 4.2: Automated SSH Key Rotation (30-day Security Policy)
Scenario: Security policy requires SSH keys to be rotated every 30 days. Automate key rotation across 100+ servers without locking yourself out.
$ cat rotate-ssh-keys.sh
#!/bin/bash
set -e
KEY_COMMENT="devops-$(date +%Y%m%d)"
KEY_PATH="$HOME/.ssh/id_ed25519_$KEY_COMMENT"
# Generate new key pair
ssh-keygen -t ed25519 -f "$KEY_PATH" -N "" -C "$KEY_COMMENT"
# Deploy to all servers
for server in $(cat servers.txt); do
echo "Rotating keys on $server..."
ssh-copy-id -i "${KEY_PATH}.pub" "$server"
# Remove old keys (older than 30 days by comment pattern)
ssh "$server" "sed -i '/devops-/d' ~/.ssh/authorized_keys"
# Verify we can still connect
ssh -i "$KEY_PATH" "$server" "echo 'โ
Connected with new key'"
done
# Update local SSH config to use new key
sed -i "s|IdentityFile.*id_ed25519|IdentityFile $KEY_PATH|g" ~/.ssh/config
echo "โ
Key rotation complete. Old keys removed from all servers."
๐ Explanation: Generates new Ed25519 key pair with date comment.
ssh-copy-id adds new public key to each server. sed -i '/devops-/d' removes all old keys with 'devops-' pattern. Verification step ensures new key works before removing old ones, preventing lockout.Chapter 5: Process Management, Signals & systemd Services
๐ Definition: A process is a running instance of a program. Each process has a PID (Process ID). Signals are software interrupts sent to processes: SIGTERM (15) = graceful termination, SIGKILL (9) = force kill, SIGHUP (1) = reload config. systemd is the modern init system that manages services, dependencies, and logging.
๐ CASE PROJECT 5.1: Auto-healing Service with Restart Limits
Scenario: Your Node.js application crashes randomly due to memory issues. Configure systemd to auto-restart the service but prevent infinite crash loops.
$ sudo nano /etc/systemd/system/myapp.service
[Unit]
Description=My Node.js Application
After=network.target
Wants=docker.service
[Service]
Type=simple
User=appuser
Group=appuser
WorkingDirectory=/opt/myapp
ExecStart=/usr/bin/node /opt/myapp/server.js
Environment="NODE_ENV=production"
Environment="PORT=8080"
Restart=on-failure
RestartSec=10
StartLimitBurst=5
StartLimitIntervalSec=60
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
$ sudo systemctl daemon-reload
$ sudo systemctl enable --now myapp
$ sudo systemctl status myapp
๐ฉบ Explanation:
Restart=on-failure restarts only on crash (not normal stop). RestartSec=10 waits 10 seconds between restarts. StartLimitBurst=5 and StartLimitIntervalSec=60 limit to 5 restarts in 60 seconds; after that, systemd stops trying. LimitNOFILE=65536 increases file descriptor limit for high-traffic apps.๐ CASE PROJECT 5.2: Memory Leak Detector with Auto-Restart
Scenario: Your application has a known memory leak. Monitor memory usage and restart automatically when it exceeds 2GB, before the server runs out of memory.
$ cat /opt/scripts/memory-monitor.sh
#!/bin/bash
SERVICE_NAME="myapp"
MEMORY_LIMIT_MB=2048
PID=$(pgrep -f "node.*server.js" | head -1)
if [ -n "$PID" ]; then
MEM_KB=$(ps -o rss= -p $PID 2>/dev/null)
if [ -n "$MEM_KB" ]; then
MEM_MB=$((MEM_KB / 1024))
if [ $MEM_MB -gt $MEMORY_LIMIT_MB ]; then
echo "$(date): Memory limit exceeded (${MEM_MB}MB > ${MEMORY_LIMIT_MB}MB), restarting..."
sudo systemctl restart $SERVICE_NAME
fi
fi
fi
# Schedule with cron every 5 minutes
$ sudo crontab -e
*/5 * * * * /opt/scripts/memory-monitor.sh >> /var/log/memory-monitor.log 2>&1
๐ Explanation:
pgrep -f "node.*server.js" finds process by full command line pattern. ps -o rss= gets Resident Set Size in KB. Convert to MB, compare with limit. If exceeded, systemctl restart gracefully restarts service. Cron runs every 5 minutes for proactive monitoring.Chapter 6: Networking โ IP, Ports, DNS, Firewall (UFW/iptables)
๐ Definition: Linux networking commands:
ip addr (show interfaces), ss (socket statistics), ping (ICMP reachability), dig (DNS lookup), curl (HTTP testing). UFW (Uncomplicated Firewall) is a frontend for iptables. Common ports: 22=SSH, 80=HTTP, 443=HTTPS, 3306=MySQL, 5432=PostgreSQL, 6379=Redis.๐ CASE PROJECT 6.1: DDoS Mitigation with Rate Limiting
Scenario: Your web server is under SYN flood attack. Implement rate limiting to protect against DDoS while allowing legitimate traffic.
# Basic UFW rate limiting for SSH
$ sudo ufw limit ssh
# Advanced iptables rate limiting for HTTP/HTTPS
$ sudo iptables -A INPUT -p tcp --dport 80 -m limit --limit 25/minute --limit-burst 50 -j ACCEPT
$ sudo iptables -A INPUT -p tcp --dport 80 -j DROP
$ sudo iptables -A INPUT -p tcp --dport 443 -m limit --limit 25/minute --limit-burst 50 -j ACCEPT
$ sudo iptables -A INPUT -p tcp --dport 443 -j DROP
# Protect against SYN flood specifically
$ sudo iptables -A INPUT -p tcp --syn -m limit --limit 12/s --limit-burst 24 -j ACCEPT
$ sudo iptables -A INPUT -p tcp --syn -j DROP
# Save iptables rules persistently
$ sudo apt install iptables-persistent
$ sudo netfilter-persistent save
๐ก๏ธ Explanation: UFW limit allows 6 connections per 30 seconds per IP. iptables rate limits HTTP to 25 packets/minute with burst 50. SYN flood protection limits SYN packets to 12/second. Excess connections are dropped, preventing resource exhaustion while legitimate traffic passes.
๐ CASE PROJECT 6.2: Internal Service Discovery without DNS
Scenario: No DNS server is available in your development environment. Map internal services using /etc/hosts for service discovery.
# Add internal service mappings
$ echo "10.0.1.10 db.postgres.internal" | sudo tee -a /etc/hosts
$ echo "10.0.1.20 cache.redis.internal" | sudo tee -a /etc/hosts
$ echo "10.0.1.30 api.auth.internal" | sudo tee -a /etc/hosts
# Test connectivity
$ ping -c 2 db.postgres.internal
$ nc -zv db.postgres.internal 5432
# Ansible playbook to distribute hosts file across cluster
$ cat distribute-hosts.yml
- hosts: all
tasks:
- name: Add internal service mappings
lineinfile:
path: /etc/hosts
line: "{{ item }}"
loop:
- "10.0.1.10 db.postgres.internal"
- "10.0.1.20 cache.redis.internal"
๐ Explanation: Manual hostname resolution via /etc/hosts bypasses DNS. Useful for container networking, development environments, or air-gapped networks.
nc -zv tests if port is open and reachable. Ansible playbook distributes same hosts file across entire cluster for consistent service discovery.Chapter 7: Package Management โ apt, yum, dnf, and DevOps Toolchain
๐ Definition: Package managers automate software installation, updates, and dependency resolution. Ubuntu/Debian use
apt (.deb packages). CentOS/RHEL use yum or dnf (.rpm packages). Key commands: apt update (refresh indexes), apt install, apt upgrade, apt autoremove (clean orphans).๐ CASE PROJECT 7.1: Offline Package Mirror for Air-gapped Servers
Scenario: You have servers in a secure, air-gapped environment with no internet access. Download all required packages on an internet-connected machine and transfer them.
# On internet-connected machine (download only)
$ sudo apt-get update
$ sudo apt-get install --download-only nginx docker.io postgresql-14 redis
$ sudo tar -czf offline-packages.tar.gz /var/cache/apt/archives/*.deb
# Transfer tarball to air-gapped server (via USB/DVD)
$ sudo tar -xzf offline-packages.tar.gz -C /var/cache/apt/archives/
$ sudo dpkg -i /var/cache/apt/archives/*.deb
# Handle dependencies automatically
$ sudo apt-get install --fix-broken
$ sudo apt-get install -f
# For RPM-based systems (CentOS/RHEL)
$ sudo yum install --downloadonly --downloaddir=/tmp/packages nginx docker
$ sudo rpm -ivh /tmp/packages/*.rpm
๐ฆ Explanation:
--download-only fetches packages without installing. Tarball contains all .deb files with dependencies. On air-gapped server, extract and run dpkg -i to install. apt-get install -f resolves any missing dependencies. Critical for secure/classified environments.๐ CASE PROJECT 7.2: Complete DevOps Workstation Provisioning Script
Scenario: New DevOps engineers join the team. Create a one-click script that installs all required tools: Docker, kubectl, Terraform, Ansible, Helm, AWS CLI.
$ cat provision-devops-tools.sh
#!/bin/bash
set -e
# Update system
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget git vim jq unzip htop net-tools
# Install Docker
curl -fsSL https://get.docker.com | bash
sudo usermod -aG docker $USER
# Install kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
# Install Terraform
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform -y
# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Install AWS CLI v2
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip && sudo ./aws/install
# Install Ansible
sudo apt install -y python3-pip
pip3 install ansible
# Verify installations
echo "โ
Docker: $(docker --version)"
echo "โ
kubectl: $(kubectl version --client)"
echo "โ
Terraform: $(terraform version)"
echo "โ
Helm: $(helm version)"
echo "โ
AWS CLI: $(aws --version)"
echo "โ
Ansible: $(ansible --version | head -1)"
echo "๐ DevOps workstation provisioned! Please log out and back in for Docker permissions."
๐ ๏ธ Explanation: One-click setup for complete DevOps toolchain. Docker from official convenience script. kubectl from K8s releases. Terraform from HashiCorp APT repo. Helm from get.helm.sh. AWS CLI v2 from Amazon. Ansible via pip. Verification ensures all tools installed correctly. New engineers can start working immediately.
Chapter 8: Cron Jobs โ Automating Scheduled Tasks
๐ Definition: Cron is a time-based job scheduler. Syntax:
minute hour day month weekday command. Special: */5 = every 5 units, 1-5 = range, 0,12 = specific values. @reboot = run at startup. Always redirect output: command >> /var/log/job.log 2>&1 because cron has minimal PATH.๐ CASE PROJECT 8.1: Automated PostgreSQL Backup with Retention Policy
Scenario: Production database must be backed up daily at 2 AM. Keep 7 days of backups, compress them, and optionally upload to S3.
$ cat /opt/scripts/postgres-backup.sh
#!/bin/bash
BACKUP_DIR="/backups/postgres"
DB_NAME="production_db"
DB_USER="backup_user"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p "$BACKUP_DIR"
# Perform backup
PGPASSWORD="secret" pg_dump -U $DB_USER -h localhost $DB_NAME | gzip > "$BACKUP_DIR/db_${DATE}.sql.gz"
# Delete backups older than 7 days
find "$BACKUP_DIR" -name "db_*.sql.gz" -mtime +7 -delete
# Optional: Upload to S3 (requires AWS CLI configured)
# aws s3 cp "$BACKUP_DIR/db_${DATE}.sql.gz" s3://mybucket/backups/
echo "$(date): Backup completed - db_${DATE}.sql.gz" >> /var/log/backup.log
# Add to crontab
$ sudo crontab -e
# Daily backup at 2:00 AM
0 2 * * * /opt/scripts/postgres-backup.sh >> /var/log/backup.log 2>&1
# Health check every 5 minutes (monitoring)
*/5 * * * * curl -fsS https://hc-ping.com/your-uuid || /opt/scripts/alert.sh
โฐ Explanation:
pg_dump creates SQL backup, piped to gzip for compression. find -mtime +7 -delete removes backups older than 7 days. Cron runs daily at 2 AM. Health check pings monitoring service; if fails, triggers alert. Always capture output to log for debugging.๐ CASE PROJECT 8.2: SSL Certificate Auto-Renewal (Let's Encrypt)
Scenario: Let's Encrypt certificates expire every 90 days. Automate renewal with pre and post hooks to avoid downtime.
$ sudo crontab -e
# Check certificate renewal twice daily (at 12 AM and 12 PM)
0 0,12 * * * /usr/bin/certbot renew --quiet --deploy-hook "systemctl reload nginx" --pre-hook "echo 'Renewal starting' >> /var/log/certbot.log" >> /var/log/certbot-renew.log 2>&1
# Alternative: Weekly renewal check with email notification
0 2 * * 0 /usr/bin/certbot renew --quiet && echo "SSL renewed on $(date)" | mail -s "SSL Certificate Status" admin@company.com
# For wildcard certificates (DNS challenge)
$ cat /opt/scripts/dns-renew.sh
#!/bin/bash
certbot renew --manual --preferred-challenges dns --quiet
systemctl reload nginx
systemctl reload haproxy
๐ Explanation: Certbot renew checks all certificates.
--deploy-hook runs after successful renewal (reloads nginx). --pre-hook runs before renewal. Logs captured for audit. For wildcard certificates, DNS challenge requires manual intervention or API integration. Cron ensures certificates never expire unexpectedly.Chapter 9: Log Analysis โ grep, sed, awk Mastery
๐ Definition: Logs are the language servers speak. grep (global regular expression print) searches patterns. sed (stream editor) finds/replaces text. awk processes structured data by columns. Combine with pipes (
|) for powerful log analysis pipelines.๐ CASE PROJECT 9.1: Nginx Error Analyzer with Alerting
Scenario: Monitor Nginx access logs for 5xx errors. Generate hourly report with top failing endpoints and IP addresses.
$ cat /opt/scripts/nginx-error-analyzer.sh
#!/bin/bash
LOG_FILE="/var/log/nginx/access.log"
REPORT_DIR="/var/reports"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p "$REPORT_DIR"
# Total 5xx errors in last hour
ERROR_COUNT=$(awk -v date="$(date --date='1 hour ago' '+%d/%b/%Y:%H')" '$4 ~ date && $9 >= 500' $LOG_FILE | wc -l)
if [ $ERROR_COUNT -gt 100 ]; then
echo "ALERT: High 5xx error rate: $ERROR_COUNT errors in last hour" | mail -s "โ ๏ธ High Error Rate" oncall@company.com
fi
# Top 10 failing endpoints (5xx)
awk '$9 >= 500 {print $7, $9}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 > "$REPORT_DIR/failing_endpoints_$DATE.txt"
# Top 10 IP addresses causing 5xx
awk '$9 >= 500 {print $1}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 > "$REPORT_DIR/top_ips_$DATE.txt"
# Real-time monitoring with tail -f and grep
# Use: tail -f /var/log/nginx/access.log | grep --line-buffered "HTTP/1.1\" 5[0-9][0-9]"
# One-liner for quick analysis
$ awk '$9 == 404 {print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -20
๐ Explanation:
awk -v date=... passes variable to awk. $4 ~ date matches timestamp field. $9 >= 500 filters 5xx errors. Alert triggers if >100 errors/hour. sort | uniq -c | sort -rn counts duplicates and sorts descending. Real-time monitoring with tail -f | grep shows errors as they happen.๐ CASE PROJECT 9.2: Failed SSH Login Intrusion Detection
Scenario: Detect brute force SSH attacks by analyzing auth.log. Block IPs with >10 failed attempts in 5 minutes.
$ cat /opt/scripts/ssh-attack-detector.sh
#!/bin/bash
AUTH_LOG="/var/log/auth.log"
THRESHOLD=10
TIME_WINDOW=5 # minutes
# Extract failed SSH attempts in last 5 minutes
FAILED_IPS=$(grep "Failed password" $AUTH_LOG | grep "$(date --date="$TIME_WINDOW minutes ago" '+%b %d %H:%M')" | awk '{print $(NF-3)}' | sort | uniq -c | awk -v t=$THRESHOLD '$1 > t {print $2}')
# Block attacking IPs using ufw
for IP in $FAILED_IPS; do
if ! sudo ufw status | grep -q "$IP"; then
sudo ufw deny from $IP to any
echo "$(date): Blocked attacking IP $IP" >> /var/log/ssh-block.log
fi
done
# Alternative using sed to anonymize IPs in logs
sed -i 's/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/REDACTED/g' $AUTH_LOG
# Count unique attackers per day
grep "Failed password" $AUTH_LOG | awk '{print $(NF-3)}' | sort | uniq -c | sort -rn | head -20
# Schedule to run every 5 minutes
$ sudo crontab -e
*/5 * * * * /opt/scripts/ssh-attack-detector.sh
๐ก๏ธ Explanation:
grep "Failed password" finds failed SSH attempts. awk '{print $(NF-3)}' extracts IP address (NF = last field, NF-3 = 4th from end). Count attempts per IP, block if >10 in 5 minutes. sed -i redacts IPs for privacy compliance (GDPR). Run every 5 minutes via cron for automatic attack mitigation.Chapter 10: LVM (Logical Volume Manager) โ Flexible Storage Management
๐ Definition: LVM adds abstraction between physical disks and filesystems. Components: PV (Physical Volume) = raw disk, VG (Volume Group) = pool of storage, LV (Logical Volume) = virtual partition. Benefits: resize without unmounting, snapshots, striping, mirroring. Commands:
pvcreate, vgcreate, lvcreate, lvextend, resize2fs (ext4) or xfs_growfs (XFS).๐ CASE PROJECT 10.1: Dynamic Volume Expansion for Growing Database
Scenario: PostgreSQL database is running out of space on /var/lib/postgresql. Add new disk and extend LVM volume without downtime.
# Step 1: Add new disk to VM (e.g., /dev/sdb from cloud console)
# Step 2: Create physical volume
$ sudo pvcreate /dev/sdb
$ sudo pvs # Verify PV created
# Step 3: Extend volume group (assuming existing VG named 'vg_data')
$ sudo vgextend vg_data /dev/sdb
$ sudo vgs # Verify free space in VG
# Step 4: Extend logical volume (add 20GB)
$ sudo lvextend -L +20G /dev/vg_data/lv_postgres
$ # Or extend to use all free space: lvextend -l +100%FREE /dev/vg_data/lv_postgres
# Step 5: Resize filesystem online (no unmount needed!)
$ sudo resize2fs /dev/vg_data/lv_postgres # For ext4
# For XFS: sudo xfs_growfs /mount/point
# Step 6: Verify new size
$ df -h /var/lib/postgresql
$ sudo lvdisplay /dev/vg_data/lv_postgres
# Create LVM snapshot for backup (point-in-time)
$ sudo lvcreate -L 10G -s -n lv_postgres_snap /dev/vg_data/lv_postgres
$ sudo mount /dev/vg_data/lv_postgres_snap /mnt/snapshot
$ # Backup from snapshot, then remove: sudo lvremove vg_data/lv_postgres_snap
๐พ Explanation:
pvcreate /dev/sdb initializes new disk as PV. vgextend adds PV to existing VG. lvextend -L +20G adds 20GB to LV. resize2fs expands filesystem online (critical for production - no downtime!). LVM snapshots provide consistent backups without stopping database writes.๐ CASE PROJECT 10.2: LVM Thin Provisioning for Container Storage
Scenario: Docker/Kubernetes nodes need thin provisioning to over-allocate storage efficiently. Create thin pool for container volumes.
# Create thin pool (allocate metadata and data)
$ sudo pvcreate /dev/sdb /dev/sdc
$ sudo vgcreate vg_container /dev/sdb /dev/sdc
$ sudo lvcreate -L 1G -n thin_pool_meta vg_container
$ sudo lvcreate -L 100G -n thin_pool_data vg_container
$ sudo lvconvert --type thin-pool --poolmetadata vg_container/thin_pool_meta vg_container/thin_pool_data
# Create thin volumes (over-provisioned)
$ sudo lvcreate -V 50G -T vg_container/thin_pool_data -n container_vol1
$ sudo lvcreate -V 50G -T vg_container/thin_pool_data -n container_vol2
$ sudo lvcreate -V 50G -T vg_container/thin_pool_data -n container_vol3
# Format and mount
$ sudo mkfs.ext4 /dev/vg_container/container_vol1
$ sudo mount /dev/vg_container/container_vol1 /var/lib/docker
# Monitor thin pool usage
$ sudo lvs -a vg_container
$ sudo lvdisplay --maps vg_container/thin_pool_data
# Extend thin pool when data usage > 80%
$ sudo pvcreate /dev/sdd
$ sudo vgextend vg_container /dev/sdd
$ sudo lvextend -L +50G vg_container/thin_pool_data
๐ Explanation: Thin provisioning allows over-allocation (e.g., 150GB allocated from 100GB pool).
lvconvert --type thin-pool creates thin pool with separate metadata LV. Thin volumes consume space only as written. Monitor pool usage with lvs -a. Extend pool online when needed. Perfect for container storage where many volumes have low actual usage.Chapter 11: Environment Variables & Configuration Management (12-Factor App)
๐ Definition: Environment variables store configuration outside code (12-Factor App methodology). Key concepts:
export VAR=value (set), $VAR (access), env (list all), .env files (load multiple vars). Never commit secrets to git! Use .gitignore. For production, use secret managers (HashiCorp Vault, AWS Secrets Manager).๐ CASE PROJECT 11.1: Multi-Environment Deployment with .env Files
Scenario: Same application runs in development, staging, and production with different databases, API keys, and log levels. Implement environment-specific configuration.
# Directory structure
/opt/myapp/
โโโ .env.base
โโโ .env.development
โโโ .env.staging
โโโ .env.production
โโโ load-config.sh
$ cat .env.base
APP_NAME=MyApplication
LOG_FORMAT=json
$ cat .env.development
APP_ENV=development
DATABASE_URL=postgresql://localhost/dev_db
DEBUG=true
LOG_LEVEL=debug
$ cat .env.production
APP_ENV=production
DATABASE_URL=postgresql://prod-db.internal:5432/app_prod
DEBUG=false
LOG_LEVEL=warn
API_KEY=prod-secret-key
$ cat load-config.sh
#!/bin/bash
ENV=${1:-development}
CONFIG_DIR="/opt/myapp"
if [ ! -f "$CONFIG_DIR/.env.$ENV" ]; then
echo "โ Environment file .env.$ENV not found!"
exit 1
fi
# Load base then environment-specific (override)
set -a
source "$CONFIG_DIR/.env.base"
source "$CONFIG_DIR/.env.$ENV"
set +a
# Validate required variables
for var in DATABASE_URL APP_ENV; do
if [ -z "${!var}" ]; then
echo "โ Required variable $var is not set for $ENV environment!"
exit 1
fi
done
echo "โ
Loaded configuration for $ENV environment"
echo " DATABASE_URL: ${DATABASE_URL}"
# Usage: source load-config.sh production
# Then run: node server.js
# Docker run with env file
$ docker run --env-file .env.production myapp:latest
# Kubernetes secret from env file
$ kubectl create secret generic app-config --from-env-file=.env.production
๐ Explanation:
set -a automatically exports all variables from sourced files. ${!var} is indirect variable expansion (gets value of variable named by $var). Validation ensures required configs exist. Never commit .env files to git! Use .gitignore. Docker and Kubernetes support env-file natively.๐ CASE PROJECT 11.2: Secure Secrets Management with HashiCorp Vault
Scenario: Production secrets (API keys, database passwords) cannot be stored in files or environment variables. Integrate with HashiCorp Vault for dynamic secrets.
# Start Vault dev server (testing only)
$ vault server -dev
$ export VAULT_ADDR='http://127.0.0.1:8200'
# Store a secret
$ vault kv put secret/myapp DATABASE_PASSWORD="s3cr3t" API_KEY="abc123"
# Retrieve secret
$ vault kv get -field=DATABASE_PASSWORD secret/myapp
# Application script to fetch secrets
$ cat /opt/scripts/fetch-secrets.sh
#!/bin/bash
export DATABASE_PASSWORD=$(vault kv get -field=DATABASE_PASSWORD secret/myapp)
export API_KEY=$(vault kv get -field=API_KEY secret/myapp)
exec node /opt/myapp/server.js
# systemd service with Vault integration
$ sudo nano /etc/systemd/system/myapp.service
[Service]
Environment="VAULT_ADDR=http://vault.internal:8200"
ExecStartPre=/usr/local/bin/vault login -method=token token=$(cat /etc/vault-token)
ExecStart=/opt/scripts/fetch-secrets.sh
# Environment variable best practices
$ # Never hardcode secrets in scripts
$ # Never commit .env files to git
$ echo ".env" >> .gitignore
$ echo "*.env" >> .gitignore
$ echo "secrets/" >> .gitignore
# Validate no secrets in git history
$ git log -p | grep -i "password\|secret\|key\|token"
๐ Explanation: Vault provides dynamic, encrypted secrets with audit logs.
vault kv put stores secrets, vault kv get retrieves them. Secrets never appear in process lists or logs. systemd can fetch secrets before starting service. Always gitignore env files and scan for accidentally committed secrets. This is production-grade security.๐ Light/Dark Mode