# Dan Arbaugh

**Staff Software Engineer**

- Email: dan@danarbaugh.com
- LinkedIn: https://linkedin.com/in/danarbaugh
- GitHub: https://github.com/danarbaugh
- Website: https://danarbaugh.com

**Specializations:** Cloud Infrastructure · Capacity Planning · Multi-Cloud Orchestration · Observability

---

## Summary

Staff-level engineer with 12+ years of experience and 7+ years building and operating multi-cloud infrastructure at O'Reilly Media. I design and lead the systems that provision thousands of ephemeral cloud environments per week across Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure, managing capacity planning, resource lifecycle, and cost efficiency across a multi-tenant platform. Built on Google Kubernetes Engine (GKE) with Terraform for IaC, Datadog for observability, and async Python for workflow orchestration. Google Cloud certified. Open source contributor in Go.

---

## Experience

### O'Reilly Media, Inc. — Senior Software Engineer
**December 2018 – Present** *(promoted April 2023)*

- Designed and built Cloud Labs provisioning on Google Kubernetes Engine (GKE): Terraform for organizational scaffolding, async Python with Celery for orchestration, cloud provider SDKs (Google Cloud Platform / GCP, Amazon Web Services / AWS, Microsoft Azure), Postgres and Redis for state. Manages full lifecycle of ephemeral cloud accounts including provisioning, monitoring, recycling, and cleanup.
- Scaled to thousands of concurrent, session-based cloud environments per week. Designed the account pool and recycling architecture to optimize utilization and control spend across three providers.
- Architected provider-specific capacity strategies: OU-based account pools with recycling for AWS, resource-group provisioning for Azure, project-based allocation for GCP. Independently arrived at the same account pool pattern AWS uses internally for its own training environments.
- Built observability layer with Datadog: custom metrics, dashboards, alerting, and distributed tracing across the provisioning pipeline. Used tracing to find and fix bottlenecks, cutting environment startup time to under 30 seconds.
- Led Cloud Labs engineering: hiring, technical interviews, architecture decisions, cross-functional work with product and editorial to match capacity to demand.
- Managed IAM policies, access controls, and secrets rotation across all three clouds. Built abuse detection to track per-tenant resource consumption and flag anomalous usage.
- Conceived and shipped O'Reilly's first AI-powered Cloud Lab, an embedded coding agent (aider CLI + AWS Bedrock) that lets learners work with Large Language Models (LLMs) within seconds of launch. Built additional LLM labs on Azure OpenAI and GCP Vertex AI.
- Contributed resource handlers in Go to [aws-nuke](https://github.com/ekristen/aws-nuke), an open source tool for programmatic cleanup of AWS resources across organizational accounts.

### Ciholas, Inc. — Software Engineer
**April 2017 – December 2018**

- Built real-time front-end interfaces for an Ultra-Wideband (UWB) network monitoring system with millisecond-latency data visualization.

### Springleaf Financial Services — Programmer Analyst, Senior
**September 2014 – April 2017** *(promoted August 2015)*

- Developed and deployed a hybrid mobile app for financial services in a regulated environment.

---

## Technical Skills

**Languages:** Go, Python, TypeScript, JavaScript, SQL

**Cloud & Infrastructure:** Google Cloud Platform (GCP), Google Kubernetes Engine (GKE), Amazon Web Services (AWS), Microsoft Azure, Kubernetes, Docker, Terraform, multi-tenant SaaS, capacity planning, resource lifecycle management, Continuous Integration / Continuous Deployment (CI/CD)

**Observability:** Datadog (metrics, dashboards, alerting, custom integrations), distributed tracing, logging

**AI/ML:** Large Language Model (LLM) integration, prompt engineering, agent workflows, Retrieval-Augmented Generation (RAG), LiteLLM, AWS Bedrock, GCP Vertex AI

**Backend:** REST APIs, async Python, Celery, workflow orchestration, FastAPI, Django, Node.js, Postgres, Redis

**Open Source:** Contributor to [aws-nuke](https://github.com/ekristen/aws-nuke) (Go), cloud infrastructure automation

---

## Education & Certifications

- **B.S. Computer Science**, University of Southern Indiana (2014)
- **Google Cloud Associate Cloud Engineer**, Issued December 2023, Valid through December 2026
