- Designed and built Cloud Labs provisioning on GKE: Terraform for organizational scaffolding, async Python with Celery for orchestration, cloud provider SDKs (GCP, AWS, Azure), Postgres and Redis for state. Manages full lifecycle of ephemeral cloud accounts including provisioning, monitoring, recycling, and cleanup.
- Scaled to thousands of concurrent, session-based cloud environments per week. Designed the account pool and recycling architecture to optimize utilization and control spend across three providers.
- Architected provider-specific capacity strategies: OU-based account pools with recycling for AWS, resource-group provisioning for Azure, project-based allocation for GCP. Independently arrived at the same account pool pattern AWS uses internally for its own training environments.
- Built observability layer with Datadog: custom metrics, dashboards, alerting, and distributed tracing across the provisioning pipeline. Used tracing to find and fix bottlenecks, cutting environment startup time to under 30 seconds.
- Led Cloud Labs engineering: hiring, technical interviews, architecture decisions, cross-functional work with product and editorial to match capacity to demand.
- Managed IAM policies, access controls, and secrets rotation across all three clouds. Built abuse detection to track per-tenant resource consumption and flag anomalous usage.
- Conceived and shipped O’Reilly’s first AI-powered Cloud Lab, an embedded coding agent (aider CLI + AWS Bedrock) that lets learners work with LLMs within seconds of launch. Built additional LLM labs on Azure OpenAI and GCP Vertex AI.
- Contributed resource handlers in Go to aws-nuke, an open source tool for programmatic cleanup of AWS resources across organizational accounts.
Summary
Staff-level engineer with 12+ years of experience and 7+ years building and operating multi-cloud infrastructure at O’Reilly Media. I design and lead the systems that provision thousands of ephemeral cloud environments per week across GCP, AWS, and Azure, managing capacity planning, resource lifecycle, and cost efficiency across a multi-tenant platform. Built on GKE with Terraform for IaC, Datadog for observability, and async Python for workflow orchestration. Google Cloud certified. Open source contributor in Go.
Experience
- Built real-time front-end interfaces for an Ultra-Wideband (UWB) network monitoring system with millisecond-latency data visualization.
- Developed and deployed a hybrid mobile app for financial services in a regulated environment.
Technical Skills
Languages: Go, Python, TypeScript, JavaScript, SQL
Cloud & Infrastructure: GCP (GKE), AWS, Azure, Kubernetes, Docker, Terraform, multi-tenant SaaS, capacity planning, resource lifecycle management, CI/CD
Observability: Datadog (metrics, dashboards, alerting, custom integrations), distributed tracing, logging
Backend: REST APIs, async Python, Celery, workflow orchestration, FastAPI, Django, Node.js, Postgres, Redis
Open Source: Contributor to aws-nuke (Go), cloud infrastructure automation
Education & Certifications
- B.S. Computer Science, University of Southern Indiana ()
- Google Cloud Associate Cloud Engineer, Issued , Valid through