Jonathan Ng

Site Reliability Engineer

Toronto, Ontario

Systems Design Engineering @ University of Waterloo


Skills

Tools

ArgoCD, Atlassian, AWS, Datadog, FluxCD, GCP, Git, Grafana, Helm, Kubernetes, LaunchDarkly, Nx, Prometheus, Terraform

Languages

Bash, CSS, HTML, JavaScript, Python

Work Experience

DevOps Engineer (Observability and Internal Services) @ StackAdapt

(2024-05-28 - present)


  • Rollout of metadata labelling standardization across Kubernetes infrastructure, enabling support for team owner-based service alert routing
  • Improve reliability of Observability stack, including migrating Grafana and Grafana OnCall backends from sqlite to postgres, and enabling Grafana to operate in high-availability mode
  • Rollout of Gatus and accompanying Prometheus alerts to enable endpoint monitoring and quick feedback to critical service teams

SRE II @ Tempo Software

(2022-01-05 - 2024-05-10)


  • Architected internal tooling enabling developers to develop fully-scoped prod-like ephemeral environments for feature testing, used by 50+ developers across 9+ teams, improving release confidence and increasing release frequency 718%
  • Joined initiative as a developer to refactor auth service, enabling feature management via LaunchDarkly
  • Modularized legacy IaC to simplify provisioning of environments, incorporating GitOps by leveraging terragrunt and GitHub Actions to promote changes between environments
  • Co-led the migration of code base and artifacts from GitLab to GitHub consolidating 2 product lines under one organization
  • Directed the knowledge-sharing process for onboarding new hires, growing the team to 6 members
  • Implemented Datadog monitoring for k8s services, improving coverage by enabling support for opentelemetry to capture custom metrics
  • Improved stability and consistency of production rollouts with readiness gates and pod disruption budget, reducing release downtime with ALB ingress controllers to zero
  • Refactored release communication to more align with GitOps strategy, enabling consistent notifications to stakeholders as features get promoted into production and increasing service coverage to 100%
  • Improved security of CI workflows across the organization by leveraging OIDC roles for AWS authentication, eliminating need for access keys in CI as well as creating a standard for repo and service based AWS access
  • Led initiative to refactor services to leverage internal communications where possible, increasing visibility in product's inter-service communications
  • Directed new SSO permission strategy for standardizing AWS permissions for SREs and developers across the company, leaning into GitOps and management of IaC

Junior DevOps @ TimePlay (now Stream6ix)

(2021-04-18 - 2021-12-30)


  • Used Ansible to manage scaling game infrastructure on AWS for a projected 700% increase in player traffic
  • Deployed infrastructure using AWS API Gateway, Lambda and ECS to reduce resource spin up times 93% down to 30s for new line of on-demand games
  • Created CI/CD pipelines for Unity projects using Unity Cloud Build to leverage tailored support and integrations

DevOps Cloud Developer Intern @ Cryptonumerics (now Snowflake)

(2019-09-03 - 2019-12-20)


  • Designed OAS dataset retrieval API to support popular cloud storage solutions
  • Refactored Spectron testing suite, improving test consistency by 100% and cutting test time down 89%

Innovation Engineer Intern @ VIA Rail Canada

(2019-01-08 - 2019-04-30)


  • Developed telemetry solution using Kibana to monitor train health and activity, leveraging existing sensor data installed throughout the train and enabling engineers to diagnose issues remotely
  • Developed a bash script to automate set up of analytics solution across scalable fleet of train cars
  • Applied beacon technology to map customer journeys through trainstation via device pings

DevOps Engineer Intern @ TD Bank

(2017-09-05 - 2018-08-31)


  • Generated a daily health report for stakeholders by compiling Jenkins build, and SonarQube results with Groovy
  • Coordinated migration of over 80 projects from several outdated instances of Jenkins, proposing a workflow to reduce over 90% of the planned work
  • Architected and introduced an experimental shared library system on Jenkins to streamline continuous integration pipeline for over 5 project teams
  • Administrator over Atlassian Toolstack, providing support and provisioning across all cloud platforms

Backend Engineer Intern @ Rave

(2017-01-03 - 2017-04-28)


  • Migrated video service from Postgres to Google Cloud Datastore, to improve consistency of transactions
  • Refactored location microservice, reducing redis queries by 50%

Projects

Kube Cats

(2025-05-01 - present)


  • A fun Kubernetes workload visualizer using pixel cats written with Go and React

Home Lab

(2023-07-01 - present)


  • A home lab server for learning and hosting passion projects

Pomodoro Timer

(2023-05-01 - 2023-05-02)


  • A simple pomodoro app to learn React

Free Games Notifier

(2021-06-01 - present)


  • A script to pull freebies from Epic, supporting docker, k8s and github actions

Personal Discord Bots

(2021-01-01 - 2022-02-01)


  • A personal Discord bot developed to provide information from various game and movie APIs

Luxify

(2020-01-01 - 2021-01-01)


  • A restock notification service using Facebook's messaging API and various online store APIs to notify userbase as soon as highly coveted items are in stock