Jonathan Ng
Site Reliability Engineer/maker-of-things
I build and keep distributed systems healthy — observability, internal platforms, and the automation that makes shipping calm. Based in Toronto, deep into homelab, metrics, and custom keyboards.

DevOps engineer & maker-of-things.
I'm a Site Reliability Engineer based in Toronto, Canada with an interest in homelab, observability, and game development. Day to day I work on metrics, logging, tracing, and the internal platforms that let teams ship with confidence. Off the clock you'll find me building custom keyboards, cooking, fishing, and playing far too many video games. See what I'm currently working on here, dive into my blog, or check out what I'm running in my homelab.
Where I've shipped.
A timeline of the teams I've helped keep reliable, fast, and observable.
- Rollout of metadata labelling standardization across Kubernetes infrastructure, enabling support for dynamic owner-based service alert routing
- Improve reliability of Observability stack, including migrating Grafana and Grafana OnCall backends from sqlite to postgres, and enabling Grafana to operate in high-availability mode
- Rollout of Gatus and accompanying Prometheus alerts to enable endpoint monitoring and quick feedback to critical service teams
- Architected internal tooling enabling developers to develop fully-scoped prod-like ephemeral environments for feature testing, used by 50+ developers across 9+ teams, improving release confidence and increasing release frequency 718%
- Joined initiative as a developer to refactor auth service, enabling feature management via LaunchDarkly
- Modularized legacy IaC to simplify provisioning of environments, incorporating GitOps by leveraging terragrunt and GitHub Actions to promote changes between environments
- Used Ansible to manage scaling game infrastructure on AWS for a projected 700% increase in player traffic
- Deployed infrastructure using AWS API Gateway, Lambda and ECS to reduce resource spin up times 93% down to 30s for new line of on-demand games
- Created CI/CD pipelines for Unity projects using Unity Cloud Build to leverage tailored support and integrations
- Designed OAS dataset retrieval API to support popular cloud storage solutions
- Refactored Spectron testing suite, improving test consistency by 100% and cutting test time down 89%
- Developed telemetry solution using Kibana to monitor train health and activity, leveraging existing sensor data installed throughout the train and enabling engineers to diagnose issues remotely
- Developed a bash script to automate set up of analytics solution across scalable fleet of train cars
- Applied beacon technology to map customer journeys through trainstation via device pings
- Generated a daily health report for stakeholders by compiling Jenkins build, and SonarQube results with Groovy
- Coordinated migration of over 80 projects from several outdated instances of Jenkins, proposing a workflow to reduce over 90% of the planned work
- Architected and introduced an experimental shared library system on Jenkins to streamline continuous integration pipeline for over 5 project teams
- Migrated video service from Postgres to Google Cloud Datastore, to improve consistency of transactions
- Refactored location microservice, reducing redis queries by 50%
Things I've built.
Side projects, tools, and experiments — mostly born in the homelab.
A restock notification service using Facebook's messaging API and various online store APIs to notify userbase as soon as highly coveted items are in stock
A year of commits.
2,265 contributions in the last year · Updated 2026-06-27
The stack I reach for.
Tools
Languages
Let's build something.
Got an idea, a role, or just want to talk homelabs and observability? My inbox is open.