Open to opportunities

Hritik
Chaudhary

I build infrastructure that doesn't break — and scale systems that shouldn't slow down.

hritik@infra ~
$ whoami
Lead DevOps / SRE Engineer
$ cat stats.yml
experience: 6+ years
clusters: 26+ Kubernetes
uptime: 99.99%
cost_saved: $280K+
team_size: 8 engineers
$ echo $STATUS
open_to_opportunities: true
6+Years Experience
26+K8s Clusters
40+Nodes Managed
99.99%Uptime SLA
$280K+Cost Savings
300+Workloads Migrated
65%MTTR Reduction
8Engineers Led
$2M+Budget Managed
6+Years Experience
26+K8s Clusters
40+Nodes Managed
99.99%Uptime SLA
$280K+Cost Savings
300+Workloads Migrated
65%MTTR Reduction
8Engineers Led
$2M+Budget Managed

Infrastructure at Scale.
Reliability by Design.

Lead DevOps/SRE Engineer with 6+ years building and scaling high-availability platforms across multi-cloud environments. I architect Kubernetes platforms at scale (40+ nodes, 26+ clusters), design GPU-capable cloud infrastructure, and deliver production-grade observability.

Currently leading an 8-member DevOps/SRE team at Eptura (Fortune 500 SaaS) across 4 time zones, managing a $2M+ annual multi-cloud budget. Extending deep platform engineering expertise into AI/ML infrastructure — model serving, GPU scheduling, and inference optimization.

I also build AI products independently through KraftAI, applying serverless inference patterns and modern AI APIs to solve real-world problems.

Multi-Cloud

AWS, Azure, GCP — production expertise across all three

AI Infrastructure

GPU scheduling, vLLM, Triton, inference at scale

Compliance

SOC2, ISO 27001, GDPR delivered ahead of schedule

Leadership

8-person team across 4 time zones, global async

Skills & Expertise

Enterprise-grade tools I use to build, scale, and secure infrastructure

Cloud & Compute

AWS (EKS, EC2, Fargate) Azure (AKS) GCP (GKE) GPU Instances DigitalOcean

Containers & Orchestration

Kubernetes Docker Helm Istio Karpenter Podman

AI/ML Infrastructure

vLLM Triton Inference Ray Serve GPU Scheduling NVIDIA Device Plugin

CI/CD & GitOps

Argo CD GitHub Actions Jenkins GitLab CI Azure DevOps Flux

Infrastructure as Code

Terraform Ansible CloudFormation Pulumi CDK

Observability & APM

Prometheus Grafana Loki ELK Jaeger Datadog

Security & DevSecOps

HashiCorp Vault SAST/DAST Snyk Aqua CIS Hardening TLS/mTLS

Blockchain Infra

Ethereum (Geth) Bitcoin Core IPFS Dogecoin Solidity Hardhat

Programming & SRE

Python Go Bash Node.js TypeScript SLOs/SLIs

Key Projects

Enterprise-scale implementations with measurable business impact

Eptura DevOps

Multi-Cloud Kubernetes Platform

Architected and operate multi-cloud Kubernetes platforms (EKS & AKS) across 26+ clusters and 40+ nodes for Fortune 500 SaaS. Migrated 300+ production workloads with zero downtime, improving resource utilization by 45%.

99.99%
Uptime
300+
Workloads
45%
Better Utilization
EKSAKSKarpenterArgo CDTerraform
Eptura Observability

Production Observability Stack

Architected Prometheus/Grafana/Loki monitoring for 40-node production environments with AI-powered log analysis and anomaly detection. Implemented SLO-driven reliability culture, reducing MTTR by 65%.

65%
MTTR Reduction
40+
Nodes Monitored
3x
Deploy Frequency
PrometheusGrafanaLokiSLOs
RapidInnovation Blockchain

Blockchain Node Platform

Kubernetes-native platform for Ethereum, Bitcoin, and Dogecoin nodes with auto-scaling, chain-sync dashboards, RPC load balancing, and automated snapshot recovery. Handles 50M+ API calls/month.

50M+
API Calls/Mo
99.9%
Uptime
Auto
Scaling
GethBitcoin CoreKubernetesIPFS
KraftAI AI Product

KraftAI — AI Products Venture

Solo AI products venture (kraftai.in) built with Gemini API, Next.js, and serverless inference patterns. Applying platform engineering expertise to build AI-powered tools and services for real-world use cases.

Solo
Founder
AI
First
Live
Production
Next.jsGemini APIServerlessAI/ML
Open Source AI + K8s

K8s AI Incident Bot

AI-powered Kubernetes incident response bot that detects anomalies, correlates alerts, and suggests runbooks. Combines SRE best practices with ML-driven pattern recognition for faster incident resolution.

AI
Driven
K8s
Native
Auto
Runbooks
PythonKubernetesAI/MLIncident Response
Eptura FinOps

FinOps & Cloud Optimization

Delivered $280K+ total savings through cloud spend governance, Karpenter-based node autoscaling, IP exhaustion solutions on EKS, and resource right-sizing across multi-cloud environments.

$280K+
Total Savings
$2M+
Budget
35%
Cost Reduction
KarpenterFinOpsRight-sizingSpot Instances

Building the Future of AI Ops

Extending platform engineering expertise into AI infrastructure — solving the operational challenges of serving and scaling ML workloads in production

GPU Scheduling in Kubernetes

Working with NVIDIA GPU Operator, device plugin, MIG partitioning, and time-slicing for efficient GPU resource allocation across K8s clusters.

Model Serving Frameworks

Hands-on with vLLM, Triton Inference Server, and Ray Serve — deploying inference endpoints with autoscaling strategies for production traffic.

KraftAI — AI Products

Building a solo AI products venture using Gemini API, Next.js, and serverless inference patterns to solve real-world problems with AI.

GPU Observability & Cost

Applying Karpenter autoscaling, Prometheus observability, and FinOps expertise to GPU-specific infrastructure challenges and inference latency SLOs.

Work Experience

Building teams and delivering impact at scale

March 2024 — Present
Eptura (Fortune 500 SaaS)

Lead DevOps / SRE Engineer

Lead and mentor 8-member DevOps/SRE team across 4 time zones. Manage $2M+ annual multi-cloud budget with FinOps discipline. Architect multi-cloud Kubernetes platforms (EKS & AKS) across 26+ clusters sustaining 99.99% production uptime. Migrated 300+ workloads with zero downtime. Standardized GitOps with Argo CD, increasing deploy frequency 3x. Delivered SOC2 compliance and GDPR readiness.

AWSAzureKubernetesArgo CDKarpenterSOC2FinOps
August 2020 — March 2024
RapidInnovation

Senior DevOps Engineer

Built DevOps and SRE foundations for 15+ production environments with 99.9% uptime. Designed multi-cloud deployments across AWS, Azure, GCP. Modernized infrastructure with Terraform and Kubernetes, generating $160K annual savings. Optimized CI/CD pipelines reducing build time by 62%. Built blockchain infrastructure (Ethereum, Dogecoin, IPFS nodes). Automated 100+ hours/month of ops tasks. Mentored 5 engineers.

Multi-CloudTerraformBlockchainEthereumIPFSCI/CD
2017 — 2021
ABES Institute of Technology

B.Tech Computer Science & Engineering

GPA: 8.2/10. Ghaziabad, India. Strong foundation in computer science, distributed systems, and software engineering principles.

What People Say

Endorsements from colleagues and leaders

Hritik's expertise in debugging complex Kubernetes issues was invaluable. He quickly identified the root cause in our pod networking configuration that had stumped the team for days.

JS
Jonathan Smith
Lead Developer, Flush Project

Hritik's debugging and troubleshooting skills are world-class. He consistently resolved complex infrastructure issues that other engineers couldn't solve.

DR
David Rogers
CTO, RapidInnovation

He managed our entire GitLab infrastructure for 300+ employees. His expertise achieved 95% deployment success rate and transformed our release process.

VK
Vineet Kulkarni
DevOps Manager, RapidInnovation

Let's Build Something Great

Available for Lead DevOps/SRE roles, AI infrastructure consulting, and platform engineering. Flexible across global time zones.

Get In Touch LinkedIn

Let's Work Together

Let's build infrastructure that doesn't sleep.

Available for leadership roles, consulting, and technical collaborations across IST, CET/CEST, and global time zones. Open to international travel.

Ask About Hritik

AI-powered · Knows my full background

Hey! I'm Hritik's AI assistant. Ask me anything about his experience, skills, projects, or availability. I know everything about his background!