About the Role We’re hiring a Cloud Operations Engineer to join our growing infrastructure team. In this role, you’ll be responsible for monitoring, maintaining, and responding to incidents in our production cloud environment. You will play a key role in ensuring uptime, performance, and reliability of cloud-based systems across compute, networking, and storage. This is an ideal opportunity for candidates interested in the operational side of cloud infrastructure, incident response, and systems reliability, especially those with a passion for Linux, monitoring tools, and automation.
● Monitor health and performance of cloud infrastructure using tools like Prometheus, Grafana, ELK, and Zabbix.
● Perform L1–L2 troubleshooting of compute, network, and storage issues.
● Respond to infrastructure alerts and incidents with a sense of urgency and ownership.
● Execute standard operating procedures (SOPs) for issue mitigation and escalation.
● Contribute to writing and improving incident response playbooks and runbooks.
● Participate in root cause analysis (RCA) and post-incident reviews.
● Automate routine operations using scripting and Infrastructure-as-Code (IaC) tools.
Nice to Have (Not All Required) We don’t expect you to have experience in every area. If
you’re eager to learn and have a solid foundation in Linux or cloud, you're encouraged to apply — even if you're
still gaining experience in some areas below:
● Operating Systems: Linux (Debian/Ubuntu/CentOS/Rockylinux)
● Monitoring & Logging: Prometheus, Grafana, ELK, Zabbix, Nagios
● Infrastructure Troubleshooting Tools: top, htop, netstat, iostat, tcpdump
● Networking: DNS, NAT, VPN, Load Balancers
● Cloud Services: VM provisioning, disk management, firewall rules
● Automation & Scripting: Bash, Python, Git
● IaC Tools: Ansible, Terraform (good to have)
● Incident Response & RCA: Familiarity with escalation procedures and documentation best practices
● Pays strong attention to detail and can respond under pressure
● Has solid analytical and troubleshooting skills
● Is comfortable working in shifts and taking ownership of incidents
● Communicates clearly and collaborates well with cross-functional teams
● Is eager to learn cloud automation, reliability, and monitoring practices
● Hands-on experience in live cloud infrastructure operations
● Expertise in monitoring tools, alert handling, and system troubleshooting
● Real-world experience with DevOps practices, SOPs, and RCA processes
● Exposure to automation and Infrastructure-as-Code workflows
E2E Networks is the leading hyperscaler from India with focus on scalable Cloud GPU infrastructure, listed on the National Stock Exchange (NSE). The company is popular for providing accelerated cloud computing solutions, including cutting-edge Cloud GPUs like NVIDIA A100 GPUs and DGX Super Computing on the Cloud, making it the sole provider of advanced Cloud GPU capabilities in India.
E2E Networks Cloud computing solutions are built on the principles of affordability, assistance, accessibility, accommodative, and Atma Nirbhar Bharat (self-reliant India), which are collectively referred to as the 5As of E2E Cloud. The company has been instrumental in helping India become self-reliant in the cloud infrastructure by offering a true public cloud platform that is multi-region, smart dedicated compute, and designed to cater to the unique needs of Higher Education and Research, Enterprises businesses and next generation of AI/ML startups in the country.
Our platform has further strengthened its position as the leading accelerated computing cloud platform from India by demonstrating its capabilities in the Al/ML, NLP, Computer Vision and Generative AI on its Cloud GPU and DGX platforms. The company has well earned its reputation as a trusted and reliable partner of choice for Higher Education and Research Institutions, Enterprises and AI/ML startups in India as well as globally.
E2E Networks was amongst the first few providers out of India providing contactless computing with low latency. The company's advanced Cloud Computing solutions, including Cloud GPUs like NVIDIA H200 & H100 are aimed at helping India rise as an AI/ML superpower transforming Higher Education, Research and Enterprises across industry and academia..