Deskripsi Pekerjaan

TechGlobal Nusantara, a leading digital innovation company based in Jakarta, is seeking a highly skilled and passionate Site Reliability Engineer (SRE) to join our dynamic infrastructure team. In this critical role, you will bridge the gap between development and operations, ensuring the reliability, scalability, and efficiency of our production systems. You will be instrumental in automating infrastructure, monitoring system health, and responding to incidents to maintain an exceptional user experience for millions of our customers.
We offer a collaborative and forward-thinking work environment where you can leverage cutting-edge cloud technologies and make a tangible impact on our platform's success. If you thrive in a fast-paced environment and are passionate about building resilient systems, we want to hear from you.

Tanggung Jawab

Design, implement, and maintain highly available, scalable, and fault-tolerant cloud infrastructure on AWS and GCP.
Develop and manage robust monitoring, alerting, and observability solutions using tools like Prometheus, Grafana, and Datadog.
Automate operational tasks, deployments, and configuration management using Infrastructure as Code (IaC) tools such as Terraform and Ansible.
Conduct thorough incident response, root cause analysis, and blameless post-mortems to prevent future occurrences.
Collaborate closely with software engineering teams to improve application performance, reliability, and deployment processes.
Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure and maintain system reliability.
Optimize system performance and cost-efficiency across all production environments.
Participate in an on-call rotation to provide 24/7 support for critical production systems.

Kualifikasi

Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.
Minimum of 3-5 years of experience in a Site Reliability Engineering, DevOps, or Systems Administration role.
Strong hands-on experience with at least one major cloud provider (AWS, GCP, or Azure).
Proficiency in at least one scripting or programming language (e.g., Python, Go, Bash).
Deep understanding of containerization and orchestration technologies, specifically Docker and Kubernetes.
Experience with Infrastructure as Code (IaC) tools like Terraform, CloudFormation, or Pulumi.
Solid knowledge of Linux system administration, networking (TCP/IP, DNS, HTTP), and distributed systems concepts.
Experience with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or Datadog.
Excellent problem-solving skills and the ability to work effectively under pressure during incidents.

Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Data Support (Dukungan Data) - Jakarta

System Analyst (Analis Sistem) - Jakarta

Teknisi Dukungan IT (IT Support) - Jakarta

Manajer Layanan Teknis - Jakarta

Product Manager - Manajemen & Pengembangan Produk

Penasihat Teknis - Teknologi dan Inovasi

Engineering Manager (Manajer Teknik) - Jakarta

Engine Development Engineer (Insinyur Pengembangan Engine)

Instruktur Software Adobe & AutoCAD - Jakarta

Kepala UPT TIK (Teknologi Informasi & Komunikasi)