Sr SRE to collaborate with IP Network specialists/architects to troubleshoot and resolve issues, deploying automation & reliability initiatives on an i
S.i. Systèmes
Toronto, ON-
Nombre de poste(s) à combler : 1
- Salaire À discuter
-
Emploi Contrat
- Publié le 15 novembre 2024
-
Date d'entrée en fonction : 1 poste à combler dès que possible
Description
Sr SRE to collaborate with IP Network specialists/architects to troubleshoot and resolve issues, deploying automation & reliability initiatives on an infrastructure set of 125+ server for our large technology client -CREQ008170
Experience SRE Engineers with support experience only.
Is remote work available? Are there any required days in office? 3 days in office preferred but not mandatory (as this is a contractual position)
Responsibilities
- The position is for leading delivery & support of a large-scale IP Network Management platform that is Kubernetes based.
- The day-to-day responsibilities include collaborating with IP Network specialists/architects to troubleshoot and resolve issues, deploying automation & reliability initiatives on an infrastructure set of 125+ servers for proactive monitoring/issue-detection/self-healing measures leveraging the latest SRE toolkit/tech stack, working with the vendor to resolve platform related issues + developing roadmap for platform life-cycle and challenging vendor to quickly mitigate platform risks
- Deploy features/fixes based on network specialists’ needs. Also includes participating in pager rotation for 24/7 support.
- Deeply understands business drivers and cross-departmental impacts
- Develops business cases to justify application related capital investments
- Translates business requirements into technical requirements.
- Explaining complicated technical issues in a simplistic way to all levels of the organization
- Leads system requirements gathering for scalable, robust, and optimized designs
- Provides input and direction to vendors to ensure optimal designs
- Provides analysis and recommendations for new software / infrastructure
- Evaluates test results to determine pass/fail status
- Supports the project team with defect resolution during test activities
Must Haves
- Support of Kubernetes based platforms with proven experience of critical issues mitigation.
- Demonstrated experience with monitoring & observability (Zabbix/Dynatrace/Datadog for infrastructure monitoring and ELK stack for log aggregation + visualization + analysis).
- Fundamental knowledge of TCP/IP Networks - ideally in a telco environment.
2 Rounds of interviews (in-person preferred but not mandatory based on candidate location) including initial screening to guage past experience and a second technical deep-dive "What specific projects will be worked on? 24/7 Support of an IP Network Management System + Platform lifecycle initiatives (new infra deployment and management, application patching & upgrades)"
Exigences
non déterminé
non déterminé
non déterminé
non déterminé
D'autres offres de S.i. Systèmes qui pourraient t'intéresser