Sr SRE to collaborate with IP Network specialists/architects to troubleshoot and resolve issues, deploying automation & reliability initiatives on an i
S.i. Systems
Toronto, ON-
Number of positions available : 1
- Salary To be discussed
-
Contract job
- Published on November 15th, 2024
-
Starting date : 1 position to fill as soon as possible
Description
Sr SRE to collaborate with IP Network specialists/architects to troubleshoot and resolve issues, deploying automation & reliability initiatives on an infrastructure set of 125+ server for our large technology client -CREQ008170
Experience SRE Engineers with support experience only.
Is remote work available? Are there any required days in office? 3 days in office preferred but not mandatory (as this is a contractual position)
Responsibilities
- The position is for leading delivery & support of a large-scale IP Network Management platform that is Kubernetes based.
- The day-to-day responsibilities include collaborating with IP Network specialists/architects to troubleshoot and resolve issues, deploying automation & reliability initiatives on an infrastructure set of 125+ servers for proactive monitoring/issue-detection/self-healing measures leveraging the latest SRE toolkit/tech stack, working with the vendor to resolve platform related issues + developing roadmap for platform life-cycle and challenging vendor to quickly mitigate platform risks
- Deploy features/fixes based on network specialists’ needs. Also includes participating in pager rotation for 24/7 support.
- Deeply understands business drivers and cross-departmental impacts
- Develops business cases to justify application related capital investments
- Translates business requirements into technical requirements.
- Explaining complicated technical issues in a simplistic way to all levels of the organization
- Leads system requirements gathering for scalable, robust, and optimized designs
- Provides input and direction to vendors to ensure optimal designs
- Provides analysis and recommendations for new software / infrastructure
- Evaluates test results to determine pass/fail status
- Supports the project team with defect resolution during test activities
Must Haves
- Support of Kubernetes based platforms with proven experience of critical issues mitigation.
- Demonstrated experience with monitoring & observability (Zabbix/Dynatrace/Datadog for infrastructure monitoring and ELK stack for log aggregation + visualization + analysis).
- Fundamental knowledge of TCP/IP Networks - ideally in a telco environment.
2 Rounds of interviews (in-person preferred but not mandatory based on candidate location) including initial screening to guage past experience and a second technical deep-dive "What specific projects will be worked on? 24/7 Support of an IP Network Management System + Platform lifecycle initiatives (new infra deployment and management, application patching & upgrades)"
Requirements
undetermined
undetermined
undetermined
undetermined
Other S.i. Systems's offers that may interest you