Ce recruteur est en ligne!

Voilà ta chance d'être vu en premier!

Postuler maintenant

Sr SRE to collaborate with IP Network specialists/architects to troubleshoot and resolve issues, deploying automation & reliability initiatives on an i

Toronto, ON
  • Nombre de poste(s) à combler : 1

  • À discuter
  • Emploi Contrat

  • Date d'entrée en fonction : 1 poste à combler dès que possible

Sr SRE to collaborate with IP Network specialists/architects to troubleshoot and resolve issues, deploying automation & reliability initiatives on an infrastructure set of 125+ server for our large technology client -CREQ008170


Experience SRE Engineers with support experience only.




Is remote work available? Are there any required days in office? 3 days in office preferred but not mandatory (as this is a contractual position)


Responsibilities


  • The position is for leading delivery & support of a large-scale IP Network Management platform that is Kubernetes based.
  • The day-to-day responsibilities include collaborating with IP Network specialists/architects to troubleshoot and resolve issues, deploying automation & reliability initiatives on an infrastructure set of 125+ servers for proactive monitoring/issue-detection/self-healing measures leveraging the latest SRE toolkit/tech stack, working with the vendor to resolve platform related issues + developing roadmap for platform life-cycle and challenging vendor to quickly mitigate platform risks
  • Deploy features/fixes based on network specialists’ needs. Also includes participating in pager rotation for 24/7 support.
  • Deeply understands business drivers and cross-departmental impacts
  • Develops business cases to justify application related capital investments
  • Translates business requirements into technical requirements.
  • Explaining complicated technical issues in a simplistic way to all levels of the organization
  • Leads system requirements gathering for scalable, robust, and optimized designs
  • Provides input and direction to vendors to ensure optimal designs
  • Provides analysis and recommendations for new software / infrastructure
  • Evaluates test results to determine pass/fail status
  • Supports the project team with defect resolution during test activities



Must Haves

  • Support of Kubernetes based platforms with proven experience of critical issues mitigation.
  • Demonstrated experience with monitoring & observability (Zabbix/Dynatrace/Datadog for infrastructure monitoring and ELK stack for log aggregation + visualization + analysis).
  • Fundamental knowledge of TCP/IP Networks - ideally in a telco environment.



2 Rounds of interviews (in-person preferred but not mandatory based on candidate location) including initial screening to guage past experience and a second technical deep-dive "What specific projects will be worked on? 24/7 Support of an IP Network Management System + Platform lifecycle initiatives (new infra deployment and management, application patching & upgrades)"



Apply

Exigences

Niveau d'études

non déterminé

Années d'expérience

non déterminé

Langues écrites

non déterminé

Langues parlées

non déterminé