This recruiter is online.

This is your chance to shine!

Apply Now

Sr SRE to collaborate with IP Network specialists/architects to troubleshoot and resolve issues, deploying automation & reliability initiatives on an i

Toronto, ON
  • Number of positions available : 1

  • To be discussed
  • Contract job

  • Starting date : 1 position to fill as soon as possible

Sr SRE to collaborate with IP Network specialists/architects to troubleshoot and resolve issues, deploying automation & reliability initiatives on an infrastructure set of 125+ server for our large technology client -CREQ008170


Experience SRE Engineers with support experience only.




Is remote work available? Are there any required days in office? 3 days in office preferred but not mandatory (as this is a contractual position)


Responsibilities


  • The position is for leading delivery & support of a large-scale IP Network Management platform that is Kubernetes based.
  • The day-to-day responsibilities include collaborating with IP Network specialists/architects to troubleshoot and resolve issues, deploying automation & reliability initiatives on an infrastructure set of 125+ servers for proactive monitoring/issue-detection/self-healing measures leveraging the latest SRE toolkit/tech stack, working with the vendor to resolve platform related issues + developing roadmap for platform life-cycle and challenging vendor to quickly mitigate platform risks
  • Deploy features/fixes based on network specialists’ needs. Also includes participating in pager rotation for 24/7 support.
  • Deeply understands business drivers and cross-departmental impacts
  • Develops business cases to justify application related capital investments
  • Translates business requirements into technical requirements.
  • Explaining complicated technical issues in a simplistic way to all levels of the organization
  • Leads system requirements gathering for scalable, robust, and optimized designs
  • Provides input and direction to vendors to ensure optimal designs
  • Provides analysis and recommendations for new software / infrastructure
  • Evaluates test results to determine pass/fail status
  • Supports the project team with defect resolution during test activities



Must Haves

  • Support of Kubernetes based platforms with proven experience of critical issues mitigation.
  • Demonstrated experience with monitoring & observability (Zabbix/Dynatrace/Datadog for infrastructure monitoring and ELK stack for log aggregation + visualization + analysis).
  • Fundamental knowledge of TCP/IP Networks - ideally in a telco environment.



2 Rounds of interviews (in-person preferred but not mandatory based on candidate location) including initial screening to guage past experience and a second technical deep-dive "What specific projects will be worked on? 24/7 Support of an IP Network Management System + Platform lifecycle initiatives (new infra deployment and management, application patching & upgrades)"



Apply

Requirements

Level of education

undetermined

Work experience (years)

undetermined

Written languages

undetermined

Spoken languages

undetermined