This recruiter is online.

This is your chance to shine!

Apply Now

Senior Site Reliability Engineer (SRE) to ensure security, availability, reliability, scalability, and high performance of mission-critical financial/banki

Toronto, ON
  • Number of positions available : 1

  • To be discussed
  • Contract job

  • Starting date : 1 position to fill as soon as possible

Our valued crown corporation client is seeking a Senior Site Reliability Engineer (SRE) to ensure security, availability, reliability, scalability, and high performance of mission-critical financial/banking applications!


Initial 1 year contract in Ottawa, ON (100% remote) with strong possibility of extension to a total term of 3 years. 7.5 hours per day, Monday to Friday. On-Call and Overtime may be required (with advance notice)


As the successful candidate, you will work with other application and operational experts to ensure the highest level of availability, reliability, security, and scalability of various financial applications and products.


Responsibilities:

  • Pro-actively monitor log feeds /flows/alerts for data interruptions and server health
  • Perform the administration of monitoring platforms and related infrastructure working with IT specialists in cyber, network, storage, security, virtual infrastructure, platform, and database to deliver their logging requirements
  • Work with product/project/client teams to refine monitoring dashboards
  • Monitor performance of systems and service agreements to ensure that ongoing service-delivery standards, service levels and IT policies are met
  • Assess and refine current logs and whenever possible consider fine-tuning or explore opportunities for automation
  • Create and update of management packs, scripts
  • Support migration of data sources & log traffic


Must-Have Skills:

  • Demonstrated experience with monitoring platforms such as Dynatrace, SCOM (2012 R2, 2016, 1807), and/or Solarwinds
  • Demonstrated experience working with log management platforms such as Syslog-NG and Splunk
  • Demonstrated experience using scripting languages such as Python and Bash specifically for systems automation, as well as Powershell
  • Proficiency with testing and managing high availability environments with performance of disaster recovery tests on a regular basis
  • Understanding of load testing, monitoring and performance management tools for every layer of the environment (services, networks, applications).


Nice to Have Skills:

  • Demonstrated experience with SCOM Report Customization and SQL Report Server Administration
  • Demonstrated experience with infrastructure scripting and automation or familiarity with Infrastructure as Code
Apply

Requirements

Level of education

undetermined

Work experience (years)

undetermined

Written languages

undetermined

Spoken languages

undetermined