Site Reliability Engineer Jobs at Fortude in Colombo, Sri Lanka.

Application deadline date has been passed for this Job.
Exploreture
  • Post Date: July 24, 2023
  • Applications 0
  • Views 12
Job Overview

Site Reliability Engineer

 

Site Reliability Engineer Job Responsibilities, Salaries, How To Become |  Simplilearn

Primary job role

As a Site Reliability Engineer (SRE) you will be responsible for improving the overall reliability of applications by ensuring its availability, performance, and scalability. You should be able to gather the technical requirements from the DevOps team and the operational requirements from the Application Support team. With the Site Reliability Engineer role being at the heart of solving production problems, you should be able to take a holistic approach to troubleshooting and delve deeply into technical details. You must acquire the necessary domain knowledge to effectively troubleshoot and recover from an outage as well as monitor applications in production and build alerts as required.

Main duties/responsibilities

  • Work closely with the application support team.
  • Monitor critical applications and services to minimize downtime and ensure their availability.
  • Collaborate with DevOps teams to maintain and monitor CI/CD pipelines.
  • Deploy new versions to production environments.
  • Work with project teams to ensure the reliability and maintainability of new and modified releases.
  • Provide input to risk management practices that will anticipate reliability-related incidents that could adversely impact operations.
  • Document processes and monitor application performance metrics.
  • Continuously improve proactive monitoring alert configuration and incident response processes to increase reliability and reduce Mean Time to Recovery (MTTR ).
  • Optimize performance and cost efficiency through continuous monitoring, trend analysis, and fine-tuning.
  • Monitor any abnormal usage that can impact the cost or performance and take corrective actions.
  • Proactively implement preventive measures to improve system reliability.
  • Maintain runbooks, Standard Operating Procedures (SOPs), diagrams, and documentation for swift incident response.
  • Conduct post-incident reviews to improve reliability and contribute to the development of resilience strategies.
  • Achieve Service Level Indicators (SLIs) that are set to meet reliability objectives.

Experience

  • Experience in SRE/DevOps with a focus on Ops.
  • 2+ years of experience in AWS Cloud Infrastructure.
  • Familiarity with CI/CD pipelines and version control systems.
  • Experience in Project Management and issue tracking tools such as JIRA/SysAid.

Qualifications

  • Bachelor’s degree or an equivalent professional qualification.

Technical Knowledge

  • Fluent in AWS key services (EBS, S3, AWS Compute, Storage, RDS etc).
  • Expertise in Kubernetes or any Container Orchestration System.
  • Knowledge of Infrastructure as a Code.
  • Linux system administration knowledge.
  • Knowledge of RDBMS and Document databases.
  • Knowledge of Monitoring tools including AWS CloudWatch and NewRelic.
  • Additional certification in Microsoft, Linux, Cisco, AWS or similar technologies is a plus.

Behavioral competencies

  • Communication
  • Teamwork and collaboration
  • Client orientation
  • Results and execution orientation
  • Analytical ability
  • Decision making

Equal Opportunity Employer

Everyone can grow at Fortude; regardless of their identity. Join us, and be a part of an organization, where we’re all proud to belong.

Apply on LinkedIn

Job Detail
  • Offered SalaryNot Specified
  • Career LevelNot Specified
  • Experience2 Years
  • GenderBoth
  • INDUSTRYComputer and technology
  • QualificationBachelor's Degree
Shortlist Never pay anyone for job application test or interview.