Apply for this job now

Senior Network Site Reliability Engineer (virtual Work from Home Remote)

Location
Salem, Oregon
Remote Working
Remote Working
Job Type
Permanent
Posted
14 Jul 2022
Description

The senior network site reliability engineer is responsible for the big picture of how our network and applications relate to each other, we use a breadth of tools and approaches to tackle a broad spectrum of problems. Site Reliability Engineering is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. SREs will help drive iterative improvement by reducing manual processes and shortening the problem resolution cycle.

Responsibilities

Humana is looking for a senior site reliability engineer (network) position that will be responsible for building and deploying network automation, improving network reliability, and will drive tools/service development to maintain and improve our service SLOs. This person is expected to have key knowledge in the following areas:
  • Proven experience managing various large-scale enterprise network topologies including LAN, WAN, Wireless, Network Security and Services. Working knowledge of infrastructure components (e.g., routers, load balancers, cloud products, container systems, compute, storage, and networks)
  • Able to identify manual operational tasks and develop automation to solve problems in a modern SRE support model
  • Drive the collection of performance metrics that will help drive automation, reduce network down time and enhance our decision-making capability
The senior network site reliability engineer installs, supports and/or maintains network monitoring and management tools. They will perform technical analysis of software, hardware and transmission facilities using various diagnostic tools in support of efficient network operations. They will also help drive the performance, reliability, and scalability of the enterprise network to support the growing and changing needs of the business. This person will also help influence the department's strategy in the areas of automation, telemetry and predictive analytics utilizing current and emerging technologies in the market.

The position will have the following responsibilities:
  • Analyze data to diagnose and identify root causes to network-specific events
  • Develop tools and services to automate the mitigation and remediation of network-specific events
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
  • Lead significant production improvement around tooling, automation, and process improvements
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health
  • Extracting key performance metrics from SRE related tools (DynaTrace, Splunk, Big Panda, Thousand Eyes, etc and then building associated dashboards)
  • Diagnosing performance issues in complex distributed applications leveraging infrastructure and application telemetry
  • Identify manual operational tasks and develop automation to solve problems in a modern SRE support model
Required Qualifications:
  • Bachelor's degree
  • Five or more years of technical experience.
  • 2+ years working with scripting language for developing automation processes.
  • 3+ years working with an enterprise application and network performance management. solution (Dynatrace, SolarWinds, AppDynamics, Thousand Eyes).
  • 3+ years working with an enterprise log aggregation solution (ie Splunk).
  • Ability to isolate network failures causing impact to services across LAN/WAN topologies and understand how these failures surface in the application layer.
  • Excellent communication skills both written and verbal, presentation, social, and analytical skills.
Preferred Qualifications:
  • Strong networking background along with a strong familiarity with major routing/switching protocols and equipment is a bonus.
  • Familiarity with monitoring tools such as Splunk, Dynatrace, ThousandEyes, StealthWatch, BigPanda is a plus.
  • Hands on experience with cloud service providers Microsoft Azure, GCP and AWS.
  • Strong scripting experience: Ansible, Python, Perl, bash, windows scripts (VBS/PowerShell).
Additional Information

Humana and its subsidiaries require vaccinated associates who work outside of their home to submit proof of vaccination, including COVID-19 boosters. Associates who remain unvaccinated must either undergo weekly negative COVID testing OR wear a mask at all times while in a Humana facility or while working in the field.

Scheduled Weekly Hours

40

Apply for this job now

Details

  • Job Reference: 658151979-2
  • Date Posted: 14 July 2022
  • Recruiter: Humana
  • Location: Salem, Oregon
  • Remote Working: Some remote working possible
  • Salary: On Application
  • Sector: Call Centre / Customer Service
  • Job Type: Permanent