Jobs for People with MS: National MS Society

Mobile National MS Society Logo

Job Information

TEKsystems SRE Lead in Phoenix, Arizona

Please note:

  • Sponsorship/C2C is not available for this position.

  • While it is remote, candidates must be located in Phoenix, AZ

Description:

SRE Lead - Third or Fifth shift

  • 3rd shift: Mon-Fri, 11pm-7:30am EST

  • 5th shift: Sat & Sun, 7pm-7am EST (2x12 hour shifts, followed by working through the week to reach 40 hours)

Role Overview: As an SRC Lead, you’ll be at the forefront of ensuring the reliability, availability, and performance of critical enterprise technology and security applications. Your leadership will drive operational excellence, foster collaboration, and elevate the overall reliability of our systems within the Site Reliability Center (SRC). You’ll work closely with cross-functional teams, mentor engineers, and contribute to the success of the organization.

NOTE FOR THE SKILLS/TECHNOLOGIES

Be knowledgeable enough to jump in, drive the conversations to resolution, and escalate if needed to Application System Managers/SMEs (ex. Here is the problem, here is what we think it is, here is the solutions we think we should do, what do you want to do?).

Top Technologies:

• Monitoring and Debugging Tools (e.g. LogScale, Splunk, Dynatrace)

• DevOps pipeline (e.g. Git, Jenkins, Artifactory)

• Infrastructure (e.g. Red Hat Linux, Openshift, Windows)

• Networking (e.g. DNS, Load-balancing, Network tracing, Firewall)

• Database (e.g. Oracle, SQL)

• API understanding & Web services technologies: (SOAP, JSON, REST)

• Directories (e.g. LDAP, Active Directory)

• Java

Secondary:

• Python/Java Scripting, Ansible, Powershell for Automation purposes

• Modern development technologies and tools: (Agile, CI/CD, Git, Jenkins)

• Kafka Event Streaming

• ETL/Informatica

Responsibilities Summary:

Production Support. NOT new development. Troubleshoot highly technical problems which may require assessing source code to analyze and resolve problems. This requires advanced troubleshooting skills and must be able to adapt and create non-standard approaches to problem solving.

  • There are 185 applications and platforms combined in this space. It is acknowledged that expertise is not expected in all, but emphasis will be needed to develop SME for the Criticality Level 0/1 mnemonics, which are reflected in the top skills.

We are looking for someone who is astute enough to see a problem and fix it or escalate it to SME teams and learn from how they fix the problem. Runbooks should then be updated accordingly.

Key responsibilities:

  • Create and Maintain documentation to ensure knowledge accessibility.

  • Liaise with other application support teams and internal/external business and technical partners.

  • Provide ad hoc and on-demand reports.

  • Perform timely escalation of critical issues and proactively identify patterns of recurring issues to improve production.

  • Lead problem resolution and conduct root cause analysis and establish processes that will help incident prevention.

  • Participate in the Incident and Problem Management processes as a resolver accountable for root cause analysis, resolution and reporting.

  • Guidance to all staff involved and vendors in driving a coordinated approach for results.

  • Reduce escalations to Level 3 based on incremental learning about applications.

Other Duties/Qualities:

  • Technical Acumen and System Familiarity: While the majority of the role involves management, the SRC Lead should possess a solid understanding of the systems and technical stacks they are supporting. They should be able to pull up dashboards, troubleshoot issues, and guide conversations related to system health. Additionally, they must effectively manage impact and risk.

  • System Monitoring and Health: Lead the production environment by monitoring availability and taking a holistic view of system health.

  • Quality and Time-to-Market: Drive improvements in reliability, quality, and time-to-market for software solutions.

  • Performance Optimization: Continuously optimize system performance, anticipating customer needs and innovating for excellence.

  • Operational Leadership: Provide primary operational support for large-scale distributed software applications.

  • Mentorship: Mentor and guide engineers within your shift team, fostering growth and technical expertise.

  • Stakeholder Communication: Manage team operations while effectively communicating with directors and other executives/CIOs who have a stake.

Qualifications:

  • Proactive Approach: Take a proactive approach to identifying problems, performance bottlenecks, and areas for improvement.

  • Leadership Experience: Demonstrated leadership in technical roles, preferably within Site Reliability Engineering (SRE) or DevOps.

  • Continuous Improvement: Foster a culture of continuous improvement and technical excellence, proactively identifying patterns of recurring issues to enhance stability and improved processes (automation opportunities, etc.).

Experience Level:

Expert Level

About TEKsystems:

We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.

The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

DirectEmployers