Jobs for People with MS: National MS Society

Mobile National MS Society Logo

Job Information

T-Mobile USA, Inc Sr Site Reliability Engineer in Bellevue, Washington

Be unstoppable with us! T-Mobile is synonymous with innovation-and you could be part of the team that disrupted an entire industry! We reinvented customer service, brought real 5G to the nation, and now we're shaping the future of technology in wireless and beyond. Our work is as exciting as it is rewarding, so consider the career opportunity below as your invitation to grow with us, make big things happen with us, above all, #BEYOU with us. Together, we won't stop! Ready to become a part of the Uncarrier journey at T-Mobile? Our team is searching for our next Sr Site Reliability Engineer to play a crucial improving system reliability and resilience, facilitating faster and more efficient software development and deployment. You'll apply your expertise to minimize manual effort and prevent operational incidents. Your expertise in programming and scripting languages, incident response management, and various tech tools will contribute to the adaptability and efficiency of our systems. By continuously learning, you'll be able to adapt changes and drive innovation. Our team's expertise and partnerships contribute significantly to the stability and performance of T-Mobile's digital infrastructure. We pride ourselves on encouraging a culture of innovation, advocating for agile methodologies, and promoting transparency in all that we do. Join us in embodying the spirit of the 'Un-carrier' and make a tangible impact! Our team is dynamic where no day is the same, and we are diverse and inclusive passionate about growth and transformation. If you're up to the challenge, apply today! Responsibilities: Enhance system reliability and resilience by identifying potential issues and implementing preventive measures. Facilitate faster and more efficient software development and deployment by automating processes and reducing manual effort. Root Cause Analysis (RCA) review/participation to identify system issues and prevent incident recurrence, collaborating with Problem Management teams on Corrective and Preventive actions to enhance system reliability and performance, and identifying and prioritizing items for the Core SRE backlog to ensure continuous improvement in system operations and stability. Prevent operational incidents by utilizing strong problem-solving and analytical skills. Contribute to the robustness and efficiency of systems by applying expertise in programming and scripting languages, incident response management, and various tech tools. Adapts to changing circumstances and drives innovation by continuously learning new skills and technologies. Knowledge, Skills, and Abilities: 4-7 years Working in operations or development environments solving customer related issues and handling customer relationships (Required) 4-7 years developing software solutions with programming languages such as Java, C#, SQL, etc. and scripting languages such as Bash, PowerShell etc. is a plus (Preferred) Automation Ability to automate processes and reduce manual effort. (Required) Incident Management Understanding of incident response management and operational support. (Required) Experience crafting and maintaining CICD Pipelines. (Preferred) Ability to learn new skills and technologies quickly and adapt to changing circumstances. (Required) Understanding of system reliability and resilience principles. (Required) Ability to drive innovation and improve software development and deployment processes. (Preferred) Experience with cloud native platforms. (Preferred) Education: Bachelor's Degree Computer Science, Engineering, or related field (Preferred) Master's/Advanced Degree Computer Science, Engineering, or related field (Preferred) Licenses and Certifications (Preferred): AWS Certified DevOps Engineer This certification validates technical expertise in provisioning, operating, and running distributed application systems on

DirectEmployers