Jobs for People with MS: National MS Society

Mobile National MS Society Logo

Job Information

Meta Software Engineering Manager - AI Systems Co-Design in Menlo Park, California

Summary:

Meta is seeking a hands-on engineering managers to join the AI & Systems Co-Design team at Meta. The team works at the intersection of hardware, software and AI technologies, making direct contributions on LLama, DLRM, DCPerf, MTIA and many other cutting edge open-source as well as internal infrastructure projects. The co-design AI team has established relationships with both academia and industry. We frequently collaborate with academia through internships and have a track record of publications in top AI, systems and architecture conferences. We partner closely with industry leaders to influence their roadmaps and build the best products for Meta’s Infrastructure.Join us and be a part of the team that is shaping the future of Meta AI infrastructure!

Required Skills:

Software Engineering Manager - AI Systems Co-Design Responsibilities:

  1. Lead and support the communications team that works on collective libraries and contribute to enabling performance at scale of our inference and training of GenAI (Llama) and Ranking & Retrieval (DLRM) models

  2. Enable the growth of individual contributors, driving the technical roadmap along with technical leads and expand the impact of the team by growing new skill-sets and capabilities

  3. Lead a high performance team of engineers to deliver new capabilities and efficient compute systems for our fleet

  4. Technical management

  5. experience in systems architecture, performance, workload-analysis and large scale distributed systems

  6. Work cross-functionally across hardware and software/services team to drive engineering efforts

Minimum Qualifications:

Minimum Qualifications:

  1. Experience in leading teams working on high performance computing (HPC) and AI/ML systems, including:

  2. Communication libraries (e.g., NCCL, RCCL, UCC, MPI)

  3. GPU/ASIC-based kernel development and optimization (e.g. CUDA, ROCm)

  4. Distributed systems for large scale training and serving

  5. Systems Architecture + Performance

  6. Large scale distributed systems

  7. Experience running a large-scale program and dealing with ambiguity

Preferred Qualifications:

Preferred Qualifications:

  1. Experience with collective communication, e.g. one of these libraries NCCL, RCCL, Gloo, UCC, MPI

  2. Network architecture

Public Compensation:

$177,000/year to $251,000/year + bonus + equity + benefits

Industry: Internet

Equal Opportunity:

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.

Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@fb.com.

DirectEmployers