Jobs for People with MS: National MS Society

Mobile National MS Society Logo

Job Information

Insight Global Senior High Performance Computing (HPC) Architect in Rockville, Maryland

Job Description

HOW A SENIOR HPC ARCHITECT WILL MAKE AN IMPACT:

  • Provide hands-on administration and support for two HPC clusters; a 4000+ core HPC cluster that is GPU-focused and a 1,500+ core HPC cluster, including monitoring performance and health of both clusters

  • Install and support bioinformatics applications for a large and diverse research community with needs in genomics, cryo-electron microscopy, AI/ML

  • Architect and design HPC clusters to include designing new clusters or expanding existing components such as storage, InfiniBand, and compute

  • Monitor and report on cluster performance and generate data to show usage and trends

  • Perform troubleshooting and problem-solving for complex HPC operational and performance issues

  • Collaborate with researchers to guide them in effective use of the HPC resources, such as job scheduler submission, data formats, and building data workflows to effectively move data from scientific instruments to the HPC clusters for analysis.

  • Provide input to the Scientific Infrastructure team leader for setting priorities for cluster operations, scheduling policies, resources needed, etc.

  • Develop and maintain documentation and diagrams for the HPC clusters, review GitHub pull requests, and update content and training materials on the user wiki portal.

  • Teach and mentor team members on system design, best practices, and troubleshooting techniques.

*Based on experience, the pay range for this position is $65-80 per hour

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com .

   

To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/ .

Skills and Requirements

Education: BS/BA (or equivalent)

Required Experience: Minimum of 10 years related experience

Required Technical Skills:

  • Minimum of 5 years' experience as engineer or architect with HPC technologies

  • Hands-on architecture design experience with HPC to include storage, file system, InfiniBand, security, authentication, and compute architectures

  • Experience with Slurm job scheduling, including troubleshooting job status and optimizing submission scripts

  • Experience using Git to manage shared software configuration code bases

  • Hands-on experience with cloud-based services (e.g. Azure, AWS, GCP)

  • Minimum of five years' experience in Linux systems administration

  • Good understanding of storage administration and optimization, such as performing upgrades and defining RAID configurations

  • Good understanding of fundamental networking concepts and their practical applications

  • Experience with Spack or EasyBuild package manager, including making packages from PyPi, R, Github

  • Knowledge and experience in one or more scripting languages applicable to Linux (e.g. Bash, Perl, Python)

Security Clearance Level: Must be able to obtain a NIH Public Trust

Preferred Skills:

  • Experience administering RedHat / CentOS based systems

  • Experience working in a life-sciences oriented environment

  • Experience configuring and using monitoring systems to monitor HPC clusters

  • Ability to determine meaningful metrics and usage data for monthly status reports and health dashboards

  • Experience with DevOps or DevSecOps methodologies, such as automation and configuration management

  • Strong troubleshooting skills -Experience standing up a cluster from the ground up null

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal employment opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment without regard to race, color, ethnicity, religion,sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military oruniformed service member status, or any other status or characteristic protected by applicable laws, regulations, andordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request to HR@insightglobal.com.

DirectEmployers