Lawrence Livermore National Laboratory



High Performance Computing (HPC) Systems Engineer

Location:  Livermore, CA
Category:  Science & Engineering
Organization:  Computation
Posting Requirement:  External w/ US Citizenship
Job ID: 105137
Job Code: Science & Engineering MTS 2 (SES.2) / Science & Engineering MTS 3 (SES.3)
Date Posted: April 03 2019

Share this Job

Apply Now

Apply For This Job

Join us and make YOUR mark on the World!

Come join Lawrence Livermore National Laboratory (LLNL) where we apply science and technology to make the world a safer place and now one of 2019 Best Places to Work by Glassdoor!

Do you love High Performance Computing (HPC)?  Would you like to work with four of the fastest HPC systems in the world?

We have an opening for a High Performance Computing (HPC) system engineer to support HPC clusters.  You will apply comprehensive knowledge of HPC systems, including numerous high-speed, multi-petabyte Lustre file systems comprised of Linux servers and high performance RAID arrays all connected via Ethernet and Infiniband SANs.  You will independently contribute to technical projects using creativity and imagination.  This position is in the Livermore Computing (LC) Division within the Computation Directorate, supporting the LC Supercomputing Center.

This position will be filled at either the SES.2 or SES.3 level depending on your qualifications. Additional job responsibilities (outlined below) will be assigned if you are selected at the higher level.

Essential Duties
- Provide System Administration support for Linux-based HPC, Network Attached Storage (NAS) systems, Infrastructure and Parallel file systems servers and clusters. 
- Participate in the design and implementation of multiple Linux-based HPC, Infrastructure and Parallel file system servers and clusters.
- Build, configure, and maintain multiple RAID controllers and disk enclosures systems.
- Deploy and maintain Infiniband fabrics for compute and storage networks.
- Monitor installation of software releases, patches of the operating system, and third-party utilities with emphasis on overall system security.
- Work with other system engineers, Hotline, and Operations staff to improve the quality of service for end users.
- Troubleshoot and determine root cause of moderately complex system issues.
- Respond to system problems and user questions in person, via email, and via a trouble ticket system.
- Perform other duties as assigned.
In Addition at the SES.3 Level
- Analyze and tune performance of complex computer, network, file system and disk sub-systems.
- Investigate, evaluate, test and recommend technical solutions for future systems.
- Develop tools and procedures to monitor and automate system tasks on servers and clusters.

Qualifications
- Bachelor’s degree in computer science or related field or the equivalent combination of education and related experience.
- Broad experience with Linux/Unix systems including installation, configuration, networking, backups, updates and patching, and system security.
- Broad experience with or knowledge of HPC environments and technologies such as Infiniband, Slurm, Lustre.
- Comprehensive knowledge of scripting and programming languages, such as Perl, Python, and bash/csh/ksh.
- Proficient with disk and storage systems, such as host-based RAID controllers, software RAID and vendor RAID systems (e.g. NetApp, Raid Inc, DDN, etc.)
- Comprehensive experience with version control and configuration management systems, such as Subversion, git, puppet, cfengine, etc.
- Ability to work off-hours and on-call (intermittently either as needed or as part of a rotation).
- Proficient communication, interpersonal skills, and the ability to work and communicate with other technical staff and end-users.
In Addition at the SES.3 Level
- Significant experience with Linux/UNIX system administration in support of a number of independent but inter-related systems and software packages, containers, Kubernetes, virtualization environments and tools, such as KVM, VMWare, etc.
- Advanced knowledge of and significant experience providing innovative solutions to broadly defined tasks and problems.
- Advanced communication, interpersonal skills, and the ability to effectively interact with system developers and vendors with minimal direction.

Desired Qualifications
- Master’s degree in computer science or related field.
- Experience with local, parallel and distributed file systems such as XFS, ZFS, GPFS, Lustre, and with NAS platforms such as NetApp (cDot).
- Experience with OpenStack, Docker containers, Kubernetes ecosystems, and current RedHat certifications.

Pre-Employment Drug Test:  External applicant(s) selected for this position will be required to pass a post-offer, pre-employment drug test.  This includes testing for use of marijuana as Federal Law applies to us as a Federal Contractor.

Security Clearance:  This position requires a Department of Energy (DOE) Q-level clearance.

If you are selected, we will initiate a Federal background investigation to determine if you meet eligibility requirements for access to classified information or matter. In addition, all L or Q cleared employees are subject to random drug testing.  Q-level clearance requires U.S. citizenship.  If you hold multiple citizenships (U.S. and another country), you may be required to renounce your non-U.S. citizenship before a DOE L or Q clearance will be processed/granted.

Note:   This is a Career Indefinite position. Lab employees and external candidates may be considered for this position.

About Us

Lawrence Livermore National Laboratory (LLNL), located in the San Francisco Bay Area (East Bay), is a premier applied science laboratory that is part of the National Nuclear Security Administration (NNSA) within the Department of Energy (DOE).  LLNL's mission is strengthening national security by developing and applying cutting-edge science, technology, and engineering that respond with vision, quality, integrity, and technical excellence to scientific issues of national importance.  The Laboratory has a current annual budget of about $2.1 billion, employing approximately 6,800 employees.

 

LLNL is an affirmative action/ equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, marital status, national origin, ancestry, sex, sexual orientation, gender identity, disability, medical condition, protected veteran status, age, citizenship, or any other characteristic protected by law.