Lawrence Livermore National Laboratory

Senior HPC DevOps/Storage Engineer

Location:  Livermore, CA
Category:  Science & Engineering
Organization:  Computing
Posting Requirement:  External w/ US Citizenship
Job ID: 105291
Job Code: Science & Engineering MTS 4 (SES.4)
Date Posted: May 06 2019

Share this Job

Apply Now

Apply For This Job

Join us and make YOUR mark on the World!

Come join Lawrence Livermore National Laboratory (LLNL) where we apply science and technology to make the world a safer place; now one of 2019 Best Places to Work by Glassdoor!

We have an opening for a High Performance Computing (HPC) Development Operations (DevOps) Storage Engineer. You will combine software development and systems engineering with a focus on extreme scale high performance storage that provides systems capable of storing billions of files and hundreds of petabytes. You will work with a small team of DevOps engineers and developers to help architect, deploy, and manage the High Performance Storage Systems (HPSS) that provide a reliable, massively-distributed, long-term archival system for storing our irreplaceable data. This position is in the Livermore Computing (LC) Division within the Computation Directorate.

Essential Duties
- Perform hardware/software deployments, upgrades, configuration, monitoring, management, performance tuning, and ongoing support of HPSS in LC production archives.
- Perform software design, development, testing, and deployment of HPSS client interfaces.
- Apply site reliability engineering/systems engineering practices to manage and improve one or more production aspects of HPSS.
- Develop and maintain tools and utilities that aid in the operation, automation, and reliability of software-based administrative tasks associated with LC production archives.
- Analyze and tune multiple aspects of the archival service (e.g. automated monitoring, database design, networks, large-scale disk and/or tape subsystems performance) to collaborate and develop new ideas, modify approaches, and/or redefine customer requirements that impact organizational operations or directions for our future systems.
- Independently troubleshoot, determine root cause, and fix highly complex storage system issues that may involve interfacing with various technical staff across multiple organizations with differing levels of knowledge and expertise.
- Lead the deployment of software releases, patching of the various subsystems, and third-party utilities with emphasis on overall system reliability, availability and serviceability.
- Provide 24/7 customer support as a member of a rotating call list in a fast-paced and mission-critical environment.
- Perform other duties as assigned.

- Bachelor’s degree in Computer Science or related field or the equivalent combination of education and related experience.
- Substantial experience with C, Java, Python, or Perl programming (or any high-level programming language), and/or common shell scripting environments (e.g. bash).
- Ability to engage with technical staff and end-users, requiring deep technical knowledge and critical thinking necessary to effectively work with members of the Data Storage Group, HPSS development community, other LC staff, LC end-users, and to represent the Laboratory publicly (e.g. user groups and technical conferences).
- Experience setting priorities, developing new ideas, modifying approaches, and solving highly complex problems with minimal guidance and using independent judgment in a fast-paced, rapidly changing, customer-focused team environment.
- Highly advanced skills performing Linux/UNIX or storage systems administration (including software installations, updates/patching, configuration management, system security, networking, storage allocation), and/or advanced knowledge of Linux/UNIX systems programming, kernel internals, large scale application/kernel debugging techniques, and/or software testing/quality assurance techniques.
- Substantial experience with software version control and configuration management systems such as Git, Subversion, CFEngine, Puppet, etc.
- Expert verbal and written communication skills necessary to effectively collaborate in a team environment and present and explain technical information.
- Ability to work off-hours and on-call (intermittently either as needed or as part of a rotation).

Desired Qualifications
- Master’s degree in Computer Science or related field.
- Experience with high performance computing, large scale data centers, HPSS and/or other mass storage systems.
- Knowledge of one or more storage systems hardware components (e.g. Spectra Logic or Oracle robotics, Oracle/IBM tape drives, NetApp RAID, Qlogic HBAs, direct-attach fiber).

Pre-Employment Drug Test:  External applicant(s) selected for this position will be required to pass a post-offer, pre-employment drug test.  This includes testing for use of marijuana as Federal Law applies to us as a Federal Contractor.

Security Clearance:  This position requires a Department of Energy (DOE) Q-level clearance.

If you are selected, we will initiate a Federal background investigation to determine if you meet eligibility requirements for access to classified information or matter. In addition, all L or Q cleared employees are subject to random drug testing.  Q-level clearance requires U.S. citizenship.  If you hold multiple citizenships (U.S. and another country), you may be required to renounce your non-U.S. citizenship before a DOE L or Q clearance will be processed/granted.

Note:   This is a Career Indefinite position. Lab employees and external candidates may be considered for this position.

About Us

Lawrence Livermore National Laboratory (LLNL), located in the San Francisco Bay Area (East Bay), is a premier applied science laboratory that is part of the National Nuclear Security Administration (NNSA) within the Department of Energy (DOE).  LLNL's mission is strengthening national security by developing and applying cutting-edge science, technology, and engineering that respond with vision, quality, integrity, and technical excellence to scientific issues of national importance.  The Laboratory has a current annual budget of about $2.1 billion, employing approximately 6,800 employees.


LLNL is an affirmative action/ equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, marital status, national origin, ancestry, sex, sexual orientation, gender identity, disability, medical condition, protected veteran status, age, citizenship, or any other characteristic protected by law.