Lawrence Livermore National Laboratory



HPC DevOps Storage Engineer (mid-career)

Location:  Livermore, CA
Category:  Science & Engineering
Organization:  Computation
Posting Requirement:  External w/ US Citizenship
Job ID: 105290
Job Code: Science & Engineering MTS 2 (SES.2) / Science & Engineering MTS 3 (SES.3)
Date Posted: May 07 2019

Share this Job

Apply Now

Apply For This Job

Join us and make YOUR mark on the World!

Come join Lawrence Livermore National Laboratory (LLNL) where we apply science and technology to make the world a safer place; now one of 2019 Best Places to Work by Glassdoor!

We have an opening for a High-Performance Computing (HPC) Development Operations (DevOps) Storage Engineer. You will combine software development and systems engineering with a focus on extreme scale high performance storage that provides systems capable of storing billions of files and hundreds of petabytes. You will work with a small team of DevOps engineers and developers to help architect, deploy, and manage the High-Performance Storage Systems (HPSS) that provide a reliable, massively-distributed, long-term archival system for storing our irreplaceable data. This position is in the Livermore Computing (LC) Division within the Computation Directorate.

This position will be filled at either the SES.2 or SES.3 level depending on your qualifications. Additional job responsibilities (outlined below) will be assigned if you are selected at the higher level.

Essential Duties
- Perform hardware/software deployments, upgrades, configuration, monitoring, management, performance tuning, and ongoing support of HPSS in LC production archives.
- Perform software design, development, testing, and deployment of HPSS client interfaces.
- Troubleshoot, determine root cause, and fix complex storage system issues in a team of technical staff having different levels and areas of expertise.
- Apply site reliability engineering/systems engineering practices to manage and improve one or more production aspects of HPSS.
- Develop and maintain tools and utilities that aid in the operation, automation, and reliability of software-based administrative tasks associated with LC production archives.
- Monitor and manage general system health, security incidents, and other archive events.
- Participate in installation of software releases, patching of the various subsystems, and third-party utilities with emphasis on overall system reliability, availability and serviceability.
- Provide 24/7 customer support as a member of a rotating call list in a fast-paced and mission-critical environment.
- Perform other duties as assigned.
In Addition at the SES.3 Level
- Independently troubleshoot, determine root cause, and fix highly complex storage system issues that may involve interfacing with various technical staff across multiple organizations with differing levels of knowledge and expertise.
- Analyze and tune multiple aspects of archive service (e.g. database design, networks, large-scale disk and/or tape subsystems performance).
- Investigate, evaluate, test, and recommend technical solutions for future systems.

Qualifications
- Bachelor’s degree in Computer Science, Computer Engineering or related field, or the equivalent combination of education and related experience.
- Broad experience and proficiency in C, Java, Python, or Perl programming (or any high-level programming language), and/or common shell scripting environments (e.g. bash).
- Ability to engage with technical staff and end-users, requiring deep technical knowledge and critical thinking necessary to effectively work with members of the Data Storage Group, HPSS development community, other LC staff, LC end-users, and to represent the Laboratory publicly (e.g. user groups and technical conferences).
- Broad experience setting priorities and solving complex problems in a fast-paced, rapidly changing, customer-focused team environment with multiple competing priorities.
- Comprehensive skills performing Linux/UNIX or storage systems administration: software installations, updates and patching, configuration management, system security, networking, and storage allocation.
- Broad experience with software version control and configuration management systems such as Git, Subversion, CFEngine, Puppet, etc.
- Proficient verbal and written communication skills necessary to effectively collaborate in a team environment and present and explain technical information.
- Ability to work off-hours and on-call (intermittently either as needed or as part of a rotation).
In Addition at the SES.3 Level
- Advanced knowledge of Linux/UNIX systems programming and kernel internals, large scale application or kernel debugging techniques, and/or software testing/quality assurance techniques.
- Significant experience applying software engineering methods.
- Ability to provide innovative solutions to broadly defined tasks and work effectively with minimal guidance, using independent judgment.

Desired Qualifications
- Master's degree in Computer Science or related field.
- Experience with high performance computing, large scale data centers, HPSS and/or other mass storage systems.
- Knowledge of one or more storage systems hardware components (e.g. Spectra Logic or Oracle robotics, Oracle/IBM tape drives, NetApp RAID, Qlogic HBAs, direct-attach fiber).

 

Pre-Employment Drug Test:  External applicant(s) selected for this position will be required to pass a post-offer, pre-employment drug test.  This includes testing for use of marijuana as Federal Law applies to us as a Federal Contractor.

Security Clearance:  This position requires a Department of Energy (DOE) Q-level clearance.

If you are selected, we will initiate a Federal background investigation to determine if you meet eligibility requirements for access to classified information or matter. In addition, all L or Q cleared employees are subject to random drug testing.  Q-level clearance requires U.S. citizenship.  If you hold multiple citizenships (U.S. and another country), you may be required to renounce your non-U.S. citizenship before a DOE L or Q clearance will be processed/granted.

Note:   This is a Career Indefinite position. Lab employees and external candidates may be considered for this position.

About Us

Lawrence Livermore National Laboratory (LLNL), located in the San Francisco Bay Area (East Bay), is a premier applied science laboratory that is part of the National Nuclear Security Administration (NNSA) within the Department of Energy (DOE).  LLNL's mission is strengthening national security by developing and applying cutting-edge science, technology, and engineering that respond with vision, quality, integrity, and technical excellence to scientific issues of national importance.  The Laboratory has a current annual budget of about $2.1 billion, employing approximately 6,800 employees.

 

LLNL is an affirmative action/ equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, marital status, national origin, ancestry, sex, sexual orientation, gender identity, disability, medical condition, protected veteran status, age, citizenship, or any other characteristic protected by law.