Requisition Id 16005 Overview:
The National Center for Computational Sciences (NCCS) at Oak Ridge National Lab (ORNL), which hosts several of the world's most powerful computer systems, is seeking a highly qualified individual to play a key role in improving the security, performance, and reliability of the NCCS computing environments. This includes supporting one of the fastest supercomputers in the world, Frontier, along with numerous commodity clusters and specialized programs and partnerships. Frontier is one of the scientific research communitys most powerful computational instruments for exploring solutions to some of todays most challenging problems. As an HPC Linux Systems Engineer, you will work within the HPC Scalable Systems Group inside of the NCCS Systems Section to support numerous activities of the center.
The Scalable Systems group oversees, administers and supports system installation, deployment, acceptance, performance testing, upgrades, problem diagnosis, and troubleshooting of large-scale HPC computational resources. The Systems Section is within the National Center for Computational Sciences Division (NCCS). The HPC Systems Section is responsible for the division's computing, storage, networking, and infrastructure systems and services.
The NCCS provides state-of-the-art computational and data science infrastructure, coupled with dedicated technical and scientific professionals, to accelerate scientific discovery and engineering advances across a broad range of disciplines. NCCS hosts the Oak Ridge Leadership Computing Facility, one of DOE's National User Facilities.
Major Duties/Responsibilities:
• Install, integrate, and administer HPC Linux clusters and high-speed network • Diagnosing system operational problems quickly and effectively • Coordinating with vendors to resolve hardware and software problems • Recommending, planning, and coordinating hardware and software changes with customer participation using change management processes • Porting and writing system management tools • Documenting system administration procedures for routine and complex tasks • Participating in a 24-hour, 7-day on-call support rotation and off-hours maintenance windows • System implementation/integration into the NCCS environment and systems performance analysis
Deliver ORNL's mission by aligning behaviors, priorities, and interactions with our core values of Impact, Integrity, Teamwork, Safety, and Service.
Basic Qualifications:
• Bachelor's Degree in a scientific or technical field • 8+ years of Linux systems experience is required • Demonstrated experience working in classified environments, including a thorough understanding of security policies, compliance frameworks, and associated standard processes (e.g., NIST, DISA STIGs) • Demonstrated experience designing and deploying of HPC systems, ensuring they meet the computational needs and security requirements of a classified environment
Preferred Qualifications:
• Experience managing Linux operating systems in a large-scale system environment • Solid understanding of networked computing environment concepts • 8+ years of experience with Linux Cluster Administration • Ability to develop and maintain programs and scripts that aid in the operation and automation of administrative tasks using various shell and scripting languages (bash, Python, Go) • Experience with Lustre and GPFS file systems • Experience with batch schedulers (particularly SLURM) • Experience deploying and maintaining automated configuration management software such as Puppet • Strong interpersonal and communication skills • Ability to work as a team player • Proactive and solution-oriented problem solver • Prior project and/or team leadership experience
Special Requirement:
• Q clearance with SCI: This position requires the ability to obtain and maintain a Sensitive Compartmented Information (SCI) clearance from the Department of Energy. As such, this position is a Workplace Substance Abuse (WSAP) testing designated position. WSAP positions require passing a pre-placement drug test and participation in an ongoing random drug testing program. In addition, due to SCI, you may also be subject to random polygraph testing.
Benefits at ORNL:
As a U.S. Department of Energy (DOE) Office of Science national laboratory, ORNL has an impressive 80-year legacy of addressing the nation's most pressing challenges. Our team is made up of over 7,000 dedicated and innovative individuals! Our goal is to create an environment where a variety of perspectives and backgrounds are valued, ensuring ORNL is known as a top choice for employment. These principles are essential for supporting our broader mission to drive scientific breakthroughs and translate them into solutions for energy, environmental, and security challenges facing the nation.
ORNL offers competitive pay and benefits programs to attract and retain talented people. The laboratory offers many employee benefits, including medical and retirement plans and flexible work hours, to help you and your family live happy and healthy. Employee amenities such as on-site fitness, banking, and cafeteria facilities are also provided for convenience. In addition, we offer a flexible work environment that supports both the organization and the employee. A hybrid/onsite working arrangement may be available with this position.
Other benefits include the following:
• Prescription Drug Plan • Dental Plan • Vision Plan • 401(k) Retirement Plan • Contributory Pension Plan • Life Insurance • Disability Benefits • Generous Vacation and Holidays • Parental Leave • Legal Insurance with Identity Theft Protection • Employee Assistance Plan • Flexible Spending Accounts • Health Savings Accounts • Wellness Programs • Educational Assistance • Relocation Assistance • Employee Discounts