General Information
Job Description | SYS ADM 3 | Working Title | HPC Systems Administrator |
---|---|---|---|
Job Code | 007304 | Grade | 23 |
Department Name | RED Research Centers - D01238 | Department Head | Thomas Girke |
Supervisor | Thomas Girke | Effective Date | 01/23/2023 |
Position(s) Directly Supervised
Job Code | Title | FTE |
---|
Generic Scope
Experienced professional who knows how to apply theory and put it into practice with in-depth understanding of the professional field; independently performs the full range of responsibilities within the function; possesses broad job knowledge; analyzes problems/issues of diverse scope and determines solutions. |
Custom Scope
Applies skills as a seasoned, experienced systems infrastructure professional with a full understanding of industry best practices and campus, medical center or Office of the President policies and procedures to resolve a wide range of issues that are moderately complex in scope. Selects methods and techniques to obtain solutions. Evaluates new technologies including performing simple to moderate cost/benefit analyses. |
Department Custom Scope
As a team member in the High-Performance Computing Center (HPCC), the HPC Systems Administrator will manage state-of-the-art research computing infrastructure in support of the science conducted by researchers at UCR. The HPCC enables cutting-edge research in a wide range of science, engineering and biomedical disciplines by providing the computing hardware, software and expertise to enable pioneering discoveries. The incumbent will provide HPC user services support to university researchers including porting, compiling, and debugging applications to run on Linux clusters; performing benchmarking activities to test new systems; developing specifications for procurement, and acceptance testing. Additionally the HPC System Administrator will profile, analyze and optimize scientific applications for best performance; develop in-house tools and implement third-party software to support operating systems, high performance interconnects, MPI, scientific applications and libraries; provide email support for scientific users doing parallel and serial computation on HPCC?s clusters. This includes providing assistance with account management, use of systems, use of compilers, software and tools, job submission to scheduler, use of parallel file systems, and transferring data to and from system. The incumbent will hold training sessions to teach users best practices for running applications on large systems; develop and publish user and technical documentation on the use of systems and assist Linux cluster administrators with troubleshooting and repair of Linux clusters; and communicate with user groups, analyze usage, develop forecasts, help with computation, storage and data management planning to meet user needs, identification of new technologies, standards, and architectures of relevance to providing scientific computing for research, define requirements for future HPC systems. |
Education & Experience Requirements
Education Requirements
Degree | Requirement |
---|---|
Bachelor's degree in related area and/or equivalent experience/training. | Required |
Advanced degree. | Preferred |
Experience Requirements
Experience | Requirement |
---|---|
4 - 7 years of related experience. | Required |
Professional experience with the installation and systems administration of high-performance compute (HPC) systems including Linux clusters and parallel data storage systems based on Lustre, GPFS or similar. | Required |
Professional experience with a range of programming languages relevant for systems administration and HPC, including Bash, Python, Perl and C/C++. | Required |
Experience with Linux system administration in a professional environment including installing, repairing and upgrading server and network hardware such as InfiniBand networks. | Required |
License Requirements
Certification Requirements
Certification | Requirement |
---|
Educational Condition Requirements
Condition | Requirement |
---|
Key Responsibilities
Description | % Time |
---|---|
Writes and executes complex scripts and may write software in support of systems management, log analysis and other system administration duties for multiple integrated systems. | 20 |
Performs complex security control activities to prevent unauthorized access to networked resources. May assist with maintenance of security systems for network equipment and provide recommendations on network access controls.
|
20 |
Compiles software using multiple compilers including support for MPI, MKL, LAPACK, BLAS, and BOOST. Optimizes kernel level issues, user accounts and driver compatibility. | 20 |
Defines, designs and implements systems, services and technology solutions. Proposes and implements system or device enhancements such as software, hardware and network configuration, updates and installations for projects or services of moderately complex scope. | 15 |
Manages systems and services for a facility of moderate size and makes recommendations for purchase or upgrade of new computer hardware, software and services. Performs moderately complex analysis to acquire, install, modify and support operating systems, databases, utilities and Internet / intranet-related tools. Plans, designs and implements moderately complex system updates and rollouts. May perform moderately complex networking tasks and interoperability assessments for interconnected servers or components of clusters for communication.
|
15 |
Provides training for users of UCR's HPC infrastructure in the form of in person training, instruction of user workshops, and development of online user manuals. | 10 |
Knowledge, Skills & Abilities
Knowledge/Skill/Ability | Requirement |
---|---|
Demonstrated skills associated with adapting equipment and technology to serve user needs. Demonstrated comprehensive understanding of how system management actions affect other systems, system users and dependent/related functions. | Required |
Understanding of system performance monitoring and actions that can be taken to improve or correct performance. | Required |
Advanced knowledge of computer security best practices and policies including demonstrated experience securing server-based software. | Required |
Ability to write technical documentation in a clear and concise manner. | Required |
Ability to elicit and communicate technical and non-technical information in a clear and concise manner. | Required |
Knowledge of the design, development and application of technology and systems to meet business needs. | Required |
General knowledge of other areas of IT. Thorough understanding of and experience with systems-related issues and actions that can be taken to improve or correct performance. | Required |
Self-motivated and works independently and as part of a team. Demonstrates problem-solving skills. Able to learn effectively and meet deadlines. | Required |
Basic knowledge of how to apply technologies and systems to meet business needs. | Required |
Demonstrated experience writing and editing complex scripts used to perform system maintenance and administration. | Required |
Knowledge of software management and compilation under Linux OSs and their optimization for HPC environments. | Preferred |
Knowledge of Linux systems, kernels, and architectures of HPC systems, networks and large-scale storage systems. | Preferred |
Knowledge of queuing systems and workload managing software, such as Slurm, Torque, SGE or similar. | Preferred |
Special Requirements & Conditions
Special Condition | Requirement |
---|---|
Must pass a background check. | Required |
Other Special Requirements & Conditions
|
Level of Supervision Received
General Supervision |
Environment
Working Environment
Campus |
Other Requirements
Items Used
|
Physical Requirements
|
Mental Requirements
|
Environmental Requirements
|
Critical Position
Is Critical Position: Yes |