General Information
Job Description | SYS ADM 4 | Working Title | Senior HPC Systems Administrator |
---|---|---|---|
Job Code | 000520 | Grade | 25 |
Department Name | RED Research Centers - D01238 | Department Head | Thomas Girke |
Supervisor | Thomas Girke | Effective Date |
Position(s) Directly Supervised
Job Code | Title | FTE |
---|---|---|
007304 | SYS ADM 3 | 1.0 |
Generic Scope
Technical leader with a high degree of knowledge in the overall field and recognized expertise in specific areas; problem-solving frequently requires analysis of unique issues/problems without precedent and/or structure. May manage programs that include formulating strategies and administering policies, processes, and resources; functions with a high degree of autonomy. |
Custom Scope
Applies advanced systems infrastructure concepts and campus, medical center or Office of the President or institutional objectives to resolve highly complex issues where analysis of situations or data requires an in-depth evaluation of variable factors. Selects methods, techniques and evaluation criteria to obtain results. Gives presentations to associated team and other technical units. Evaluates new technologies including performing moderate to complex cost/benefit analyses. May lead a team of systems/infrastructure professionals. |
Department Custom Scope
The High-Performance Computing Center (HPCC) at the University of California, Riverside (UCR) has an opening for a Senior HPC Systems Administrator. In this exciting leadership position, you will manage state-of-the-art research computing infrastructure in support of the science conducted by researchers at UCR. The Senior HPC Administrator provides technical leadership for UCR's largest high-performance computing (HPC) infrastructure, manages a complex portfolio of responsibilities at a campus-wide level, and advises upper administration on strategic decisions in research computing. |
Education & Experience Requirements
Education Requirements
Degree | Requirement |
---|---|
Bachelor's degree in related area and/or equivalent experience/training. | Required |
Experience Requirements
Experience | Requirement |
---|---|
6 - 10 years of related experience. | Required |
Experience supervising a team of computational experts. | Preferred |
Experience configuring and fine-tuning job schedulers and resource managers (Slurm, PBS, etc.). | Preferred |
Minimum of 3 years of Linux and/or HPC administration in a professional environment. | Required |
Experience with parallel programming and computing on Linux clusters using C/C++, Fortran, Python, MPI, OpenMP, multithreading and multicore technologies on CPU and GPU architectures. | Preferred |
License Requirements
Certification Requirements
Certification | Requirement |
---|
Educational Condition Requirements
Condition | Requirement |
---|
Key Responsibilities
Description | % Time |
---|---|
Applies advanced systems/infrastructure concepts to define, design and implement highly complex systems, services and technology solutions. Proposes and implements highly complex system or device enhancements such as software, hardware and network configuration, updates and installations for projects or services of broad scope.
|
40 |
Independently manages systems and services for a large facility, campuswide, medical center or Office of the President and/or institution-wide scope and makes recommendations for purchases or upgrades. Performs complex and advanced analysis to acquire, install, modify and support operating systems, databases, utilities and web-related tools. Selects methods and techniques to obtain solutions. Interacts with senior management. May perform complex network integration tasks and interoperability assessments for interconnected servers or components of clusters for communication. May lead a team of systems/infrastructure professionals.
|
30 |
Specifies, writes and executes highly complex software and scripts to support systems management, log analysis and other system administration duties for multiple, highly integrated systems. | 15 |
Maintains complex security systems. Interprets and adopts campus, medical center or Office of the President, system and regulation-based security policies to control access to networked resources. Provides recommendations and requirements on network access controls.
|
15 |
Knowledge, Skills & Abilities
Knowledge/Skill/Ability | Requirement |
---|---|
General knowledge of other areas of IT. Thorough understanding of and experience with systems-related issues and actions that can be taken to improve or correct performance. | Required |
Demonstrated skills associated with adapting equipment and technology to serve user needs. Demonstrated comprehensive understanding of how system management actions affect other systems, system users and dependent/related functions. | Required |
Advanced experience writing and editing the most complex scripts used to perform system maintenance and administration. | Required |
Basic knowledge of how to apply technologies and systems to meet business needs. | Required |
Ability to write technical documentation in a clear and concise manner. | Required |
Understanding of system performance monitoring and actions that can be taken to improve or correct performance. | Required |
Demonstrated advanced knowledge, skills and abilities associated with system problem identification and resolution. Experience with design, configuration, operation, repair, and tuning of technology systems. | Required |
Knowledge of the design, development and application of technology and systems to meet business needs. | Required |
Self-motivated and works independently and as part of a team. Demonstrates problem-solving skills. Able to learn effectively and meet deadlines. | Required |
Ability to elicit and communicate technical and non-technical information in a clear and concise manner. | Required |
Experience leading a team of IT professionals. | Required |
Advanced knowledge of computer security best practices and policies including demonstrated experience securing most complex server-based software. | Required |
Excellent team and outreach abilities to network and collaborate with key contacts outside their own area of expertise. | Required |
Fluency in two or more programming languages and environments used in research computing such as Bash, Python, C/C++, R, Java, Tensorflow, PyTorch, Jupyter Notebooks, Rstudio Server, and Matlab. | Required |
Extensive experience in instructing user workshops for HPC systems, usage of Linux environments, programming languages, and big data management using local and/or cloud computing solutions. | Preferred |
Ability and hands-on experience in developing and maintaining web-based user documentation for HPC systems. | Preferred |
Advanced understanding and hand-on experience in administering and optimizing HPC networks and switches, such as Infiniband networks. | Preferred |
Advanced understanding and hand-on experience in administering and optimizing parallel storage systems with many PBs (1PB = 1000TB) of storage space. | Preferred |
Advanced understanding and hand-on experience in administering and optimizing complex HPC systems in large user environments with hundreds of users. | Preferred |
Special Requirements & Conditions
Special Condition | Requirement |
---|---|
Must pass a background check. | Required |
Other Special Requirements & Conditions
|
Level of Supervision Received
Direction |
Environment
Working Environment
Campus |
Other Requirements
Items Used
|
Physical Requirements
|
Mental Requirements
|
Environmental Requirements
|
Critical Position
Is Critical Position: Yes |