General Information


Job Description SYS ADM 4 Working Title Senior HPC Systems Administrator
Job Code 000520 Grade 25
Department Name RED Research Centers - D01238 Department Head Thomas Girke
Supervisor Thomas Girke Effective Date
Position(s) Directly Supervised
Job Code Title FTE
007304 SYS ADM 3 1.0

Generic Scope
Technical leader with a high degree of knowledge in the overall field and recognized expertise in specific areas; problem-solving frequently requires analysis of unique issues/problems without precedent and/or structure. May manage programs that include formulating strategies and administering policies, processes, and resources; functions with a high degree of autonomy.

Custom Scope
Applies advanced systems infrastructure concepts and campus, medical center or Office of the President or institutional objectives to resolve highly complex issues where analysis of situations or data requires an in-depth evaluation of variable factors. Selects methods, techniques and evaluation criteria to obtain results. Gives presentations to associated team and other technical units. Evaluates new technologies including performing moderate to complex cost/benefit analyses. May lead a team of systems/infrastructure professionals.

Department Custom Scope
The High-Performance Computing Center (HPCC) at the University of California, Riverside (UCR) has an opening for a Senior HPC Systems Administrator. In this exciting leadership position, you will manage state-of-the-art research computing infrastructure in support of the science conducted by researchers at UCR. The Senior HPC Administrator provides technical leadership for UCR's largest high-performance computing (HPC) infrastructure, manages a complex portfolio of responsibilities at a campus-wide level, and advises upper administration on strategic decisions in research computing.

Education & Experience Requirements

Education Requirements
Degree Requirement
Bachelor's degree in related area and/or equivalent experience/training. Required

Experience Requirements
Experience Requirement
6 - 10 years of related experience. Required
Experience supervising a team of computational experts. Preferred
Experience configuring and fine-tuning job schedulers and resource managers (Slurm, PBS, etc.). Preferred
Minimum of 3 years of Linux and/or HPC administration in a professional environment. Required
Experience with parallel programming and computing on Linux clusters using C/C++, Fortran, Python, MPI, OpenMP, multithreading and multicore technologies on CPU and GPU architectures. Preferred

License Requirements

Certification Requirements
Certification Requirement

Educational Condition Requirements
Condition Requirement

Key Responsibilities

Description % Time
Applies advanced systems/infrastructure concepts to define, design and implement highly complex systems, services and technology solutions. Proposes and implements highly complex system or device enhancements such as software, hardware and network configuration, updates and installations for projects or services of broad scope.
  • Manages UCR's largest pool of research data (many PBs). Develops and implements load balancing solutions to assure optimal allocation of computing resources by thousands of data processing tasks. Optimizes parallel data storage systems and data backups. Integrates and optimizes complex internal HPC networks to support massive numbers of parallel computing tasks.
40
Independently manages systems and services for a large facility, campuswide, medical center or Office of the President and/or institution-wide scope and makes recommendations for purchases or upgrades. Performs complex and advanced analysis to acquire, install, modify and support operating systems, databases, utilities and web-related tools. Selects methods and techniques to obtain solutions. Interacts with senior management. May perform complex network integration tasks and interoperability assessments for interconnected servers or components of clusters for communication. May lead a team of systems/infrastructure professionals.
  • Gives presentations on HPC infrastructure to UCR's upper administration, including Vice Chancellor for Research and Economic Development. Supervises staff and students working for the HPCC. Provides training of data analysis programmers from several data analysis cores in parallel computing.
30
Specifies, writes and executes highly complex software and scripts to support systems management, log analysis and other system administration duties for multiple, highly integrated systems.
15
Maintains complex security systems. Interprets and adopts campus, medical center or Office of the President, system and regulation-based security policies to control access to networked resources. Provides recommendations and requirements on network access controls.
  • Develops data security and data governance policies. Develops new user policies and standards for managing large research data sets and data security protocols. Oversees development of user tutorials and public teaching material for using the resources of UCR's HPCC efficiently.
15

Knowledge, Skills & Abilities

Knowledge/Skill/Ability Requirement
General knowledge of other areas of IT. Thorough understanding of and experience with systems-related issues and actions that can be taken to improve or correct performance. Required
Demonstrated skills associated with adapting equipment and technology to serve user needs. Demonstrated comprehensive understanding of how system management actions affect other systems, system users and dependent/related functions. Required
Advanced experience writing and editing the most complex scripts used to perform system maintenance and administration. Required
Basic knowledge of how to apply technologies and systems to meet business needs. Required
Ability to write technical documentation in a clear and concise manner. Required
Understanding of system performance monitoring and actions that can be taken to improve or correct performance. Required
Demonstrated advanced knowledge, skills and abilities associated with system problem identification and resolution. Experience with design, configuration, operation, repair, and tuning of technology systems. Required
Knowledge of the design, development and application of technology and systems to meet business needs. Required
Self-motivated and works independently and as part of a team. Demonstrates problem-solving skills. Able to learn effectively and meet deadlines. Required
Ability to elicit and communicate technical and non-technical information in a clear and concise manner. Required
Experience leading a team of IT professionals. Required
Advanced knowledge of computer security best practices and policies including demonstrated experience securing most complex server-based software. Required
Excellent team and outreach abilities to network and collaborate with key contacts outside their own area of expertise. Required
Fluency in two or more programming languages and environments used in research computing such as Bash, Python, C/C++, R, Java, Tensorflow, PyTorch, Jupyter Notebooks, Rstudio Server, and Matlab. Required
Extensive experience in instructing user workshops for HPC systems, usage of Linux environments, programming languages, and big data management using local and/or cloud computing solutions. Preferred
Ability and hands-on experience in developing and maintaining web-based user documentation for HPC systems. Preferred
Advanced understanding and hand-on experience in administering and optimizing HPC networks and switches, such as Infiniband networks. Preferred
Advanced understanding and hand-on experience in administering and optimizing parallel storage systems with many PBs (1PB = 1000TB) of storage space. Preferred
Advanced understanding and hand-on experience in administering and optimizing complex HPC systems in large user environments with hundreds of users. Preferred

Special Requirements & Conditions
Special Condition Requirement
Must pass a background check. Required

Other Special Requirements & Conditions

Level of Supervision Received
Direction

Environment

Working Environment
Campus

Other Requirements

Items Used
  • Standard Office Equipment
  • HPC servers and equipment

Physical Requirements
  • Bend : Occasionally
  • Sit : Constantly
  • Squat : Occasionally
  • Stand : Occasionally
  • Crawl : Occasionally
  • Walk : Frequently
  • Climb : N/A

Mental Requirements
  • Read/Comprehend : Constantly
  • Write : Constantly
  • Perform Calculations : Constantly
  • Communicate Orally : Constantly
  • Reason & Analyze : Constantly

Environmental Requirements
  • Is exposed to excessive noise : Yes
  • Is around moving machinery : No
  • Is exposed to marked changes in temperature and/or humidity : No
  • Drives motorized equipment : No
  • Works in confined quarters : No
  • Dust : No
  • Fumes : No
  • Other : There is occasional noise exposure due to cooling systems running in server room(s). The noise level is moderate.

Critical Position

Is Critical Position: Yes

More Information

General Campus Information

University of California, Riverside
900 University Ave.
Riverside, CA 92521
Tel: (951) 827-1012

Career OpportunitiesUCR Libraries
Campus StatusMaps and Directions

Department Information

Human Resources
1160 University Ave.
Riverside, CA 92521

Fax: (951) 827-6493
E-mail: jobshelp@ucr.edu

Footer