General Information


Job Description SYS ADM 3 Working Title HPC Systems Administrator
Job Code 007304 Grade 23
Department Name RED Research Centers - D01238 Department Head Thomas Girke
Supervisor Thomas Girke Effective Date 01/23/2023
Position(s) Directly Supervised
Job Code Title FTE

Generic Scope
Experienced professional who knows how to apply theory and put it into practice with in-depth understanding of the professional field; independently performs the full range of responsibilities within the function; possesses broad job knowledge; analyzes problems/issues of diverse scope and determines solutions.

Custom Scope
Applies skills as a seasoned, experienced systems infrastructure professional with a full understanding of industry best practices and campus, medical center or Office of the President policies and procedures to resolve a wide range of issues that are moderately complex in scope. Selects methods and techniques to obtain solutions. Evaluates new technologies including performing simple to moderate cost/benefit analyses.

Department Custom Scope
As a team member in the High-Performance Computing Center (HPCC), the HPC Systems Administrator will manage state-of-the-art research computing infrastructure in support of the science conducted by researchers at UCR. The HPCC enables cutting-edge research in a wide range of science, engineering and biomedical disciplines by providing the computing hardware, software and expertise to enable pioneering discoveries. The incumbent will provide HPC user services support to university researchers including porting, compiling, and debugging applications to run on Linux clusters; performing benchmarking activities to test new systems; developing specifications for procurement, and acceptance testing. Additionally the HPC System Administrator will profile, analyze and optimize scientific applications for best performance; develop in-house tools and implement third-party software to support operating systems, high performance interconnects, MPI, scientific applications and libraries; provide email support for scientific users doing parallel and serial computation on HPCC?s clusters. This includes providing assistance with account management, use of systems, use of compilers, software and tools, job submission to scheduler, use of parallel file systems, and transferring data to and from system. The incumbent will hold training sessions to teach users best practices for running applications on large systems; develop and publish user and technical documentation on the use of systems and assist Linux cluster administrators with troubleshooting and repair of Linux clusters; and communicate with user groups, analyze usage, develop forecasts, help with computation, storage and data management planning to meet user needs, identification of new technologies, standards, and architectures of relevance to providing scientific computing for research, define requirements for future HPC systems.

Education & Experience Requirements

Education Requirements
Degree Requirement
Bachelor's degree in related area and/or equivalent experience/training. Required
Advanced degree. Preferred

Experience Requirements
Experience Requirement
4 - 7 years of related experience. Required
Professional experience with the installation and systems administration of high-performance compute (HPC) systems including Linux clusters and parallel data storage systems based on Lustre, GPFS or similar. Required
Professional experience with a range of programming languages relevant for systems administration and HPC, including Bash, Python, Perl and C/C++. Required
Experience with Linux system administration in a professional environment including installing, repairing and upgrading server and network hardware such as InfiniBand networks. Required

License Requirements

Certification Requirements
Certification Requirement

Educational Condition Requirements
Condition Requirement

Key Responsibilities

Description % Time
Writes and executes complex scripts and may write software in support of systems management, log analysis and other system administration duties for multiple integrated systems.
20
Performs complex security control activities to prevent unauthorized access to networked resources. May assist with maintenance of security systems for network equipment and provide recommendations on network access controls.
  • Maintains and optimizes the facility's network and security infrastructure. This includes the management of access control and information exchange protocols used by the different services offered by the facility to internal (UCR-wide) and external users.
20
Compiles software using multiple compilers including support for MPI, MKL, LAPACK, BLAS, and BOOST. Optimizes kernel level issues, user accounts and driver compatibility.
20
Defines, designs and implements systems, services and technology solutions. Proposes and implements system or device enhancements such as software, hardware and network configuration, updates and installations for projects or services of moderately complex scope.
15
Manages systems and services for a facility of moderate size and makes recommendations for purchase or upgrade of new computer hardware, software and services. Performs moderately complex analysis to acquire, install, modify and support operating systems, databases, utilities and Internet / intranet-related tools. Plans, designs and implements moderately complex system updates and rollouts. May perform moderately complex networking tasks and interoperability assessments for interconnected servers or components of clusters for communication.
  • Performs systems administration, maintenance and upgrades of a high-performance research computer (HPC) cluster and parallel storage system with several petabytes (PBs) of disk storage used by over 300 scientists at UCR. This includes the installation and troubleshooting of Linux/Unix OSs, node deployment, and optimization of hardware for data storage and InfiniBand networks.
15
Provides training for users of UCR's HPC infrastructure in the form of in person training, instruction of user workshops, and development of online user manuals.
10

Knowledge, Skills & Abilities

Knowledge/Skill/Ability Requirement
Demonstrated skills associated with adapting equipment and technology to serve user needs. Demonstrated comprehensive understanding of how system management actions affect other systems, system users and dependent/related functions. Required
Understanding of system performance monitoring and actions that can be taken to improve or correct performance. Required
Advanced knowledge of computer security best practices and policies including demonstrated experience securing server-based software. Required
Ability to write technical documentation in a clear and concise manner. Required
Ability to elicit and communicate technical and non-technical information in a clear and concise manner. Required
Knowledge of the design, development and application of technology and systems to meet business needs. Required
General knowledge of other areas of IT. Thorough understanding of and experience with systems-related issues and actions that can be taken to improve or correct performance. Required
Self-motivated and works independently and as part of a team. Demonstrates problem-solving skills. Able to learn effectively and meet deadlines. Required
Basic knowledge of how to apply technologies and systems to meet business needs. Required
Demonstrated experience writing and editing complex scripts used to perform system maintenance and administration. Required
Knowledge of software management and compilation under Linux OSs and their optimization for HPC environments. Preferred
Knowledge of Linux systems, kernels, and architectures of HPC systems, networks and large-scale storage systems. Preferred
Knowledge of queuing systems and workload managing software, such as Slurm, Torque, SGE or similar. Preferred

Special Requirements & Conditions
Special Condition Requirement
Must pass a background check. Required

Other Special Requirements & Conditions

Level of Supervision Received
General Supervision

Environment

Working Environment
Campus

Other Requirements

Items Used
  • Standard Office Equipment

Physical Requirements
  • Climb : N/A
  • Sit : Frequently
  • Crawl : N/A
  • Walk : Occasionally
  • Bend : Occasionally
  • Squat : N/A
  • Stand : Occasionally

Mental Requirements
  • Read/Comprehend : Frequently
  • Write : Frequently
  • Reason & Analyze : Frequently
  • Perform Calculations : Frequently
  • Communicate Orally : Frequently

Environmental Requirements
  • Is around moving machinery : No
  • Fumes : No
  • Is exposed to marked changes in temperature and/or humidity : No
  • Is exposed to excessive noise : No
  • Drives motorized equipment : No
  • Dust : No
  • Works in confined quarters : No

Critical Position

Is Critical Position: Yes

More Information

General Campus Information

University of California, Riverside
900 University Ave.
Riverside, CA 92521
Tel: (951) 827-1012

Career OpportunitiesUCR Libraries
Campus StatusMaps and Directions

Department Information

Human Resources
1160 University Ave.
Riverside, CA 92521

Fax: (951) 827-6493
E-mail: jobshelp@ucr.edu

Footer