Skip to content

CIRRUS CIRRUS

CIRRUS

CIRRUS (Cloud Infrastructure for Remote Research, Universities, and Scientists) is a Kubernetes-based cloud platform hosted at NSF NCAR. It provides flexible, scalable compute resources that complement traditional HPC systems, public cloud services, and local infrastructure.


Getting Started Workflows

Choose your path based on your needs and experience level:

Basic Uses Advanced Uses
1. Learn the Basics
Start with CIRRUS Overview
1. Review Architecture
Understand Platform Overview and Core Services
2. Understand Team Process
Review Team Interaction and Creating Tickets
2. Plan Your Deployment
Study Adding Applications and Container Registry
3. Explore Services
Try JupyterHub for interactive computing
3. Set Up CI/CD
Configure GitHub Actions and Secrets Management
4. Request Access
Submit a service request
4. Deploy Applications
Use GitOps workflow with Helm charts

Documentation Guide

This documentation is organized into focused sections to help you find what you need:

Introduction

Platform overview, services, and hardware specs. Start here to understand CIRRUS capabilities.

Interact with CIRRUS Team

Learn how to work with the CIRRUS team and get support.

  • agile methodology
    How the team works and manages requests. Use this to understand our development process.
  • create tickets
    How to submit requests and report issues. Use when you need help or want to request services.

Deploying Applications

Everything you need to containerize and deploy applications on CIRRUS.

  • create containers
    Step-by-step containerization guide. Perfect if you're new to containers or Docker.
  • adding applications
    GitOps deployment with Helm charts. Use when you're ready to deploy your application.

Container Registry

Store, manage, and secure your container images with Harbor.

  • harbor overview
    Container registry introduction. Use to understand how to store and manage container images.
  • image management
    Push/pull images, CLI usage. Use when you need to work with container images.
  • vulnerability scanner
    Security scanning for images. Use to ensure image security.

GitHub Actions

Automate your CI/CD workflows with GitHub Actions on CIRRUS.

  • runner scale sets
    Automated CI/CD setup. Use to automate builds and deployments.
  • best practices
    Security and operational guidelines. Use to secure your CI/CD pipelines.

Jupyter on CIRRUS

Interactive computing, data analysis, and research environments.

  • jupyterhub
    Interactive computing environment. Use for data analysis and research.
  • conda environments
    Custom Python environments. Use to manage dependencies.
  • gpu usage
    GPU computing with PyTorch/TensorFlow. Use for machine learning and AI workloads.
  • dask integration
    Distributed computing. Use to scale computations across nodes.
  • binder
    Reproducible research environments. Use to share interactive notebooks.

Secret Manager

Securely store and manage sensitive data like API keys and credentials.

  • openbao
    Secure credential storage. Use to manage API keys and secrets.

Service Level Agreements

Understand our service commitments and support levels.

  • slas
    Support levels and response times. Use to understand service commitments.

Frequent Issues

Troubleshooting guide for common problems and solutions.


Platform Overview

Kubernetes Foundation

CIRRUS is built on Kubernetes (K8s), the industry-standard container orchestration platform. Kubernetes provides:

  • High availability through automatic failover and load balancing
  • Self-healing infrastructure that automatically replaces failed components
  • Scalable workloads that can grow and shrink based on demand
  • Shared services like networking, storage, and security that new applications can leverage immediately

This resilient, open-source foundation makes CIRRUS ideal for hosting research applications, data analysis workflows, and collaborative tools.

Container Technology

CIRRUS applications run in containers
lightweight, portable packages that include everything needed to run an application. Containers offer:

  • Consistent environments across development, testing, and production
  • Faster deployment with pre-built dependencies
  • Resource efficiency by sharing the host operating system
  • Portability across different computing environments

Core Services

GitOps Deployment

CIRRUS uses GitOps for application deployment and management:

  1. Code repositories store application configurations as Helm charts
  2. Argo CD monitors repositories and automatically deploys changes
  3. Version control provides audit trails and rollback capabilities
  4. Collaborative workflows enable team-based development

For deployment guidance, see adding applications.

Container Registry (Harbor)

Harbor provides secure, high-performance container image storage:

  • Local hosting reduces network latency and increases transfer speeds
  • Vulnerability scanning identifies security issues in container images
  • Access control manages who can push and pull images
  • Web interface available at https://hub.k8s.ucar.edu

Learn more: container registry

Secrets Management (OpenBao)

OpenBao securely stores sensitive data like API keys and credentials:

  • Encrypted storage protects secrets at rest
  • Secure injection into applications via External Secrets Operator
  • UCAR authentication using CIT credentials
  • Web interface available at https://bao.k8s.ucar.edu

Learn more: secret manager

GitHub Actions Integration

GitHub Actions runners enable automated CI/CD workflows:

  • On-demand scaling provisions runners as needed
  • Secure execution in isolated environments
  • Direct integration with CIRRUS services
  • Container builds using remote BuildKit

Learn more: github actions

JupyterHub Environment

JupyterHub provides interactive computing capabilities:

  • GPU support with NVIDIA A2 and A10 Tensor Core GPUs
  • GLADE integration for direct access to research datasets
  • Dask Gateway for distributed computing workflows
  • Custom environments via Binder for reproducible research

Learn more: jupyter on CIRRUS


Hardware Resources

CIRRUS operates on 18 high-performance nodes, split between Mesa Lab & NWSC, providing substantial computing power:

Compute Specifications

Site-Specific Hardware

NWSC ML
GPU Nodes (5) GPU Nodes (5)
Manufacturer Supermicro Manufacturer Supermicro
Model SYS-120U-TNR Model SYS-120U-TNR
CPU Type 2 × Intel Xeon Gold 6326 CPU Type 2 × Intel Xeon Gold 6326
CPU Speed 2.90 GHz CPU Speed 2.90 GHz
CPU Cores 16 x 2 CPU Cores 16 x 2
GPU 1 × NVIDIA A2 GPU 1 × NVIDIA A10
CUDA Driver 575.51.03 CUDA Driver 575.51.03
CUDA Runtime 12.9 CUDA Runtime 12.9
RAM 512 GB RAM 512 GB
NICs 2 × 25G NICs 2 × 25G
Storage 6 × 1.6TB NVMe Storage 6 × 3.2TB NVMe
CPU Nodes (4) CPU Nodes (4)
Manufacturer Dell Manufacturer Dell
Model PowerEdge R6615 Model PowerEdge R6615
CPU Type 2 × AMD EPYC 9354P CPU Type 2 × AMD EPYC 9354P
CPU Speed 3.25 GHz CPU Speed 3.25 GHz
CPU Cores 32 x 1 CPU Cores 32 x 1
RAM 512 GB RAM 512 GB
NICs 2 × 25G NICs 2 × 25G
Storage 8 × 1.6TB NVMe Storage 8 × 1.6TB NVMe

Totals

Resource Type Total Quantity Details
CPU Cores 832 cores AMD EPYC 9354P + Intel Xeon Gold 6326
Memory 9.2 TB 512 GB per node
GPU Nodes 10 nodes 5× NVIDIA A10, 5× NVIDIA A2
Storage 246.4 TB High-speed NVMe across all nodes
Network 25G/10G High-bandwidth interconnect

Infrastructure Status

Operational (v1-beta production release)


Getting Started

Ready to deploy on CIRRUS? Here's your path forward:

  1. Review the service level agreements
  2. Containerize your application using our create containers
  3. Submit a deployment request via our create tickets
  4. Deploy using our GitOps workflow with Helm charts

Need help? The CIRRUS team is here to assist with onboarding, troubleshooting, and optimization.