Service Level Agreements (SLAs)¶
This Service Level Agreement (SLA) outlines the relationship between the CIRRUS team - who provides the on-premise cloud infrastructure - and its recognized users, including UCAR Employees, Visitors, and external collaborators authorized to use the on-premise cloud resources.
NSF NCAR | CISL operates Compute, Storage & Network hardware in robust Data Centers at multiple organizational facilities. The on-premise cloud offers users the ability to utilize those highly available, organizationally supported, compute resources for approved use cases. This includes access to routable network space and UCAR Domain Name Systems (DNS). These resources provide a supplement to computing needs that aren't fulfilled by the HPC offering, public cloud, or what is available locally.
Audience: Service Technical Staff, System Administrators, On & Off Site Personnel, and Authorized Affiliates
Recognized Customers: On & Off Site Personnel, and Authorized Affiliates
Important
Availability: The service is designed to operate 24/7. However, support is currently limited to business hours only.
Response Level and Service Definitions¶
Definitions¶
Response Times¶
Important
There is currently no after-hours support. All issues occurring after business hours will be triaged at the start of the next workday.
Backup & Disaster Recovery Policy¶
CIRRUS follows Infrastructure as Code (IaC) practices. All applications deployed on the on-prem cloud are defined via code repositories and can be redeployed as needed.
- Application Backups: Applications themselves are not backed up individually; they are re-deployed via Argo CD and source-controlled templates.
- Argo CD: Argo projects are backed up after changes, enabling project restoration in case of data loss.
- Container Images (Harbor): Images stored in Harbor are backed up to object storage and can be restored from there.
Persistent Volume Backups¶
Persistent Volumes (PVs) in CIRRUS can be replicated across sites to improve resiliency.
To request PV replication for your application, please create a ticket.
Change Management¶
All changes must be submitted via a Jira ticket. For more information on this process, please see create tickets.
Tickets are reviewed and prioritized by the CIRRUS Product Owner.
- Critical and Urgent tickets will be addressed based on SLA response times.
- Regular requests are reviewed during the team's bi-weekly planning sessions.
Contact Information¶
Business Hours: 08:00 - 17:00 MST, Monday - Friday
Primary Contact: Nick Cote
Secondary Contact: Submit a Jira Request
Off Hours Contact: Nick Cote and/or Jira Request
Monitoring & Reporting¶
For observability, the CIRRUS infrastructure leverages:
- Prometheus for metrics collection
- Grafana for visualization and dashboards
- Loki for centralized log aggregation
These tools work together to detect, surface, and alert the CIRRUS team to any operational issues within the platform.