Fix NPE in NASBackupProvider when no running KVM host is available by jmsperu · Pull Request #12805 · apache/cloudstack · GitHub
Skip to content

Fix NPE in NASBackupProvider when no running KVM host is available#12805

Merged
sureshanaparti merged 3 commits into
apache:4.22from
jmsperu:fix/nas-backup-null-host-npe
Mar 27, 2026
Merged

Fix NPE in NASBackupProvider when no running KVM host is available#12805
sureshanaparti merged 3 commits into
apache:4.22from
jmsperu:fix/nas-backup-null-host-npe

Conversation

@jmsperu

@jmsperu jmsperu commented Mar 12, 2026

Copy link
Copy Markdown
Collaborator

Rebased onto 4.22 as requested (previously #12680).

ResourceManager.findOneRandomRunningHostByHypervisor() can return null when no KVM host in the zone has status=Up (e.g. during management server startup, brief agent disconnections, or host state transitions).

NASBackupProvider.syncBackupStorageStats() and deleteBackup() call host.getId() without a null check, causing a NullPointerException that crashes the entire BackupSyncTask background job every sync interval.

This adds null checks in both methods:

  • syncBackupStorageStats: log a warning and return early
  • deleteBackup: throw CloudRuntimeException with a descriptive message

ResourceManager.findOneRandomRunningHostByHypervisor() can return null
when no KVM host in the zone has status=Up (e.g. during management
server startup, brief agent disconnections, or host state transitions).

NASBackupProvider.syncBackupStorageStats() and deleteBackup() call
host.getId() without a null check, causing a NullPointerException that
crashes the entire BackupSyncTask background job every sync interval.

This adds null checks in both methods:
- syncBackupStorageStats: log a warning and return early
- deleteBackup: throw CloudRuntimeException with a descriptive message
@DaanHoogland

Copy link
Copy Markdown
Contributor

@blueorangutan

Copy link
Copy Markdown

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@codecov

codecov Bot commented Mar 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.60%. Comparing base (27bce46) to head (b3cb72f).
⚠️ Report is 12 commits behind head on 4.22.

Files with missing lines Patch % Lines
...rg/apache/cloudstack/backup/NASBackupProvider.java 0.00% 4 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.22   #12805      +/-   ##
============================================
- Coverage     17.61%   17.60%   -0.01%     
- Complexity    15662    15674      +12     
============================================
  Files          5917     5917              
  Lines        531415   531606     +191     
  Branches      64973    64996      +23     
============================================
+ Hits          93588    93601      +13     
- Misses       427271   427446     +175     
- Partials      10556    10559       +3     
Flag Coverage Δ
uitests 3.70% <ø> (-0.01%) ⬇️
unittests 18.67% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 17121

@DaanHoogland

Copy link
Copy Markdown
Contributor

@jmsperu , probably due to the rebase:

20:18:12 [ERROR] /jenkins/workspace/acs-centos8-pkg-builder/dist/rpmbuild/BUILD/cloudstack-4.22.1.0-SNAPSHOT/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java:[558,12] error: cannot find symbol
20:18:12   symbol:   variable CollectionUtils
20:18:12   location: class NASBackupProvider

seems an import is missing.

@jmsperu

jmsperu commented Mar 18, 2026

Copy link
Copy Markdown
Collaborator Author

Good catch @DaanHoogland, the CollectionUtils import was dropped during the rebase. Fixed now — should build cleanly.

@DaanHoogland DaanHoogland added this to the 4.22.1 milestone Mar 19, 2026
@DaanHoogland

Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17198

@DaanHoogland DaanHoogland requested review from abh1sar and Copilot March 19, 2026 15:37
@DaanHoogland

Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a crash in the NAS backup provider’s background sync/deletion flows when no running KVM host is available in a zone (i.e., ResourceManager.findOneRandomRunningHostByHypervisor(...) returns null), preventing BackupSyncTask from dying due to NullPointerException.

Changes:

  • Add a null-host check in deleteBackup() and fail with a descriptive CloudRuntimeException.
  • Add early returns in syncBackupStorageStats() when there are no repositories and when no running KVM host can be found (with a warning log).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +475 to +477
if (host == null) {
throw new CloudRuntimeException(String.format("Unable to find a running KVM host in zone %d to delete backup %s", backup.getZoneId(), backup.getUuid()));
}
Comment on lines 562 to +566
final Host host = resourceManager.findOneRandomRunningHostByHypervisor(Hypervisor.HypervisorType.KVM, zoneId);
if (host == null) {
logger.warn("Unable to find a running KVM host in zone {} to sync backup storage stats", zoneId);
return;
}
@blueorangutan

Copy link
Copy Markdown

[SF] Trillian test result (tid-15701)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 70450 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr12805-t15701-kvm-ol8.zip
Smoke tests completed. 141 look OK, 8 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_invalid_upgrade_kubernetes_cluster Failure 0.01 test_kubernetes_clusters.py
test_02_upgrade_kubernetes_cluster Failure 0.00 test_kubernetes_clusters.py
test_03_deploy_and_scale_kubernetes_cluster Failure 0.00 test_kubernetes_clusters.py
test_04_autoscale_kubernetes_cluster Failure 0.00 test_kubernetes_clusters.py
test_05_basic_lifecycle_kubernetes_cluster Failure 0.00 test_kubernetes_clusters.py
test_06_delete_kubernetes_cluster Failure 0.01 test_kubernetes_clusters.py
test_08_upgrade_kubernetes_ha_cluster Failure 0.00 test_kubernetes_clusters.py
test_10_vpc_tier_kubernetes_cluster Failure 0.00 test_kubernetes_clusters.py
test_11_test_unmanaged_cluster_lifecycle Error 0.00 test_kubernetes_clusters.py
test_12_test_deploy_cluster_different_offerings_per_node_type Failure 0.00 test_kubernetes_clusters.py
ContextSuite context=TestPrivateGwACL>:setup Error 0.00 test_privategw_acl.py
test_02_purge_expunged_api_vm_end_date Error 3.21 test_purge_expunged_vms.py
test_03_purge_expunged_api_vm_start_end_date Error 2.02 test_purge_expunged_vms.py
test_04_purge_expunged_api_vm_no_date Error 2.13 test_purge_expunged_vms.py
test_05_purge_expunged_vm_service_offering Error 1.50 test_purge_expunged_vms.py
test_06_purge_expunged_vm_background_task Error 361.51 test_purge_expunged_vms.py
test_CRUD_operations_userdata Error 1523.56 test_register_userdata.py
test_deploy_vm_with_registered_userdata Error 7.95 test_register_userdata.py
test_deploy_vm_with_registered_userdata_with_override_policy_allow Error 10.00 test_register_userdata.py
test_deploy_vm_with_registered_userdata_with_override_policy_append Error 10.46 test_register_userdata.py
test_deploy_vm_with_registered_userdata_with_override_policy_deny Error 10.10 test_register_userdata.py
test_deploy_vm_with_registered_userdata_with_params Error 9.14 test_register_userdata.py
test_link_and_unlink_userdata_to_template Error 8.74 test_register_userdata.py
test_user_userdata_crud Error 8.99 test_register_userdata.py
ContextSuite context=TestIsolatedNetworks>:setup Error 0.00 test_routers_network_ops.py
ContextSuite context=TestRedundantIsolateNetworks>:setup Error 0.00 test_routers_network_ops.py
ContextSuite context=TestRouterServices>:setup Error 0.00 test_routers.py
ContextSuite context=TestCpuCapServiceOfferings>:setup Error 0.00 test_service_offerings.py
ContextSuite context=TestServiceOfferings>:setup Error 0.29 test_service_offerings.py
ContextSuite context=TestSetSourceNatIp>:setup Error 0.00 test_set_sourcenat.py

@jmsperu

jmsperu commented Mar 20, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks @DaanHoogland for driving the packaging and smoke tests. All builds green and the 8 test failures are pre-existing (Kubernetes, userdata, private gw ACL) — unrelated to this change.

Is there anything else needed to get this into 4.22.1?

@jmsperu

jmsperu commented Mar 26, 2026

Copy link
Copy Markdown
Collaborator Author

@DaanHoogland @abh1sar Gentle ping — builds and tests are green. Is this ready to merge into 4.22?

@sureshanaparti

Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@abh1sar abh1sar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

@abh1sar

abh1sar commented Mar 27, 2026

Copy link
Copy Markdown
Contributor

@DaanHoogland @abh1sar Gentle ping — builds and tests are green. Is this ready to merge into 4.22?

@jmsperu Please address the comment about import ordering. We'll get this merged after that.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17264

@rajujith rajujith left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Tested

@sureshanaparti

Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17267

@sureshanaparti sureshanaparti merged commit 6ca6aa1 into apache:4.22 Mar 27, 2026
24 of 26 checks passed
@boring-cyborg

boring-cyborg Bot commented Mar 27, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants