feat(rest-api): Support for adding Tenant information when reported Issue for Delete Instance#2829
feat(rest-api): Support for adding Tenant information when reported Issue for Delete Instance#2829hwadekar-nv wants to merge 1 commit into
Conversation
…ssue for Delete Instance
0fc7298 to
b086bab
Compare
Summary by CodeRabbit
WalkthroughA canonical delete-attribution JSON payload structure ( ChangesInstance Delete Attribution
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
|
🌿 Preview your docs: https://nvidia-preview-pull-request-2829.docs.buildwithfern.com/infra-controller |
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-06-23 23:40:21 UTC | Commit: 0fc7298 |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rest-api/api/pkg/api/handler/instance_test.go`:
- Around line 9726-9750: Replace the conditional guard that checks if tsc.Calls
has elements and if Arguments has sufficient length with mandatory require
statements. Use require.NotEmpty(t, tsc.Calls) to ensure the test fails if calls
are missing when MachineHealthIssue is present, then safely access the Arguments
array without the length check. This makes the workflow-attribution assertions
mandatory rather than allowing them to be silently skipped, ensuring regressions
are caught deterministically.
In `@rest-api/api/pkg/api/handler/instance.go`:
- Around line 4906-4909: The error handling block in the transaction closure
(where derr is checked) is returning cutil.NewAPIErrorResponse(c,
http.StatusInternalServerError, "Failed to delete Instance", nil), but inside
WithTx closures you should return cutil.NewAPIError instead and let the outer
handler translate it after transaction unwind. Replace the
cutil.NewAPIErrorResponse call with cutil.NewAPIError, removing the context
parameter c and keeping only the status code, message, and data parameters.
In `@rest-api/api/pkg/api/model/instance_test.go`:
- Around line 2688-2695: The test currently only validates top-level keys in the
parsed JSON structure but does not check the nested structure of tenant_reported
to ensure the omitempty contract is enforced for the details field. After the
existing assertions that validate wantRawKeys and wantAbsentKeys at the top
level, add additional assertions that unmarshal the tenant_reported object from
the parsed map and verify that the details field is properly absent (not null)
when it should be omitted according to the omitempty semantics for nil or empty
detail cases.
In `@rest-api/api/pkg/api/model/instance.go`:
- Around line 1651-1678: The ToInstanceDeleteAttributionConfig method serializes
the config to JSON and returns it as a string, but does not validate that the
resulting serialized payload respects the DeleteAttributionIssueDetailsMaxLen
size contract. After the json.Marshal call succeeds and before the return
statement, add a length validation check to ensure the serialized string data
does not exceed DeleteAttributionIssueDetailsMaxLen. If the length exceeds the
limit, return an empty string and an appropriate error to prevent cross-service
contract violations when this attribution payload is used as Issue.Details
elsewhere.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: bc6cda04-ca4d-4afe-9d9b-12145bc8d98f
📒 Files selected for processing (4)
rest-api/api/pkg/api/handler/instance.gorest-api/api/pkg/api/handler/instance_test.gorest-api/api/pkg/api/model/instance.gorest-api/api/pkg/api/model/instance_test.go
| if len(tsc.Calls) > 0 && len(tsc.Calls[len(tsc.Calls)-1].Arguments) > 3 { | ||
| releaseReq := tsc.Calls[len(tsc.Calls)-1].Arguments[3].(*cwssaws.InstanceReleaseRequest) | ||
| require.NotNil(t, releaseReq.Issue) | ||
| require.NotNil(t, releaseReq.Issue.Details) | ||
| var issueDetails model.InstanceDeleteAttributionConfig | ||
| require.NoError(t, json.Unmarshal([]byte(releaseReq.Issue.Details), &issueDetails)) | ||
| assert.Equal(t, tt.args.reqOrg, issueDetails.InitiatedBy.Org) | ||
| assert.Equal(t, tt.args.reqUser.ID.String(), issueDetails.InitiatedBy.UserID) | ||
| assert.Equal(t, dinstance.TenantID.String(), issueDetails.InitiatedBy.TenantID) | ||
| assert.Equal(t, tt.args.reqOrg, issueDetails.InitiatedBy.TenantOrg) | ||
| require.NotNil(t, issueDetails.TenantReported) | ||
| assert.Equal(t, strings.ToUpper(tt.args.reqData.MachineHealthIssue.Category), issueDetails.TenantReported.Category) | ||
| if tt.args.reqData.MachineHealthIssue.Summary != nil { | ||
| assert.Equal(t, *tt.args.reqData.MachineHealthIssue.Summary, issueDetails.TenantReported.Summary) | ||
| } | ||
| if tt.args.reqData.MachineHealthIssue.Details != nil { | ||
| assert.Equal(t, *tt.args.reqData.MachineHealthIssue.Details, *issueDetails.TenantReported.Details) | ||
| } | ||
| if issueDetails.TenantReported.Details == nil { | ||
| assert.Nil(t, issueDetails.TenantReported.Details) | ||
| } else { | ||
| require.NotNil(t, issueDetails.TenantReported.Details) | ||
| assert.Equal(t, *issueDetails.TenantReported.Details, *tt.args.reqData.MachineHealthIssue.Details) | ||
| } | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Make workflow-attribution assertions mandatory for the machine-health test path.
The conditional guard around tsc.Calls allows this branch to be skipped silently. For cases where MachineHealthIssue is present, require call presence first (require.NotEmpty) and then assert arguments, so regressions fail deterministically.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@rest-api/api/pkg/api/handler/instance_test.go` around lines 9726 - 9750,
Replace the conditional guard that checks if tsc.Calls has elements and if
Arguments has sufficient length with mandatory require statements. Use
require.NotEmpty(t, tsc.Calls) to ensure the test fails if calls are missing
when MachineHealthIssue is present, then safely access the Arguments array
without the length check. This makes the workflow-attribution assertions
mandatory rather than allowing them to be silently skipped, ensuring regressions
are caught deterministically.
| if derr != nil { | ||
| logger.Error().Err(derr).Msg("error building delete attribution config") | ||
| return cutil.NewAPIErrorResponse(c, http.StatusInternalServerError, "Failed to delete Instance", nil) | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Use transaction-scoped error construction inside the WithTx closure.
At Line 4908, this branch returns cutil.NewAPIErrorResponse(...) from inside the transaction closure. The closure should return cutil.NewAPIError(...) and let the outer handler path translate it after transaction unwind.
💡 Proposed fix
- return cutil.NewAPIErrorResponse(c, http.StatusInternalServerError, "Failed to delete Instance", nil)
+ return cutil.NewAPIError(http.StatusInternalServerError, "Failed to delete Instance", nil)As per path instructions, "Inside transaction closures, return cutil.NewAPIError(code, msg, data) for errors; outside, return cutil.NewAPIErrorResponse(c, code, msg, data) directly."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if derr != nil { | |
| logger.Error().Err(derr).Msg("error building delete attribution config") | |
| return cutil.NewAPIErrorResponse(c, http.StatusInternalServerError, "Failed to delete Instance", nil) | |
| } | |
| if derr != nil { | |
| logger.Error().Err(derr).Msg("error building delete attribution config") | |
| return cutil.NewAPIError(http.StatusInternalServerError, "Failed to delete Instance", nil) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@rest-api/api/pkg/api/handler/instance.go` around lines 4906 - 4909, The error
handling block in the transaction closure (where derr is checked) is returning
cutil.NewAPIErrorResponse(c, http.StatusInternalServerError, "Failed to delete
Instance", nil), but inside WithTx closures you should return cutil.NewAPIError
instead and let the outer handler translate it after transaction unwind. Replace
the cutil.NewAPIErrorResponse call with cutil.NewAPIError, removing the context
parameter c and keeping only the status code, message, and data parameters.
Source: Path instructions
| var parsed map[string]json.RawMessage | ||
| require.NoError(t, json.Unmarshal([]byte(raw), &parsed)) | ||
| for _, key := range tt.wantRawKeys { | ||
| assert.Contains(t, parsed, key) | ||
| } | ||
| for _, key := range tt.wantAbsentKeys { | ||
| assert.NotContains(t, parsed, key) | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Add nested raw-key assertions for tenant_reported.details omission semantics.
The current raw JSON check only validates top-level keys. For the nil/empty details cases, the test should also assert that tenant_reported.details is absent (not null) to fully lock the omitempty contract.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@rest-api/api/pkg/api/model/instance_test.go` around lines 2688 - 2695, The
test currently only validates top-level keys in the parsed JSON structure but
does not check the nested structure of tenant_reported to ensure the omitempty
contract is enforced for the details field. After the existing assertions that
validate wantRawKeys and wantAbsentKeys at the top level, add additional
assertions that unmarshal the tenant_reported object from the parsed map and
verify that the details field is properly absent (not null) when it should be
omitted according to the omitempty semantics for nil or empty detail cases.
| // ToInstanceDeleteAttributionConfig builds the delete attribution config from the API request. | ||
| func (idr *APIInstanceDeleteRequest) ToInstanceDeleteAttributionConfig(user *cdbm.User, instance *cdbm.Instance) (string, error) { | ||
| config := InstanceDeleteAttributionConfig{ | ||
| InitiatedBy: InstanceDeleteInitiatedBy{ | ||
| Org: instance.Tenant.Org, | ||
| UserID: user.ID.String(), | ||
| TenantID: instance.Tenant.ID.String(), | ||
| TenantOrg: instance.Tenant.Org, | ||
| }, | ||
| } | ||
| if idr.MachineHealthIssue != nil { | ||
| mhi := idr.MachineHealthIssue | ||
| reported := InstanceDeleteAttributionTenantReported{ | ||
| Category: strings.ToUpper(mhi.Category), | ||
| } | ||
| if mhi.Summary != nil { | ||
| reported.Summary = *mhi.Summary | ||
| } | ||
| if mhi.Details != nil && *mhi.Details != "" { | ||
| reported.Details = mhi.Details | ||
| } | ||
| config.TenantReported = &reported | ||
| } | ||
| data, err := json.Marshal(config) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| return string(data), nil |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Enforce the Issue.Details size contract before returning the serialized attribution payload.
At Line 1674, the serialized JSON can exceed DeleteAttributionIssueDetailsMaxLen even when request fields pass validation (Summary and Details can each be 1024). That creates a cross-service contract risk when this string is forwarded as Issue.Details.
💡 Proposed fix
data, err := json.Marshal(config)
if err != nil {
return "", err
}
+if len(data) > DeleteAttributionIssueDetailsMaxLen {
+ return "", fmt.Errorf("delete attribution payload exceeds %d bytes", DeleteAttributionIssueDetailsMaxLen)
+}
return string(data), nil🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@rest-api/api/pkg/api/model/instance.go` around lines 1651 - 1678, The
ToInstanceDeleteAttributionConfig method serializes the config to JSON and
returns it as a string, but does not validate that the resulting serialized
payload respects the DeleteAttributionIssueDetailsMaxLen size contract. After
the json.Marshal call succeeds and before the return statement, add a length
validation check to ensure the serialized string data does not exceed
DeleteAttributionIssueDetailsMaxLen. If the length exceeds the limit, return an
empty string and an appropriate error to prevent cross-service contract
violations when this attribution payload is used as Issue.Details elsewhere.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
|
|
||
| // InstanceDeleteInitiatedBy records who initiated an Instance delete in the cloud API layer. | ||
| type InstanceDeleteInitiatedBy struct { | ||
| Org string `json:"org"` |
There was a problem hiding this comment.
What is the difference between org and tenant_org?
| } | ||
|
|
||
| // InstanceDeleteAttributionTenantReported captures tenant-reported issue fields included in delete attribution. | ||
| type InstanceDeleteAttributionTenantReported struct { |
There was a problem hiding this comment.
Can we just use APIMachineHealthIssue https://github.com/NVIDIA/infra-controller/blob/main/rest-api/api/pkg/api/model/machine.go#L72?
| // InstanceDeleteAttributionConfig is the canonical JSON payload persisted on delete and forwarded to Site when an issue is reported. | ||
| type InstanceDeleteAttributionConfig struct { | ||
| InitiatedBy InstanceDeleteInitiatedBy `json:"initiated_by"` | ||
| TenantReported *InstanceDeleteAttributionTenantReported `json:"tenant_reported,omitempty"` |
There was a problem hiding this comment.
Don't we already send this information when Tenant specifies it? Seems like we're duplicating the information?
| } | ||
|
|
||
| // ToInstanceDeleteAttributionConfig builds the delete attribution config from the API request. | ||
| func (idr *APIInstanceDeleteRequest) ToInstanceDeleteAttributionConfig(user *cdbm.User, instance *cdbm.Instance) (string, error) { |
There was a problem hiding this comment.
We should call this GetTenantAttributionDetails. Also let's pass user and tenant as argument, we don't seem to use Instance for anything other than accessing Tenant.
|
I’d recommend moving delete attribution out of Issue.details and into a new optional delete_attribution field on InstanceReleaseRequest. That keeps details backward-compatible as tenant-provided issue text, works for both issue and no-issue delete flows, and lets Core store attribution as a separate structured field instead of parsing escaped JSON from details. For all-delete audit coverage, the API should populate this field on every delete and also emit/persist structured audit metadata cloud-side. Something like: ` message DeleteAttribution { message InstanceReleaseRequest { |
|
Thanks @thossain-nv @sunilkumar-nvidia, while implementing this, I also felt to have separate field in proto instead using |
As detailed in the description provided here: #2107
This PR aims to address the following issues:
Type of Change
Testing