Skip to content

REST client init aborts on non-JSON Content-Type and silently falls back to ZAPI for the rest of the controller lifetime #1140

@ashokmuthyalapati

Description

@ashokmuthyalapati

Describe the bug

In v25.10.0 (and current master), Trident's go-swagger-generated ONTAP REST client aborts during its very first call ("initial call") if ONTAP returns a Content-Type the client does not have a JSON consumer for, with the canonical go-swagger error:

Error creating ONTAP REST API client for initial call. Falling back to ZAPI.
error="&{<nil>} (*models.ErrorResponse) is not supported by the TextConsumer,
       can be resolved by supporting TextUnmarshaler interface"

What is happening, end to end:

  1. The initial probe is dispatched through github.com/go-openapi/runtime's ClientOperation.Do.
  2. When the response Content-Type is text/* (or otherwise not registered for the operation), the runtime selects its default TextConsumer.
  3. TextConsumer.Consume(reader, target) only accepts *string, *[]byte, or encoding.TextUnmarshaler. The generated models.ErrorResponse is none of those, so the consumer returns the error string above verbatim.
  4. The wrapper around the initial REST call treats this consumer error as fatal, logs Falling back to ZAPI., and switches the backend to the legacy ZAPI client for the rest of the controller process — even though the underlying HTTP call may have completed normally and even though a retry would likely succeed.

The bug is therefore not "a noisy log line"; it is a one-shot, irreversible decision made on the basis of a body-decoding failure, and it is reachable in real environments any time ONTAP or an upstream proxy/LB returns a non-JSON body on /api/cluster (HTML auth challenge, plain-text proxy error, mid-upgrade response, etc.).

Environment

  • Trident version: v25.10.0
  • Kubernetes orchestrator: OpenShift
  • OS: RHEL CoreOS / RHEL 9.x worker nodes
  • NetApp backend type: ONTAP 9.15.1P7

To Reproduce

  1. Install Trident v25.10.0 with --https_rest against an ONTAP 9.x cluster whose management LIF responds to the very first REST call with a Content-Type other than application/hal+json / application/json. Common triggers:

    • A text/html body when the LIF fronts an auth challenge / session-establishment page (e.g., the Trident user is missing application: http access on its role, or an HTTP-only/proxy intercept sits in front of the cluster).
    • A text/plain error body produced by some load-balancers/proxies when an upstream is briefly unhealthy.
  2. Watch the controller log right after pod start:

    oc logs deploy/trident-controller -c trident-main -n trident -f \
      | grep -E 'Error creating ONTAP REST API client|TextConsumer|Falling back to ZAPI'
  3. Within the first few seconds the verbatim line shown in Describe the bug appears, after which every subsequent ONTAP API call from this controller is sent to .../servlets/netapp.servlets.admin.XMLrequest_filer instead of /api/....

The error is also reproducible without a Trident pod, using the generated REST client against a small mock server that returns any non-JSON Content-Type:

package main

import (
    "fmt"
    "net/http"
    "net/http/httptest"

    httptransport "github.com/go-openapi/runtime/client"
    rtclient "github.com/netapp/trident/storage_drivers/ontap/api/rest/client"
    "github.com/netapp/trident/storage_drivers/ontap/api/rest/client/cluster"
)

func main() {
    srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        // Any of text/html, text/plain, application/octet-stream, application/xml
        // reproduces the bug.
        w.Header().Set("Content-Type", "text/html; charset=utf-8")
        w.WriteHeader(http.StatusInternalServerError)
        _, _ = w.Write([]byte("<html><body>boom</body></html>"))
    }))
    defer srv.Close()

    rt := httptransport.New(srv.Listener.Addr().String(), "/api", []string{"http"})
    c := rtclient.New(rt, nil)

    _, err := c.Cluster.ClusterGet(cluster.NewClusterGetParams(), nil)
    fmt.Println("err:", err)
    // err: &{<nil>} (*models.ErrorResponse) is not supported by the TextConsumer,
    //      can be resolved by supporting TextUnmarshaler interface
}

The text/htmlTextConsumer path is taken whenever the response Content-Type does not match one of the operation's declared producers; models.ErrorResponse is the schema go-swagger picks for the default error response, and that struct lacks TextUnmarshaler, so the consumer rejects the body and c.Cluster.ClusterGet returns the error string above. This is exactly what the controller log shows.

Expected behavior

Two independent, additive fixes — either one alone is sufficient; both together is best.

  1. The REST client must not abort initialization on a body it cannot parse. Whether the initial probe's body decodes into *models.ErrorResponse is independent of whether the call itself succeeded. The wrapper that triggers the Falling back to ZAPI. branch should:

    • Use the HTTP status code as the source of truth: a 2xx response with an unparseable body should still mark REST as available (log a warning, not an error).
    • On non-2xx, fall back to a status-code-only error (e.g. fmt.Errorf("REST init returned %d: %s", resp.StatusCode, http.StatusText(resp.StatusCode))) instead of returning the raw consumer error.
  2. Make models.ErrorResponse implement encoding.TextUnmarshaler (and ideally encoding.BinaryUnmarshaler) so the TextConsumer can succeed on text/* bodies even when ONTAP or a fronting LB returns an unexpected content type. A minimal implementation:

    func (e *ErrorResponse) UnmarshalText(b []byte) error {
        if e.Error == nil {
            e.Error = &Error{}
        }
        msg := strings.TrimSpace(string(b))
        e.Error.Message = &msg
        return nil
    }

This turns an opaque text payload into a populated ErrorResponse.Message and lets the REST client surface a clean, structured error to its caller instead of aborting REST initialization entirely.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions