feat(go/adbc/driver/flightsql): add logging for Flight SQL#4322
feat(go/adbc/driver/flightsql): add logging for Flight SQL#4322davidhcoe wants to merge 5 commits into
Conversation
davidhcoe
commented
May 18, 2026
- Adds Open Telemetry logging for Flight SQL
- Adds support for passing values as parameters to support TOML integration
| #if defined(_WIN32) | ||
| #include <windows.h> // Must come first | ||
|
|
||
| // These version macros gate which Win32 APIs the SDK headers declare. They MUST |
There was a problem hiding this comment.
I had to make this change to get the linter to work correctly.
| // the host process's environment. When neither this option nor the | ||
| // environment variable is set, the driver falls back to the | ||
| // process-global OpenTelemetry tracer provider. | ||
| OptionKeyTelemetryTracesExporter = "adbc.telemetry.traces_exporter" |
There was a problem hiding this comment.
This is so we can pass values using the driver manager / TOML values and not need to use environment variables.
| headers := md.Get("authorization") | ||
| if len(headers) > 0 { | ||
| b.mutex.Lock() | ||
| defer b.mutex.Unlock() |
There was a problem hiding this comment.
why the switch to an explicit unlock vs defer?
There was a problem hiding this comment.
Good call — switched back to defer Unlock(). Both methods now go through a small rotateAuth(headers ...) (previous []string, logger *slog.Logger) helper that holds the mutex with defer b.mutex.Unlock(), mutates b.hdrs, and returns a snapshot of the previous auth value plus the current logger. The structured log emission happens in HeadersReceived / SetHeader outside the critical section so we don't hold the lock across the slog call.
| // been open at the time of failure. Without these records the operator | ||
| // otherwise has only the bare gRPC EOF to work with, which carries no | ||
| // progress or location information. | ||
| func newRecordReader(ctx context.Context, alloc memory.Allocator, cl *flightsql.Client, info *flight.FlightInfo, clCache gcache.Cache, bufferSize int, logger *slog.Logger, opts ...grpc.CallOption) (rdr array.RecordReader, err error) { |
There was a problem hiding this comment.
this argument list is getting rather long, could we condense some of these arguments into a struct as a config thing?
There was a problem hiding this comment.
Done — extracted a recordReaderConfig struct (alloc, cl, info, clientCache, bufferSize, logger) and changed the signature to newRecordReader(ctx context.Context, cfg recordReaderConfig, opts ...grpc.CallOption). Logger is still optional (nil → no-op via safeLogger). Updated all 13 call sites.
| // the lengths of every column buffer in the batch. The total is useful for | ||
| // answering questions such as "did the stream fail after receiving 10 KB or | ||
| // 10 MB?" when triaging a mid-stream EOF. | ||
| func estimateBatchBytes(rec arrow.RecordBatch) int64 { |
There was a problem hiding this comment.
There was a problem hiding this comment.
Switched to util.TotalRecordSize. Removed the local estimateBatchBytes helper.
| // doGetWithLogger is the logging-aware DoGet implementation used by every | ||
| // caller in the driver. Earlier revisions exposed a plain `doGet` wrapper | ||
| // that supplied a nil logger; it had no remaining callers and was removed | ||
| // to satisfy the unused-function linter — every call site now passes its | ||
| // own *slog.Logger (or relies on safeLogger to materialize a no-op one). | ||
| // | ||
| // When a logger is provided every endpoint resolution attempt is logged | ||
| // individually so that failures hopping between Flight locations can be | ||
| // traced. The function also accumulates per-location attempt errors and, if | ||
| // every attempt fails, joins them into a single returned error so that the | ||
| // caller can see *all* the locations that were tried (and how each one | ||
| // failed) rather than just the last one. This is critical for diagnosing | ||
| // "[FlightSQL] error reading from server: EOF" reports where the empty | ||
| // "endpoint N: []" indicates the Location list was actually empty: with the | ||
| // enhanced error the operator can see whether multiple alternate locations | ||
| // were attempted and which one ultimately served (or failed) the request. |
There was a problem hiding this comment.
We can trim this? e.g. I don't think we need codebase archaeology here.
| // tracesExporter records the value of | ||
| // adbc.OptionKeyTelemetryTracesExporter that was supplied (if any) | ||
| // at database construction time. The tracer itself is already | ||
| // initialized by driverbase by the time SetOptions runs, so this | ||
| // field is retained purely so GetOption can echo back the | ||
| // configured value (callers expect "get returns what was set") and | ||
| // so SetOption can return a precise diagnostic when a caller tries | ||
| // to change the exporter after the database has been opened. | ||
| tracesExporter string | ||
| // tracesFolderPath records the value of | ||
| // adbc.OptionKeyTelemetryTracesFolderPath that was supplied at | ||
| // construction time. The on-disk RotatingFileWriter behind the | ||
| // "adbcfile" exporter is created during NewDatabase*, so this field | ||
| // is retained for symmetry with tracesExporter: GetOption can echo | ||
| // the configured value back, and SetOption can surface a precise | ||
| // diagnostic when a caller tries to retarget the folder after the | ||
| // writer is already running. | ||
| tracesFolderPath string |
There was a problem hiding this comment.
Given the stated purpose, why can't driverbase handle the get/set?
| var cnxnSupport support | ||
|
|
||
| info, err := cl.GetSqlInfo(ctx, []flightsql.SqlInfo{flightsql.SqlInfoFlightSqlServerTransaction}, d.timeout) | ||
| // Capture the resolved gRPC peer address from the first call so the |
There was a problem hiding this comment.
Is this actually stable? I thought there were circumstances under which gRPC would reconnect, or there may be round-robin DNS load balancing in play, etc. (Albeit I doubt anyone wants to try to deploy Flight SQL with such a setup...)
| // environment variable. See NewDatabaseImplBaseWithOptions for a | ||
| // constructor that also accepts a custom on-disk folder for the | ||
| // "adbcfile" exporter. | ||
| func NewDatabaseImplBaseWithExporter( |
There was a problem hiding this comment.
Do we need this, or can everyone just use the fully general version below?
| } | ||
|
|
||
| func (base *DatabaseImplBase) InitTracing(ctx context.Context, driverName string, driverVersion string) (err error) { | ||
| func (base *DatabaseImplBase) InitTracing(ctx context.Context, driverName string, driverVersion string) error { |
There was a problem hiding this comment.
Ditto here; this isn't a public API per se, so I would be ok just having InitTracingWithOptions or even amending the signature instead of adding another method
| // recordReaderConfig bundles the dependencies that newRecordReader | ||
| // needs to spin up its per-endpoint goroutines. Grouping them into a | ||
| // single value keeps the call sites compact and lets new fields be | ||
| // added without rippling another positional argument through every | ||
| // caller. The fields mirror the corresponding members on | ||
| // connectionImpl/statement so callers can populate the struct by | ||
| // straight field copies. |
There was a problem hiding this comment.
Can we clean up the various comments? I don't think we need a homework-style justification of every single choice made, e.g. here a config struct is perfectly natural and doesn't need more lines of justification than code.
| // maxLoggedTicketBytes limits how many bytes of a Flight ticket are emitted in | ||
| // log records. Tickets can be opaque server-defined blobs of arbitrary size | ||
| // (sometimes embedding large query plans), so we cap how much we include | ||
| // to keep log records reasonably sized while still being useful for | ||
| // correlation against server-side logs. | ||
| const maxLoggedTicketBytes = 32 |
There was a problem hiding this comment.
Tickets may possibly include sensitive info and so I think they shouldn't be logged at all.
| if isSensitiveHeader(k) { | ||
| continue | ||
| } |
There was a problem hiding this comment.
...is this check necessary? wellKnownCorrelationHeaders is an allowlist, no?
| for k, vals := range md { | ||
| lk := strings.ToLower(k) | ||
| if seen[lk] { | ||
| continue | ||
| } | ||
| if isSensitiveHeader(lk) { | ||
| continue | ||
| } | ||
| if strings.HasSuffix(lk, "-request-id") || | ||
| strings.HasSuffix(lk, "-trace-id") || | ||
| strings.HasSuffix(lk, "-query-id") || | ||
| strings.HasSuffix(lk, "-session-id") || | ||
| strings.HasSuffix(lk, "-activity-id") { | ||
| out = append(out, slog.Any(prefix+lk, vals)) | ||
| } | ||
| } |
There was a problem hiding this comment.
I feel like we simply shouldn't log headers that aren't allowlisted