Summary
DSE parameter encoding should be inferred from the candidate values at parse time in the framework (EnvParamSpec / EnvParams), instead of being decided ad hoc by each optimizer agent. Today the encoding.type field must be spelled out in TOML, and the only real inference that exists (log-scale detection) lives inside a single BO agent rather than in the framework.
Motivation / problem
src/cloudai/configurator/env_params.py defines the encoding stack (Encoding protocol, CategoricalEncoding, LogEncoding, AnyEncoding discriminated union). Selecting a non-default encoding currently requires an explicit encoding = { type = "log" } in the config. Config authors have no way to know which type to fill, and it is easy to forget/mismatch.
- The one place that does infer parameter kind from values is an optimizer agent (BO's
_detect_log_scale, which flips Ax's log_scale flag). That is framework logic that leaked into an agent:
- It is agent-specific and not reused by GA / MAB / RL agents, which each re-parse the raw config in their own
configure().
- The taxonomy is generic ("given candidate values, pick an encoding") with nothing optimizer-specific about it, so every agent re-deriving it is duplication and a source of drift.
Encoding is a property of the parameter, not of the optimizer. It should be decided once, at the layer that already owns the parameter model.
Proposal
Infer the Encoding from candidate values when constructing EnvParamSpec / EnvParam, and make encoding.type an optional override:
Inference rules (mirroring the existing BO heuristic, generalized):
- all candidates are strings ->
CategoricalEncoding
- numeric, length >= 3, all strictly positive, constant ratio within tolerance (geometric series) ->
LogEncoding
- otherwise numeric -> ordinal/linear (categorical-by-index today; a dedicated ordinal/scalar encoding can follow)
Precedence: an explicit encoding in TOML always wins; inference only fills the unspecified case.
Edge cases
drop_rate = [0.0, 0.001] (real PRT case) cannot be inferred as log: it has only 2 points and contains 0.0. It will infer to ordinal/linear. Authors who want log for such a series must either use a genuine >=3-point positive geometric series or set the explicit override. This is expected and should be documented.
- Zero / negative values disqualify log inference.
- Perfectly uniform diffs (arithmetic) -> not log.
Acceptance criteria
- Inference happens in
env_params.py at parse time (EnvParamSpec / EnvParams.from_test), not in any agent.
encoding.type is optional; an explicit value overrides inference.
- Unit tests cover each branch (strings, geometric->log, arithmetic->linear) plus the ambiguous/zero/2-point cases and the explicit-override precedence.
- Follow-up (separate, downstream): optimizer agents drop their ad-hoc inference and consume the framework-decided
Encoding.
Summary
DSE parameter encoding should be inferred from the candidate values at parse time in the framework (
EnvParamSpec/EnvParams), instead of being decided ad hoc by each optimizer agent. Today theencoding.typefield must be spelled out in TOML, and the only real inference that exists (log-scale detection) lives inside a single BO agent rather than in the framework.Motivation / problem
src/cloudai/configurator/env_params.pydefines the encoding stack (Encodingprotocol,CategoricalEncoding,LogEncoding,AnyEncodingdiscriminated union). Selecting a non-default encoding currently requires an explicitencoding = { type = "log" }in the config. Config authors have no way to know whichtypeto fill, and it is easy to forget/mismatch._detect_log_scale, which flips Ax'slog_scaleflag). That is framework logic that leaked into an agent:configure().Encoding is a property of the parameter, not of the optimizer. It should be decided once, at the layer that already owns the parameter model.
Proposal
Infer the
Encodingfrom candidate values when constructingEnvParamSpec/EnvParam, and makeencoding.typean optional override:Inference rules (mirroring the existing BO heuristic, generalized):
CategoricalEncodingLogEncodingPrecedence: an explicit
encodingin TOML always wins; inference only fills the unspecified case.Edge cases
drop_rate = [0.0, 0.001](real PRT case) cannot be inferred as log: it has only 2 points and contains0.0. It will infer to ordinal/linear. Authors who want log for such a series must either use a genuine >=3-point positive geometric series or set the explicit override. This is expected and should be documented.Acceptance criteria
env_params.pyat parse time (EnvParamSpec/EnvParams.from_test), not in any agent.encoding.typeis optional; an explicit value overrides inference.Encoding.