Arize Phoenix / OpenInference¶
Arize Phoenix and the broader OpenInference tracing standard are whatifd’s second supported trace source, shipped in v0.2. The adapter is tracer-neutral by construction: it consumes OpenInference-shaped span dictionaries from a caller-supplied provider, not a pinned Phoenix client. Anything that emits OpenInference spans — Phoenix proper, a custom OTLP collector, an offline JSONL dump — works with a ~5-line wiring callable.
What whatifd reads from Phoenix / OpenInference¶
For each selected trace:
input.value— the agent’s user-facing input (wrapped asSensitive[str]at the boundary per cardinal #5).output.value— the agent’s output (the original output the verdict is diffed against; also wrapped).context.trace_id+parent_id— used to group spans into traces and identify the root span.openinference.span.kind— used to identify LLM vs tool vs retrieval spans for theToolCache.All other span attributes — pass through to
RawTrace.metadataunwrapped (cardinal #5: only intentional user content isSensitive).
whatifd is read-only on Phoenix. It never writes spans back.
Install¶
uv pip install whatifd whatifd-langfuse whatifd-inspect-ai whatifd-phoenix
whatifd-phoenix itself is tracer-neutral and has no required SDK pin. If you want to read from a live Phoenix instance via arize-phoenix-client, install the optional extra:
uv pip install "whatifd-phoenix[live]"
The core whatifd package does not pull Phoenix — you can use Langfuse, your own adapter, or the synthetic whatifd.adapters.stub.StubTraceSource without ever installing this package.
Wiring a spans_provider¶
The adapter accepts a spans_provider: Callable[[], Iterable[dict]] that yields OpenInference-shaped span dictionaries. The shape decouples whatifd from any single Phoenix client SDK or transport layer.
From arize-phoenix-client (live Phoenix instance)¶
from arize.phoenix.client import Client
from whatifd_phoenix import PhoenixTraceSource
def spans_provider():
client = Client(endpoint="https://your-phoenix-host")
# Iterate however your Phoenix deployment exposes spans —
# the API surface varies by version. The adapter cares only
# that each yielded item is a dict with OpenInference attrs.
for span in client.get_spans(project="my-project"):
yield span.to_dict()
source = PhoenixTraceSource(spans_provider=spans_provider)
From a JSONL dump (offline / CI fixture)¶
import json
from pathlib import Path
from whatifd_phoenix import PhoenixTraceSource
def spans_provider():
with Path("spans.jsonl").open() as f:
for line in f:
yield json.loads(line)
source = PhoenixTraceSource(spans_provider=spans_provider)
From an OpenTelemetry collector¶
If your spans land in any OpenInference-emitting OTLP destination, the same pattern applies — implement a spans_provider that pulls them out as dicts. The adapter never assumes Phoenix-specific transport.
OpenInference attribute mapping¶
The adapter reads the standard OpenInference span attribute conventions:
OpenInference attribute |
|
|
|---|---|---|
|
Group spans into traces |
no |
|
Identify root span per trace |
no |
|
Classify span (LLM / tool / retriever / agent) |
no |
|
Trace input (user message) |
yes |
|
Trace output (agent response) |
yes |
any other attribute |
Passed through to |
no |
The Sensitive[str] wrapping is enforced at the adapter boundary; downstream code that needs the raw value must call .unwrap(reason=...) and the unwrap is audit-logged.
Selectors¶
The Phoenix adapter supports the same selector grammar as Langfuse for cohort filtering — see the Langfuse selector grammar. Selectors that depend on tracer-specific concepts (e.g., Langfuse-style score-based filters) are interpreted by the adapter against whatever attribute Phoenix exposes for the equivalent signal.
If a selector references a concept Phoenix doesn’t have a direct equivalent for, the adapter declares it unsupported at config-load time (cardinal #1: structural failure, not silent skip).
Tool-call cache¶
OpenInference tool spans surface as cache entries under whatifd’s default use-original policy. The cache key components are derived from the span’s openinference.span.kind == "tool" plus the span’s input.value hash. Tool-output retrieval reads output.value from the same span.
If a trace has incomplete tool spans (output missing, mismatched parent/child structure), whatifd records the trace as a structured replay failure (cardinal #1) in the report’s Replay validity section.
Scope of v0.2 support¶
Feature |
v0.2 |
Notes |
|---|---|---|
Read OpenInference spans from any provider |
✅ |
The |
Wrap user content as |
✅ |
Per cardinal #5. |
Conformance with |
✅ |
14 conformance tests (5 inherited from the harness + 9 adapter-specific). |
Live-Phoenix recorded-cassette smoke |
❌ (v0.3) |
Parity with |
Phoenix-specific selector grammar |
partial |
Core selectors map cleanly; Phoenix-only signals (e.g., evaluation runs) are not yet first-class. |
Multi-project Phoenix tenancy |
✅ |
The |
|
declared empty |
|
Limitations¶
The adapter doesn’t ship a Phoenix client wrapper. You supply the
spans_provider. This is a deliberate design choice: pinning to a specific Phoenix SDK version would bindwhatifd-phoenixto one transport, breaking neutrality. The trade-off is ~5 lines of wiring code at integration time.OpenInference attribute conformance. If your spans are emitted by a non-standard tracer that almost follows OpenInference, attribute mapping may surface gaps. The adapter validates the attributes it reads at trace-build time; gaps appear as structured
RawTracebuild errors.Live-Phoenix verification is not yet structurally pinned. v0.3 will land a
pytest-recording-based cassette suite mirroringwhatifd-langfuse’s discipline. Until then, validate yourspans_provideragainst a fixture set before relying on it in CI.