00 / Methodology

Measurement Protocol

How Valoh measures brand presence in LLM environments.

This page documents what Valoh measures, how it collects each signal, and what it deliberately does not do. It's intended to be readable by a measurement lead, an MRC-track auditor, or a CFO asking what they're paying for.

Version: 1.2 │ Last revised: 2026-05-04 │ Status: public

01 / Overview

What we measure

Five signals, per brand, per model, per day.

Valoh tracks five signals about how a brand appears in conversational AI responses: mention rate, rank position, sentiment, competitor citations, and recommendation strength. These are the same signals AEO tools surface — Valoh's contribution is independence, transparent collection, and auditable retention of the underlying transcripts.

Every reported value can be traced back to a logged conversation. Nothing in the report is inferred; everything is observed.

02 / Prompts

Prompt Construction

Frozen prompt sets, defined by category.

For each measured category, Valoh constructs a prompt set up front. A prompt set contains 40–60 representative consumer queries spanning awareness, comparison, and purchase-intent stages. Once defined, the set is locked for the measurement window.

2.1: Category-anchored, not brand-anchored.
Prompts ask about the category, not the brand directly. "What's the best running shoe for flat feet?" rather than "Tell me about Brand X." This is what produces unsolicited mentions.
2.2: Locked for the measurement window.
Clients do not see or influence prompt phrasing during a measurement window. Independence is a methodological requirement, not a courtesy.
2.3: Reviewed by category, on a published cadence.
Prompt sets are reviewed every six months for category drift. Revisions are versioned and noted in the report. Comparable measurement requires comparable prompts.

03 / Models

Model Coverage

Four models. Queried separately. Reported separately.

Models tracked, May 2026
Model	Provider	Access	Daily queries / brand
ChatGPT	OpenAI · GPT-class flagship	API, default settings	~50
Gemini	Google · flagship	API, default settings	~50
Claude	Anthropic · flagship	API, default settings	~50
Perplexity	Perplexity · default model	API, default settings	~50

Each model is queried independently. We do not average results across models; presence in ChatGPT and presence in Perplexity are different facts and are reported as such.

04 / Sampling

Sampling & Frequency

Daily refresh. 200+ queries per brand per day. 90-day retention.

200+

Queries logged per brand per day, distributed across four models

24h

Refresh cadence; reports update once daily

90 days

Raw transcripts retained at full fidelity for client audit

12 mo

Aggregated signal data retained for trend analysis

Sampling is calibrated to produce a 95% confidence interval on mention rate at ±3% per model per category. Sample size scales upward where category breadth or low base rates demand it.

05 / Signals

Signal Definitions

Each signal has a stable definition, a known unit, and a known method.

Formal signal definitions
Signal	Definition	Unit
Mention Rate	The percentage of category-anchored prompts in which the brand is named anywhere in the response.	% / day
Rank Position	When mentioned in an enumerated list, the average ordinal position of the brand within that list.	avg rank
Sentiment	Polarity of the language used to describe the brand within its surrounding sentences, scored on a continuous −1 to +1 scale.	−1 to +1
Competitor Citations	The set of named competitor brands appearing in the same response as the measured brand, with frequency.	named set
Recommendation Strength	Tier 1 (actively recommended) → Tier 4 (mentioned but not recommended), classified by a defined rubric over the recommendation phrasing.	tier 1–4

06 / Validation

Validation

Sentiment and recommendation tier are double-coded.

Mention rate, rank position, and competitor citations are extracted programmatically and have inter-rater agreement of effectively 1.0 by construction. Sentiment and recommendation strength are subjective signals; both are double-coded — once by an automated classifier and once by a human reviewer on a stratified sample. Disagreements are reviewed and resolved.

6.1: Sentiment. 10% stratified sample of responses is human-reviewed daily. Inter-rater agreement is reported per category per quarter.
6.2: Recommendation strength. 10% stratified sample is human-reviewed against a published rubric. Rubric updates are versioned.
6.3: Audit access. Clients may request the raw transcripts behind any reported signal value, at any time, for the 90-day retention window.

07 / Boundaries

What Valoh deliberately does not do

A measurement firm cannot also be an optimization firm.
This is the line we hold.

We do not tell you what to write, what to publish, how to structure your content, or what to ask the model. We tell you what the model said.

—: No content recommendations. Valoh does not generate, suggest, or critique your content. We don't offer "remediation roadmaps" or copy edits.
—: No prompt optimization for clients. We don't help you write prompts that elicit better responses. We measure responses to prompts that real users would write.
—: No client influence on prompt sets. Clients do not see or revise prompts inside their measurement window. Prompt sets are reviewed publicly, not privately.
—: No managed-service optimization. We don't sell remediation engagements alongside measurement. You can use any optimization vendor you like; Valoh measures the result.
—: No proprietary aggregate scores. Valoh does not publish a "brand visibility score" of its own design. We report the underlying signals; the buyer decides what matters.

08 / Limitations

Known Limitations

Things this measurement does not, and cannot, capture.

8.1: Personalized model behavior. Valoh queries via API at default settings. User-personalized behavior (memory, custom instructions, prior conversations) is not captured.
8.2: Model drift. Models update without notice. We log the model identifier returned with each response so version-induced shifts can be attributed correctly, but we cannot prevent them.
8.3: Coverage gaps. Models we don't query are not measured. Coverage expands as the category matures; new additions are versioned and noted.
8.4: Causal claims. Valoh measures presence, not the cause of presence. We do not attribute changes in mention rate to specific brand actions.

09 / Access

Early Access

If this is the kind of measurement you've been looking for, leave your email.

Or run a free 7-day pulse on your brand