Skip to main content
The evaluation framework is a tool that composes the watsonx Orchestrate Agent Development Kit (ADK), which allows you to test, evaluate and analyze native agents that you have created. To test your agents, you have to set up a file with the expected interaction responses, and run the agent evaluation to check if your agent matches the expectations.
External agents: External agents require TSV validation files but do not use the ADK validation framework. See the TSV File Requirements for External Agents section below and External Agent Onboarding for details.

Before you begin

  • You must have a native agent built with the Agent Development Kit .
  • If you need to run the framework with model-proxy and a WO_INSTANCE that points to a non-Dallas region, you can supply the model override flag:
    export MODEL_OVERRIDE="meta-llama/llama-3-2-90b-vision-instruct"
    
  • Install the watsonx Orchestrate Agent Development Kit. For more information, see Installing the ADK .
  • Install the watsonx Orchestrate Developer Edition. For more information, see Installing the Developer Edition .
  • For more information about the evaluation framework, see the Evaluation framework overview .

Validating your native agent

Native agents refer to agents that were created with the watsonx Orchestrate Agent Development Kit or inside the watsonx Orchestrate platform. The validate-native command validates the native agent and registered tools, collaborator agents, and knowledge bases against a set of inputs.

Running the Validation

Prepare a TSV file with three columns:
  • The first column contains user stories.
  • The second column is the expected summary or output.
  • The third column is the name of the native agent that you want to validate.
For example:
example.tsv
My username is nwaters. I want to find out my timeoff schedule from: 2025-01-01 to: 2025-03-03.	Your timeoff schedule for 20250101 to 20250303 is: 20250105	hr_agent
The provided user stories and expected output are used to generate the json formatted test case used to evaluate the agent. The generated test cases are saved at the path: <output-folder>/native_agent_evaluations/generated_test_data Running the command
orchestrate evaluations validate-native -t <path to data file tsv> -o <output folder>

Preparing for submission

To prepare your native agent for submission, you must include all the result files from the validation and evaluation stages. Compress the results into a zip file:
company-name-onboarding-validation.zip/
└── native_agent_evaluations/
    ├── generated_test_data/
        ├── native_agent_evaluation_test_0.json
        ├── ...
    ├── knowledge_base_summary_metrics/
    ├── messages/
    └── summary_metrics.csv
    └── ... # other relevant files for the evaluation
└── evaluations/
    ├── ... # other relevant files for the evaluation
    ├── sample_agent.yaml
    ├── knowledge_base_summary_metrics.json
    └── summary_metrics.csv

Next steps

After you prepare the evaluation files, you must package your agent. The validation results should be placed under the evaluations/ folder in your package structure:
.
├── agents/
│   └── my_agent.yaml
├── connections/
│   └── my_connections.yaml
├── offerings/
│   └── my_offering.yaml
├── tools/
│   └── sample_tool/
│       ├── tool.py
│       └── requirements.txt
└── evaluations/
    └── company-name-onboarding-validation.zip
For complete packaging and submission instructions, see Native Agent Onboarding.

TSV File Requirements for External Agents

External agents also require TSV validation files for testing, though they don’t use the ADK validation framework.

TSV File Format

Create a TSV file with three columns:
  • Column 1: User prompt or query
  • Column 2: Expected response or outcome
  • Column 3: Agent identifier (your external agent name)

Example

test.tsv
What is the weather in Paris?	The current weather in Paris is 18°C and sunny.	weather_agent
Find me a hotel in Tokyo	Here are available hotels in Tokyo: [list]	travel_agent
Book a flight to New York	I found 5 flights to New York. Which date would you prefer?	travel_agent

Submission Process

  1. Create your TSV file with comprehensive test cases covering your agent’s capabilities
  2. Test thoroughly with your external agent endpoint to ensure accuracy
  3. Submit to IBM by emailing your TSV file to: IBMAgentConnect@ibm.com
    • Subject line: “TSV Validation - [Your Agent Name]”
    • Include the TSV file as an attachment
  4. Include in submission: Upload the TSV file when submitting through the IBM Concierge app

Best Practices

  • Include diverse test cases covering different scenarios
  • Ensure expected responses match your agent’s actual behavior
  • Test edge cases and error handling
  • Keep responses concise but representative of actual output
  • Use consistent agent identifiers across all test cases
For complete external agent onboarding instructions, see External Agent Onboarding.