Skip to main content
The evaluation framework is a tool that composes the watsonx Orchestrate Agent Development Kit (ADK), which allows you to test, evaluate and analyze native agents that you have created. To test your agents, you have to set up a file with the expected interaction responses, and run the agent evaluation to check if your agent matches the expectations.
External agents: External agents do not require validation through the ADK. Validation for external agents is handled during the submission process through the IBM Concierge app. See External Agent Onboarding for details.

Before you begin

  • You must have a native agent built with the Agent Development Kit .
  • If you need to run the framework with model-proxy and a WO_INSTANCE that points to a non-Dallas region, you can supply the model override flag:
    export MODEL_OVERRIDE="meta-llama/llama-3-2-90b-vision-instruct"
    
  • Install the watsonx Orchestrate Agent Development Kit. For more information, see Installing the ADK .
  • Install the watsonx Orchestrate Developer Edition. For more information, see Installing the Developer Edition .
  • For more information about the evaluation framework, see the Evaluation framework overview .

Validating your native agent

Native agents refer to agents that were created with the watsonx Orchestrate Agent Development Kit or inside the watsonx Orchestrate platform. The validate-native command validates the native agent and registered tools, collaborator agents, and knowledge bases against a set of inputs.

Running the Validation

Prepare a TSV file with three columns:
  • The first column contains user stories.
  • The second column is the expected summary or output.
  • The third column is the name of the native agent that you want to validate.
For example:
example.tsv
My username is nwaters. I want to find out my timeoff schedule from: 2025-01-01 to: 2025-03-03.	Your timeoff schedule for 20250101 to 20250303 is: 20250105	hr_agent
The provided user stories and expected output are used to generate the json formatted test case used to evaluate the agent. The generated test cases are saved at the path: <output-folder>/native_agent_evaluations/generated_test_data Running the command
orchestrate evaluations validate-native -t <path to data file tsv> -o <output folder>

Preparing for submission

To prepare your native agent for submission, you must include all the result files from the validation and evaluation stages. Compress the results into a zip file:
company-name-onboarding-validation.zip/
└── native_agent_evaluations/
    ├── generated_test_data/
        ├── native_agent_evaluation_test_0.json
        ├── ...
    ├── knowledge_base_summary_metrics/
    ├── messages/
    └── summary_metrics.csv
    └── ... # other relevant files for the evaluation
└── evaluations/
    ├── ... # other relevant files for the evaluation
    ├── sample_agent.yaml
    ├── knowledge_base_summary_metrics.json
    └── summary_metrics.csv

Next steps

After you prepare the evaluation files, you must package your agent. The validation results should be placed under the evaluations/ folder in your package structure:
.
├── agents/
│   └── my_agent.yaml
├── connections/
│   └── my_connections.yaml
├── offerings/
│   └── my_offering.yaml
├── tools/
│   └── sample_tool/
│       ├── tool.py
│       └── requirements.txt
└── evaluations/
    └── company-name-onboarding-validation.zip
For complete packaging and submission instructions, see Native Agent Onboarding.