The evaluation framework is a tool that composes the watsonx Orchestrate Agent Development Kit (ADK), which allows you to test, evaluate and analyze the agents that you have created. To test your agents, you have to set up a file with the expected interaction responses, and run the agent evaluation to check if your agent matches the expectations.
The validation step checks whether your external agent works without a native agent. To learn more about native agents and external agents, see Creating Agents.
After you build your external agent, follow these steps to validate:
A user story describes the intention of the user with context information. You must provide all the relevant user information for the agent to process. For example:
Prepare a .tsv
file with two columns. The first column contains user stories. The second column is the expected summary or output. For an example file, see the external agent validation folder example.
Do not include a column header in the .tsv
file.
The validate-external
command validates your external agent to the chat completions schema for streamed events and stores validation results including the streamed events from the external agent for later triaging and debugging.
You must use an external agent specification, such as the following:
And then you can run the validation command:
You must provide valid credentials to connect to your external agent.
The validation results are saved to a validation_results
subfolder under the path provided for the --output
flag.
.env
file. To learn more about how to configure your .env
file, see the Setup the environment.The evaluation framework creates two files:
sample_block_validation_results.json
validation_results.json
The sample_block_validation_results.json
prepends default messages to the user story. These messages act as context to the agent. Considering that there are n
messages, the goal is to validate if the external agent can properly handle an array of messages where the a number of n - 1
messages is the context, and the nth
message is the message the external agent should respond to provided the context. The following default messages are prepended:
The validation files contain the following fields:
success
: Boolean value that indicates if the events streamed back adhered to the expected schema.logged-events
: Streamed events from the external agent.messages
: The messages that were sent to the external agent.After validation, you can evaluate the agent by using the provided input. This evaluation checks whether the external agent works when added as a collaborator agent to a native agent.
Import the external agent to your tenant. For more information, see External agents. For example:
Add the external agent as a collaborator to your native agent. See the documentation for an in depth guide. The following is an example of the native agent specification:
Import the native agent:
Run the evaluation:
After the results are generated, you can review them and keep iterating until you get satisfactory results from the evaluation.
To submit your results, you must include the all result files from the validation and evaluation stages. You must also provide a valid credential that can be used to test your agent.
Compress the results into a zip file to send them, as follows: