OpenEnv Flow Debugger

A real-world agentic debugging environment for Power Automate

This project is a small, easy-to-use debugging tool built with OpenEnv. It's inspired by those tricky real-world problems we hit in tools like Power Automate.

Our environment focuses on a super common issue: those annoying '400 BadRequest' errors that pop up when a condition in your automation flow has a syntax mistake.

The main idea here isn't to build a perfect smart agent right away. Instead, we want to create a clear, realistic, and expandable way to test and improve how agents fix bugs.

What You Need to Do

Imagine you have a Power Automate Flow that just failed.

It failed because of an "HTTP 400 BadRequest" error. This error happened in a "Condition" step. And the condition expression has a tiny syntax error.

Your job as the agent is to fix that broken condition expression so the flow can run perfectly.

Each time you play (each "episode"), it's like facing a real-life debugging puzzle that automation engineers deal with all the time.

What You See (Observation Space)

At each step, you'll get some info in a JSON-like format. It includes:

case_id: A unique ID for this specific problem.
run_status: Tells you if the flow is still 'Failed' or 'Succeeded'.
failed_step: Which step caused the problem.
error: Details about the error, like the code and a message.
steps: A list of all the steps in the flow, showing their inputs and outputs.
attempts_left: How many more tries you have to fix it.

Example observation (kept simple):

case_id: CASE_001
run_status: Failed
failed_step: Condition_Check
error: code=400, message=BadRequest, details=InvalidTemplate: The expression is invalid
steps:
- Compose_Ext (Succeeded, outputs: xlsx)
- Condition_Check (Failed, expression: @equals(outputs('Compose_Ext'),'xlsx')
attempts_left: 3

What You Can Do (Action Space - Just Starting!)

Right now, in this simple version, you can only do one type of action.

You can submit a patch_step action. This action targets the Condition_Check step and updates its inputs.expression field.

Example action:

action = patch_step
step = Condition_Check
field = inputs.expression
value = @equals(outputs('Compose_Ext'),'xlsx')

For now, your fix needs to be an exact match to what's expected for it to count as correct.

How You Get Graded (Reward Function)

Our scoring system is pretty straightforward:

+1.0 if you successfully fix the flow.
-0.1 for trying an incorrect fix (but you still have tries left).
-0.2 if you run out of tries without fixing it.

The game (episode) ends when the flow is fixed, or when you run out of chances.

The Problems (Dataset)

The specific bugs we're trying to fix are stored in JSON files here:

flow_debugger_env/data/cases.json

Each problem includes the messed-up flow state, error details, and a hidden 'gold_fix' (the right answer) that the environment uses to check your work. You, the agent, never see this 'gold_fix'.

How to Run the Example

Just run the demo.py file from the main project folder like this:

python demo.py

The demo will pick a random bug, use a basic rule-based agent to try and fix the condition expression, and then show you how it went.

What This Can't Do Yet (Limitations)

This simple version is kept small on purpose:

It only deals with syntax errors in Condition expressions.
It doesn't actually run real Power Automate flows.
It doesn't connect to any outside services or APIs.
It's not doing fancy AI learning (like reinforcement learning) yet.

Keeping things simple means it's fast, predictable, and easy for us to build on later.

What's Next?

We could add more cool stuff later, like:

Figuring out errors in 'filter array' settings.
Dealing with 'null' values or wrong data types.
Fixing multiple steps at once.
Using smarter, AI-powered agents.
Training AI using special tools like TRL or Unsloth.
Adding 'Green Agent' wrappers.

Why We Made This

Debugging Power Automate is a real headache for many, and it's a big deal. This environment turns those everyday automation failures into a structured task for agents and a useful testbed for learning and experimenting with OpenEnv.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning