Dear author:
I see in your project that when you are evaluating on two benchmarks——BugsInPy and TypeBugs, you are using the function: gen_test_script('prompt_patches/bugsinpy/correctness_failed_cases.json', in the evaluate.py script, with split = 5, benchmark = “bugsinpy”) to generate sh file, which is then evaluated in PyTER's docker. But there's a bit of logic in there that I don't understand, how is the generated fixed patches, replaced into the original buggy function? I don't see where this part of the logic is implemented. After I follow the gen_test_script, it just generates some sh scripts in the folder. So I'm curious how the step of embedding the fixed patch into the buggy function is accomplished.
Dear author:
I see in your project that when you are evaluating on two benchmarks——BugsInPy and TypeBugs, you are using the function: gen_test_script('prompt_patches/bugsinpy/correctness_failed_cases.json', in the evaluate.py script, with split = 5, benchmark = “bugsinpy”) to generate sh file, which is then evaluated in PyTER's docker. But there's a bit of logic in there that I don't understand, how is the generated fixed patches, replaced into the original buggy function? I don't see where this part of the logic is implemented. After I follow the gen_test_script, it just generates some sh scripts in the folder. So I'm curious how the step of embedding the fixed patch into the buggy function is accomplished.