12 comments

  • mlop99 3 hours ago
    Curious if the behaviour driven testing can be done by another LLM agent (or a group of agents) - one LLM agent testing another. Could lead to a self-improving loop?
  • shailendra145 3 hours ago
    A powerful move beyond benchmarks — this paper redefines LLM evaluation through realistic, behavior-driven testing.
  • raj_maddipati 1 hour ago
    Excellent work
  • papz2k 3 hours ago
    Very interesting work.
  • harshv_03 1 hour ago
    Interesting
  • ankush9812 3 hours ago
    Nice Work
  • ashyash518 3 hours ago
    Nice work
  • saurabh_xen 3 hours ago
    Great work
  • quanta9 3 hours ago
    interesting
  • cs_exps 1 hour ago
    [dead]