AI summary 1 แหล่ง · วันนี้ · 17:13

นักวิจัยเปิดตัวเครื่องมือใหม่ช่วย LM agent ทำงาน reasoning และ verification ได้แม่นยำขึ้น

มี 4 งานวิจัยใหม่ออกมาพร้อมกันเกี่ยวกับการทำให้ LM agent ทำงาน reasoning ได้ดีขึ้น AgenticInterpBench เป็น benchmark ใหม่ที่ช่วยให้ agent อธิบาย circuit ใน mechanistic interpretability ได้ มี 84 circuits พร้อม 163 annotations VeryTrace เป็น framework ที่แปลง chain-of-thought เป็น DSL ที่ compile ได้ เพื่อจับ error ตั้งแต่ขั้นตอนแรก PrologMCP เป็น server ที่ให้ LM เรียกใช้ Prolog solver ผ่าน MCP protocol แบบ stateful ส่วนงานวิจัย DL-Lite ทำให้ reasoning แบบ defeasible ใน description logic ทำงานได้เร็วขึ้นด้วย rational closure ทั้งหมดมุ่งแก้ปัญหาที่ LM มักพลาดในงาน multi-step reasoning ลึกๆ

แหล่งข่าว

ประเด็น

วันนี้ · 17:13

อัปเดต

แหล่งต้นทาง · 4

ลิงก์ต้นทางอยู่ครบ เพื่อให้เปิดอ่านเต็มและเทียบข้อมูลเองได้

arXiv — cs.AI เมื่อวาน · 04:00

Can Language Model Agents be Helpful Circuit Explainers in Mechanistic Interpretability?

arXiv — cs.AI เมื่อวาน · 04:00

VeryTrace: Verifying Reasoning Traces through Compilable Formalism and Structured Verification

arXiv — cs.AI เมื่อวาน · 04:00

Tractable Reasoning and Conjunctive Query Answering for Defeasible DL-Lite under Rational Closure