## Sources

1. [Databricks CTO Wins ACM Prize, Says AGI Is Already Here](https://awesomeagents.ai/news/zaharia-databricks-agi-claim-acm/)
2. [Best AI Models for Agentic Tool Use - April 2026](https://awesomeagents.ai/capabilities/agentic-tool-use/)
3. [Meta Muse Spark Launches, Ranks 4th Among Frontier Models](https://awesomeagents.ai/news/meta-muse-spark-frontier-debut/)
4. [MedGemma 1.5, Smarter MCTS, and Auditing AI Agents](https://awesomeagents.ai/science/medgemma-mcts-auditable-agents/)
5. [Best AI Chatbot Builders 2026: 6 Platforms Tested](https://awesomeagents.ai/tools/best-ai-chatbot-builders-2026/)
6. [Eclipse Raises $1.3B to Back and Build Physical AI](https://awesomeagents.ai/news/eclipse-ventures-physical-ai-1-3b-fund/)
7. [Microsoft MAI Models: Voice, Speech and Image Reviewed](https://awesomeagents.ai/reviews/review-microsoft-mai-models/)
8. [GLM-5.1 Tops SWE-Bench Pro With Zero NVIDIA Hardware](https://awesomeagents.ai/news/glm-5-1-swe-bench-pro-huawei-chips/)
9. [Utah Clears AI to Renew Psychiatric Meds Autonomously](https://awesomeagents.ai/news/utah-ai-psychiatric-meds-legion-health/)
10. [Claude Mythos Preview Finds Thousands of Zero-Days](https://awesomeagents.ai/news/claude-mythos-preview-zero-day-cybersecurity/)

---

### Best AI Chatbot Builders 2026: 6 Platforms Tested
**Author: James Kowalski**

*   **Market Evolution:** The chatbot builder market has shifted drastically from old decision-tree models to LLM-powered knowledge retrieval flows, creating a diverse set of use cases and vendor offerings [1]. 
*   **Top Platforms Evaluated:**
    *   **Botpress:** Best for technical teams and complex multi-channel agents. It features an LLM-agnostic architecture allowing routing to OpenAI, Anthropic, Mistral, or self-hosted models [2, 3]. It supports over 190 integrations and a visual flow editor with code hooks [3]. However, the free tier limits users to only 500 messages per month [4].
    *   **Voiceflow:** Best for voice and chat agents. It natively handles both digital chat and voice telephony (e.g., via Twilio) [2, 5]. It uses a Figma-like canvas interface and supports multiple LLM backends [5]. A notable downside is that extra editor seats cost $50/month each, which scales poorly for large teams [6].
    *   **Tidio:** Ideal for e-commerce support, offering a combined chatbot and live chat platform with a native Shopify integration [2, 6, 7]. Its Lyro AI can handle order statuses and product availability natively, but users are locked into Tidio's backend without custom LLM routing [7, 8].
    *   **Chatbase:** Best for quick knowledge-base bots, allowing deployment in under an hour by feeding it existing documentation [2, 8]. It supports over 15 AI models but lacks complex multi-turn conversation logic or external API triggering [9, 10].
    *   **ManyChat:** Specialized for social media automation (Instagram, WhatsApp, Facebook Messenger, TikTok) and is not designed for website chat [2, 10, 11]. It uses a contact-based pricing model that automatically scales as audience size grows [12].
*   **Pricing Considerations:** The true cost of these platforms often lies in add-ons like extra editor seats, AI conversation limits, or branding removal, rather than the base subscription price [13].

### Best AI Models for Agentic Tool Use - April 2026
**Author: James Kowalski**

*   **Current Leaders:** **Claude Opus 4.6** holds the top combined position for agentic tasks, scoring 80.8% on SWE-bench Verified (software engineering) and 72.7% on OSWorld (autonomous computer use) [14, 15]. **Gemini 3.1 Pro** is a close competitor, scoring 80.6% on SWE-bench Verified and 75.0% on OSWorld, offering near-Opus quality at roughly half the API cost [15-17].
*   **Computer Use Specialist:** **GPT-5.4** ties with Gemini 3.1 Pro for the top spot in autonomous computer use (OSWorld) at 75.0%, crossing the human expert baseline of 72.4% [18-20]. 
*   **Function Calling vs. Agentic Workflow:** While smaller models like **GLM 4.5** and **Qwen3 32B** excel at narrow function calling (BFCL V3), they trail significantly on complex multi-step agentic workflows [21, 22]. Open-weight models are currently 20+ points behind the leaders on SWE-bench Verified [16, 23].
*   **Scaffolding Importance:** The choice of agent harness or scaffold has a massive impact on performance, affecting agentic scores by up to 22%, whereas swapping the underlying model only shifts scores by roughly 1% [24, 25]. 

### Claude Mythos Preview Finds Thousands of Zero-Days
**Author: Elena Marchetti**

*   **Vulnerability Discovery:** Anthropic's restricted **Claude Mythos Preview** model autonomously discovered thousands of high-severity and critical zero-day vulnerabilities across major operating systems and web browsers [26, 27].
*   **Notable Exploits:** The model found a 27-year-old OpenBSD kernel bug and a 16-year-old FFmpeg flaw that had previously survived 5 million automated fuzzer runs [27-29]. It successfully chained multiple vulnerabilities to build functioning exploits, including a six-packet ROP chain in FreeBSD that cost under $2,000 in API calls to execute [30, 31].
*   **Emergent Capabilities:** Anthropic explicitly stated that the model was not trained for these security capabilities; rather, they emerged naturally as a consequence of general improvements in coding and reasoning [32, 33]. 
*   **Access Restrictions:** Due to safety risks, the model is not publicly available. It is limited to Project Glasswing partners and critical infrastructure organizations, priced at a premium of $25/$125 per million input/output tokens [27, 34, 35].

### Databricks CTO Wins ACM Prize, Says AGI Is Already Here
**Author: Daniel Okafor**

*   **ACM Prize:** Matei Zaharia, co-founder and CTO of Databricks, won the 2026 ACM Prize in Computing for his foundational work on Apache Spark, Delta Lake, and MLflow, which underpin modern enterprise AI infrastructure [36-38].
*   **AGI Claims:** Zaharia controversially claimed that "AGI is here already" but argues that current evaluations fail to recognize it because they incorrectly apply human standards (like the bar exam) to non-human systems [37, 39].
*   **Commercial Incentives:** The article notes that Zaharia's claims must be viewed through his commercial incentives. Databricks' AI workloads generated $1.4 billion in annualized revenue (26% of their business), and declaring AGI "arrived" drives enterprise infrastructure spending [40, 41].
*   **Industry Divide:** The tech industry is split on AGI. Infrastructure providers (like NVIDIA and Databricks) claim it has arrived, while consumer application leaders like Mark Zuckerberg label it "marketing speak," and researchers like Andrew Ng warn of AI bubbles [37, 42, 43].

### Eclipse Raises $1.3B to Back and Build Physical AI
**Author: Daniel Okafor**

*   **Fundraising and Focus:** Eclipse Ventures closed $1.3 billion across two funds (a $720M early-stage fund and a $591M growth-stage fund) specifically to invest in **"physical AI"**—applying machine learning to robotics, defense, and manufacturing [44, 45].
*   **Defense Dominance:** The firm's portfolio heavily targets U.S. DoD programs and defense contracts, backing companies like True Anomaly (autonomous spacecraft), Blue Water Autonomy (unmanned Navy ships), and VulcanForms (defense fabrication) [45, 46].
*   **Hands-On Model:** Unlike traditional generalist VC firms, Eclipse often builds companies from scratch by identifying gaps, assembling founding teams, and providing the massive capital required to cross from hardware prototypes to commercial production [47, 48].
*   **Market Drivers:** The pivot to physical AI is driven by falling hardware costs, advanced real-time ML decision-making, and persistent labor shortages in manufacturing and construction [49].

### GLM-5.1 Tops SWE-Bench Pro With Zero NVIDIA Hardware
**Author: Sophie Zhang**

*   **Hardware Milestone:** Z.ai released GLM-5.1, a 744B MoE model trained entirely on 100,000 Huawei Ascend 910B chips using the MindSpore framework, bypassing U.S. Entity List restrictions with **zero NVIDIA or U.S. silicon** [50-52].
*   **Coding Leadership:** The model claimed the top spot on the rigorous **SWE-bench Pro** benchmark with a score of 58.4, slightly edging out GPT-5.4 and Claude Opus 4.6 [50, 53].
*   **Agentic Training Loop:** GLM-5.1 features a post-training pipeline using "asynchronous reinforcement learning" allowing the model to engage in "break-and-repair" workflows, autonomously completing continuous coding tasks spanning up to eight hours [54].
*   **Trade-Offs:** While exceptional at software engineering, the model trails US frontier models on pure math and science reasoning (e.g., GPQA-Diamond) [53, 55]. Additionally, its inference speed is notably slow at 44.3 tokens per second [56, 57]. 

### MedGemma 1.5, Smarter MCTS, and Auditing AI Agents
**Author: Elena Marchetti**

*   **MedGemma 1.5 (Google):** The new 4B parameter open medical AI model can natively process full **3D imaging volumes** (like CTs and MRIs) [58, 59]. It achieved massive improvements, with MRI classification accuracy jumping to 65% and pathology ROUGE-L scores rising from 0.02 to 0.49 [60, 61]. It also powers MedASR, which heavily outperforms standard models like Whisper on medical dictation [62].
*   **PRISM-MCTS:** A new reasoning framework that drastically improves Monte Carlo Tree Search efficiency [60, 63]. By using a shared memory and scoring intermediate reasoning steps, it halves the required trajectories on the difficult GPQA benchmark while outperforming existing search frameworks [60, 64].
*   **Auditable Agents:** An audit of six major open-source AI agent projects revealed **617 security findings**, demonstrating a severe lack of accountability and auditability in current agent deployments [60, 65-67]. The research proves that adding tamper-evident logging to secure these systems only adds a median overhead of 8.3ms [60, 67].

### Meta Muse Spark Launches, Ranks 4th Among Frontier Models
**Author: Elena Marchetti**

*   **Ground-Up Rebuild:** Meta Superintelligence Labs released Muse Spark, a natively multimodal model built entirely from scratch over nine months [68-70]. It integrates voice, text, and image inputs directly to achieve high compute efficiency [70].
*   **Benchmark Performance:** The model ranks 4th on the Artificial Analysis Intelligence Index (score: 52), trailing behind Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6 [71]. 
*   **Strengths vs. Weaknesses:** It leads the pack on health and science benchmarks (scoring 42.8% on HealthBench Hard) and excels in visual understanding [71-73]. However, it critically underperforms in coding and abstract reasoning, scoring a 42.5 on ARC-AGI-2 compared to Gemini's 76.5 [71, 74]. 
*   **Launch Strategy:** Uncharacteristically for Meta, Muse Spark launched as a closed-source, proprietary system deployed directly into Meta's consumer apps (Facebook, Instagram, Threads), limiting independent safety and architecture audits [69, 71, 75].

### Microsoft MAI Models: Voice, Speech and Image Reviewed
**Author: Elena Marchetti**

*   **Strategic Independence:** Microsoft launched three in-house AI models running natively on their own MAIA 200 inference chips, indicating an effort to reduce their exclusive reliance on OpenAI infrastructure while securing lower costs for enterprise users [76-78].
*   **MAI-Transcribe-1:** The standout model of the trio. It boasts best-in-class accuracy, averaging a 3.8% Word Error Rate across 25 languages on the FLEURS benchmark, beating Whisper-large-v3 across the board while operating at half the GPU cost [79-81].
*   **MAI-Voice-1:** Extremely fast text-to-speech generation, capable of producing 60 seconds of audio in under one second, making it ideal for real-time voice agents [82]. Voice cloning features are present but require strict Microsoft approval [83].
*   **MAI-Image-2:** Though it outputs high-quality photorealistic images (ranking #3 on Arena.ai), its utility is crippled by severe restrictions, including a 15-image daily cap, rigid square-only aspect ratios, and overzealous content filtering [79, 84-86].

### Utah Clears AI to Renew Psychiatric Meds Autonomously
**Author: Elena Marchetti**

*   **Regulatory First:** Under a state-run regulatory sandbox pilot, Utah became the first government to allow an AI system (Legion Health) to autonomously renew psychiatric medications without real-time human physician approval [87-89].
*   **Strict Limitations:** The AI is strictly limited to stable patients requesting renewals for 15 lower-risk, non-controlled psychiatric drugs (e.g., fluoxetine, sertraline) [90, 91]. It cannot handle new prescriptions, dose adjustments, or high-risk drugs like antipsychotics or lithium [90-92].
*   **Oversight Phasing:** The pilot follows a three-stage safety architecture: it begins with 250 prior-reviewed renewals (requiring a 98% agreement rate), moves to 1,000 retrospective audits (99% agreement), and then shifts to randomized monthly audits [90, 93, 94].
*   **Broader Implications:** This follows a parallel Utah program by Doctronic that already automates renewals for roughly 80% of chronic-condition medications [90, 95]. While innovative, the pilot faces concerns over undefined legal liability, opaque AI reasoning processes, and whether small supervised sample sizes can safely predict larger deployment success [96, 97].