<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://hiimmuc.github.io/Personal-AI-Digest/feed.xml" rel="self" type="application/atom+xml" /><link href="https://hiimmuc.github.io/Personal-AI-Digest/" rel="alternate" type="text/html" /><updated>2026-06-23T14:31:06+07:00</updated><id>https://hiimmuc.github.io/Personal-AI-Digest/feed.xml</id><title type="html">AI Digest</title><subtitle>Personal daily AI &amp; tech research digest</subtitle><author><name>hiimmuc</name></author><entry><title type="html">Daily Digest 2026-06-23</title><link href="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-23/" rel="alternate" type="text/html" title="Daily Digest 2026-06-23" /><published>2026-06-23T00:00:00+07:00</published><updated>2026-06-23T00:00:00+07:00</updated><id>https://hiimmuc.github.io/Personal-AI-Digest/digest/daily</id><content type="html" xml:base="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-23/"><![CDATA[<div class="digest-theme">
  <svg class="digest-theme-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 1.5a6.5 6.5 0 1 0 0 13 6.5 6.5 0 0 0 0-13zM0 8a8 8 0 1 1 16 0A8 8 0 0 1 0 8z" /><path d="M6.5 7.75A.75.75 0 0 1 7.25 7h1a.75.75 0 0 1 .75.75v2.75h.25a.75.75 0 0 1 0 1.5h-2a.75.75 0 0 1 0-1.5h.25v-2h-.25a.75.75 0 0 1-.75-.75zM8 6a1 1 0 1 1 0-2 1 1 0 0 1 0 2z" /></svg>
  <span>Today's digest highlights a significant shift toward agentic workflows, focusing on autonomous research, industrial-scale diagnostics, and the optimization of multi-agent systems. There is a clear emphasis on bridging the gap between high-level reasoning and physical embodiment through spatial memory and 3D perception.</span>
</div>

<h2 id="global-trends">Global Trends</h2>

<h3 id="personal-interests">Personal Interests</h3>

<p class="section-desc">Papers discovered through your interest topics.</p>

<h4 id="embodied-ai">Embodied AI</h4>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-cv" title="Computer Vision and Pattern Recognition (cs.CV)">Computer Vision and Pattern Recognition (cs.CV)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.23675">IMAGIN-4D: Image-Guided Controllable Interaction Generation</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Sai Kumar Dwivedi, Federica Bogo, Buğra Tekin, Chenhongyi Yang, Nadine Bertsch, Tomas Hodan, Michael J. Black, Dimitrios Tzionas, Shreyas Hampali
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.23675" target="_blank" rel="noopener noreferrer">2606.23675</a></p>
<p class="paper-detail"><strong>Authors:</strong> Sai Kumar Dwivedi, Federica Bogo, Buğra Tekin, Chenhongyi Yang, Nadine Bertsch, Tomas Hodan, Michael J. Black, Dimitrios Tzionas, Shreyas Hampali</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Generating human-object interactions (HOI) is central to character animation, robotics, AR/VR, and embodied AI. Recent HOI generation methods synthesize motion from text, object geometry, and sparse waypoints, controlling action semantics and object trajectories. However, these signals underspecify interaction: the same prompt and trajectory can produce different grasps, approach directions, body poses, object poses, contacts, and body-object layouts. We address this ambiguity with a reference image as a visual specification of the desired interaction snapshot. However, a single global image representation conflates distinct cues and conditions all frames on identical visual evidence. We therefore introduce IMAGIN-4D, a diffusion-based HOI generator that decomposes image conditioning spatio-temporally. For spatial conditioning, IMAGIN-4D extracts supervised interaction-state tokens for body pose, object pose, body-object contact, and spatial relationships at the depicted frame. For temporal conditioning, it computes frame-aware tokens by querying image patches per generated frame, allowing sequence segments to attend to different visual cues from the same image. To balance image, text, and waypoint cues, IMAGIN-4D uses role-aware conditioning: text, waypoints, and interaction-state tokens use separate AdaLN streams, while frame-aware visual tokens cross-attend with motion tokens. Since HOI motion datasets lack paired images, we build a synthetic motion-to-image rendering pipeline from FullBodyManipulation (FBM) and introduce an image-adherence metric to evaluate whether generated motions match the reference snapshot. Experiments on FBM and BEHAVE show that IMAGIN-4D improves fine-grained interaction control over single-token and uniformly image-conditioned baselines while preserving waypoint-following and motion quality. Code and models will be released at https://imagin4d.github.io.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces IMAGIN-4D, a diffusion-based framework that enables fine-grained control over human-object interactions (HOI) by decomposing a reference image into spatio-temporal conditioning tokens.</p>
<p><strong>Core Idea:</strong> To resolve the ambiguity of underspecified text and waypoint prompts, the model uses a reference image to specify precise interaction snapshots, avoiding the conflation of cues by decomposing the image into spatial states and temporal frame-aware tokens.</p>
<p><strong>Technique:</strong> The method employs a diffusion model with role-aware conditioning, utilizing separate AdaLN streams for text, waypoints, and interaction-state tokens, while using cross-attention for frame-aware visual tokens.</p>
<p><strong>Pipeline:</strong> Text prompt + Object geometry + Waypoints + Reference image → Spatio-temporal token extraction &amp; Role-aware conditioning → Diffusion-based motion generation</p>
<p><strong>Methodology:</strong> The authors developed a synthetic motion-to-image rendering pipeline to create a training dataset and introduced a new image-adherence metric to evaluate how well the generated motion matches the reference snapshot.</p>
<p><strong>Results:</strong> IMAGIN-4D outperforms single-token and uniformly image-conditioned baselines in fine-grained interaction control while maintaining high motion quality and waypoint adherence on the FBM and BEHAVE datasets.</p>
<p><strong>Limitations:</strong> The model relies on a synthetic motion-to-image rendering pipeline for training due to the lack of paired images in existing HOI datasets.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.23675" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ro" title="Robotics (cs.RO)">Robotics (cs.RO)</span><span class="cat-tag cat-cv" title="Computer Vision and Pattern Recognition (cs.CV)">Computer Vision and Pattern Recognition (cs.CV)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.23565">HoloAgent-0: A Unified Embodied Agent Framework with 3D Spatial Memory</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Xiaolin Zhou, Liu Liu, Tingyang Xiao, Wei Feng, Fa Fu, Xinrui Meng, Xinjie Wang, Jialiang Han, Boyang Yu, Yun Du, Wei Sui, Zhizhong Su
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.23565" target="_blank" rel="noopener noreferrer">2606.23565</a></p>
<p class="paper-detail"><strong>Authors:</strong> Xiaolin Zhou, Liu Liu, Tingyang Xiao, Wei Feng, Fa Fu, Xinrui Meng, Xinjie Wang, Jialiang Han, Boyang Yu, Yun Du, Wei Sui, Zhizhong Su</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">LLM agents follow a practical execution loop in digital environments: they reason over structured states, invoke tools, inspect feedback, and revise actions. Extending this loop to physical robots is difficult because physical execution is continuous, embodiment-dependent, uncertain, and constrained by safety. Existing embodied-AI systems have advanced manipulation, spatial understanding, navigation, and humanoid control, but these capabilities often remain specialized modules or loosely coupled decision loops. In this work, we introduce HoloAgent-0, a unified embodied agent framework for real-world robot deployment. Embodied AgentOS converts language instructions into executable skill graphs, schedules robot resources, monitors execution, and triggers clarification or re-planning from runtime feedback. HoloAgent-0 organizes heterogeneous robot models and controllers through three coupled layers: Embodied AgentOS for closed-loop execution, 3D spatial memory for physical world grounding, and embodied skills for robot action. We deploy HoloAgent-0 on real hardware and evaluate its spatial memory, long-horizon navigation, and closed-loop execution across motion generation, object search, cross-robot coordination, and mobile manipulation.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces HoloAgent-0, a unified framework that integrates 3D spatial memory with a closed-loop operating system to enable complex, long-horizon robot tasks.</p>
<p><strong>Core Idea:</strong> To bridge the gap between digital LLM reasoning and physical robot execution, the framework unifies heterogeneous robot controllers through a structured OS that manages skill graphs and spatial grounding.</p>
<p><strong>Technique:</strong> The framework employs a three-layer architecture consisting of Embodied AgentOS for execution logic, 3D spatial memory for environmental grounding, and a library of embodied skills for physical actions.</p>
<p><strong>Pipeline:</strong> Natural language instructions → Embodied AgentOS (Skill Graph conversion &amp; Resource Scheduling) → 3D Spatial Memory Grounding → Embodied Skills Execution → Runtime Feedback → Re-planning/Clarification</p>
<p><strong>Methodology:</strong> The authors developed a modular system that converts high-level goals into executable graphs, deployed it on real hardware, and evaluated it across navigation, object search, and mobile manipulation tasks.</p>
<p><strong>Results:</strong> The framework successfully demonstrated capabilities in long-horizon navigation, cross-robot coordination, and closed-loop mobile manipulation on real-world hardware.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the scalability of the 3D spatial memory in extremely large-scale dynamic environments or the latency overhead of the closed-loop re-planning cycle.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.23565" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-cv" title="Computer Vision and Pattern Recognition (cs.CV)">Computer Vision and Pattern Recognition (cs.CV)</span><span class="cat-tag cat-ro" title="Robotics (cs.RO)">Robotics (cs.RO)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.23293">Flow6D: Discrete-to-Continuous Flow Matching for Efficient and Accurate Category-Level 6D Pose Estimation</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Mingyu Mei, Li Zhang, Zibo Dai, Han Sun, Xinyue Zhao, Huiliang Shen, Zaixing He
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.23293" target="_blank" rel="noopener noreferrer">2606.23293</a></p>
<p class="paper-detail"><strong>Authors:</strong> Mingyu Mei, Li Zhang, Zibo Dai, Han Sun, Xinyue Zhao, Huiliang Shen, Zaixing He</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">6D pose estimation is a key task in computer vision and embodied AI, widely used in robotic manipulation, augmented reality, etc. Existing methods directly regress in a high-dimensional continuous space, facing two key challenges in category-level pose estimation: limited accuracy due to noise and local optima, and inefficient search over an infinite space that hinders real-time performance. This paper proposes Flow6D, a hierarchical flow matching framework with a two-stage discrete latent space localization-continuous pose regression strategy. Rotation and translation parameters are first discretized into bins, with a discrete flow matching model locking the latent space around the true pose to reduce search complexity. Then, by sampling in the latent space, a continuous flow matching model predicts local pose residuals to optimize the estimate and regress to an accurate pose. The framework also naturally extends to articulated objects, outperforming state-of-the-art methods on synthetic and real datasets with real-time inference at 70 FPS. Project website: https://flow6d.github.io/.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces Flow6D, a hierarchical flow matching framework that addresses the accuracy and efficiency challenges of category-level 6D pose estimation by combining discrete latent space localization with continuous pose regression.</p>
<p><strong>Core Idea:</strong> The core idea is to decompose the 6D pose estimation task into a two-stage process: first narrowing down the search space using discrete flow matching and then refining the pose using continuous flow matching.</p>
<p><strong>Technique:</strong> The method utilizes a discrete-to-continuous flow matching strategy where rotation and translation are discretized into bins to lock the latent space before regressing local residuals.</p>
<p><strong>Pipeline:</strong> Input Image → Discrete Flow Matching (Latent Space Localization) → Continuous Flow Matching (Local Residual Regression) → Final 6D Pose</p>
<p><strong>Methodology:</strong> The framework discretizes pose parameters into bins to reduce search complexity and employs a hierarchical flow matching model to predict accurate pose residuals from the localized latent space.</p>
<p><strong>Results:</strong> Flow6D outperforms state-of-the-art methods on synthetic and real datasets, achieves real-time inference at 70 FPS, and naturally extends to articulated objects.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the sensitivity of the binning strategy to different object scales or the computational overhead of the hierarchical flow matching compared to single-stage regressors.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.23293" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-cv" title="Computer Vision and Pattern Recognition (cs.CV)">Computer Vision and Pattern Recognition (cs.CV)</span><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.23256">P-JEPA: Procedural Video Representation Learning via Joint Embedding Predictive Architecture</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Felix Tristram, Stefano Gasperini, Benjamin Killeen, Marcel Walch, Christian Benz, Nassir Navab, Ghazal Ghazaei
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.23256" target="_blank" rel="noopener noreferrer">2606.23256</a></p>
<p class="paper-detail"><strong>Authors:</strong> Felix Tristram, Stefano Gasperini, Benjamin Killeen, Marcel Walch, Christian Benz, Nassir Navab, Ghazal Ghazaei</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">The increasing maturity of embodied AI platforms has driven a growing interest in procedural video representation learning to support intelligent assistance systems for complex, multi-step tasks. Leveraging large-scale latent predictive training, video foundation models capture video dynamics, enabling downstream tasks such as activity understanding, spatiotemporal localization, and predictive control. However, procedural videos include actions with long-range dependencies that these models do not support, due to the quadratic complexity of self-attention. Distinct actions, for example, may be visually similar despite appearing at different points in the procedure, such as turning the stove on versus off. Here, we propose a backbone-agnostic approach that learns long-duration video representations by reducing the problem to a dense, frame-aligned action space and predicting pooled masked latent vectors. This approach allows our Procedural Joint Embedding Predictive Architecture (P-JEPA) to ingest videos over 30 minutes long, enabling effective long-form understanding of procedural steps. We evaluate P-JEPA using features extracted with VJEPA2.1, TSM, and I3D over the EgoExo4D, EgoProceL, and Assembly101 datasets, finding that it consistently improves linear separability, streaming inference, and temporal action segmentation performance, achieving state-of-the-art results on EgoExo4D fine-grained action classification while using an order of magnitude fewer parameters than LLM-based methods and running in real time.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces P-JEPA, a backbone-agnostic framework for learning long-duration procedural video representations that overcomes the quadratic complexity of self-attention in long-form videos.</p>
<p><strong>Core Idea:</strong> The authors propose reducing long-duration video understanding to a dense, frame-aligned action space to capture long-range dependencies and distinguish visually similar actions occurring at different procedural stages.</p>
<p><strong>Technique:</strong> P-JEPA utilizes a Joint Embedding Predictive Architecture that predicts pooled masked latent vectors, allowing the model to process videos exceeding 30 minutes in length.</p>
<p><strong>Pipeline:</strong> Long-form procedural video → Feature extraction (VJEPA2.1, TSM, or I3D) → Dense frame-aligned action space mapping → Pooled masked latent vector prediction → Long-form representation learning</p>
<p><strong>Methodology:</strong> The model is evaluated across EgoExo4D, EgoProceL, and Assembly101 datasets using various backbones to measure linear separability, streaming inference, and temporal action segmentation.</p>
<p><strong>Results:</strong> P-JEPA achieved state-of-the-art results on EgoExo4D fine-grained action classification, outperformed LLM-based methods with an order of magnitude fewer parameters, and achieved real-time inference.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the specific performance trade-offs when using different backbone architectures or the scalability of the dense action space for extremely high-frequency actions.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.23256" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ro" title="Robotics (cs.RO)">Robotics (cs.RO)</span><span class="cat-tag cat-cv" title="Computer Vision and Pattern Recognition (cs.CV)">Computer Vision and Pattern Recognition (cs.CV)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.22971">Humanoid-OmniOcc: Stereo-Based Full-View Occupancy Dataset for Embodied AI</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Xianda Guo, Bohao Zhang, Chenwei Huang, Shiyuan Chen, Ruilin Wang, Yiqun Duan, Cong Yang, Qin Zou, Wei Sui
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.22971" target="_blank" rel="noopener noreferrer">2606.22971</a></p>
<p class="paper-detail"><strong>Authors:</strong> Xianda Guo, Bohao Zhang, Chenwei Huang, Shiyuan Chen, Ruilin Wang, Yiqun Duan, Cong Yang, Qin Zou, Wei Sui</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Occupancy prediction at voxel-level granularity is essential for safe robotic navigation and interaction in complex environments. Existing occupancy datasets, however, are predominantly designed for autonomous driving with vehicle-centric biases -- forward-facing cameras, far-field geometry, and static road priors -- limiting their applicability to embodied humanoid perception. We present Humanoid-OmniOcc, a large-scale panoramic stereo-based occupancy dataset tailored for humanoid robots. The dataset encompasses 15 diverse simulated indoor scenes and 5 real-world environments, yielding over 155K samples with broad scene and style diversity. Importantly, the dataset is designed around a Real2Sim2Real closed-loop paradigm: real sensor specifications drive physically accurate simulation, simulation produces large-scale annotated training data, and models trained in simulation are directly evaluated on real-world captures -- enabling iterative refinement of the sim-to-real pipeline. We further propose \textbf{H}umanoid \textbf{S}urround \textbf{S}tereo-guided \textbf{Occ}upancy model (Humanoid-OmniOcc) that exploits robust depth priors for accurate 2D-to-3D lifting. Extensive experiments show that Humanoid-OmniOcc consistently outperforms monocular baselines and generalizes well to both unseen simulated test scenes and real-world environments, validating the effectiveness of the Real2Sim2Real design. Code and data will be available upon acceptance at https://d-robotics-ai-lab.github.io/humanoid-omniocc.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces Humanoid-OmniOcc, a large-scale panoramic stereo-based occupancy dataset and a corresponding model specifically designed for humanoid robots to overcome vehicle-centric biases in existing datasets.</p>
<p><strong>Core Idea:</strong> The research leverages a Real2Sim2Real closed-loop paradigm to bridge the gap between simulation and reality, providing high-granularity voxel-level occupancy for embodied AI.</p>
<p><strong>Technique:</strong> The authors propose the Humanoid Surround Stereo-guided Occupancy (Humanoid-OmniOcc) model, which utilizes robust depth priors from stereo vision for accurate 2D-to-3D lifting.</p>
<p><strong>Pipeline:</strong> Real sensor specifications → Physically accurate simulation → Large-scale annotated training data → Model training → Real-world evaluation</p>
<p><strong>Methodology:</strong> The methodology involves creating a diverse dataset of 15 simulated and 5 real-world environments, followed by training a stereo-guided model to predict 3D occupancy from panoramic views.</p>
<p><strong>Results:</strong> The Humanoid-OmniOcc model consistently outperforms monocular baselines and demonstrates strong generalization across unseen simulated scenes and real-world environments.</p>
<p><strong>Limitations:</strong> The study focuses on simulated and real-world indoor/diverse environments, but the scalability of the Real2Sim2Real pipeline to highly dynamic or extremely large-scale outdoor humanoid tasks remains an open area.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.22971" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="multi-agent-systems">Multi-Agent Systems</h4>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span><span class="cat-tag cat-ai" title="Multiagent Systems (cs.MA)">Multiagent Systems (cs.MA)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.23664">MAS-PromptBench: When Does Prompt Optimization Improve Multi-Agent LLM Systems?</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Juyang Bai, Laixi Shi
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.23664" target="_blank" rel="noopener noreferrer">2606.23664</a></p>
<p class="paper-detail"><strong>Authors:</strong> Juyang Bai, Laixi Shi</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Multi-agent systems (MAS) offer a scalable path forward for agentic AI, comprising multiple LLM-based agents, each assigned a system prompt and a position within a workflow that governs inter-agent coordination and output aggregation. System prompts thus form a critical and accessible optimization surface: they specify agents' roles and behaviors, enabling system-level improvements without model finetuning. Although prompt optimization has shown substantial potential for single LLMs, extending it to MAS poses distinct challenges, notably an exponentially growing search space. It remains unclear whether, when, and by how much prompt optimization improves MAS performance, and how sensitive such gains are to system configuration. In this work, we systematically study system-prompt optimization across a broad range of MAS setups varying in task, workflow, communication protocol, and team size, benchmarking two prompt optimizers that naturally extend state-of-the-art single-agent methods. The results reveal its potential to unlock significant gains while exposing open challenges, characterizing when and how much prompt optimization helps across diverse MAS settings.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper systematically investigates the impact of system-prompt optimization on Multi-Agent Systems (MAS), identifying when and how much optimization improves performance across diverse configurations.</p>
<p><strong>Core Idea:</strong> While prompt optimization is effective for single LLMs, MAS introduces an exponentially growing search space, making it unclear if optimizing individual agent prompts translates to significant system-level gains.</p>
<p><strong>Technique:</strong> The authors benchmark two prompt optimizers extended from state-of-the-art single-agent methods to evaluate their effectiveness across various MAS workflows.</p>
<p><strong>Pipeline:</strong> MAS configurations (task, workflow, protocol, team size) → Prompt optimization of individual agent system prompts → Performance evaluation of the aggregated MAS output.</p>
<p><strong>Methodology:</strong> The study employs a systematic benchmarking approach across a broad range of MAS setups, varying communication protocols and team sizes to measure sensitivity to system configuration.</p>
<p><strong>Results:</strong> The results reveal that prompt optimization can unlock significant gains in some MAS settings but also expose challenges regarding the sensitivity of these gains to specific system configurations.</p>
<p><strong>Limitations:</strong> The research highlights the challenge of the exponentially growing search space in MAS and leaves open questions regarding the optimal balance of optimization across complex workflows.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.23664" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-eess" title="eess.SY">eess.SY</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.23011">Robust Data-Driven Nash Equilibrium Seeking under Partial-Decision Information</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Linqi Wang, Yifei Li, Wenjie Liu, Yuzhou Wei, Gang Wang, Lihua Xie
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.23011" target="_blank" rel="noopener noreferrer">2606.23011</a></p>
<p class="paper-detail"><strong>Authors:</strong> Linqi Wang, Yifei Li, Wenjie Liu, Yuzhou Wei, Gang Wang, Lihua Xie</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">This paper presents a data-driven framework for decentralized Nash equilibrium (NE) seeking in multi-agent systems with unknown linear dynamics subject to exogenous disturbances, operating under partial-decision information (where agents lack direct access to the decisions of all others) and equality constraints. The proposed framework integrates an NE model, a distributed communication protocol, an internal model for disturbance rejection, and a data-driven stabilization strategy. By reformulating the problem as a cooperative output regulation problem, we synthesize controllers directly from noisy input-state data via semi-definite programs (SDPs), providing formal guarantees for closed-loop stability and asymptotic convergence to the NE. The approach is further extended to a class of nonlinear systems with constant disturbances by leveraging integral control and describing nonlinearities via quadratic constraints. Numerical simulations involving unmanned aerial vehicle networks and a rotary-wing aerial vehicle formation validate the efficacy and robustness of the proposed method.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper proposes a robust data-driven framework for decentralized Nash equilibrium (NE) seeking in multi-agent systems with unknown linear dynamics, exogenous disturbances, and partial-decision information.</p>
<p><strong>Core Idea:</strong> The problem is reformulated as a cooperative output regulation problem, allowing for the synthesis of controllers directly from noisy input-state data while ensuring stability and convergence.</p>
<p><strong>Technique:</strong> The framework utilizes semi-definite programs (SDPs) to synthesize controllers and employs an internal model for disturbance rejection and quadratic constraints for nonlinear extensions.</p>
<p><strong>Pipeline:</strong> Noisy input-state data → Cooperative output regulation reformulation → Semi-definite programming (SDP) controller synthesis → Decentralized Nash equilibrium seeking</p>
<p><strong>Methodology:</strong> The authors integrate an NE model with a distributed communication protocol and a data-driven stabilization strategy, extending the linear approach to nonlinear systems via integral control.</p>
<p><strong>Results:</strong> The method provides formal guarantees for closed-loop stability and asymptotic convergence, validated through numerical simulations of UAV networks and rotary-wing aerial vehicle formations.</p>
<p><strong>Limitations:</strong> The current framework is primarily focused on systems with constant disturbances and specific classes of nonlinearities described by quadratic constraints.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.23011" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-default" title="astro-ph.IM">astro-ph.IM</span><span class="cat-tag cat-physics" title="physics.soc-ph">physics.soc-ph</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.22859">AI Scientists as Engines of Discovery: A Case for Development within Reformed Institutions</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Raul Jimenez, Boris Bolliet, Francisco Villaescusa-Navarro, Rabih Zbib, Benjamin Wandelt, David N. Spergel, Thomas Meier, Jessica Montgomery, Hana Aliee, Licia Verde
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.22859" target="_blank" rel="noopener noreferrer">2606.22859</a></p>
<p class="paper-detail"><strong>Authors:</strong> Raul Jimenez, Boris Bolliet, Francisco Villaescusa-Navarro, Rabih Zbib, Benjamin Wandelt, David N. Spergel, Thomas Meier, Jessica Montgomery, Hana Aliee, Licia Verde</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Agentic artificial intelligence (AI) systems are beginning to assist, accelerate, and partially automate scientific discovery, performing tasks that span literature synthesis, code generation, data analysis, hypothesis proposal, and model criticism. We argue that this transition is qualitative rather than incremental, and that suitably designed multi-agent systems may evolve from passive computational tools into ``AI scientists'' that can expand the hypothesis-generating and verification capacity of science. Such systems must be developed and deployed within a scientific ecosystem fit for purpose: institutions must be redesigned for verification, accountability, interpretability, and dual-use safety. We sketch how multi-agent architectures, illustrated by the prototype framework \textit{Denario}, accelerate the discovery cycle and traverse model spaces beyond human reach; examine what this implies for authorship, peer review, and the enduring role of human scientists; and close with recommendations for governing AI as an epistemic actor rather than a mere instrument.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper argues for a paradigm shift from viewing AI as a passive tool to an 'AI scientist' capable of autonomous hypothesis generation and verification. It proposes a framework for integrating these agentic systems into reformed scientific institutions that prioritize accountability and safety.</p>
<p><strong>Core Idea:</strong> Agentic multi-agent systems can qualitatively expand the capacity of scientific discovery by traversing model spaces beyond human reach. This transition requires a fundamental redesign of scientific infrastructure to manage AI as an epistemic actor.</p>
<p><strong>Technique:</strong> The authors utilize multi-agent architectures, specifically the 'Denario' prototype framework, to automate and accelerate the scientific discovery cycle.</p>
<p><strong>Pipeline:</strong> Scientific data and literature → Multi-agent system (hypothesis generation, code execution, data analysis, model criticism) → Accelerated discovery and expanded model spaces</p>
<p><strong>Methodology:</strong> The paper combines conceptual framework development with a technical sketch of the Denario prototype to illustrate how multi-agent systems perform complex scientific tasks.</p>
<p><strong>Results:</strong> The framework demonstrates the ability to accelerate the discovery cycle and explore complex model spaces; it also identifies critical implications for authorship, peer review, and human-AI collaboration.</p>
<p><strong>Limitations:</strong> The paper leaves open questions regarding the specific governance of AI as an epistemic actor and the practicalities of ensuring dual-use safety in autonomous discovery.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.22859" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-21" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">21 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.22610">PaperClaw: Harnessing Agents for Autonomous Research and Human-in-the-Loop Refinement</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Weiwei Ye, Hangchen Liu, Dongyuan Li, Renhe Jiang
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.22610" target="_blank" rel="noopener noreferrer">2606.22610</a></p>
<p class="paper-detail"><strong>Authors:</strong> Weiwei Ye, Hangchen Liu, Dongyuan Li, Renhe Jiang</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Large language models have become capable reasoners and tool users that write and run code and search the literature, which makes automating the research process itself a realistic goal. We present PAPERCLAW, a harnessed multi-agent system that carries a project autonomously, from a field of study to a finished paper. PAPERCLAW curates a domain from a field's live literature, datasets, and code; brainstorms it into an idea with a pre-registered main-result contract; and drives a stoppable hypothesis map through an iterative propose, test, reflect loop that grows only from measured verdicts and halts once the evidence supports the idea, at which point it writes a venue-compliant paper. A full-lifecycle memory keeps each stage in a single living record, so a long run can be paused, inspected, and resumed without losing context. At the centre is an in-cycle research assistant with research tools and skills: it can drive the whole pipeline on its own, while the same interface lets a person step in at any stage, turning a first autonomous draft into a stronger paper through human-in-the-loop refinement. Throughout, PAPERCLAW keeps its output grounded and checkable, citing only references validated against open scholarly indexes and reporting results that genuinely ran. An evaluation with an LLM judge finds that PAPERCLAW produces strong papers both fully autonomously and with human-in-the-loop refinement.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces PAPERCLAW, a multi-agent system capable of autonomously conducting research from initial domain curation to the production of a venue-compliant paper. It features a full-lifecycle memory system and a human-in-the-loop interface for collaborative refinement.</p>
<p><strong>Core Idea:</strong> Automating the research lifecycle by using LLM agents to brainstorm, test hypotheses through iterative loops, and write papers while maintaining groundedness through validated citations and executed code.</p>
<p><strong>Technique:</strong> A harnessed multi-agent architecture utilizing a 'propose, test, reflect' loop, a pre-registered main-result contract, and a persistent memory record to maintain context across long-running tasks.</p>
<p><strong>Pipeline:</strong> Field of study → Literature/Dataset/Code curation → Idea brainstorming &amp; Contract registration → Iterative hypothesis testing → Paper writing → Final venue-compliant manuscript</p>
<p><strong>Methodology:</strong> The system uses an in-cycle research assistant to drive a stoppable hypothesis map, where each step is validated against open scholarly indexes and actual code execution results.</p>
<p><strong>Results:</strong> Evaluation using an LLM judge demonstrated that PAPERCLAW produces high-quality papers in both fully autonomous modes and through human-in-the-loop refinement.</p>
<p><strong>Limitations:</strong> The study relies on an LLM judge for evaluation and the system's success is dependent on the availability of open scholarly indexes and executable code environments.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.22610" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-20" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-se" title="Software Engineering (cs.SE)">Software Engineering (cs.SE)</span></span>
      <span class="paper-date">20 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.21963">Holmes: Multimodal Agentic Diagnosis for Mixed-Language Mobile Crashes at Industrial Scale</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Jia Li, Wenyuan Ma, Ting Peng, Haibin Zheng, Yuetang Deng
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.21963" target="_blank" rel="noopener noreferrer">2606.21963</a></p>
<p class="paper-detail"><strong>Authors:</strong> Jia Li, Wenyuan Ma, Ting Peng, Haibin Zheng, Yuetang Deng</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Diagnosing mobile crashes in ultra-large-scale industrial applications is a formidable challenge due to the sheer volume of code, the complexity of mixed-language environments, and the inability to reproduce failures locally. Traditional static analysis struggles with scalability, while existing LLM-based agents often rely on reproducible environments unavailable in post-mortem scenarios. We present Holmes, a multi-agent system that automates root cause analysis by synthesizing multimodal runtime signals--stack traces, logs, and thread states--to reconstruct failure contexts without reproduction. Holmes introduces a hierarchical Retrieve-Explore-Reason architecture that leverages low-level artifacts (e.g., registers, assembly) to bridge the semantic gap between open-source business logic and closed-source system frameworks. By dynamically compressing the search space using runtime clues, Holmes precisely navigates 70-million-line codebases to identify non-local defects. Evaluated on real-world crashes from WeChat, Holmes achieves 87.6% accuracy in function-level fault localization and reduces average investigation time by over 98% (to ~77 seconds), demonstrating its effectiveness in transforming labor-intensive debugging into an efficient verification workflow.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces Holmes, a multi-agent system designed to automate root cause analysis for mobile crashes in ultra-large-scale, mixed-language industrial environments without requiring local reproduction.</p>
<p><strong>Core Idea:</strong> Holmes reconstructs failure contexts by synthesizing multimodal runtime signals and navigating massive codebases using a hierarchical architecture that bridges the gap between business logic and system frameworks.</p>
<p><strong>Technique:</strong> The system employs a hierarchical Retrieve-Explore-Reason architecture that utilizes low-level artifacts like registers and assembly to dynamically compress the search space.</p>
<p><strong>Pipeline:</strong> Multimodal runtime signals (stack traces, logs, thread states) → Retrieve-Explore-Reason multi-agent synthesis → Function-level fault localization</p>
<p><strong>Methodology:</strong> The authors developed a multi-agent framework that processes real-world crash data from a 70-million-line codebase, using runtime clues to navigate complex dependencies and identify non-local defects.</p>
<p><strong>Results:</strong> Achieved 87.6% accuracy in function-level fault localization and reduced average investigation time by over 98% (to approximately 77 seconds).</p>
<p><strong>Limitations:</strong> The paper focuses on post-mortem analysis where reproduction is impossible, potentially leaving open questions regarding its performance in scenarios where dynamic reproduction is feasible.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.21963" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="vision-language-models">Vision-Language Models</h4>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-cv" title="Computer Vision and Pattern Recognition (cs.CV)">Computer Vision and Pattern Recognition (cs.CV)</span><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-default" title="cs.GR">cs.GR</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.23679">Semantic Browsing: Controllable Diversity for Image Generation</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Sara Dorfman, Maya Vishnevsky, Omer Dahary, Or Patashnik, Daniel Cohen-Or
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.23679" target="_blank" rel="noopener noreferrer">2606.23679</a></p>
<p class="paper-detail"><strong>Authors:</strong> Sara Dorfman, Maya Vishnevsky, Omer Dahary, Or Patashnik, Daniel Cohen-Or</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Modern text-to-image models excel in visual fidelity and prompt adherence. However, this strict adherence comes at the cost of diversity: generated samples tend to collapse into a single visual interpretation. Existing methods to improve diversity produce outputs driven by incidental variations rather than meaningful design choices. This motivates a new variant of the diversity task where structure is enforced on the generated samples. We introduce a method for controlled diversity that enables Semantic Browsing, where users can navigate structured image galleries and experience creative exploration through a systematic traversal of meaningful, interpretable axes of variation. Achieving this level of semantic control requires a deep understanding of the scene. We exploit the fact that recent text-to-image models are trained on elaborated captions, effectively decoupling semantic decision-making from pixel generation. This enables a paradigm shift: instead of relying on stochastic variation within the text-to-image model, we induce diversity directly at the text level. By leveraging rich textual representations, we allow a Vision Language Model (VLM) to operate on the full scene context. To overcome the generic outputs typical of standard VLMs, we employ an agentic workflow that explicitly enforces structured variation attuned to the original prompt. We demonstrate that our method produces diverse and navigable design spaces where every variation corresponds to a specific, user-understandable semantic decision.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces 'Semantic Browsing,' a method for controllable diversity in image generation that allows users to navigate structured galleries based on meaningful, interpretable axes of variation.</p>
<p><strong>Core Idea:</strong> Instead of relying on stochastic noise for diversity, the authors induce variation directly at the text level by decoupling semantic decision-making from pixel generation.</p>
<p><strong>Technique:</strong> The authors employ an agentic workflow using a Vision Language Model (VLM) to manipulate rich textual representations and enforce structured variations attuned to the original prompt.</p>
<p><strong>Pipeline:</strong> User Prompt → VLM Agentic Workflow (Semantic Variation) → Elaborated Textual Descriptions → Text-to-Image Model → Structured Image Gallery</p>
<p><strong>Methodology:</strong> The method leverages the fact that modern models are trained on elaborated captions, using a VLM to systematically traverse scene contexts and generate diverse but semantically linked prompts.</p>
<p><strong>Results:</strong> The method produces navigable design spaces where every variation corresponds to a specific, user-understandable semantic decision, overcoming the 'mode collapse' of standard text-to-image models.</p>
<p><strong>Limitations:</strong> The approach relies on the quality of the VLM's scene understanding and the ability of the text-to-image model to faithfully translate complex, elaborated captions into pixels.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.23679" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-cv" title="Computer Vision and Pattern Recognition (cs.CV)">Computer Vision and Pattern Recognition (cs.CV)</span><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.23611">Data Selection Through Iterative Self-Filtering for Vision-Language Settings</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Andrei Liviu Nicolicioiu, Sarvjeet Singh Ghotra, Morgane M. Moss, Aaron Courville
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.23611" target="_blank" rel="noopener noreferrer">2606.23611</a></p>
<p class="paper-detail"><strong>Authors:</strong> Andrei Liviu Nicolicioiu, Sarvjeet Singh Ghotra, Morgane M. Moss, Aaron Courville</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">The availability of large amounts of clean data is paramount to training neural networks. However, at large scales, manual oversight is impractical, resulting in sizeable datasets that can be very noisy. Attempts to mitigate this obstacle to producing performant vision-language models have so far involved heuristics, curated reference datasets, and using pre-trained models. Here we propose a novel, bootstrapped method in which a CLIP model is trained on an evolving, self-selected dataset. This evolving dataset constitutes a balance of filtered, highly probable clean samples as well as diverse samples from the entire distribution. Our proposed Self-Filtering method iterates between training the model and selecting a subsequently improved data mixture. Training on vision-language datasets filtered by the proposed approach improves downstream performance without the need for additional data or pre-trained models.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces a bootstrapped self-filtering method for vision-language data selection that improves model performance without requiring manual oversight, curated reference sets, or external pre-trained models.</p>
<p><strong>Core Idea:</strong> The authors propose an iterative process where a model trains on a dataset it helps curate, balancing high-probability clean samples with diverse samples from the broader distribution.</p>
<p><strong>Technique:</strong> The method employs an iterative self-filtering loop where a CLIP model is trained on an evolving mixture of data, refining the selection criteria in each cycle.</p>
<p><strong>Pipeline:</strong> Raw noisy vision-language dataset → Iterative CLIP training and self-filtering → Balanced mixture of clean and diverse samples → Improved downstream model performance</p>
<p><strong>Methodology:</strong> The researchers developed a bootstrapping framework that alternates between model training and data selection, ensuring the training set evolves to include both high-quality filtered data and representative distribution diversity.</p>
<p><strong>Results:</strong> The proposed Self-Filtering method improves downstream performance on vision-language tasks without the need for additional data or pre-trained models.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the potential for model collapse or the specific risks of over-filtering diversity during the iterative bootstrapping process.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.23611" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-cv" title="Computer Vision and Pattern Recognition (cs.CV)">Computer Vision and Pattern Recognition (cs.CV)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.23494">Brain-Adapter: A Dual-Stream Vision-Language MIL Framework for Comprehensive 3D CT Diagnosis of Acute Intracranial Pathologies</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Zhenyu Yi, Zhiyun Song, Yusong Sun, Zelin Liu, Manman Fei, Zhenhao Li, Jiaxuan Zhao, Xu Han, Lichi Zhang
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.23494" target="_blank" rel="noopener noreferrer">2606.23494</a></p>
<p class="paper-detail"><strong>Authors:</strong> Zhenyu Yi, Zhiyun Song, Yusong Sun, Zelin Liu, Manman Fei, Zhenhao Li, Jiaxuan Zhao, Xu Han, Lichi Zhang</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Automated diagnosis of 3D brain CT scans is essential for critical care, yet it remains challenging due to the heavy reliance on manual annotations and the limited semantic understanding of conventional models. While 2D foundation vision-language models (VLMs) have shown remarkable generalization, effectively transferring their representational power to 3D volumes remains an open problem. In this paper, we propose Brain-Adapter, a novel dual-stream multiple instance learning (MIL) framework that leverages pre-trained 2D biomedical VLMs and raw diagnostic reports for robust scan-level multi-label classification. Specifically, we introduce a Text-Conditioned Attention (TCA) mechanism, utilizing raw diagnostic sentences as semantic queries to dynamically align visual cues with specific disease concepts. Concurrently, a parallel visual MIL stream captures global scan characteristics, supervised by structured labels extracted via a Large Language Model (LLM). To ensure representation coherence, a consistency constraint enforces synergy between the two streams. During inference, an Uncertainty-Aware Refinement (UAR) module dynamically calibrates and fuses these dual-stream predictions to resolve ambiguous cases. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art 3D models and standard MIL approaches. By eliminating the reliance on dense annotations, Brain-Adapter provides a highly scalable and clinically viable solution for 3D acute intracranial pathology analysis.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces Brain-Adapter, a dual-stream Multiple Instance Learning (MIL) framework that enables robust 3D brain CT multi-label classification by leveraging 2D vision-language models and raw diagnostic reports.</p>
<p><strong>Core Idea:</strong> The framework bridges the gap between 2D foundation models and 3D medical volumes by using a dual-stream approach that aligns visual cues with semantic text queries while maintaining global scan consistency.</p>
<p><strong>Technique:</strong> The primary techniques include a Text-Conditioned Attention (TCA) mechanism for semantic alignment, a consistency constraint for stream synergy, and an Uncertainty-Aware Refinement (UAR) module for final prediction fusion.</p>
<p><strong>Pipeline:</strong> 3D Brain CT Scans &amp; Raw Diagnostic Reports → Dual-Stream MIL (TCA-based Semantic Stream + Global Visual Stream) → Consistency Constraint &amp; Uncertainty-Aware Refinement → Multi-label Pathology Diagnosis</p>
<p><strong>Methodology:</strong> The model uses a Text-Conditioned Attention mechanism to query visual features using raw sentences, a parallel MIL stream supervised by LLM-extracted labels, and a consistency loss to ensure coherent representations.</p>
<p><strong>Results:</strong> Brain-Adapter significantly outperforms state-of-the-art 3D models and standard MIL approaches, providing a scalable solution that eliminates the need for dense manual annotations.</p>
<p><strong>Limitations:</strong> The study does not explicitly detail the computational overhead of the dual-stream architecture or the potential sensitivity of the UAR module to specific types of noise in raw diagnostic reports.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.23494" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.23487">CADRE: Stable, Parameter Efficient Adaptation of Medical Vision Language Models with Bounded Forgetting and Prior Drift</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Amrita Singh, Rishabh Jha
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.23487" target="_blank" rel="noopener noreferrer">2606.23487</a></p>
<p class="paper-detail"><strong>Authors:</strong> Amrita Singh, Rishabh Jha</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Medical vision-language models (VLMs) such as BiomedCLIP generalize broadly, but adapting them to a clinical service is as much a safety problem as an accuracy one. Updating a deployed model for a new imaging modality can fail silently in two ways that harm patients: it can forget modalities it already handled (catastrophic forgetting), and it can drift from its trustworthy pretrained prior toward modality-specific shortcuts. We study parameter-efficient continual adaptation through these two properties rather than leaderboard accuracy, presenting CADRE: a frozen-backbone framework combining low-rank adaptation (LoRA) with an online, self-scaling, similarity-aware elastic weight consolidation term that bounds retained-competence loss, and an anchor-to-prior penalty bounding embedding drift from the frozen prior. Two short guarantees, a bound on total consolidation mass and a scale-invariance property, remove the scale-related sources of vanilla EWC's order fragility. Using breast cancer across three maximally dissimilar modalities (histopathology, ultrasound, chest radiography) as a controlled cross-modality stress test, under a multi-seed, multi-order protocol with paired significance testing and training approximately 0.23% of parameters, CADRE attains the highest accuracy, SPQ, and backward transfer and the lowest forgetting among adapting methods, reducing forgetting roughly sevenfold versus the strongest regularized baseline (0.075 to 0.011; paired p=0.023) and achieving positive backward transfer where every baseline is negative. We frame these as stability properties aligned with clinical-safety desiderata, not a deployment guarantee; robustness to distribution shift and adversarial inputs is out of scope.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces CADRE, a parameter-efficient framework for adapting medical vision-language models that simultaneously mitigates catastrophic forgetting and prior drift to ensure clinical safety.</p>
<p><strong>Core Idea:</strong> Instead of optimizing solely for leaderboard accuracy, the authors prioritize stability by bounding retained-competence loss and preventing the model from drifting toward modality-specific shortcuts.</p>
<p><strong>Technique:</strong> CADRE combines Low-Rank Adaptation (LoRA) with a self-scaling, similarity-aware Elastic Weight Consolidation (EWC) term and an anchor-to-prior penalty.</p>
<p><strong>Pipeline:</strong> Medical images and text → Frozen backbone with LoRA layers → Similarity-aware EWC and anchor-to-prior regularization → Stable, multi-modality adapted VLM.</p>
<p><strong>Methodology:</strong> The authors conducted a cross-modality stress test using breast cancer data across histopathology, ultrasound, and chest radiography, evaluating performance using multi-seed, multi-order protocols.</p>
<p><strong>Results:</strong> CADRE achieved the highest accuracy and backward transfer while reducing forgetting roughly sevenfold (0.075 to 0.011) compared to the strongest regularized baseline, achieving positive backward transfer where all baselines were negative.</p>
<p><strong>Limitations:</strong> The framework does not provide deployment guarantees and does not address robustness to distribution shifts or adversarial inputs.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.23487" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-cv" title="Computer Vision and Pattern Recognition (cs.CV)">Computer Vision and Pattern Recognition (cs.CV)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.23206">CFPO: Counterfactual Policy Optimization for Multimodal Reasoning</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Zhangyuan Yu, Wanran Sun, Guangjing Yang, Xiaohu Wu, Qicheng Lao
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.23206" target="_blank" rel="noopener noreferrer">2606.23206</a></p>
<p class="paper-detail"><strong>Authors:</strong> Zhangyuan Yu, Wanran Sun, Guangjing Yang, Xiaohu Wu, Qicheng Lao</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in multimodal reasoning. However, prevailing reinforcement learning (RL) paradigms lack explicit counterfactual enhancement and causal learning mechanisms. This fundamental deficiency results in severe grounding failures, manifesting as a tendency to ignore visual evidence in favor of language priors or exhibiting hallucination drift during long chain-of-thought reasoning. To address this root cause, we propose CounterFactual Policy Optimization (CFPO), a novel framework that enforces causal consistency between visual perception and textual reasoning. CFPO introduces a cross-modal counterfactual enhancement mechanism, which regularizes the policy by maximizing the discrepancy between the model's predictions and those from a counterfactual state where critical visual cues are suppressed. This approach seamlessly integrates with standard algorithms like GRPO and DAPO without requiring external reward models or additional supervision. Extensive experiments demonstrate that CFPO significantly improves reasoning fidelity, achieving consistent gains of 3.17%-6.25% over standard RL baselines and 1.32%-2.13% over the state-of-the-art perception-aware method (PAPO). Code is available at https://github.com/Raven-July/CFPO.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces CFPO, a novel reinforcement learning framework that enforces causal consistency between visual perception and textual reasoning in Large Vision-Language Models (LVLMs).</p>
<p><strong>Core Idea:</strong> The core idea is to mitigate grounding failures and hallucinations by regularizing the model to distinguish between actual visual evidence and language priors through counterfactual reasoning.</p>
<p><strong>Technique:</strong> The technique involves a cross-modal counterfactual enhancement mechanism that maximizes the prediction discrepancy between the original state and a state where critical visual cues are suppressed.</p>
<p><strong>Pipeline:</strong> Multimodal Input (Image + Text) → Counterfactual State Generation (Visual Cue Suppression) → Policy Optimization (GRPO/DAPO with CFPO Regularization) → Causal-Consistent Reasoning Output</p>
<p><strong>Methodology:</strong> CFPO integrates with standard RL algorithms by penalizing the model if its reasoning remains unchanged when key visual information is removed, thereby forcing the model to rely on actual visual evidence.</p>
<p><strong>Results:</strong> CFPO achieved consistent gains of 3.17%-6.25% over standard RL baselines and 1.32%-2.13% over the state-of-the-art perception-aware method (PAPO).</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the computational overhead of generating counterfactual states or the specific criteria for identifying which visual cues are 'critical' for suppression.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.23206" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    <a class="paper-action-btn gh-btn" href="https://github.com/Raven-July/CFPO" target="_blank" rel="noopener noreferrer" title="View code on GitHub" aria-label="View code on GitHub"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z" /></svg><span>Code</span></a>
  </div>
</div>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ro" title="Robotics (cs.RO)">Robotics (cs.RO)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.23157">Bridging Semantics and Kinematics: A Modular Framework for Zero-Shot Robotic Manipulation</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Ali Alabbas, Dipshikha Das, Camillo Murgia, Sainul Ansary, Alaa Elkamash, Philip Long
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.23157" target="_blank" rel="noopener noreferrer">2606.23157</a></p>
<p class="paper-detail"><strong>Authors:</strong> Ali Alabbas, Dipshikha Das, Camillo Murgia, Sainul Ansary, Alaa Elkamash, Philip Long</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">This paper presents a modular training-free framework for zero-shot, language-guided robotic manipulation in semi-structured environments. The architecture bridges the gap between high-level reasoning and low-level kinematics by decomposing the vision-action pipeline into three stages: visual perception, semantic interpretation, and task execution. To overcome the spatial ambiguity and semantic hallucinations inherent in standard Vision-Language Models (VLMs), the perception module employs FastSAM and Set-of-Mark (SoM) prompting to dynamically generate grounded, alphanumeric visual anchors. The same foundation model then operates purely as a Large Language Model (LLM) to act as a semantic router, translating unconstrained human directives into verifiable, reconfigurable configurations. Finally, these configurations are dynamically parsed by a Task Orchestrator into MoveIt Task Constructor (MTC) to generate collision-free trajectories. The framework is evaluated across two zero-shot experimental setups: unconstrained open-world sequential manipulation and dense relational spatial reasoning, achieving a 62% end-to-end task success rate across both scenarios, demonstrating its capacity to reliably execute complex physical actions without domain-specific training or manual coordinate programming.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces a modular, training-free framework that enables zero-shot, language-guided robotic manipulation by bridging high-level semantic reasoning with low-level kinematic execution.</p>
<p><strong>Core Idea:</strong> The framework decomposes the vision-action pipeline into three distinct stages to eliminate spatial ambiguity and semantic hallucinations common in standard Vision-Language Models.</p>
<p><strong>Technique:</strong> It utilizes FastSAM and Set-of-Mark (SoM) prompting for grounded visual anchoring, an LLM as a semantic router for configuration generation, and MoveIt Task Constructor (MTC) for trajectory planning.</p>
<p><strong>Pipeline:</strong> Unconstrained human directives → Visual perception (FastSAM/SoM) &amp; Semantic routing (LLM) → Task Orchestration (MTC) → Collision-free trajectories</p>
<p><strong>Methodology:</strong> The authors evaluated the framework in two zero-shot scenarios: open-world sequential manipulation and dense relational spatial reasoning, testing the system's ability to generalize without domain-specific training.</p>
<p><strong>Results:</strong> The framework achieved a 62% end-to-end task success rate across both zero-shot experimental setups.</p>
<p><strong>Limitations:</strong> The framework operates in semi-structured environments and may face challenges in highly dynamic or unstructured settings where visual anchors are difficult to maintain.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.23157" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-cv" title="Computer Vision and Pattern Recognition (cs.CV)">Computer Vision and Pattern Recognition (cs.CV)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.23132">T-VSS: Test-Time Visual Subspace Steering for Adversarial Robustness of Vision-Language Models</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Jaehyuk Jang, Minseok Seo. Seungju Cho, Kangwook Ko, Changick Kim
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.23132" target="_blank" rel="noopener noreferrer">2606.23132</a></p>
<p class="paper-detail"><strong>Authors:</strong> Jaehyuk Jang, Minseok Seo. Seungju Cho, Kangwook Ko, Changick Kim</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Vision-language models (VLMs) achieve strong zero-shot recognition, but they remain highly vulnerable to adversarial perturbations. Recent test-time adaptations improve robustness without retraining, but they do not directly adapt the corrupted visual representation itself. Prompt-based methods adapt the learnable text prompts, while input-space methods optimize pixels or padding at test time. These approaches can improve predictions, but they do so through an indirect and expensive optimization path. We propose Test-time Visual Subspace Steering (T-VSS), a lightweight defense that performs test-time adaptation directly in the visual feature space. T-VSS first builds a sample-specific low-rank subspace from multi-view feature residuals anchored at the attacked image. It then learns a shared feature correction within this subspace using reliability-weighted entropy minimization. By constraining adaptation to a compact visual geometry, T-VSS steers attacked features toward more stable and discriminative predictions while avoiding noisy full-space updates. Experiments on fine-grained, ImageNet, and ImageNet-OOD benchmarks show that T-VSS improves adversarial robustness while maintaining competitive clean accuracy and better efficiency than prior test-time adaptations.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces T-VSS, a lightweight test-time defense that improves the adversarial robustness of Vision-Language Models (VLMs) by directly steering visual features in a low-rank subspace.</p>
<p><strong>Core Idea:</strong> Instead of optimizing pixels or text prompts, the method performs direct feature-space adaptation by constraining updates to a sample-specific low-rank subspace to avoid noisy full-space updates.</p>
<p><strong>Technique:</strong> T-VSS utilizes multi-view feature residuals to construct a low-rank subspace and applies reliability-weighted entropy minimization to learn a shared feature correction.</p>
<p><strong>Pipeline:</strong> Attacked Image → Multi-view Feature Extraction → Low-rank Subspace Construction → Reliability-weighted Entropy Minimization → Corrected Visual Features → Robust Prediction</p>
<p><strong>Methodology:</strong> The approach builds a sample-specific subspace from feature residuals anchored at the attacked image and optimizes a correction vector within this compact geometry to steer features toward stable predictions.</p>
<p><strong>Results:</strong> T-VSS improves adversarial robustness across fine-grained, ImageNet, and ImageNet-OOD benchmarks while maintaining competitive clean accuracy and superior efficiency compared to prior test-time adaptations.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the computational overhead of multi-view feature extraction or the scalability of subspace construction for extremely high-dimensional feature spaces.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.23132" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.23087">FlowTrain: Flow-Based Decoupled Training for Industrial-Grade Vision-Language Models</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Zhida Jiang, Zhaolong Xing, Yang Pei, Xiaolong Chen, Yuanhang Xiao, Chengzhi Huang, Xiyu Liu, Haopeng Liu, Qingyuan Sang, Lingfeng Zhou, Jiaxing Wang, Zicheng Zhang, Wenzhe Wang, Xinyu Liu, Yan Li, Zhen Chen, Ke Zhang
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.23087" target="_blank" rel="noopener noreferrer">2606.23087</a></p>
<p class="paper-detail"><strong>Authors:</strong> Zhida Jiang, Zhaolong Xing, Yang Pei, Xiaolong Chen, Yuanhang Xiao, Chengzhi Huang, Xiyu Liu, Haopeng Liu, Qingyuan Sang, Lingfeng Zhou, Jiaxing Wang, Zicheng Zhang, Wenzhe Wang, Xinyu Liu, Yan Li, Zhen Chen, Ke Zhang</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Industrial-grade distributed training of vision-language models (VLMs) remains far less efficient than that of unimodal LLMs. Existing solutions either follow a monolithic design that assigns uniform parallelism to heterogeneous modules or adopt a disaggregated deployment that separates modules while executing them as a batch-synchronized pipeline. In this paper, we highlight that the above solutions are still not sufficient, and VLM training can be further decoupled. To this end, we present FlowTrain, a flow-based decoupled training framework that reformulates VLM training as a producer-consumer dataflow coordinated through a unified memory pool. The encoder and backbone can progress independently over a global virtual address space. Since this execution decoupling fundamentally changes the optimization objective of allocation and scheduling, FlowTrain further introduces a heterogeneous parallel allocator that assigns module-specific parallelism strategies by solving a throughput matching problem. The dynamic packing scheduler is used to construct balanced microbatches at runtime according to the actual LLM-side computation cost. Extensive experiments on real-world workloads show that FlowTrain achieves over 50% MFU and up to 1.7x throughput improvement, narrowing the efficiency gap to LLM-only training.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces FlowTrain, a flow-based decoupled training framework that significantly improves the training efficiency of vision-language models (VLMs) by decoupling the execution of heterogeneous modules.</p>
<p><strong>Core Idea:</strong> VLM training is reformulated as a producer-consumer dataflow where the encoder and backbone progress independently over a unified memory pool rather than as a batch-synchronized pipeline.</p>
<p><strong>Technique:</strong> The framework employs a heterogeneous parallel allocator to solve a throughput matching problem and a dynamic packing scheduler to construct balanced microbatches based on real-time computation costs.</p>
<p><strong>Pipeline:</strong> Raw data → Unified memory pool → Independent encoder/backbone execution → Throughput-matched parallel allocation → Dynamic microbatch packing → Optimized VLM training</p>
<p><strong>Methodology:</strong> FlowTrain replaces monolithic parallelism with a decoupled dataflow architecture, utilizing a global virtual address space and a dynamic scheduling mechanism to handle the varying computational demands of different VLM components.</p>
<p><strong>Results:</strong> Achieved over 50% Model Flops Utilization (MFU) and up to a 1.7x throughput improvement on real-world workloads, narrowing the efficiency gap between VLM and LLM training.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the overhead of managing the unified memory pool or the scalability limits of the dynamic packing scheduler under extreme hardware heterogeneity.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.23087" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-cv" title="Computer Vision and Pattern Recognition (cs.CV)">Computer Vision and Pattern Recognition (cs.CV)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.22999">Black-Box Continual Learning for Vision-Language Models</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Yuting Li, Weihang Fang, Haoyuan Gao, Linghe Kong, Yexin Li, Lichao Sun, Weiran Huang
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.22999" target="_blank" rel="noopener noreferrer">2606.22999</a></p>
<p class="paper-detail"><strong>Authors:</strong> Yuting Li, Weihang Fang, Haoyuan Gao, Linghe Kong, Yexin Li, Lichao Sun, Weiran Huang</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">The rapid deployment of Vision-Language Models (VLMs) in dynamic environments necessitates the ability to learn continuously without forgetting. However, traditional continual learning (CL) settings often rely on white-box paradigms, which is increasingly invalidated by the shift toward cloud-hosted models. In this paper, we introduce Black-CL, a more realistic benchmark for VLMs that enforces three primary real-world challenges: weight and architecture inaccessibility, constrained computation, and task-agnostic inference. The learner can query only output embeddings or logits, with no gradient flow through or structural modification of the backbone. Current CL methodologies, which rely on backbone backpropagation or complex parameter expansion, are fundamentally incompatible with these constraints. Under this setting, we propose BETA, a simple yet effective baseline built on the key insight that solely optimizing textual prototypes can navigate the complexities of CL. BETA integrates three core components: Semantic Projection Accumulation (SPA) for incremental knowledge acquisition, Latent Distribution Replay (LDR) for anchoring the embedding space against catastrophic forgetting, and Test-Time Prototype Adaptation (TTPA) for dynamic, instance-aware boundary refinement. Extensive experiments across ten diverse datasets and various backbones demonstrate that BETA significantly outperforms existing black-box tuners. Remarkably, with only 0.05 M trainable parameters, a 180--3000$\times$ reduction compared to competitive methods, BETA achieves performance on par with or even exceeding white-box CL methods. We believe Black-CL and BETA provide a foundational framework for future advancements in continual learning and accelerates the transition of continual learning from academia to real-world systems.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces Black-CL, a realistic black-box continual learning benchmark for Vision-Language Models (VLMs), and proposes BETA, a parameter-efficient framework that achieves white-box performance under strict constraints.</p>
<p><strong>Core Idea:</strong> Continual learning for VLMs should be evaluated in a black-box setting where the backbone is inaccessible, focusing on optimizing textual prototypes rather than model weights.</p>
<p><strong>Technique:</strong> BETA utilizes Semantic Projection Accumulation (SPA), Latent Distribution Replay (LDR), and Test-Time Prototype Adaptation (TTPA) to manage knowledge acquisition and prevent catastrophic forgetting.</p>
<p><strong>Pipeline:</strong> Input (New Task Data) → Process (Embedding Extraction → SPA Knowledge Accumulation → LDR Space Anchoring → TTPA Boundary Refinement) → Output (Updated Textual Prototypes)</p>
<p><strong>Methodology:</strong> The authors developed a framework that operates solely on output embeddings or logits, avoiding gradient flow through the backbone while using a minimal set of trainable parameters.</p>
<p><strong>Results:</strong> BETA outperformed existing black-box tuners across ten datasets, achieving performance on par with white-box methods while using only 0.05M trainable parameters (a 180-3000x reduction).</p>
<p><strong>Limitations:</strong> The study focuses on black-box constraints which may not account for scenarios where partial weight access or architectural modifications are permitted.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.22999" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-22" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-cv" title="Computer Vision and Pattern Recognition (cs.CV)">Computer Vision and Pattern Recognition (cs.CV)</span><span class="cat-tag cat-default" title="Computer Science and Game Theory (cs.GT)">Computer Science and Game Theory (cs.GT)</span></span>
      <span class="paper-date">22 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.22918">Each Judge Its Own Yardstick: Discovering Per-VLM Taxonomies for Physical Video Evaluation</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Yu Cao, Ziquan Liu, Zhensong Zhang, Jiankang Deng, Shaogang Gong, Jifei Song
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.22918" target="_blank" rel="noopener noreferrer">2606.22918</a></p>
<p class="paper-detail"><strong>Authors:</strong> Yu Cao, Ziquan Liu, Zhensong Zhang, Jiankang Deng, Shaogang Gong, Jifei Song</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Maintaining physical consistency in video generators and world models increasingly relies on vision-language models (VLMs) as automated judges that provide reward signals, ranking decisions, and data-filtering criteria. Yet VLMs differ substantially in training data and architecture, encoding physical phenomena through distinct internal representations. A single global evaluation schema therefore gives every VLM the same axes of competence, regardless of what each can actually perceive. We propose JudgeFit, an iterative refinement procedure that discovers a per-VLM evaluation taxonomy. An initial taxonomy is constructed by prompting the target VLM to enumerate physics errors on a small set of videos and clustering the resulting descriptions. The taxonomy is then refined through a diagnostic step: we calibrate the VLM's per-dimension scores to human physical-commonsense ratings, diagnose which dimensions it scores unreliably or redundantly, and prompt an LLM to repair them, iterating until convergence. We further instantiate this procedure as a benchmark and apply it to 16 VLMs spanning eight model families. The refined taxonomy outperforms the global-schema baseline on held-out videos for every VLM tested, with a mean relative improvement of approximately 32%. Beyond aggregate accuracy, the per-VLM profiles expose model-specific blind spots that overall rankings cannot anticipate, with reliability patterns differing markedly across model families.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces JudgeFit, a framework to discover and refine model-specific evaluation taxonomies for physical video consistency, moving beyond one-size-fits-all evaluation schemas.</p>
<p><strong>Core Idea:</strong> Different Vision-Language Models (VLMs) possess unique internal representations of physics; therefore, evaluation criteria should be tailored to each model's specific perceptual capabilities.</p>
<p><strong>Technique:</strong> An iterative refinement procedure that combines VLM-generated error descriptions, clustering, human-aligned calibration, and LLM-based taxonomy repair.</p>
<p><strong>Pipeline:</strong> Small set of videos → VLM error enumeration → Clustering → Human-aligned calibration → LLM-based taxonomy repair → Iterative convergence → Per-VLM evaluation taxonomy</p>
<p><strong>Methodology:</strong> The authors applied JudgeFit to 16 VLMs across eight families, comparing the performance of per-VLM taxonomies against a global-schema baseline using human physical-commonsense ratings.</p>
<p><strong>Results:</strong> The per-VLM taxonomy outperformed the global-schema baseline on held-out videos for all 16 models, achieving a mean relative improvement of approximately 32%.</p>
<p><strong>Limitations:</strong> The study focuses on physical consistency and does not explore how these taxonomies might adapt to other non-physical video attributes or real-time evaluation constraints.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.22918" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h2 id="tech-news">Tech News</h2>

<h3 id="ai-safety">AI Safety</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Tue, 23 Ju</span>
  </div>
  <a class="news-title" href="https://openai.com/index/daybreak-securing-the-world/" target="_blank" rel="noopener noreferrer">OpenAI DayBreak – GPT-5.5-Cyber</a>
  <p class="news-summary">OpenAI introduced &#x27;DayBreak,&#x27; a specialized initiative focused on securing the digital landscape against cyber threats. The announcement highlights the development of GPT-5.5-Cyber, a model specifically optimized for cybersecurity tasks and defense.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Cybersecurity</span><span class="news-tag">GPT-5.5</span><span class="news-tag">LLM</span><span class="news-tag">AI Safety</span></div>
    <a class="news-read-btn" href="https://openai.com/index/daybreak-securing-the-world/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-22</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1ud0rft/nondeterministic_vulnerability_detection/" target="_blank" rel="noopener noreferrer">Non-deterministic Vulnerability Detection Benchmark System [P]</a>
  <p class="news-summary">A developer is seeking feedback on a benchmark system designed to test LLM vulnerability detection. The project uses &#x27;hidden&#x27; Juliet code to remove known CWE patterns and employs LLM-generated comments to study how sentiment and plain English can manipulate an AI&#x27;s ability to identify security flaws.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">LLM Security</span><span class="news-tag">Vulnerability Detection</span><span class="news-tag">Benchmarking</span><span class="news-tag">AI Safety</span><span class="news-tag">Cybersecurity</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1ud0rft/nondeterministic_vulnerability_detection/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="agentic-ai">Agentic AI</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Tue, 23 Ju</span>
  </div>
  <a class="news-title" href="https://news.ycombinator.com/item?id=48641160" target="_blank" rel="noopener noreferrer">Ask HN: Anthropic banned me from using Claude Code and I don&#x27;t know what to do</a>
  <p class="news-summary">A user reported being banned from using Anthropic&#x27;s Claude Code tool, sparking a discussion on platform policies and usage limits. The thread explores potential reasons for account restrictions and the challenges of using agentic coding tools. It highlights the friction between automated tool usage and provider safety guardrails.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Claude Code</span><span class="news-tag">Anthropic</span><span class="news-tag">Agentic AI</span><span class="news-tag">LLM usage</span><span class="news-tag">Platform Policies</span></div>
    <a class="news-read-btn" href="https://news.ycombinator.com/item?id=48641160" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--nvidia">NVIDIA Technical Blog</span>
    <span class="news-date">2026-06-23</span>
  </div>
  <a class="news-title" href="https://developer.nvidia.com/blog/how-telcos-build-autonomous-networks-with-agentic-ai/" target="_blank" rel="noopener noreferrer">How Telcos Build Autonomous Networks with Agentic AI</a>
  <p class="news-summary">Telecom operators are transitioning from basic AI integration to autonomous network management using Agentic AI. The blog explores how these agents can automate complex network operations, customer care, and back-office workflows to improve efficiency. It highlights the shift toward self-healing and self-optimizing infrastructure.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Agentic AI</span><span class="news-tag">Telecom</span><span class="news-tag">Network Automation</span><span class="news-tag">NVIDIA</span><span class="news-tag">Enterprise AI</span></div>
    <a class="news-read-btn" href="https://developer.nvidia.com/blog/how-telcos-build-autonomous-networks-with-agentic-ai/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="computer-vision">Computer Vision</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Tue, 23 Ju</span>
  </div>
  <a class="news-title" href="https://blog.roboflow.com/yolo26/" target="_blank" rel="noopener noreferrer">An Introduction to YOLO26</a>
  <p class="news-summary">This article provides an introduction to YOLOv26, a significant iteration in the popular You Only Look Once object detection framework. It likely covers architectural improvements, performance benchmarks, and practical applications for real-time computer vision tasks.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Object Detection</span><span class="news-tag">Computer Vision</span><span class="news-tag">YOLO</span><span class="news-tag">Real-time AI</span></div>
    <a class="news-read-btn" href="https://blog.roboflow.com/yolo26/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Tue, 23 Ju</span>
  </div>
  <a class="news-title" href="https://arxiv.org/abs/2606.03748" target="_blank" rel="noopener noreferrer">Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models</a>
  <p class="news-summary">Ultralytics has introduced YOLO26, a unified real-time end-to-end vision model designed for high-performance tasks. The model aims to streamline complex vision pipelines by integrating multiple capabilities into a single architecture. It represents a significant advancement in efficient, real-time object detection and spatial understanding.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">YOLO26</span><span class="news-tag">Computer Vision</span><span class="news-tag">Real-time Inference</span><span class="news-tag">Ultralytics</span><span class="news-tag">Object Detection</span></div>
    <a class="news-read-btn" href="https://arxiv.org/abs/2606.03748" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-23</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1ud8ovs/just_landed_a_computer_vision_internship_heres/" target="_blank" rel="noopener noreferrer">Just landed a Computer Vision internship, here&#x27;s the preparation list I used [D]</a>
  <p class="news-summary">A user shared a comprehensive preparation checklist for landing Computer Vision internships, covering core math, ML fundamentals, and specialized CV topics. The resource is designed to be actionable and can be compressed into a 7-day study plan for job seekers.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Computer Vision</span><span class="news-tag">Machine Learning</span><span class="news-tag">Career Advice</span><span class="news-tag">Internship Prep</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1ud8ovs/just_landed_a_computer_vision_internship_heres/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="computing-systems">Computing Systems</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Tue, 23 Ju</span>
  </div>
  <a class="news-title" href="https://kreya.app/blog/new-http-query-method-explained/" target="_blank" rel="noopener noreferrer">The new HTTP QUERY method explained</a>
  <p class="news-summary">This article explains the technical nuances and implementation of a new HTTP query method. It focuses on how these protocols facilitate data exchange and request handling in modern web architectures.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">HTTP</span><span class="news-tag">Networking</span><span class="news-tag">Web Development</span><span class="news-tag">Protocols</span><span class="news-tag">Computing Systems</span></div>
    <a class="news-read-btn" href="https://kreya.app/blog/new-http-query-method-explained/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Tue, 23 Ju</span>
  </div>
  <a class="news-title" href="https://jchri.st/blog/in-praise-of-memcached/" target="_blank" rel="noopener noreferrer">In praise of memcached</a>
  <p class="news-summary">The article explores the enduring utility and architectural simplicity of Memcached in modern infrastructure. It highlights how its minimalist design provides high-performance caching that remains relevant despite the rise of more complex distributed systems.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Caching</span><span class="news-tag">Infrastructure</span><span class="news-tag">Distributed Systems</span><span class="news-tag">Performance</span><span class="news-tag">Software Engineering</span></div>
    <a class="news-read-btn" href="https://jchri.st/blog/in-praise-of-memcached/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="general">General</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/DeepLearning</span>
    <span class="news-date">2026-06-23</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/deeplearning/comments/1ud89nr/alignment_processes_in_neural_networks/" target="_blank" rel="noopener noreferrer">Alignment processes in neural networks?</a>
  <p class="news-summary">A researcher proposes a method to test if ReLU activation decisions align with training data by comparing a standard neural network against one where ReLU gates are replaced by Locality Sensitive Hashing (LSH). Preliminary results on toy models showed unexpected behavior, potentially suggesting complex internal dynamics in how neural networks gate information. The post includes links to research directions and FreeBasic code for further exploration.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Neural Networks</span><span class="news-tag">ReLU</span><span class="news-tag">Locality Sensitive Hashing</span><span class="news-tag">Model Interpretability</span><span class="news-tag">Deep Learning Research</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/deeplearning/comments/1ud89nr/alignment_processes_in_neural_networks/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="llm">LLM</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Tue, 23 Ju</span>
  </div>
  <a class="news-title" href="https://swelljoe.com/post/will-it-mythos/" target="_blank" rel="noopener noreferrer">Will It Mythos?</a>
  <p class="news-summary">The post discusses the &#x27;Will It Mythos&#x27; project, which explores the capabilities and limitations of AI models in generating complex, consistent mythological frameworks. It touches upon the creative boundaries of generative models and how they handle deep world-building.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Generative AI</span><span class="news-tag">World Building</span><span class="news-tag">Creative AI</span><span class="news-tag">LLM capabilities</span></div>
    <a class="news-read-btn" href="https://swelljoe.com/post/will-it-mythos/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Tue, 23 Ju</span>
  </div>
  <a class="news-title" href="https://arxiv.org/abs/2606.16140" target="_blank" rel="noopener noreferrer">VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO</a>
  <p class="news-summary">VibeThinker is a 3B parameter model that achieves reasoning performance surpassing Claude 3 Opus 4.5. The breakthrough is attributed to a novel combination of Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO). This demonstrates that efficient RL techniques can significantly boost reasoning capabilities in smaller, more accessible models.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">LLM</span><span class="news-tag">RL</span><span class="news-tag">Reasoning</span><span class="news-tag">Model Efficiency</span><span class="news-tag">SFT</span></div>
    <a class="news-read-btn" href="https://arxiv.org/abs/2606.16140" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h2 id="github-trending">
  <svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="20" height="20" style="vertical-align:middle;margin-right:6px"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z" /></svg>
  GitHub Trending
</h2>

<p class="section-desc">Trending repositories on GitHub filtered and scored for relevance to your interests.</p>

<h3 id="agentic-ai-1">Agentic AI</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/NVIDIA/skills" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">NVIDIA</span><span class="gh-sep">/</span><strong class="gh-repo">skills</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a collection of modular skills for AI agents published by NVIDIA, enabling agents to perform complex tasks. It is highly relevant for research into Agentic AI and Multi-Agent Systems as it provides the building blocks for functional, goal-oriented autonomous behavior.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">Multi-Agent Systems</span><span class="gh-tag">NVIDIA</span><span class="gh-tag">AI Agents</span><span class="gh-tag">Foundation Models</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-23</span>
      <a class="gh-visit-btn" href="https://github.com/NVIDIA/skills" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/JCodesMore/ai-website-cloner-template" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">JCodesMore</span><span class="gh-sep">/</span><strong class="gh-repo">ai-website-cloner-template</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a template for using AI coding agents to clone websites with a single command. It is highly relevant to the user&#x27;s interest in Agentic AI and LLMs as it demonstrates practical application of autonomous agents in software engineering tasks.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">LLM</span><span class="gh-tag">Coding Agents</span><span class="gh-tag">TypeScript</span><span class="gh-tag">Automation</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-23</span>
      <a class="gh-visit-btn" href="https://github.com/JCodesMore/ai-website-cloner-template" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/heygen-com/hyperframes" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">heygen-com</span><span class="gh-sep">/</span><strong class="gh-repo">hyperframes</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a framework for rendering video from HTML, specifically designed for AI agents to interact with and generate visual content. It is highly relevant for Agentic AI and Human-Computer Interaction as it enables agents to manipulate web-based interfaces and produce visual outputs.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">Generative Models</span><span class="gh-tag">Human-Computer Interaction</span><span class="gh-tag">Multimodal</span><span class="gh-tag">Web Rendering</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-23</span>
      <a class="gh-visit-btn" href="https://github.com/heygen-com/hyperframes" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/virattt/ai-hedge-fund" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">virattt</span><span class="gh-sep">/</span><strong class="gh-repo">ai-hedge-fund</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository appears to be a high-profile project focused on building an AI-driven hedge fund team. It is highly relevant to the user&#x27;s interest in Agentic AI and Multi-Agent Systems as it likely explores autonomous agents collaborating on complex financial tasks.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">Multi-Agent Systems</span><span class="gh-tag">LLMs</span><span class="gh-tag">Finance</span><span class="gh-tag">Autonomous Agents</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-23</span>
      <a class="gh-visit-btn" href="https://github.com/virattt/ai-hedge-fund" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/unclecode/crawl4ai" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">unclecode</span><span class="gh-sep">/</span><strong class="gh-repo">crawl4ai</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Crawl4AI is an open-source web crawler specifically designed to output LLM-friendly content, making it ideal for data ingestion in RAG pipelines. It is highly relevant for building Agentic AI systems that require real-time web browsing and structured data extraction.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">web scraping</span><span class="gh-tag">RAG</span><span class="gh-tag">LLM</span><span class="gh-tag">data ingestion</span><span class="gh-tag">Agentic AI</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-23</span>
      <a class="gh-visit-btn" href="https://github.com/unclecode/crawl4ai" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="speech">Speech</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/jamiepine/voicebox" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">jamiepine</span><span class="gh-sep">/</span><strong class="gh-repo">voicebox</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Speech</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Voicebox is an open-source AI voice studio that enables users to clone, dictate, and create high-quality synthetic speech. It is highly relevant to the user&#x27;s interest in speech, generative models, and multimodal learning.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">speech synthesis</span><span class="gh-tag">voice cloning</span><span class="gh-tag">generative AI</span><span class="gh-tag">TTS</span><span class="gh-tag">multimodal</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-23</span>
      <a class="gh-visit-btn" href="https://github.com/jamiepine/voicebox" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>]]></content><author><name>hiimmuc</name></author><summary type="html"><![CDATA[Today's digest highlights a significant shift toward agentic workflows, focusing on autonomous research, industrial-scale diagnostics, and the optimization of multi-agent systems. There is a clear emphasis on bridging the gap between high-level reasoning and physical embodiment through spatial memory and 3D perception.]]></summary></entry><entry><title type="html">Daily Digest 2026-06-22</title><link href="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-22/" rel="alternate" type="text/html" title="Daily Digest 2026-06-22" /><published>2026-06-22T00:00:00+07:00</published><updated>2026-06-22T00:00:00+07:00</updated><id>https://hiimmuc.github.io/Personal-AI-Digest/digest/daily</id><content type="html" xml:base="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-22/"><![CDATA[<div class="digest-theme">
  <svg class="digest-theme-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 1.5a6.5 6.5 0 1 0 0 13 6.5 6.5 0 0 0 0-13zM0 8a8 8 0 1 1 16 0A8 8 0 0 1 0 8z" /><path d="M6.5 7.75A.75.75 0 0 1 7.25 7h1a.75.75 0 0 1 .75.75v2.75h.25a.75.75 0 0 1 0 1.5h-2a.75.75 0 0 1 0-1.5h.25v-2h-.25a.75.75 0 0 1-.75-.75zM8 6a1 1 0 1 1 0-2 1 1 0 0 1 0 2z" /></svg>
  <span>Today's updates focus on enhancing the operational utility of large language models through specialized memory architectures, engineering frameworks, and domain-specific data processing.</span>
</div>

<h2 id="github-trending">
  <svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="20" height="20" style="vertical-align:middle;margin-right:6px"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z" /></svg>
  GitHub Trending
</h2>

<p class="section-desc">Trending repositories on GitHub filtered and scored for relevance to your interests.</p>

<h3 id="agentic-ai">Agentic AI</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/calesthio/OpenMontage" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">calesthio</span><span class="gh-sep">/</span><strong class="gh-repo">OpenMontage</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">OpenMontage is an agentic video production system that leverages over 500 agent skills to automate complex video workflows. It is highly relevant as it demonstrates a sophisticated multi-agent architecture for creative content generation using LLMs and vision tools.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Multi-Agent Systems</span><span class="gh-tag">Agentic AI</span><span class="gh-tag">Generative Models</span><span class="gh-tag">Video Production</span><span class="gh-tag">LLMs</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-22</span>
      <a class="gh-visit-btn" href="https://github.com/calesthio/OpenMontage" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/bytedance/deer-flow" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">bytedance</span><span class="gh-sep">/</span><strong class="gh-repo">deer-flow</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Deer-flow is a long-horizon SuperAgent harness designed to handle complex tasks spanning minutes to hours by coordinating subagents, memories, and sandboxed tools. It is highly relevant as it provides a robust framework for multi-agent systems and autonomous agentic workflows.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Multi-Agent Systems</span><span class="gh-tag">Agentic AI</span><span class="gh-tag">LLM</span><span class="gh-tag">Autonomous Agents</span><span class="gh-tag">Tool Use</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-22</span>
      <a class="gh-visit-btn" href="https://github.com/bytedance/deer-flow" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/topoteretes/cognee" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">topoteretes</span><span class="gh-sep">/</span><strong class="gh-repo">cognee</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Cognee provides a self-hosted knowledge graph engine designed to give AI agents persistent long-term memory across sessions. It is highly relevant for building complex multi-agent systems that require structured, relational data retrieval beyond simple vector search.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">knowledge graph</span><span class="gh-tag">long-term memory</span><span class="gh-tag">agents</span><span class="gh-tag">RAG</span><span class="gh-tag">multi-agent systems</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-22</span>
      <a class="gh-visit-btn" href="https://github.com/topoteretes/cognee" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/ZhuLinsen/daily_stock_analysis" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">ZhuLinsen</span><span class="gh-sep">/</span><strong class="gh-repo">daily_stock_analysis</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository implements an LLM-powered multi-market stock analysis system that integrates real-time news and market data. It is highly relevant as it demonstrates Agentic AI workflows, automated decision-making, and RAG-style data processing for financial analysis.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">LLM</span><span class="gh-tag">Agentic AI</span><span class="gh-tag">RAG</span><span class="gh-tag">Financial Analysis</span><span class="gh-tag">Automation</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-22</span>
      <a class="gh-visit-btn" href="https://github.com/ZhuLinsen/daily_stock_analysis" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/DeusData/codebase-memory-mcp" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">DeusData</span><span class="gh-sep">/</span><strong class="gh-repo">codebase-memory-mcp</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a high-performance MCP server that indexes codebases into a persistent knowledge graph for LLM interaction. It is highly relevant for Agentic AI and RAG workflows as it enables agents to perform sub-millisecond queries on large codebases with significantly reduced token usage.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">RAG</span><span class="gh-tag">Agentic AI</span><span class="gh-tag">Knowledge Graph</span><span class="gh-tag">LLM</span><span class="gh-tag">Computing Systems</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-22</span>
      <a class="gh-visit-btn" href="https://github.com/DeusData/codebase-memory-mcp" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/koala73/worldmonitor" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">koala73</span><span class="gh-sep">/</span><strong class="gh-repo">worldmonitor</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 3/5">★★★<span class="gh-relevance-empty">★★</span> <span class="gh-rel-num">3/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a real-time intelligence dashboard that utilizes AI-powered news aggregation and geopolitical monitoring. It is relevant to the user&#x27;s interest in Agentic AI and LLMs as it likely employs automated agents to synthesize complex information into a unified situational awareness interface.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">LLM</span><span class="gh-tag">Information Retrieval</span><span class="gh-tag">Real-time Monitoring</span><span class="gh-tag">Data Aggregation</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-22</span>
      <a class="gh-visit-btn" href="https://github.com/koala73/worldmonitor" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="computing-systems">Computing Systems</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/palmier-io/palmier-pro" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">palmier-io</span><span class="gh-sep">/</span><strong class="gh-repo">palmier-pro</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Computing Systems</span>
      <span class="gh-relevance" title="Relevance 3/5">★★★<span class="gh-relevance-empty">★★</span> <span class="gh-rel-num">3/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This is a macOS video editor specifically designed to integrate AI workflows into the creative process. It is relevant to the user&#x27;s interest in Human-Computer Interaction and the practical application of generative models in software systems.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Human-Computer Interaction</span><span class="gh-tag">Video Editing</span><span class="gh-tag">Swift</span><span class="gh-tag">AI Tools</span><span class="gh-tag">Multimedia</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-22</span>
      <a class="gh-visit-btn" href="https://github.com/palmier-io/palmier-pro" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="general">General</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/microsoft/ML-For-Beginners" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">microsoft</span><span class="gh-sep">/</span><strong class="gh-repo">ML-For-Beginners</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">General</span>
      <span class="gh-relevance" title="Relevance 3/5">★★★<span class="gh-relevance-empty">★★</span> <span class="gh-rel-num">3/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a comprehensive 12-week curriculum covering the fundamentals of classic machine learning. While it doesn&#x27;t focus on advanced topics like Agentic AI or Robotics, it serves as a foundational prerequisite for understanding the underlying principles of the user&#x27;s core interests.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">machine learning</span><span class="gh-tag">education</span><span class="gh-tag">beginners</span><span class="gh-tag">jupyter notebook</span><span class="gh-tag">fundamentals</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-22</span>
      <a class="gh-visit-btn" href="https://github.com/microsoft/ML-For-Beginners" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="llm">LLM</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/asgeirtj/system_prompts_leaks" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">asgeirtj</span><span class="gh-sep">/</span><strong class="gh-repo">system_prompts_leaks</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">LLM</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository serves as a comprehensive collection of extracted system prompts from major LLM providers like Anthropic, OpenAI, Google, and xAI. It is highly relevant for understanding the underlying instructions that shape model behavior, safety guardrails, and agentic capabilities.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">system prompts</span><span class="gh-tag">LLM</span><span class="gh-tag">prompt engineering</span><span class="gh-tag">AI safety</span><span class="gh-tag">model behavior</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-22</span>
      <a class="gh-visit-btn" href="https://github.com/asgeirtj/system_prompts_leaks" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/ed-donner/llm_engineering" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">ed-donner</span><span class="gh-sep">/</span><strong class="gh-repo">llm_engineering</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">LLM</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides practical materials for mastering LLM engineering, covering core concepts and implementation techniques. It is highly relevant for understanding the foundational engineering required to build and deploy large language model applications.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">llm</span><span class="gh-tag">engineering</span><span class="gh-tag">fine-tuning</span><span class="gh-tag">notebooks</span><span class="gh-tag">foundation models</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-22</span>
      <a class="gh-visit-btn" href="https://github.com/ed-donner/llm_engineering" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="rl">RL</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/THUDM/slime" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">THUDM</span><span class="gh-sep">/</span><strong class="gh-repo">slime</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">RL</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Slime is a post-training framework specifically designed for Reinforcement Learning (RL) scaling in Large Language Models. It is highly relevant as it addresses the core mechanics of aligning and scaling LLMs through advanced RL techniques.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">RL Scaling</span><span class="gh-tag">LLM Post-training</span><span class="gh-tag">Reinforcement Learning</span><span class="gh-tag">Foundation Models</span><span class="gh-tag">Agentic AI</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-22</span>
      <a class="gh-visit-btn" href="https://github.com/THUDM/slime" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>]]></content><author><name>hiimmuc</name></author><summary type="html"><![CDATA[Today's updates focus on enhancing the operational utility of large language models through specialized memory architectures, engineering frameworks, and domain-specific data processing.]]></summary></entry><entry><title type="html">Daily Digest 2026-06-21</title><link href="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-21/" rel="alternate" type="text/html" title="Daily Digest 2026-06-21" /><published>2026-06-21T00:00:00+07:00</published><updated>2026-06-21T00:00:00+07:00</updated><id>https://hiimmuc.github.io/Personal-AI-Digest/digest/daily</id><content type="html" xml:base="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-21/"><![CDATA[<div class="digest-theme">
  <svg class="digest-theme-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 1.5a6.5 6.5 0 1 0 0 13 6.5 6.5 0 0 0 0-13zM0 8a8 8 0 1 1 16 0A8 8 0 0 1 0 8z" /><path d="M6.5 7.75A.75.75 0 0 1 7.25 7h1a.75.75 0 0 1 .75.75v2.75h.25a.75.75 0 0 1 0 1.5h-2a.75.75 0 0 1 0-1.5h.25v-2h-.25a.75.75 0 0 1-.75-.75zM8 6a1 1 0 1 1 0-2 1 1 0 0 1 0 2z" /></svg>
  <span>Today's digest highlights a growing tension between the rapid scaling of AI-generated content and the practical, organizational challenges of integrating these models into reliable workflows. The focus shifts from raw model capabilities toward the infrastructure, trust, and "slop" produced by mass automation.</span>
</div>

<h2 id="tech-news">Tech News</h2>

<h3 id="agentic-ai">Agentic AI</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-22</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1uc7mk8/maybe_the_ai_race_isnt_about_models_at_all_but/" target="_blank" rel="noopener noreferrer">Maybe the AI race isn’t about models at all, but about trust and organizational intelligence</a>
  <p class="news-summary">The post argues that as AI intelligence becomes commoditized, the primary competitive moat will shift from model benchmarks to &#x27;organizational intelligence.&#x27; It suggests that the real challenge for enterprises lies in the &#x27;Reality Layer&#x27;—integrating AI into complex workflows with governance, auditability, and trust. This perspective positions enterprise software and institutional integration as more durable moats than the underlying foundation models.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Enterprise AI</span><span class="news-tag">AI Governance</span><span class="news-tag">Model Commoditization</span><span class="news-tag">Organizational Intelligence</span><span class="news-tag">AI Strategy</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1uc7mk8/maybe_the_ai_race_isnt_about_models_at_all_but/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-22</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1ucabxf/has_ai_adoption_at_work_matched_the_hype/" target="_blank" rel="noopener noreferrer">Has AI adoption at work matched the hype?</a>
  <p class="news-summary">The post explores the practical reality of AI integration in corporate environments versus the surrounding hype. It seeks to distinguish between the success of off-the-shelf tools like ChatGPT and Claude versus the development of custom internal workflows and agentic systems. The discussion aims to identify which approach yields better ROI and smoother adoption for teams.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">AI Adoption</span><span class="news-tag">Enterprise AI</span><span class="news-tag">LLMs</span><span class="news-tag">Workflow Automation</span><span class="news-tag">Agentic AI</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1ucabxf/has_ai_adoption_at_work_matched_the_hype/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-22</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1ucdory/what_setup_do_you_use_for_always_on_ai/" target="_blank" rel="noopener noreferrer">What Setup Do You Use for &quot;always on&quot; AI</a>
  <p class="news-summary">A user is seeking sustainable infrastructure recommendations for hosting &#x27;always-on&#x27; AI agents to avoid the limitations of local hardware and expiring cloud credits. They are currently using Claude&#x27;s remote session features but want a more permanent solution for continuous background processing.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Agentic AI</span><span class="news-tag">Cloud Computing</span><span class="news-tag">LLM Infrastructure</span><span class="news-tag">Remote Sessions</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1ucdory/what_setup_do_you_use_for_always_on_ai/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1uc4ict/why_selfreflection_react_loops_fail_on/" target="_blank" rel="noopener noreferrer">Why self-reflection ReAct loops fail on long-horizon tasks, and the AgentOS verification architecture we built to fix it.</a>
  <p class="news-summary">The post critiques the &#x27;ReAct&#x27; paradigm, arguing that self-reflection leads to &#x27;pseudo-correctness&#x27; because agents share the same blind spots as their original reasoning. To solve this, the authors developed AgentOS, a kernel that orchestrates a 150-agent asynchronous swarm where independent verifiers audit sub-agents in isolated context windows.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Agentic AI</span><span class="news-tag">LLM</span><span class="news-tag">Multi-Agent Systems</span><span class="news-tag">Reasoning</span><span class="news-tag">AgentOS</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1uc4ict/why_selfreflection_react_loops_fail_on/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1ubskzk/did_ai_deep_research_get_lazy/" target="_blank" rel="noopener noreferrer">Did AI Deep Research get lazy?</a>
  <p class="news-summary">A user reports a significant decrease in the processing time and depth of &#x27;Deep Research&#x27; features across both ChatGPT and Gemini. The user notes that while the AI previously spent 20-30 minutes synthesizing hundreds of sources, it now completes tasks in under 7 minutes, leading to concerns about reduced output quality.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">LLM</span><span class="news-tag">Agentic AI</span><span class="news-tag">Model Performance</span><span class="news-tag">User Experience</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1ubskzk/did_ai_deep_research_get_lazy/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1ubu3cy/where_do_you_see_prediction_and_decisionmaking/" target="_blank" rel="noopener noreferrer">Where do you see prediction and decision-making separating in AI systems?</a>
  <p class="news-summary">The discussion explores the conceptual and practical boundary between AI systems that merely provide predictions and those integrated into active decision-making workflows. It highlights the ambiguity that arises as models become more responsive and are embedded deeper into real-world operational processes. The thread seeks to identify where the line is drawn as AI moves from an advisory tool to an autonomous agent.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">AI Ethics</span><span class="news-tag">Agentic AI</span><span class="news-tag">Decision Theory</span><span class="news-tag">Machine Learning</span><span class="news-tag">AI Systems</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1ubu3cy/where_do_you_see_prediction_and_decisionmaking/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="computer-vision">Computer Vision</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1ubk0io/brands_using_aigenerated_influencers_to_promote/" target="_blank" rel="noopener noreferrer">Brands using AI-generated influencers to promote products on social media | AI (artificial intelligence) | The Guardian</a>
  <p class="news-summary">Brands are increasingly adopting AI-generated influencers to market products on social media platforms. This trend leverages synthetic media to create controllable, cost-effective brand ambassadors that can interact with audiences at scale. The shift raises questions regarding authenticity, digital ethics, and the evolving role of generative AI in marketing.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Generative AI</span><span class="news-tag">Synthetic Media</span><span class="news-tag">Marketing</span><span class="news-tag">Computer Vision</span><span class="news-tag">Social Media</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1ubk0io/brands_using_aigenerated_influencers_to_promote/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="computing-systems">Computing Systems</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Mon, 22 Ju</span>
  </div>
  <a class="news-title" href="https://github.com/openai/codex/issues/28224" target="_blank" rel="noopener noreferrer">Codex logging bug may write TBs to local SSDs</a>
  <p class="news-summary">A logging bug in the OpenAI Codex repository has been identified that can cause excessive data output, potentially writing terabytes of logs to local SSDs. This issue highlights a critical infrastructure vulnerability in how large-scale models handle telemetry and logging.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Logging</span><span class="news-tag">Infrastructure</span><span class="news-tag">Bug Report</span><span class="news-tag">OpenAI</span><span class="news-tag">Codex</span></div>
    <a class="news-read-btn" href="https://github.com/openai/codex/issues/28224" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1ubm0q3/utah_data_center_brute_forced_through_to_approval/" target="_blank" rel="noopener noreferrer">Utah Data Center Brute Forced Through to Approval Despite Widespread Popular Opposition</a>
  <p class="news-summary">A data center project in Utah received government approval despite 71% local opposition regarding water scarcity and environmental impacts. The project bypassed standard regulatory channels by exploiting the Military Installation Development Authority (MIDA), a mechanism that could potentially be replicated in other states to fast-track infrastructure.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Data Centers</span><span class="news-tag">Infrastructure</span><span class="news-tag">AI Regulation</span><span class="news-tag">Resource Scarcity</span><span class="news-tag">Government Policy</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1ubm0q3/utah_data_center_brute_forced_through_to_approval/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1ubwkd5/ai_is_making_crypto_security_cheaper_faster_and/" target="_blank" rel="noopener noreferrer">AI is making crypto security cheaper, faster and harder to ignore</a>
  <p class="news-summary">The post discusses how AI technologies are revolutionizing the cryptocurrency security landscape by lowering costs and increasing the speed of threat detection. It highlights the shift toward proactive security measures that are becoming increasingly difficult for developers and platforms to overlook.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Cybersecurity</span><span class="news-tag">Cryptocurrency</span><span class="news-tag">AI Integration</span><span class="news-tag">Threat Detection</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1ubwkd5/ai_is_making_crypto_security_cheaper_faster_and/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="general">General</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1uc0kmx/conflict_of_interest/" target="_blank" rel="noopener noreferrer">Conflict of Interest</a>
  <p class="news-summary">A Reddit post highlights potential conflicts of interest regarding Peter Thiel&#x27;s influence over the AI landscape through Founders Fund. The post notes that Persona Identities, a company linked to Thiel&#x27;s portfolio, serves as the primary identity verification partner for both Anthropic&#x27;s Claude and OpenAI.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Venture Capital</span><span class="news-tag">AI Governance</span><span class="news-tag">Ethics</span><span class="news-tag">Identity Verification</span><span class="news-tag">Founders Fund</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1uc0kmx/conflict_of_interest/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="llm">LLM</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1ubzzi3/if_you_use_more_than_one_ai_model_how_do_you_keep/" target="_blank" rel="noopener noreferrer">If you use more than one AI model, how do you keep your context straight across them?</a>
  <p class="news-summary">A user highlights the friction of &#x27;context switching&#x27; when utilizing multiple LLMs for different tasks, such as writing versus reasoning. The primary challenge is the repetitive need to re-brief each model on project background, which leads to inconsistent outputs and fragmented information. The post seeks community strategies for maintaining a single source of truth across disparate AI environments.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">LLM</span><span class="news-tag">Context Window</span><span class="news-tag">Prompt Engineering</span><span class="news-tag">Workflow Optimization</span><span class="news-tag">Multi-Model Usage</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1ubzzi3/if_you_use_more_than_one_ai_model_how_do_you_keep/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1ubpzj7/my_personal_experience_from_last_4_years_about_ai/" target="_blank" rel="noopener noreferrer">My personal experience from last 4 years about AI</a>
  <p class="news-summary">A developer with four years of experience argues that prompt engineering is no longer a significant competitive advantage as LLMs have become more resilient to poor input. Instead, the real &#x27;moat&#x27; for businesses lies in high-quality, clean, and comprehensive data pipelines. The author emphasizes that providing deep context through organized data is the key to achieving actual ROI in AI implementation.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Prompt Engineering</span><span class="news-tag">Data Engineering</span><span class="news-tag">LLM Implementation</span><span class="news-tag">Business AI</span><span class="news-tag">Data Quality</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1ubpzj7/my_personal_experience_from_last_4_years_about_ai/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="mlops">MLOps</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-22</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1ucbhzf/ive_been_interviewing_ai_engineers_and_i_honestly/" target="_blank" rel="noopener noreferrer">I’ve been interviewing AI engineers and I honestly didn’t expect it to feel this disconnected from reality</a>
  <p class="news-summary">A veteran developer highlights a growing disconnect between AI candidates&#x27; theoretical knowledge and the practical demands of production engineering. The post notes that while many can build impressive demos, they struggle with the &#x27;chaotic&#x27; reality of shipping reliable, production-ready systems. This underscores a significant skills gap in the current AI hiring landscape.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">AI Engineering</span><span class="news-tag">MLOps</span><span class="news-tag">Talent Gap</span><span class="news-tag">Production AI</span><span class="news-tag">Software Engineering</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1ucbhzf/ive_been_interviewing_ai_engineers_and_i_honestly/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="nlp">NLP</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1ubzc6m/ai_might_make_me_fail_my_class/" target="_blank" rel="noopener noreferrer">AI might make me fail my class</a>
  <p class="news-summary">A college student reports that their entirely human-written research paper was flagged as 100% AI-generated by multiple detection tools. The post highlights the growing issue of &#x27;false positives&#x27; in AI detection software and the potential for academic repercussions due to unreliable technology.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">AI Detection</span><span class="news-tag">LLM</span><span class="news-tag">Academic Integrity</span><span class="news-tag">False Positives</span><span class="news-tag">NLP</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1ubzc6m/ai_might_make_me_fail_my_class/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1ubnaqo/the_surge_of_slopsince_the_release_of_chatgpt35/" target="_blank" rel="noopener noreferrer">The Surge of Slop—since the release of ChatGPT-3.5 in late 2022, the number of e-books published on Amazon has skyrocketed, tripling by late 2025. A new scientific analysis shows that this is entirely due to the rise of AI-generated books, which now far outnumber human-written books. [The Economist]</a>
  <p class="news-summary">A surge in AI-generated content is flooding digital marketplaces, with Amazon e-book publications tripling since late 2022. Data from Deezer further highlights this trend, showing that AI music now accounts for 44% of new uploads, with many listeners unable to distinguish it from human-made content.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Generative AI</span><span class="news-tag">Content Saturation</span><span class="news-tag">LLMs</span><span class="news-tag">AI Music</span><span class="news-tag">Data Analysis</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1ubnaqo/the_surge_of_slopsince_the_release_of_chatgpt35/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-22</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1ucet7o/the_outreach_system_my_friend_used_to_generate/" target="_blank" rel="noopener noreferrer">The Outreach System My Friend Used to Generate $235K for His Web Agency</a>
  <p class="news-summary">A web agency owner scaled his business to $235K by transitioning from high-volume, generic email outreach to an automated, AI-driven system. Using a tool called Swokei, he now leverages automated website analysis to generate personalized outreach messages based on specific design and SEO flaws. This shift demonstrates the practical application of AI in automating personalized B2B sales and lead generation.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Generative AI</span><span class="news-tag">Automation</span><span class="news-tag">Sales Tech</span><span class="news-tag">NLP</span><span class="news-tag">Lead Generation</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1ucet7o/the_outreach_system_my_friend_used_to_generate/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1uc3q7e/most_multihop_rag_goes_stale_the_moment_your_data/" target="_blank" rel="noopener noreferrer">Most multi-hop RAG goes stale the moment your data changes, what about a training-free approach that skips the graph rebuild?</a>
  <p class="news-summary">A new open-source framework called MOTHRAG addresses the scalability issues of GraphRAG by performing multi-hop reasoning at query time over a plain dense index. This training-free approach eliminates the need for costly knowledge graph reconstructions or model retraining when data updates, maintaining high accuracy while reducing costs. The system uses a deterministic ensemble of reasoning arms to produce auditable, proof-tree-structured answers.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">RAG</span><span class="news-tag">Knowledge Graphs</span><span class="news-tag">LLMs</span><span class="news-tag">Multi-hop Reasoning</span><span class="news-tag">Open Source</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1uc3q7e/most_multihop_rag_goes_stale_the_moment_your_data/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="robotics">Robotics</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1ubxb6b/why_an_ai_company_cleaned_my_new_york_city/" target="_blank" rel="noopener noreferrer">Why an AI company cleaned my New York City apartment for free</a>
  <p class="news-summary">A user shared a story about an AI company providing free apartment cleaning services in New York City. The incident highlights the practical application of physical automation and the potential for AI companies to engage in real-world service testing.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Robotics</span><span class="news-tag">Automation</span><span class="news-tag">Physical AI</span><span class="news-tag">Real-world Application</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1ubxb6b/why_an_ai_company_cleaned_my_new_york_city/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>]]></content><author><name>hiimmuc</name></author><summary type="html"><![CDATA[Today's digest highlights a growing tension between the rapid scaling of AI-generated content and the practical, organizational challenges of integrating these models into reliable workflows. The focus shifts from raw model capabilities toward the infrastructure, trust, and "slop" produced by mass automation.]]></summary></entry><entry><title type="html">Daily Digest 2026-06-20</title><link href="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-20/" rel="alternate" type="text/html" title="Daily Digest 2026-06-20" /><published>2026-06-20T00:00:00+07:00</published><updated>2026-06-20T00:00:00+07:00</updated><id>https://hiimmuc.github.io/Personal-AI-Digest/digest/daily</id><content type="html" xml:base="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-20/"><![CDATA[<div class="digest-theme">
  <svg class="digest-theme-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 1.5a6.5 6.5 0 1 0 0 13 6.5 6.5 0 0 0 0-13zM0 8a8 8 0 1 1 16 0A8 8 0 0 1 0 8z" /><path d="M6.5 7.75A.75.75 0 0 1 7.25 7h1a.75.75 0 0 1 .75.75v2.75h.25a.75.75 0 0 1 0 1.5h-2a.75.75 0 0 1 0-1.5h.25v-2h-.25a.75.75 0 0 1-.75-.75zM8 6a1 1 0 1 1 0-2 1 1 0 0 1 0 2z" /></svg>
  <span>Today's digest highlights a shift toward local model optimization and the practical deployment of open-source alternatives to proprietary systems. The focus remains on balancing performance with sovereignty, safety, and architectural efficiency.</span>
</div>

<h2 id="tech-news">Tech News</h2>

<h3 id="ai-safety">AI Safety</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Sun, 21 Ju</span>
  </div>
  <a class="news-title" href="https://support.claude.com/en/articles/14328960-identity-verification-on-claude" target="_blank" rel="noopener noreferrer">Identity verification on Claude</a>
  <p class="news-summary">Anthropic has implemented identity verification measures for users of the Claude platform. This move is likely aimed at reducing abuse, ensuring compliance with safety regulations, and managing access to high-compute resources.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Identity Verification</span><span class="news-tag">Anthropic</span><span class="news-tag">Claude</span><span class="news-tag">AI Safety</span><span class="news-tag">Governance</span></div>
    <a class="news-read-btn" href="https://support.claude.com/en/articles/14328960-identity-verification-on-claude" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="agentic-ai">Agentic AI</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Sun, 21 Ju</span>
  </div>
  <a class="news-title" href="https://github.com/raiyanyahya/recall" target="_blank" rel="noopener noreferrer">Show HN: Recall – Local project memory for Claude Code</a>
  <p class="news-summary">Recall is a local project designed to provide long-term memory for Claude Code, allowing the agent to persist context across different sessions. It enables the AI to remember previous interactions, project-specific details, and user preferences by storing them in a local database. This tool aims to enhance the utility of agentic coding workflows by reducing the need to re-provide context.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Agentic AI</span><span class="news-tag">LLM</span><span class="news-tag">Developer Tools</span><span class="news-tag">Context Management</span></div>
    <a class="news-read-btn" href="https://github.com/raiyanyahya/recall" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="computer-vision">Computer Vision</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1ubtf09/a_slightly_improved_dvdjepa_demo_p/" target="_blank" rel="noopener noreferrer">A slightly improved DVD-JEPA demo [P]</a>
  <p class="news-summary">A community member shared an improved demonstration of Yann LeCun&#x27;s Joint-Embedding Predictive Architecture (JEPA). The update includes environment noise to highlight JEPA&#x27;s ability to ignore irrelevant details and provides a fair comparison against a pixel-space baseline. The project aims to more clearly illustrate the core promise of JEPA as a world model.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">JEPA</span><span class="news-tag">World Models</span><span class="news-tag">Computer Vision</span><span class="news-tag">Yann LeCun</span><span class="news-tag">Predictive Modeling</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1ubtf09/a_slightly_improved_dvdjepa_demo_p/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-20</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1uatlzx/dvdjepa_an_opensource_fullyreproducible_jepa/" target="_blank" rel="noopener noreferrer">DVD-JEPA: an open-source, fully-reproducible JEPA world model [P]</a>
  <p class="news-summary">DVD-JEPA is an open-source, fully reproducible implementation of Yann LeCun&#x27;s Joint-Embedding Predictive Architecture (JEPA) using a bouncing DVD logo as a world model. By predicting latent representations rather than raw pixels, the model successfully learns spatial coordinates and can detect anomalies with high precision. The project demonstrates that the core principles of large-scale world models can be distilled into a tiny, browser-runnable architecture.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">JEPA</span><span class="news-tag">World Models</span><span class="news-tag">Anomaly Detection</span><span class="news-tag">Computer Vision</span><span class="news-tag">Self-Supervised Learning</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1uatlzx/dvdjepa_an_opensource_fullyreproducible_jepa/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-20</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1ub1db3/studying_flux_in_diffusers_library_was_hard_so_i/" target="_blank" rel="noopener noreferrer">Studying FLUX in diffusers library was hard, so I built a smaller open-source version [P]</a>
  <p class="news-summary">A developer released &#x27;minFLUX&#x27;, a simplified PyTorch implementation of the FLUX.1 and FLUX.2 diffusion models designed to strip away the complexity of the HuggingFace diffusers library. The project provides line-by-line mappings to official source code, including training/inference loops and architectural insights into the differences between FLUX.1 and FLUX.2.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Diffusion Models</span><span class="news-tag">FLUX</span><span class="news-tag">PyTorch</span><span class="news-tag">Open Source</span><span class="news-tag">Computer Vision</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1ub1db3/studying_flux_in_diffusers_library_was_hard_so_i/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="computing-systems">Computing Systems</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Mon, 22 Ju</span>
  </div>
  <a class="news-title" href="https://docs.deno.com/runtime/desktop/" target="_blank" rel="noopener noreferrer">Deno Desktop</a>
  <p class="news-summary">Deno Desktop is a new initiative to bring the Deno runtime to a native desktop environment. It aims to provide a seamless experience for developers to build and run applications with native access to system APIs and a unified toolchain.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Runtime</span><span class="news-tag">Developer Tools</span><span class="news-tag">JavaScript</span><span class="news-tag">Systems Programming</span></div>
    <a class="news-read-btn" href="https://docs.deno.com/runtime/desktop/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Sat, 20 Ju</span>
  </div>
  <a class="news-title" href="https://fil-c.org/inlineasm" target="_blank" rel="noopener noreferrer">Memory Safe Inline Assembly</a>
  <p class="news-summary">The project introduces a way to write memory-safe inline assembly, addressing a long-standing security vulnerability in systems programming. By providing a safer abstraction for low-level hardware interaction, it aims to reduce buffer overflows and other memory-related exploits while maintaining performance.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Systems Programming</span><span class="news-tag">Memory Safety</span><span class="news-tag">Security</span><span class="news-tag">Low-level Optimization</span></div>
    <a class="news-read-btn" href="https://fil-c.org/inlineasm" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Sun, 21 Ju</span>
  </div>
  <a class="news-title" href="https://alexkritchevsky.com/2026/05/25/everything-is-logarithms.html" target="_blank" rel="noopener noreferrer">Everything is logarithms</a>
  <p class="news-summary">The article explores the pervasive role of logarithms in mathematics, computer science, and information theory. It discusses how logarithmic scales simplify complex growth patterns and are fundamental to understanding algorithmic complexity and data representation.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Mathematics</span><span class="news-tag">Information Theory</span><span class="news-tag">Algorithms</span><span class="news-tag">Computing Fundamentals</span></div>
    <a class="news-read-btn" href="https://alexkritchevsky.com/2026/05/25/everything-is-logarithms.html" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Sat, 20 Ju</span>
  </div>
  <a class="news-title" href="https://github.com/playX18/lisp-in-types/" target="_blank" rel="noopener noreferrer">Lisp in the Rust Type System</a>
  <p class="news-summary">This project explores integrating Lisp-style macro systems and functional programming paradigms into the Rust type system. It aims to provide more expressive power for type-level programming, potentially enhancing how complex data structures and logic are handled in systems programming.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Rust</span><span class="news-tag">Programming Languages</span><span class="news-tag">Type Systems</span><span class="news-tag">Functional Programming</span><span class="news-tag">Systems Programming</span></div>
    <a class="news-read-btn" href="https://github.com/playX18/lisp-in-types/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Sat, 20 Ju</span>
  </div>
  <a class="news-title" href="https://6it.dev/blog/infographics-operation-costs-in-cpu-clock-cycles-take-2-80736" target="_blank" rel="noopener noreferrer">Efficient C++ Programming for Modern C++ CPUs, Chapter 4/part 2</a>
  <p class="news-summary">This content explores low-level C++ programming optimizations tailored for modern CPU architectures. It focuses on understanding operation costs in clock cycles to write more efficient code, which is foundational for high-performance computing.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">C++</span><span class="news-tag">Systems Programming</span><span class="news-tag">Performance Optimization</span><span class="news-tag">Low-level Programming</span><span class="news-tag">CPU Architecture</span></div>
    <a class="news-read-btn" href="https://6it.dev/blog/infographics-operation-costs-in-cpu-clock-cycles-take-2-80736" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Sun, 21 Ju</span>
  </div>
  <a class="news-title" href="https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction" target="_blank" rel="noopener noreferrer">Prefer duplication over the wrong abstraction (2016)</a>
  <p class="news-summary">The article argues that software engineers often over-engineer systems by creating complex, incorrect abstractions that lead to technical debt. It advocates for &#x27;duplication over the wrong abstraction,&#x27; suggesting that repeating code is preferable to building a flawed shared component that is difficult to maintain.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Software Engineering</span><span class="news-tag">System Design</span><span class="news-tag">Technical Debt</span><span class="news-tag">Code Architecture</span></div>
    <a class="news-read-btn" href="https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Sun, 21 Ju</span>
  </div>
  <a class="news-title" href="https://brandur.org/minimum-viable-unit" target="_blank" rel="noopener noreferrer">The minimum viable unit of saleable software</a>
  <p class="news-summary">The article explores the concept of the &#x27;minimum viable unit&#x27; of software, arguing for a shift toward modular, composable components rather than monolithic applications. It discusses how this philosophy impacts software architecture, scalability, and the economic viability of modern software products.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Software Architecture</span><span class="news-tag">Software Engineering</span><span class="news-tag">Scalability</span><span class="news-tag">Modular Design</span></div>
    <a class="news-read-btn" href="https://brandur.org/minimum-viable-unit" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-20</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1uavduv/an_open_handbook_on_llm_inference_at_scale_gpu/" target="_blank" rel="noopener noreferrer">An open handbook on LLM inference at scale (GPU internals, KV cache, batching, vLLM/SGLang/TensorRT-LLM) [P]</a>
  <p class="news-summary">A community-driven open handbook is being developed to explain the technical internals of LLM inference at scale. It covers critical topics such as GPU memory hierarchy, KV cache management, and optimization frameworks like vLLM and TensorRT-LLM. The project aims to bridge the gap between high-level model usage and low-level hardware execution bottlenecks.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">LLM Inference</span><span class="news-tag">GPU Architecture</span><span class="news-tag">vLLM</span><span class="news-tag">MLOps</span><span class="news-tag">Systems Engineering</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1uavduv/an_open_handbook_on_llm_inference_at_scale_gpu/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1ubmybr/i_released_a_softmaxfree_attention_model_at_gpt2/" target="_blank" rel="noopener noreferrer">I released a softmax-free attention model at GPT-2 Medium scale (~354M params, 11.5B tokens): structural sparsity + tile-skipping kernels for long-context VRAM savings. Open weights + custom Triton kernels [R]</a>
  <p class="news-summary">A researcher has released a softmax-free attention model at the GPT-2 Medium scale, featuring structural sparsity and custom Triton kernels. The model is specifically designed to optimize VRAM usage for long-context processing. It includes open weights and specialized tile-skipping kernels for improved efficiency.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">LLM</span><span class="news-tag">Attention Mechanism</span><span class="news-tag">Triton Kernels</span><span class="news-tag">Model Optimization</span><span class="news-tag">Open Weights</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1ubmybr/i_released_a_softmaxfree_attention_model_at_gpt2/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="general">General</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Sun, 21 Ju</span>
  </div>
  <a class="news-title" href="https://powerfox.jazzzny.me/" target="_blank" rel="noopener noreferrer">PowerFox Browser</a>
  <p class="news-summary">PowerFox is a browser designed to integrate AI capabilities directly into the web browsing experience. It aims to streamline workflows by providing native access to large language models and intelligent tools within the interface.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">AI Integration</span><span class="news-tag">Web Browsing</span><span class="news-tag">Productivity Tools</span><span class="news-tag">LLM</span></div>
    <a class="news-read-btn" href="https://powerfox.jazzzny.me/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-20</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1uazlhg/would_you_let_an_ml_phd_student_graduate_without/" target="_blank" rel="noopener noreferrer">Would you let an ML PhD student graduate without a top-tier paper? [D]</a>
  <p class="news-summary">A discussion on the academic standards for Machine Learning PhD graduation, specifically focusing on whether a student can graduate without a publication in top-tier venues like NeurIPS or ICML. The debate centers on balancing high-impact publication requirements against the quality of a coherent thesis and solid research contributions.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Academic Research</span><span class="news-tag">PhD Standards</span><span class="news-tag">Machine Learning</span><span class="news-tag">Publication Venues</span><span class="news-tag">Research Ethics</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1uazlhg/would_you_let_an_ml_phd_student_graduate_without/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-20</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1uark0u/time_series_modeling_needs_a_dynamical_systems/" target="_blank" rel="noopener noreferrer">Time Series Modeling Needs a Dynamical Systems Perspective [R]</a>
  <p class="news-summary">A new position paper argues that time series modeling should shift toward a dynamical systems perspective to achieve true out-of-domain generalization and long-term forecasting. The authors advocate for DSR-specific training objectives, pretraining on chaotic system simulations, and a return to modern RNNs over Transformers to better capture recursive temporal rules. They emphasize that proper training techniques and dynamical priors are more critical for success than model architecture alone.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Time Series</span><span class="news-tag">Dynamical Systems</span><span class="news-tag">RNNs</span><span class="news-tag">Foundation Models</span><span class="news-tag">Machine Learning Theory</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1uark0u/time_series_modeling_needs_a_dynamical_systems/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="llm">LLM</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Mon, 22 Ju</span>
  </div>
  <a class="news-title" href="https://techstackups.com/comparisons/glm-5.2-vs-opus/" target="_blank" rel="noopener noreferrer">GLM 5.2 vs. Opus</a>
  <p class="news-summary">The discussion compares the performance and capabilities of the GLM 5.2 model against Claude 3 Opus. Users are evaluating benchmarks, reasoning abilities, and practical use cases for both large language models.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">LLM</span><span class="news-tag">Model Comparison</span><span class="news-tag">Benchmarking</span><span class="news-tag">NLP</span></div>
    <a class="news-read-btn" href="https://techstackups.com/comparisons/glm-5.2-vs-opus/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Sun, 21 Ju</span>
  </div>
  <a class="news-title" href="https://apertvs.ai/" target="_blank" rel="noopener noreferrer">Apertus – Open Foundation Model for Sovereign AI</a>
  <p class="news-summary">Apertus is an open foundation model designed specifically for Sovereign AI, aiming to provide nations and organizations with independent infrastructure. It focuses on data privacy, local control, and the democratization of high-performance AI capabilities.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Sovereign AI</span><span class="news-tag">Open Source</span><span class="news-tag">Foundation Models</span><span class="news-tag">Data Privacy</span></div>
    <a class="news-read-btn" href="https://apertvs.ai/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Sun, 21 Ju</span>
  </div>
  <a class="news-title" href="https://www.marble.onl/posts/cancel_claude.html" target="_blank" rel="noopener noreferrer">There is minimal downside to switching to open models</a>
  <p class="news-summary">The discussion explores the growing viability of open-source models as alternatives to proprietary systems. It highlights that open models often provide comparable performance with greater privacy, customization, and cost-efficiency, leading to minimal downsides for many enterprise use cases.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Open Source</span><span class="news-tag">LLM</span><span class="news-tag">Model Deployment</span><span class="news-tag">Privacy</span><span class="news-tag">Cost Efficiency</span></div>
    <a class="news-read-btn" href="https://www.marble.onl/posts/cancel_claude.html" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1ubv0f5/ema_on_lora_r/" target="_blank" rel="noopener noreferrer">EMA on LoRA ? [R]</a>
  <p class="news-summary">A user is seeking research papers or empirical evidence regarding the use of Exponential Moving Average (EMA) on LoRA adapters. Specifically, they are interested in using an EMA-based adapter as a self-teacher to generate soft labels for a trainable adapter, similar to on-policy self-distillation techniques.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">LoRA</span><span class="news-tag">EMA</span><span class="news-tag">Self-Distillation</span><span class="news-tag">Fine-tuning</span><span class="news-tag">Parameter-Efficient Fine-Tuning</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1ubv0f5/ema_on_lora_r/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-20</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1uazlnd/hi_reddit_i_posted_my_build_your_own_llm_workshop/" target="_blank" rel="noopener noreferrer">Hi Reddit, I posted my Build Your Own LLM workshop to Youtube teaching ML, LLM and math intuition [P]</a>
  <p class="news-summary">A new workshop has been released on YouTube and as a self-paced resource, designed to teach users how to build an LLM from scratch without prior math or ML prerequisites. The curriculum covers the full pipeline including transformer architecture, GPU coding with Triton/CUDA, pre-training, and reinforcement learning. It utilizes a unique pedagogical approach of combining slides, manual Excel-based math intuition, and practical PyTorch coding.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">LLM</span><span class="news-tag">Education</span><span class="news-tag">PyTorch</span><span class="news-tag">Transformer Architecture</span><span class="news-tag">Machine Learning</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1uazlnd/hi_reddit_i_posted_my_build_your_own_llm_workshop/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-20</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1uaoomx/how_to_access_books3_dataset_for_research/" target="_blank" rel="noopener noreferrer">how to access books3 dataset for research purposes? [R]</a>
  <p class="news-summary">A user on the r/MachineLearning subreddit is seeking information on how to obtain the Books3 dataset for research purposes. The query highlights ongoing interest in large-scale text corpora used for training large language models.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Books3</span><span class="news-tag">Dataset Access</span><span class="news-tag">LLM Training</span><span class="news-tag">Research Data</span><span class="news-tag">Open Source</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1uaoomx/how_to_access_books3_dataset_for_research/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="mlops">MLOps</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Mon, 22 Ju</span>
  </div>
  <a class="news-title" href="https://sakana.ai/fugu/" target="_blank" rel="noopener noreferrer">Sakana Fugu</a>
  <p class="news-summary">Sakana AI has introduced Fugu, a framework designed to simplify the development of complex AI systems by modularizing components. It aims to streamline the integration of various models and tools to facilitate more efficient research and deployment. The project focuses on making it easier to build sophisticated AI workflows through a structured approach.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Frameworks</span><span class="news-tag">AI Development</span><span class="news-tag">Modular AI</span><span class="news-tag">Research Tools</span></div>
    <a class="news-read-btn" href="https://sakana.ai/fugu/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1ubwcat/datacentric_debugging_for_teams_training_neural/" target="_blank" rel="noopener noreferrer">Data-centric debugging for teams training neural nets [P]</a>
  <p class="news-summary">WeightsLab is an open-source, PyTorch-native tool designed for data-centric debugging during neural network training. It allows engineers to pause runs mid-training to inspect live loss signals and identify issues like mislabels, class imbalances, and outliers. The tool specifically targets computer vision workflows involving images, videos, and LiDAR point cloud data.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">WeightsLab</span><span class="news-tag">Data-centric AI</span><span class="news-tag">PyTorch</span><span class="news-tag">Computer Vision</span><span class="news-tag">MLOps</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1ubwcat/datacentric_debugging_for_teams_training_neural/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-20</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1ub15wf/tsauditor_a_timeseries_auditing_framework_p/" target="_blank" rel="noopener noreferrer">TSAuditor: A time-series auditing framework [P]</a>
  <p class="news-summary">A new open-source tool called TSAuditor has been released to address common pitfalls in time-series data analysis, such as chronological breaks and data leakage. Unlike standard profiling tools that may overlook small percentages of missing data, TSAuditor identifies specific sequential errors and provides evidence-based suggestions for fixes. It aims to simplify the Exploratory Data Analysis (EDA) process and reduce the need for custom validation scripts.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Time-Series</span><span class="news-tag">Data Engineering</span><span class="news-tag">Open Source</span><span class="news-tag">EDA</span><span class="news-tag">MLOps</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1ub15wf/tsauditor_a_timeseries_auditing_framework_p/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-20</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1uar4vc/built_a_global_aq_pm25_forecaster_ml_model_p/" target="_blank" rel="noopener noreferrer">Built a Global AQ (PM2.5) Forecaster ML Model [P]</a>
  <p class="news-summary">A developer shared an end-to-end PM2.5 air quality forecasting pipeline using 1.6M+ rows of OpenAQ and NASA data. The project highlights a transition from a stateless Gradient Boosting Regressor to a horizon-aligned architecture with autoregressive lag vectors to solve the &#x27;variance trap&#x27; in chaotic environments. The final model achieved a MASE below 1.0, outperforming naive carryover guesses across multiple countries.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Time Series Forecasting</span><span class="news-tag">Gradient Boosting</span><span class="news-tag">MLOps</span><span class="news-tag">Data Engineering</span><span class="news-tag">Environmental AI</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1uar4vc/built_a_global_aq_pm25_forecaster_ml_model_p/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="nlp">NLP</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Sun, 21 Ju</span>
  </div>
  <a class="news-title" href="https://www.teachmecoolstuff.com/viewarticle/fine-tuning-a-local-llm-to-categorize-questions" target="_blank" rel="noopener noreferrer">Good results fine tuning a local LLM like Qwen 3:0.6B to categorize questions</a>
  <p class="news-summary">The author shares successful results from fine-tuning a small, local LLM (Qwen 3:0.6B) specifically for question categorization. The project demonstrates that smaller models can be highly effective for niche classification tasks when properly tuned. It highlights the feasibility of running specialized AI workflows on local hardware.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">LLM</span><span class="news-tag">Fine-tuning</span><span class="news-tag">Local LLM</span><span class="news-tag">Qwen</span><span class="news-tag">NLP</span></div>
    <a class="news-read-btn" href="https://www.teachmecoolstuff.com/viewarticle/fine-tuning-a-local-llm-to-categorize-questions" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1ubz5o8/an_update_on_matrix_recurrent_units_an_attention/" target="_blank" rel="noopener noreferrer">An Update on Matrix Recurrent Units, an Attention Alternative [R]</a>
  <p class="news-summary">A researcher shared updates on Matrix Recurrent Units (MRU), a linear-time sequence architecture proposed as an alternative to the attention mechanism. The update details new methods for bounding matrix states—such as using the Cayley Map and QR decomposition—to solve training instability issues observed on larger datasets. The project explores leveraging matrix associativity and parallel scans to achieve efficient sequence modeling on deep learning hardware.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Sequence Modeling</span><span class="news-tag">Attention Alternatives</span><span class="news-tag">Recurrent Neural Networks</span><span class="news-tag">Linear Time Complexity</span><span class="news-tag">Matrix Algebra</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1ubz5o8/an_update_on_matrix_recurrent_units_an_attention/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="speech">Speech</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-21</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1ubvmdx/best_current_methods_for_finetuning_whisper_on/" target="_blank" rel="noopener noreferrer">Best current methods for finetuning whisper on domain specific vocabulary? [P]</a>
  <p class="news-summary">A user is seeking advice on the most effective methods for fine-tuning OpenAI&#x27;s Whisper model to recognize domain-specific technical vocabulary in Spanish. The inquiry covers techniques like LoRA and QLoRA while seeking guidance on data requirements and convergence for specialized speech recognition.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Whisper</span><span class="news-tag">Fine-tuning</span><span class="news-tag">Speech Recognition</span><span class="news-tag">LoRA</span><span class="news-tag">NLP</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1ubvmdx/best_current_methods_for_finetuning_whisper_on/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>]]></content><author><name>hiimmuc</name></author><summary type="html"><![CDATA[Today's digest highlights a shift toward local model optimization and the practical deployment of open-source alternatives to proprietary systems. The focus remains on balancing performance with sovereignty, safety, and architectural efficiency.]]></summary></entry><entry><title type="html">Daily Digest 2026-06-19</title><link href="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-19/" rel="alternate" type="text/html" title="Daily Digest 2026-06-19" /><published>2026-06-19T00:00:00+07:00</published><updated>2026-06-19T00:00:00+07:00</updated><id>https://hiimmuc.github.io/Personal-AI-Digest/digest/daily</id><content type="html" xml:base="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-19/"><![CDATA[<div class="digest-theme">
  <svg class="digest-theme-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 1.5a6.5 6.5 0 1 0 0 13 6.5 6.5 0 0 0 0-13zM0 8a8 8 0 1 1 16 0A8 8 0 0 1 0 8z" /><path d="M6.5 7.75A.75.75 0 0 1 7.25 7h1a.75.75 0 0 1 .75.75v2.75h.25a.75.75 0 0 1 0 1.5h-2a.75.75 0 0 1 0-1.5h.25v-2h-.25a.75.75 0 0 1-.75-.75zM8 6a1 1 0 1 1 0-2 1 1 0 0 1 0 2z" /></svg>
  <span>Today's research and news focus heavily on the governance, reliability, and architectural refinement of agentic systems, specifically addressing how to manage uncertainty and ensure alignment in complex workflows.</span>
</div>

<h2 id="global-trends">Global Trends</h2>

<h3 id="arxiv-subjects">Papers discovered from ArXiv subject categories</h3>

<h4 id="ai-safety">AI Safety</h4>

<div class="paper-item" data-date="2026-06-19" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19527">Emergent Alignment</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Martin Kol\'a\v{r}
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19527" target="_blank" rel="noopener noreferrer">2606.19527</a></p>
<p class="paper-detail"><strong>Authors:</strong> Martin Kol\'a\v{r}</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Can Large Language Models (LLMs) discern when their own outputs are misaligned with human ethics? And can they self-correct? We endow an LLM with a conscience step that reviews its own reasoning and outputs, and we extend the training loss with an alignment component using Direct Preference Optimization (DPO) to steer the model away from non-ethical outputs. The result is an online technique to align models in a wide range of applications: training, fine-tuning, adversarial prompting, and zero-shot learning. It does not require a weaker or stronger judge, relying instead on a frozen copy of itself. In previous work, the Emergent Misalignment scenario showed a range of emergent unethical behaviors from fine-tuning the model to hack code. Instead, we empirically show how to achieve Emergent Alignment: a single high-level introspective question steers training toward an ethical model under the same code hacking scenario.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces 'Emergent Alignment,' a method where LLMs can self-correct unethical outputs by using a conscience step and a self-referential alignment loss. It demonstrates that a single high-level introspective question can steer a model toward ethical behavior during complex tasks like code hacking.</p>
<p><strong>Core Idea:</strong> LLMs can be trained to recognize and correct their own misalignments with human ethics by reviewing their own reasoning through an internal conscience mechanism.</p>
<p><strong>Technique:</strong> The authors use a 'conscience step' for self-review and extend the training loss with an alignment component using Direct Preference Optimization (DPO) based on a frozen copy of the model.</p>
<p><strong>Pipeline:</strong> Input prompt → Model reasoning &amp; output generation → Conscience step (self-review) → DPO-based alignment loss calculation → Final ethical output</p>
<p><strong>Methodology:</strong> The researchers endowed an LLM with a conscience step to review its own outputs and applied DPO to steer the model away from non-ethical behaviors using a frozen copy of the model as a reference.</p>
<p><strong>Results:</strong> The method successfully achieved 'Emergent Alignment' in a code hacking scenario, where a single introspective question steered the model toward ethical behavior across training, fine-tuning, and zero-shot learning.</p>
<p><strong>Limitations:</strong> The paper focuses on the emergence of alignment through introspection but does not fully explore the potential for the model to develop sophisticated methods to bypass its own conscience step.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19527" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="agentic-ai">Agentic AI</h4>

<div class="paper-item" data-date="2026-06-19" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-ai" title="Multiagent Systems (cs.MA)">Multiagent Systems (cs.MA)</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19464">Deontic Policies for Runtime Governance of Agentic AI Systems</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Anupam Joshi, Tim Finin, Karuna Pande Joshi, Lalana Kagal
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19464" target="_blank" rel="noopener noreferrer">2606.19464</a></p>
<p class="paper-detail"><strong>Authors:</strong> Anupam Joshi, Tim Finin, Karuna Pande Joshi, Lalana Kagal</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Autonomous agentic AI systems driven by Large Language Models (LLMs) introduce a new class of security, privacy, and compliance challenges: an agent that can invoke tools, manipulate data, install software, and coordinate with peer agents across organizational boundaries must be constrained not just by authentication and access control, but by the full structure of enterprise governance. This includes specifying what agents are permitted and prohibited from doing, what they areobliged to do after certain actions (e.g., notify the CISO), under what conditions a standing obligation may be waived, and which rules take precedence when policies conflict. This governance problem exceeds what current policy engines provide. Systems such as XACML, Rego, and Cedar address only the permit/prohibit subset of this governance structure. They do not provide obligation lifecycle management, meta-policy conflict resolution, dispensations that waive obligations in specific circumstances, and ontological reasoning over domain class hierarchies commonly found in applications such as healthcare, cybersecurity, or data privacy. We propose AgenticRei, which realizes key governance requirements such as obligations, dispensations, policy conflict resolutions, and reasoning over policies, as well as the basic permit/prohibit constraints. We use a deontic policy language built on the Rei framework, expressed as OWL (Web Ontology Language) and evaluated at runtime by a high-performance logic engine entirely outside the LLM. The same pipeline governs both tool invocations by the agent and agent-to-agent messages. We show through examples that deontic policies capture governance constraints around security and privacy that mostly cannot be expressed in current production engines. Our approach composes naturally with industry-standard frameworks like A2AS.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces AgenticRei, a governance framework that extends standard access control to include deontic logic concepts like obligations, dispensations, and conflict resolution for agentic AI.</p>
<p><strong>Core Idea:</strong> Current policy engines (XACML, Rego, Cedar) are insufficient for agentic AI because they lack the ability to manage obligation lifecycles and complex ontological reasoning required for enterprise governance.</p>
<p><strong>Technique:</strong> The authors develop a deontic policy language built on the Rei framework, expressed in OWL (Web Ontology Language) and evaluated by an external high-performance logic engine.</p>
<p><strong>Pipeline:</strong> Agent action/message request → Deontic policy evaluation (OWL/Rei) → Governance decision (Permit/Prohibit/Obligation/Dispensation) → Execution or Notification</p>
<p><strong>Methodology:</strong> The researchers designed a logic-based governance pipeline that operates entirely outside the LLM to ensure deterministic enforcement of security, privacy, and compliance rules.</p>
<p><strong>Results:</strong> The framework successfully captures complex governance constraints (e.g., mandatory notifications, conditional waivers, and policy precedence) that are currently unexpressible in production engines.</p>
<p><strong>Limitations:</strong> The paper focuses on the architectural framework and logic engine; further exploration into real-time performance scaling for massive agent swarms or specific industry-specific ontology mappings may be needed.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19464" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-19" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19559">Uncertainty Decomposition for Clarification Seeking in LLM Agents</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Gregory Matsnev
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19559" target="_blank" rel="noopener noreferrer">2606.19559</a></p>
<p class="paper-detail"><strong>Authors:</strong> Gregory Matsnev</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Recent position papers argue that the classical aleatoric/epistemic uncertainty framework is insufficient for interactive large language model (LLM) agents and call for underspecification-aware, decomposed, and communicable uncertainty representations that can unlock new agent capabilities such as proactive clarification seeking and shared mental-model building. Practical deployment constraints -- black-box APIs, interactive latency budgets, and the absence of labeled trajectories -- rule out logprob-based, multi-sampling, and training-based methods, leaving prompt-based estimation as the most viable family for surfacing such signals at deployment time. We answer this call with a simple prompt-based decomposition that separates action confidence from request uncertainty (u), enabling the agent to ask for clarification when the task specification is ambiguous. To evaluate it, we introduce two clarification-augmented benchmarks (WebShop-Clarification and ALFWorld-Clarification) in which 50% of tasks are deliberately underspecified, and systematically compare the proposed decomposition against ReAct+UE and Uncertainty-Aware Memory (UAM) across five LLM backbones (GPT-5.1, DeepSeek-v3.2-exp, GLM-4.7, Qwen3.5-35B, GPT-OSS-120B) on these variants together with the standard WebShop, ALFWorld, and REAL benchmarks for fault detection. Averaged across the five backbones, the proposed decomposition improves clarification F1 on ALFWorld-Clarification by 73% over ReAct+UE and by 36% over UAM, and leads clarification F1 on every backbone on WebShop-Clarification and on four of five backbones on ALFWorld-Clarification, indicating that the gains generalize beyond a single LLM.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces a prompt-based uncertainty decomposition method that separates action confidence from request uncertainty to enable LLM agents to proactively seek clarification on underspecified tasks.</p>
<p><strong>Core Idea:</strong> Standard aleatoric/epistemic uncertainty frameworks are insufficient for interactive agents; instead, agents need to distinguish between their ability to perform an action and the ambiguity of the user's request.</p>
<p><strong>Technique:</strong> A simple prompt-based decomposition is used to surface uncertainty signals at deployment time, bypassing the need for logprobs, multi-sampling, or retraining.</p>
<p><strong>Pipeline:</strong> Underspecified task input → Prompt-based uncertainty decomposition (Action Confidence vs. Request Uncertainty) → Proactive clarification seeking or action execution</p>
<p><strong>Methodology:</strong> The authors evaluated the method using two new clarification-augmented benchmarks (WebShop-Clarification and ALFWorld-Clarification) across five different LLM backbones.</p>
<p><strong>Results:</strong> The proposed decomposition improved clarification F1 on ALFWorld-Clarification by 73% over ReAct+UE and 36% over UAM, achieving the highest clarification F1 on every backbone for WebShop-Clarification.</p>
<p><strong>Limitations:</strong> The study focuses on prompt-based estimation due to black-box API constraints, which may limit the granularity of uncertainty signals compared to internal model weights.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19559" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-19" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19602">Configurable Clinical Information Extraction with Agentic RAG: What Works, What Breaks, and Why</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Osman Alperen \c{C}inar-Kora\c{s}, Marie Bauer, Sameh Khattab, Merlin Engelke, Moon Kim, Stephan Settelmeier, Shigeyasu Sugawara, Fabian Freisleben, Felix Nensa, Jens Kleesiek
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19602" target="_blank" rel="noopener noreferrer">2606.19602</a></p>
<p class="paper-detail"><strong>Authors:</strong> Osman Alperen \c{C}inar-Kora\c{s}, Marie Bauer, Sameh Khattab, Merlin Engelke, Moon Kim, Stephan Settelmeier, Shigeyasu Sugawara, Fabian Freisleben, Felix Nensa, Jens Kleesiek</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Patient contexts span hundreds of heterogeneous documents and thousands of structured data points, yet the document-level metadata that AI systems need for retrieval and triage is absent or incomplete. Standard retrieval-augmented generation fails on this data, mishandling temporal reasoning, cross-document dependencies, and missing metadata. We deploy ACIE (Agentic Clinical Information Extraction) at University Medicine Essen: an on-premise agentic RAG pipeline that reasons over complete patient contexts and grounds every answer in source passages for clinician verification. We quantify the metadata gap, trace the architectural decisions it shaped, and evaluate extraction alongside an independent retrospective lymphoma registry study, in which nuclear-medicine physicians verify every extracted value against its cited sources. Across 7,326 judgments, clinicians accepted 96.5\% of extractions, with per-type acceptance ranging from 80\% to 99\%.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces ACIE, an on-premise agentic RAG pipeline designed to extract clinical information from complex, heterogeneous patient records where standard RAG fails due to missing metadata and temporal dependencies.</p>
<p><strong>Core Idea:</strong> By using an agentic reasoning framework, the system can navigate hundreds of documents to ground clinical extractions in specific source passages, ensuring high-fidelity data for clinician verification.</p>
<p><strong>Technique:</strong> The authors employ Agentic Retrieval-Augmented Generation (RAG) to perform multi-step reasoning over complete patient contexts, addressing cross-document dependencies and temporal reasoning.</p>
<p><strong>Pipeline:</strong> Heterogeneous patient documents → Agentic RAG reasoning and source grounding → Verified clinical information extractions</p>
<p><strong>Methodology:</strong> The researchers deployed ACIE at University Medicine Essen and evaluated it against an independent retrospective lymphoma registry study, where nuclear-medicine physicians verified 7,326 extractions against cited sources.</p>
<p><strong>Results:</strong> Clinicians accepted 96.5% of the extractions, with specific data types achieving acceptance rates between 80% and 99%.</p>
<p><strong>Limitations:</strong> The study highlights the 'metadata gap' in clinical records and explores the architectural trade-offs required to handle missing information and complex temporal reasoning.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19602" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-19" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19704">Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Dhaval C. Patel, Kaoutar El Maghraoui, Shuxin Lin, Yusheng Li, Tianjun Feng, Chun-Yi Tsai, Yihan Sun, Wei Alexander Xin, Akshat Bhandari, Tanisha Rathod, Aaron Fan, Sanskruti Vijay Shejwal, Tomas Pasiecznik, Sagar Chethan Kumar, Tanmay Agarwal, Rohith Kanathur, Sam Colman, Amaan Sheikh, Dev Bahl, Ann Li, Krish Veera, Alimurtaza Mustafa Merchant, Shambhawi Baswaraj Bhure, Sajal Kumar Goyla, Chengrui Li, Kirthana Natarajan, Rui Li, Thomas Ajai, Rujing Li, Vivek G. Iyer, Sanjaii Vijayakumar, Yitong Bai, Ayal Yakobe, Darief Maes, Yassine Jebbouri, Tianyang Xu, Thai Quoc On, Vera Mazeeva, Winston Li, Yuval Shemla, Yeshitha Bhuvanesh, Rushin Bhatt, Siddharth Chethan Gowda, Alisha Vinod, Caroline Cahill, Shriya Aishani Rachakonda, Yunfeng Chen, Aryaman Agrawal, Aman Upganlawar, Mao Le Jonathan Ang, Yubin Sally Go, Madhav Rajkondawar, Yang-Jung Chen, Trisha Maturi, Ananya Kapoor, Andrew Li, Shrey Arora, Mana Abbaszadeh, Shen Li, Charles Xu, Byeolah Kwon
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19704" target="_blank" rel="noopener noreferrer">2606.19704</a></p>
<p class="paper-detail"><strong>Authors:</strong> Dhaval C. Patel, Kaoutar El Maghraoui, Shuxin Lin, Yusheng Li, Tianjun Feng, Chun-Yi Tsai, Yihan Sun, Wei Alexander Xin, Akshat Bhandari, Tanisha Rathod, Aaron Fan, Sanskruti Vijay Shejwal, Tomas Pasiecznik, Sagar Chethan Kumar, Tanmay Agarwal, Rohith Kanathur, Sam Colman, Amaan Sheikh, Dev Bahl, Ann Li, Krish Veera, Alimurtaza Mustafa Merchant, Shambhawi Baswaraj Bhure, Sajal Kumar Goyla, Chengrui Li, Kirthana Natarajan, Rui Li, Thomas Ajai, Rujing Li, Vivek G. Iyer, Sanjaii Vijayakumar, Yitong Bai, Ayal Yakobe, Darief Maes, Yassine Jebbouri, Tianyang Xu, Thai Quoc On, Vera Mazeeva, Winston Li, Yuval Shemla, Yeshitha Bhuvanesh, Rushin Bhatt, Siddharth Chethan Gowda, Alisha Vinod, Caroline Cahill, Shriya Aishani Rachakonda, Yunfeng Chen, Aryaman Agrawal, Aman Upganlawar, Mao Le Jonathan Ang, Yubin Sally Go, Madhav Rajkondawar, Yang-Jung Chen, Trisha Maturi, Ananya Kapoor, Andrew Li, Shrey Arora, Mana Abbaszadeh, Shen Li, Charles Xu, Byeolah Kwon</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Agent benchmarks are growing fast, but no single benchmark touches more than four or five of the dimensions that deployment exposes. This paper aggregates the largest coordinated deep-dive of one MCP-based industrial-agent benchmark to date: fourteen parallel implementation studies covering new asset classes (including a multi-modal visual extension), alternative orchestrations, retrieval strategies, reasoning modes, infrastructure optimizations, and evaluation-methodology probes. Consolidating those studies with seven prior agent benchmarks, we argue that aggregate-score leaderboards systematically underspecify deployed-agent evaluation. Rankings derived from aggregate scores do not transfer to out-of-distribution settings; recent public-to-hidden competition retrospectives provide direct empirical evidence of this rank instability. We propose ranking configurations by predictive validity, the correlation between in-sample and out-of-sample rank, rather than in-sample mean, and report a twelve-tier measurement apparatus that exposes the deployment-relevant dimensions HELM and its agent-era successors collapse. The position is operationalized through three falsifiable out-of-distribution criteria with explicit thresholds; existing evidence partly supports it but is too thin to confirm. We close with a pre-registered pilot design and a field-level vision for what the next generation of agentic benchmarks should report.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper identifies that aggregate-score leaderboards for LLM agents lack predictive validity for real-world deployment and proposes a new evaluation framework based on out-of-distribution (OOD) rank correlation.</p>
<p><strong>Core Idea:</strong> Current agent benchmarks fail to generalize because they collapse complex deployment dimensions into single scores; evaluation should instead prioritize the correlation between in-sample and out-of-sample performance.</p>
<p><strong>Technique:</strong> The authors utilize a twelve-tier measurement apparatus and a predictive validity metric to evaluate how well benchmark rankings transfer to unseen settings.</p>
<p><strong>Pipeline:</strong> Agent benchmarks and industrial-agent studies → Analysis of rank instability across OOD settings → Predictive validity ranking and twelve-tier measurement apparatus</p>
<p><strong>Methodology:</strong> The researchers conducted fourteen parallel implementation studies on an MCP-based industrial-agent benchmark and consolidated them with seven prior benchmarks to analyze rank transferability.</p>
<p><strong>Results:</strong> The study provides empirical evidence of rank instability in public-to-hidden competitions and demonstrates that aggregate scores systematically underspecify the dimensions relevant to actual deployment.</p>
<p><strong>Limitations:</strong> The evidence supporting the proposed OOD criteria is currently too thin to fully confirm the new framework, necessitating further field-level testing.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19704" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-19" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19494">Hidden Anchors in Multi-Agent LLM Deliberation</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Apurba Pokharel, Ram Dantu
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19494" target="_blank" rel="noopener noreferrer">2606.19494</a></p>
<p class="paper-detail"><strong>Authors:</strong> Apurba Pokharel, Ram Dantu</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Multi-agent LLM deliberation, where agents exchange and revise answers over several rounds, is increasingly used to improve reasoning and accuracy, yet how and why it works is rarely modelled. Such deliberation mirrors how humans reach decisions. As social animals we are pulled both by the group, the herd effect that classical opinion-dynamics models such as DeGroot and Friedkin--Johnsen capture, and by our own internal belief, which they do not. We model multi-agent deliberation as a closed-loop dynamical system in which each agent carries a hidden internal belief, its anchor, that continually pulls its opinion regardless of its neighbours. We show this anchor can be recovered from the deliberation alone, and that it explains a behaviour classical consensus rules forbid: an agent's confidence in the correct answer can climb past where any agent started, escaping the space (convexhull) formed by the initial beliefs. Checking whether the recovered anchor also predicts held-out runs (generalizes) gives a simple test for when a model is truly driven bysuch an anchor. Across three open-weight model families this is a spectrum, not all-or-nothing. All anchors' influence are about equally strongly, but they differ in where the anchor sits, and only when it sits far from the initial opinions does deliberation escape the hull and need the full closed-loop model.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces a closed-loop dynamical system model to explain multi-agent LLM deliberation, identifying 'hidden anchors' as the mechanism that allows agents to reach conclusions outside the convex hull of initial opinions.</p>
<p><strong>Core Idea:</strong> While classical models focus on herd effects, this research posits that agents possess internal beliefs (anchors) that continuously pull their opinions, explaining how deliberation can transcend the starting points of all participants.</p>
<p><strong>Technique:</strong> The authors model deliberation as a closed-loop dynamical system and develop a method to recover hidden anchors from observed deliberation traces.</p>
<p><strong>Pipeline:</strong> Multi-agent deliberation traces → Anchor recovery algorithm → Generalization testing on held-out runs</p>
<p><strong>Methodology:</strong> The researchers compared multi-agent deliberation across three open-weight model families against classical opinion-dynamics models (DeGroot and Friedkin-Johnsen) to measure anchor influence.</p>
<p><strong>Results:</strong> The study found that anchors exist across all tested models; when an anchor sits far from initial opinions, the deliberation escapes the initial convex hull, requiring the full closed-loop model for accurate prediction.</p>
<p><strong>Limitations:</strong> The research identifies a spectrum of anchor influence rather than a binary state, and further investigation is needed on the specific nature of these internal beliefs in different model architectures.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19494" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-19" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span><span class="cat-tag cat-default" title="q-fin.RM">q-fin.RM</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19501">DeXposure-Claw: An Agentic System for DeFi Risk Supervision</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Aijie Shu, Bowei Chen, Wenbin Wu, Cathy Yi-Hsuan Chen, Fengxiang He
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19501" target="_blank" rel="noopener noreferrer">2606.19501</a></p>
<p class="paper-detail"><strong>Authors:</strong> Aijie Shu, Bowei Chen, Wenbin Wu, Cathy Yi-Hsuan Chen, Fengxiang He</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Decentralized finance exposes supervisors to fast-moving, networked credit risks. General-purpose LLM agents fit this setting poorly: they over-read weak evidence and recommend high-stakes interventions, while existing evaluations offer no regulator-aligned way to measure the resulting false alarms. We introduce DeXposure-Claw, a forecast-grounded agentic supervision system that routes LLM decisions through structured evidence: (1) DeXposure-FM, a graph time-series foundation model, forecasts future exposure networks; (2) deterministic monitors and stress scenarios then turn those forecasts into typed alerts, attribution signals, and scenario evidence; and (3) data-health and confidence gates constrain escalation before DeXposure-Claw emits auditable supervisory tickets with rationales. We further develop DeXposure-Bench, a six-axis evaluation harness, whose decision axis scores tickets against a regulator-aligned absolute-loss ground truth and an explicit false-intervention rate. Experiments on five years of weekly real data fully support our system. Code is at https://github.com/EVIEHub/DeXposure-Claw.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces DeXposure-Claw, an agentic system for DeFi risk supervision, and DeXposure-Bench, a new evaluation harness designed to measure false-intervention rates in regulatory contexts.</p>
<p><strong>Core Idea:</strong> To prevent LLM agents from over-reacting to weak evidence in complex DeFi networks, the system grounds agentic decisions in forecast-based evidence and deterministic stress scenarios.</p>
<p><strong>Technique:</strong> The system utilizes a graph time-series foundation model (DeXposure-FM) to forecast exposure networks, coupled with confidence gates and a multi-axis evaluation framework.</p>
<p><strong>Pipeline:</strong> DeFi network data → DeXposure-FM (forecasting) → Deterministic monitors &amp; stress scenarios (alert generation) → Confidence gates (filtering) → DeXposure-Claw (auditable supervisory tickets)</p>
<p><strong>Methodology:</strong> The authors developed a three-stage pipeline involving graph-based forecasting, typed alert generation, and agentic reasoning, evaluated against five years of real-world weekly DeFi data.</p>
<p><strong>Results:</strong> Experiments on five years of real data demonstrate that the system successfully reduces false alarms and provides auditable rationales compared to general-purpose LLM agents.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the computational overhead of the graph time-series foundation model or the scalability of the system to extremely high-frequency trading data.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19501" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    <a class="paper-action-btn gh-btn" href="https://github.com/EVIEHub/DeXposure-Claw" target="_blank" rel="noopener noreferrer" title="View code on GitHub" aria-label="View code on GitHub"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z" /></svg><span>Code</span></a>
  </div>
</div>

<h4 id="computer-vision">Computer Vision</h4>

<div class="paper-item" data-date="2026-06-19" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19522">REVEAL++: Differentiable Phenotypic Grouping for Vision-Language Retinal Modeling of Alzheimer's Disease Risk</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Ethan Elio Meidinger, Seowung Leem, Zeyun Zhao, Ruogu Fang
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19522" target="_blank" rel="noopener noreferrer">2606.19522</a></p>
<p class="paper-detail"><strong>Authors:</strong> Ethan Elio Meidinger, Seowung Leem, Zeyun Zhao, Ruogu Fang</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">The retina offers a noninvasive window into neurodegenerative disease, capturing subtle structural patterns associated with a risk of future cognitive decline. Vision-language alignment frameworks such as REVEAL have shown that pairing retinal fundus images with structured clinical risk narratives improves early prediction of Alzheimer's disease (AD). A key design choice in these approaches is the use of phenotypic grouping, where individuals with similar risk profiles are treated as multi-positive pairs during contrastive learning. However, existing methods operationalize phenotypic similarity as a discrete construct, relying on hard group assignments that impose rigid supervision and decouple group formation from representation learning. We propose a continuous formulation of phenotypic structure within contrastive learning. Rather than assigning samples to fixed clusters, we model inter-subject similarity as a differentiable weighting function derived from intra-modality embedding similarities in both retinal images and risk profiles. These weights define soft multi-positive relationships through a continuous aggregation operator, enabling graded supervision that reflects the spectrum nature of disease risk. We further introduce a soft-target contrastive objective that jointly learns cross-modal alignment and phenotypic structure in an end-to-end manner. Evaluated on UK Biobank retinal imaging data for incident AD prediction, the proposed framework consistently outperforms discrete group-based contrastive learning and standard vision-language baselines. By treating phenotypic similarity as a learnable, continuous signal rather than a fixed grouping rule, our approach provides a principled and robust foundation for population-scale neurodegenerative risk modeling from multi-modal retinal and clinical data.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces REVEAL++, a framework that replaces discrete phenotypic grouping with a continuous, differentiable weighting function for vision-language alignment in Alzheimer's disease risk modeling.</p>
<p><strong>Core Idea:</strong> Instead of assigning patients to fixed clusters, the model treats phenotypic similarity as a learnable, continuous signal derived from intra-modality embedding similarities.</p>
<p><strong>Technique:</strong> The authors employ a soft-target contrastive objective and a continuous aggregation operator to define graded multi-positive relationships between retinal images and clinical risk narratives.</p>
<p><strong>Pipeline:</strong> Retinal fundus images and clinical risk narratives → Intra-modality embedding similarity calculation → Differentiable weighting function → Soft-target contrastive learning → Alzheimer's disease risk prediction</p>
<p><strong>Methodology:</strong> The methodology involves modeling inter-subject similarity as a differentiable weight based on both image and text embeddings, allowing for end-to-end joint learning of cross-modal alignment and phenotypic structure.</p>
<p><strong>Results:</strong> The framework consistently outperforms both discrete group-based contrastive learning and standard vision-language baselines on UK Biobank retinal imaging data for incident AD prediction.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the specific clinical interpretability of the learned continuous weights or the scalability of the differentiable aggregation across extremely large-scale datasets.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19522" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-19" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-cv" title="Computer Vision and Pattern Recognition (cs.CV)">Computer Vision and Pattern Recognition (cs.CV)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19651">BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Max Van Puyvelde, Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19651" target="_blank" rel="noopener noreferrer">2606.19651</a></p>
<p class="paper-detail"><strong>Authors:</strong> Max Van Puyvelde, Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Three-dimensional (3D) brain MRI is central to clinical neurology and neuro-oncology, where generative models could augment under-represented cohorts, simulate disease trajectories, and support privacy-preserving data sharing. Latent diffusion has been the go-to solution for modeling imaging data, but it places two competing demands on the tokenizer: encoder embeddings must retain the clinical information that downstream tasks act on, and the decoder must reconstruct anatomically faithful volumes. Existing reconstruction-driven tokenizers achieve the second at the expense of the first. To address this, we introduce a fully volumetric masked-autoencoder (MAE) based tokenizer for 3D brain MRI latent diffusion, decoupling encoder and decoder: a frozen 3D MAE encoder produces clinically informative embeddings, while a dedicated CNN decoder reconstructs voxels from a linear projection of those embeddings. We pretrain the encoder on 35,309 volumes from 18 public cohorts spanning four modalities, ten disease categories, and 200+ acquisition sites, and demonstrate its dual utility in two settings. First, on a 23-task linear-probing benchmark, the encoder outperforms or matches SOTA models (i.e., BrainIAC, BrainSegFounder, and MedicalNet) on 21 of 23 tasks. Second, a conditional diffusion transformer (DiT) trained on these clinically informative embeddings supports both conditional generation across six variables and patient-specific longitudinal forecasting. Together these results establish a single 3D brain-MRI embedding space capable of both downstream clinical tasks and controllable generation.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces BrainG3N, a dual-purpose tokenizer that successfully decouples clinical information preservation from anatomical reconstruction in 3D brain MRI latent diffusion.</p>
<p><strong>Core Idea:</strong> By using a frozen Masked Autoencoder (MAE) encoder and a dedicated CNN decoder, the model ensures that latent embeddings remain rich in clinical features while still allowing for high-fidelity volumetric reconstruction.</p>
<p><strong>Technique:</strong> The authors employ a volumetric Masked Autoencoder (MAE) architecture to pretrain a robust embedding space, which is then used as the latent space for a conditional Diffusion Transformer (DiT).</p>
<p><strong>Pipeline:</strong> 3D Brain MRI volumes → Frozen 3D MAE Encoder → Clinically informative embeddings → Conditional Diffusion Transformer (DiT) → Controllable 3D MRI generation or longitudinal forecasting</p>
<p><strong>Methodology:</strong> The encoder was pretrained on a massive dataset of 35,309 volumes across 18 cohorts, followed by linear probing for clinical tasks and training a DiT for conditional generation and forecasting.</p>
<p><strong>Results:</strong> The encoder outperformed or matched SOTA models on 21 out of 23 clinical tasks and successfully supported conditional generation across six variables and patient-specific longitudinal forecasting.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the computational overhead of the dual-decoder approach or the specific performance gaps on the 2 tasks where it did not outperform SOTA.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19651" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-19" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-cv" title="Computer Vision and Pattern Recognition (cs.CV)">Computer Vision and Pattern Recognition (cs.CV)</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19735">GLARE: A Natural Language Interface for Querying Global Explanations</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Bhavan Vasu, Rajesh Mangannavar
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19735" target="_blank" rel="noopener noreferrer">2606.19735</a></p>
<p class="paper-detail"><strong>Authors:</strong> Bhavan Vasu, Rajesh Mangannavar</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">While global explanations are crucial for understanding vision models across datasets, classes, and decision contexts, their complex and monolithic nature often hinders practical exploration. Because users typically seek targeted answers to specific questions rather than static artifacts, we present an LLM-based interactive interface that provides natural language access to global explanations for black-box image classifiers. The system's core LLM acts as a mediator, translating natural language questions into structured SQL queries over local explanation data. This enables flexible aggregation without exposing users to low-level representations. For each query, the interface outputs statistics-augmented natural language responses, supporting local explanations, and intent-aligned visualizations. We evaluate the system on intent interpretation, query mapping accuracy, generalization to novel queries and datasets, and robustness to linguistic errors. Our results demonstrate that LLM-mediated querying substantially improves the accessibility and usability of global explanations for human-centered XAI.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces GLARE, an LLM-based interactive interface that enables users to query global explanations of black-box image classifiers using natural language. It bridges the gap between complex, monolithic global explanations and user-specific questions by providing a flexible, queryable system.</p>
<p><strong>Core Idea:</strong> Instead of presenting static global explanation artifacts, the system treats local explanation data as a queryable database, allowing users to extract targeted insights through natural language interaction.</p>
<p><strong>Technique:</strong> The system uses a Large Language Model (LLM) as a mediator to translate natural language questions into structured SQL queries, which are then executed over a repository of local explanations.</p>
<p><strong>Pipeline:</strong> Natural language question → LLM translation to SQL → SQL execution over local explanation data → Statistics-augmented natural language response and intent-aligned visualization.</p>
<p><strong>Methodology:</strong> The authors evaluated the system based on intent interpretation, query mapping accuracy, generalization to new datasets/queries, and robustness to linguistic errors.</p>
<p><strong>Results:</strong> The results demonstrate that LLM-mediated querying substantially improves the accessibility and usability of global explanations for human-centered XAI compared to traditional static methods.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail specific limitations, but potential areas for further research include the scalability of the SQL database for massive datasets and the handling of highly ambiguous linguistic queries.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19735" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="computing-systems">Computing Systems</h4>

<div class="paper-item" data-date="2026-06-19" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19538">ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Ashim Dhor, Rasel Mondal, Pin Yu Chen
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19538" target="_blank" rel="noopener noreferrer">2606.19538</a></p>
<p class="paper-detail"><strong>Authors:</strong> Ashim Dhor, Rasel Mondal, Pin Yu Chen</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Convolutional networks, recurrent networks, and transformers each encode different inductive biases -- locality, sequential memory, and content-dependent pairwise interaction -- and have remained mathematically distinct since their inception. We show that this fragmentation reflects not a fundamental diversity in how signals should be processed, but rather incomplete views of a single underlying mathematical object: a learnable integral transform. We introduce the Integral Transform Network (ITNet), a unified architecture built around a learnable kernel that depends jointly on positions and features. This kernel is implemented as a small neural network, specifically an MLP, that models pairwise interactions, enabling the model to adapt its behavior from data. We show that convolution, self-attention (including multi-head), and autoregressive recurrence (including LSTM, GRU, S4, and Mamba) arise as special cases under appropriate parameterizations, and that ITNet is a universal approximator of continuous operators. To make this practical, we develop tiled kernel fusion, importance-weighted Monte Carlo integration, and learned low-rank factorization, enabling efficient and scalable computation. A single ITNet architecture with a shared operator and lightweight modality-specific encoders matches or exceeds specialized baselines on ImageNet-1K , GLUE, ModelNet40, VQA\,v2 and NLVR2. The results demonstrate that a single learned interaction mechanism can recover the behavior of all three architectural families from data.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces ITNet, a unified architecture that mathematically subsumes convolution, self-attention, and recurrence into a single learnable integral transform. It demonstrates that these three distinct architectural families are special cases of a single underlying operator.</p>
<p><strong>Core Idea:</strong> The authors propose that the fragmentation of neural architectures reflects incomplete views of a single mathematical object: a learnable kernel that depends jointly on positions and features.</p>
<p><strong>Technique:</strong> ITNet uses a learnable kernel implemented as a small MLP to model pairwise interactions, which can be parameterized to behave as convolution, attention, or recurrence.</p>
<p><strong>Pipeline:</strong> Input data → Modality-specific encoders → ITNet (Learnable Integral Transform via MLP kernel) → Output</p>
<p><strong>Methodology:</strong> The researchers developed tiled kernel fusion, importance-weighted Monte Carlo integration, and learned low-rank factorization to ensure the integral transform is computationally efficient and scalable.</p>
<p><strong>Results:</strong> A single ITNet architecture with a shared operator matches or exceeds specialized baselines across diverse tasks including ImageNet-1K, GLUE, ModelNet40, VQA v2, and NLVR2.</p>
<p><strong>Limitations:</strong> While the paper addresses scalability through factorization and Monte Carlo integration, the computational complexity of high-dimensional integral transforms remains a potential challenge for extremely large-scale deployments.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19538" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="general">General</h4>

<div class="paper-item" data-date="2026-06-19" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-default" title="cs.DL">cs.DL</span><span class="cat-tag cat-default" title="Systems and Control (cs.SY)">Systems and Control (cs.SY)</span><span class="cat-tag cat-eess" title="eess.SY">eess.SY</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19630">AI4SE and SE4AI Exploration: A Decade Looking Back and Forward</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      H. Sinan Bank, Daniel R. Herber, Thomas Bradley
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19630" target="_blank" rel="noopener noreferrer">2606.19630</a></p>
<p class="paper-detail"><strong>Authors:</strong> H. Sinan Bank, Daniel R. Herber, Thomas Bradley</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">The March 2020 INCOSE INSIGHT special issue on AI and Systems Engineering (SE) became the most downloaded issue in the publication's history and launched a research community that now draws over 250 registrants to its annual workshop. In this article, we trace the progress in AI and SE across three phases (labeled here foundational, applied, and LLM inflection) based on the authors' reading of the field's core papers, and describe our opinions of where the community has converged and where critical gaps remain. Separately, a human-AI agreement literature review leveraging both human expertise and six AI models was performed to assess the relevance of 1,712 INCOSE INSIGHT articles and 889 SERC publications. The results identify five critical research gaps and offer guidance for practitioners navigating AI adoption, assurance, and workforce transformation in SE. We share the agreement data and the AI4SE/SE4AI Explorer web application so readers can compare their own relevance judgments with the human and AI raters.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper provides a historical retrospective of the AI and Systems Engineering (SE) intersection and identifies critical research gaps through a large-scale human-AI agreement study.</p>
<p><strong>Core Idea:</strong> The field has evolved through foundational, applied, and LLM inflection phases, necessitating a structured understanding of AI4SE (AI for SE) and SE4AI (SE for AI) to guide adoption and assurance.</p>
<p><strong>Technique:</strong> The authors utilized a hybrid literature review combining human expertise with six different AI models to evaluate the relevance of nearly 3,500 publications.</p>
<p><strong>Pipeline:</strong> Historical literature and publication databases → Human and AI model relevance scoring → Identification of research gaps and development of the AI4SE/SE4AI Explorer web application.</p>
<p><strong>Methodology:</strong> A qualitative analysis of core papers across three historical phases combined with a quantitative human-AI agreement study on 1,712 INCOSE INSIGHT articles and 889 SERC publications.</p>
<p><strong>Results:</strong> The study identified five critical research gaps and established a baseline for human-AI agreement in relevance judgment, while providing a web application for community exploration.</p>
<p><strong>Limitations:</strong> The study identifies significant remaining gaps in AI assurance and workforce transformation that require further investigation as the field moves past the LLM inflection point.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19630" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="llm">LLM</h4>

<div class="paper-item" data-date="2026-06-19" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-security" title="Cryptography and Security (cs.CR)">Cryptography and Security (cs.CR)</span><span class="cat-tag cat-default" title="Logic in Computer Science (cs.LO)">Logic in Computer Science (cs.LO)</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19588">Analyzing the Narration Gap in LLM-Solver Loops</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Zunchen Huang, Songgaojun Deng
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19588" target="_blank" rel="noopener noreferrer">2606.19588</a></p>
<p class="paper-detail"><strong>Authors:</strong> Zunchen Huang, Songgaojun Deng</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Formal tools such as SAT and SMT solvers are increasingly embedded in language model reasoning pipelines when a safety or security critical question can be formulated in logic. Unlike chain of thought whose steps are sampled from the model distribution without formal guarantee, a solver produces a sound and independently verifiable answer. However, the soundness guarantee can be lost in the interaction between the solver and the model. The hybrid pipeline has three components: formalizing the question, deciding it, and narrating the result. Prior work has studied the formalization and decision, but not narration, which is the step that turns a formal tool's output into the user answer. To fill the narration gap, we first model the LLM-solver loop as a verified decision procedure. We further evaluate five open-sourced models under prompt injection, and we find certificate gating makes the solver verdict sound, while an adversary can invert a verified conclusion across phrasings and channels. We study the mitigation through hardened prompt that reduces injection significantly but cannot eliminate it and still suffers under adaptive attack. Combining the formal analysis and empirical studies, we show in the LLM-solver loop, robustness does not reach to the answer that the user finally reads.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper identifies and analyzes the 'narration gap' in LLM-solver loops, demonstrating that while formal solvers provide sound answers, the final narration step remains vulnerable to prompt injections.</p>
<p><strong>Core Idea:</strong> The soundness of a formal solver can be compromised during the final step of translating a verified result into a natural language response for the user.</p>
<p><strong>Technique:</strong> The authors model the LLM-solver loop as a verified decision procedure and employ certificate gating and hardened prompts to mitigate adversarial attacks.</p>
<p><strong>Pipeline:</strong> User Question → Formalization → Solver Decision → Narration → Final User Answer</p>
<p><strong>Methodology:</strong> The study combines formal verification modeling with empirical evaluations of five open-source models under various prompt injection and adaptive attack scenarios.</p>
<p><strong>Results:</strong> Certificate gating ensures the solver verdict remains sound, but adversaries can still invert verified conclusions across different phrasings; hardened prompts reduce but do not eliminate injection risks.</p>
<p><strong>Limitations:</strong> Hardened prompts cannot completely eliminate vulnerabilities and remain susceptible to sophisticated adaptive attacks.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19588" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-19" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19475">Diffusion Language Models: An Experimental Analysis</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Thomas Bertolani, Davide Bucciarelli, Leonardo Zini, Marcella Cornia, Lorenzo Baraldi
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19475" target="_blank" rel="noopener noreferrer">2606.19475</a></p>
<p class="paper-detail"><strong>Authors:</strong> Thomas Bertolani, Davide Bucciarelli, Leonardo Zini, Marcella Cornia, Lorenzo Baraldi</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Large Language Models (LLMs) have revolutionized language modeling through autoregressive generation, enabling strong performance across a wide range of tasks. Recently, Diffusion Language Models (DLMs) have emerged as an alternative paradigm that generates text through iterative denoising rather than next-token prediction, allowing parallel refinement of entire sequences. While numerous diffusion-based architectures have been proposed, differences in evaluation protocols, datasets, inference budgets, and generation hyperparameters make it difficult to compare their capabilities and understand the trade-offs they offer. In this work, we present a systematic experimental analysis of modern DLMs. Specifically, we evaluate eight state-of-the-art DLMs across eight benchmarks spanning reasoning, coding, translation, knowledge, and structured problem solving, while explicitly considering both generation quality and computational efficiency. Beyond downstream evaluation, we analyze the impact of key inference-time factors, including denoising steps, context length, block size, and parallel unmasking strategies, and complement large-scale experiments with controlled comparisons of smaller models trained under identical conditions. Our analysis highlights the strengths and limitations of diffusion-based language modeling across different tasks, architectures, and inference budgets. We show that the behavior of DLMs is strongly influenced by generation-time design choices, leading to distinct trade-offs between performance and computational efficiency. Overall, our study provides practical insights into the capabilities and deployment characteristics of contemporary DLMs.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper provides a systematic experimental analysis of eight state-of-the-art Diffusion Language Models (DLMs) across diverse benchmarks to establish a clear understanding of their capabilities and trade-offs.</p>
<p><strong>Core Idea:</strong> Unlike autoregressive models that predict tokens sequentially, DLMs generate text through iterative denoising, allowing for parallel refinement of entire sequences.</p>
<p><strong>Technique:</strong> The study employs a multi-dimensional evaluation framework that accounts for generation quality, computational efficiency, and various inference-time hyperparameters.</p>
<p><strong>Pipeline:</strong> Text prompts and benchmarks → Iterative denoising across multiple DLM architectures → Evaluated text outputs across reasoning, coding, translation, and knowledge tasks.</p>
<p><strong>Methodology:</strong> The authors evaluated eight DLMs on eight benchmarks, conducting controlled comparisons of smaller models and analyzing the impact of denoising steps, context length, block size, and parallel unmasking strategies.</p>
<p><strong>Results:</strong> The analysis reveals that DLM performance is heavily influenced by generation-time design choices, showing distinct trade-offs between output quality and computational efficiency across different tasks.</p>
<p><strong>Limitations:</strong> The study highlights that the lack of standardized evaluation protocols and hyperparameters in existing DLM research makes direct comparisons difficult.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19475" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-19" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-default" title="stat.AP">stat.AP</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19607">Which Pairs to Compare for LLM Post-Training?</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Jiangze Han, Vineet Goyal, Will Ma
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19607" target="_blank" rel="noopener noreferrer">2606.19607</a></p>
<p class="paper-detail"><strong>Authors:</strong> Jiangze Han, Vineet Goyal, Will Ma</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Preference-based post-training has become a central paradigm for aligning language models. A common data-collection strategy is to generate a small set of completions for each prompt and label the resulting comparison pairs. However, human preference labels are often much more expensive than generating additional completions, suggesting a different use of the same labeling budget: generate a larger pool of completions, but label only the most informative comparison pairs. This paper studies which pairs should be compared in preference-based post-training. We formulate comparison curation as a sampling-design problem and evaluate designs by the quality of the final policy under the preference-based post-training objective. We instantiate this framework for Direct Preference Optimization (DPO), analyzing how the choice of labeled pairs propagates through DPO training to downstream policy performance. Our main results provide matching upper and lower bounds on the post-training optimality gap of the DPO-trained policy. The bounds show that comparison selection affects downstream performance through a single design-dependent information matrix, which links label allocation to parameter estimation error and policy suboptimality. This yields an explicit optimization criterion for budgeted comparison curation and motivates practical sampling designs for selecting informative pairs from large generated completion pools. Experiments on synthetic settings and language-model post-training benchmarks show that the proposed designs consistently improve sample efficiency over common comparison-selection heuristics.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper provides a theoretical framework for optimal comparison curation in preference-based post-training, establishing upper and lower bounds on the optimality gap of DPO-trained policies based on label allocation.</p>
<p><strong>Core Idea:</strong> Instead of labeling all possible pairs from a small set of completions, it is more efficient to generate a large pool of completions and selectively label only the most informative comparison pairs.</p>
<p><strong>Technique:</strong> The authors formulate comparison curation as a sampling-design problem, deriving a design-dependent information matrix that links label allocation to parameter estimation error and policy suboptimality.</p>
<p><strong>Pipeline:</strong> Large pool of generated completions → Information-theoretic comparison curation → Selected preference pairs → DPO post-training → Optimized policy</p>
<p><strong>Methodology:</strong> The study analyzes how different comparison selection designs propagate through the DPO objective, using synthetic settings and real-world LLM benchmarks to evaluate sample efficiency.</p>
<p><strong>Results:</strong> The proposed designs consistently outperform common heuristics in sample efficiency, providing an explicit optimization criterion for maximizing policy performance under a fixed labeling budget.</p>
<p><strong>Limitations:</strong> The analysis focuses primarily on the DPO objective and may require further validation across different preference-based alignment algorithms or highly complex multi-turn dialogues.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19607" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-19" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19509">LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Akshat Dasula, Prasanna Desikan, Jaideep Srivastava
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19509" target="_blank" rel="noopener noreferrer">2606.19509</a></p>
<p class="paper-detail"><strong>Authors:</strong> Akshat Dasula, Prasanna Desikan, Jaideep Srivastava</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Large language models (LLMs) are increasingly applied to structured clinical data, yet whether they can recognize the limits of their own knowledge on such tasks remains unexplored. We study this question through the lens of cross-model attribution divergence with the goal of reducing epistemic uncertainty for structured tasks, comparing Qwen 2.5 7B and XGBoost on a prediction task via attribution divergence analysis. We report four findings. First, LLM verbalized confidence is epistemically vacuous, it outputs a near-constant (0.856-0.937) regardless of whether accuracy is 49% or 75.3%, tracking prompt format rather than prediction quality. Second, the LLM exhibits an inverse difficulty effect: accuracy drops to 64.8% when XGBoost is 99% correct, but matches XGBoost (73.8% vs. 73.1%) when it is moderately uncertain. Third, few-shot examples and SHAP-derived feature evidence are orthogonal, super-additive interventions: they reduce the Attribution Disagreement Score (ADS) from 1.54 to 0.38 and improve accuracy from 49% to 75.3% without training. Fourth, a cross-model calibrator that determined LLM reliability using attribution divergence signals reduces expected calibration error from 0.254 to 0.080, replacing uninformative verbalized confidence with patient-specific reliability estimates, without accessing model internals or requiring repeated inference. We frame these findings as a cold start problem for LLMs on structured data and outline a path toward genuine epistemic self-awareness.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper identifies that LLMs lack epistemic self-awareness on clinical tabular data and proposes a cross-model attribution divergence method to calibrate reliability. It demonstrates that verbalized confidence is vacuous and provides a way to estimate model reliability without internal access or repeated inference.</p>
<p><strong>Core Idea:</strong> LLMs exhibit 'epistemic blind spots' where they provide high confidence despite low accuracy, particularly when a specialized model (XGBoost) is highly certain. By measuring the divergence in feature attribution between an LLM and a gradient-boosted tree, one can quantify the LLM's uncertainty.</p>
<p><strong>Technique:</strong> Cross-model attribution divergence analysis comparing LLM attention/feature importance against SHAP values from an XGBoost model. The authors also use super-additive interventions (few-shot examples + SHAP evidence) to improve performance.</p>
<p><strong>Pipeline:</strong> Clinical tabular data → XGBoost (SHAP values) &amp; LLM (Attribution) → Attribution Disagreement Score (ADS) → Cross-model Calibrator → Patient-specific reliability estimate</p>
<p><strong>Methodology:</strong> The researchers compared Qwen 2.5 7B and XGBoost on a clinical prediction task, analyzing the correlation between verbalized confidence, prediction accuracy, and attribution divergence. They tested interventions like few-shot prompting and SHAP-derived evidence to mitigate these blind spots.</p>
<p><strong>Results:</strong> Verbalized confidence was found to be vacuous (0.856-0.937) regardless of accuracy. The cross-model calibrator reduced Expected Calibration Error (ECE) from 0.254 to 0.080. Few-shot and SHAP interventions improved accuracy from 49% to 75.3% and reduced ADS from 1.54 to 0.38.</p>
<p><strong>Limitations:</strong> The study focuses on a specific clinical tabular task and does not explore the scalability of attribution divergence across diverse unstructured domains or the computational overhead of generating SHAP values for the calibrator.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19509" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="nlp">NLP</h4>

<div class="paper-item" data-date="2026-06-19" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19626">Toten: Knowledge-Based Ontological Tokenization Of Physical Quantities And Technical Notation In Brazilian Portuguese</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Antonio de Sousa Leit\~ao Filho; Allan Kardec Duailibe Barros Filho; Fabr\'icio Saul Lima; Selby Mykael Lima dos Santos; Rejani Bandeira Vieira Sousa
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19626" target="_blank" rel="noopener noreferrer">2606.19626</a></p>
<p class="paper-detail"><strong>Authors:</strong> Antonio de Sousa Leit\~ao Filho; Allan Kardec Duailibe Barros Filho; Fabr\'icio Saul Lima; Selby Mykael Lima dos Santos; Rejani Bandeira Vieira Sousa</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Byte-Pair Encoding tokenization is statistically efficient for vocabulary compression, but semantically blind to structured technical entities, fragmenting physical quantities, numbers, units, and symbolic expressions into lexically arbitrary subwords. We present TOTEN, a knowledge-based ontological tokenization framework that replaces statistical derivation with declarative classification grounded in a formal ontology of engineering entities (OEE). We formalize TOTEN as the triple : the ontology gathers types, structural principles, composition relations, and preservable invariants; the classification function maps raw text into typed regions; and the instantiator family yields a self-descriptive structured representation. Robustness derives from deterministic coupling with three external oracles: Pint (dimensional), Unicode Character Database (typographic), and RSLP (Portuguese morphology). Intrinsic evaluation covers four properties verifiable by construction -- ontological atomicity, dimensional equivalence, typographic robustness, and numerical reconstruction -- over an internal, physically validated benchmark (EngQuant, N=800) and four Brazilian Portuguese external corpora (N=1771 eligible cases). We also report detection recall, distinguishing coverage from conditional atomicity. Against eight state-of-the-art baselines, TOTEN achieves unit ontological atomicity in all contrasts and numerical reconstruction of 0.775-0.904 on external corpora, vs. 0.627-0.703 for the best baseline (Quantulum3); on EngQuant, 0.780 vs. 0.340. Differences are statistically significant (McNemar with Holm correction). Spearman correlation between internal and external rankings confirms concurrent validity of the control benchmark. Dimensional equivalence shows statistical parity with Pint, the oracle from which the system inherits dimensional authority.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces TOTEN, a knowledge-based ontological tokenization framework that preserves the semantic integrity of physical quantities and technical notations in Brazilian Portuguese. It outperforms state-of-the-art statistical tokenizers by ensuring ontological atomicity and numerical reconstruction of engineering entities.</p>
<p><strong>Core Idea:</strong> Replace statistically derived subword tokenization (like BPE) with a declarative, ontology-grounded classification system to prevent the fragmentation of structured technical data.</p>
<p><strong>Technique:</strong> TOTEN utilizes a triple framework consisting of a formal ontology of engineering entities (OEE), a classification function for typed regions, and an instantiator family for structured representation.</p>
<p><strong>Pipeline:</strong> Raw Brazilian Portuguese technical text → Ontological classification (OEE) + External Oracles (Pint, Unicode, RSLP) → Self-descriptive structured tokens</p>
<p><strong>Methodology:</strong> The authors developed a deterministic system coupled with three external oracles and evaluated it against eight baselines using an internal benchmark (EngQuant) and four external Brazilian Portuguese corpora.</p>
<p><strong>Results:</strong> TOTEN achieved unit ontological atomicity in all contrasts and significantly higher numerical reconstruction scores (0.775-0.904) compared to the best baseline (0.627-0.703) on external corpora, and 0.780 vs. 0.340 on the EngQuant benchmark.</p>
<p><strong>Limitations:</strong> The study focuses specifically on Brazilian Portuguese and engineering entities; the paper also distinguishes between detection recall and conditional atomicity, suggesting coverage remains a factor.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19626" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="rl">RL</h4>

<div class="paper-item" data-date="2026-06-19" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-ai" title="Multiagent Systems (cs.MA)">Multiagent Systems (cs.MA)</span><span class="cat-tag cat-default" title="Systems and Control (cs.SY)">Systems and Control (cs.SY)</span><span class="cat-tag cat-eess" title="eess.SY">eess.SY</span></span>
      <span class="paper-date">19 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19683">Exit-and-Join Dynamics for Decentralized Coalition Formation</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Quanyan Zhu
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19683" target="_blank" rel="noopener noreferrer">2606.19683</a></p>
<p class="paper-detail"><strong>Authors:</strong> Quanyan Zhu</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">This paper studies coalition formation as a decentralized dynamical process driven by unilateral exit-and-join decisions. Agents evaluate local moves using the Aumann-Dreze value, so payoffs are computed within the agent's current coalition rather than through a globally negotiated coalition structure. The resulting model links cooperative payoff allocation with noncooperative best-response behavior: a terminal partition is precisely a coalition structure with no admissible, individually profitable exit-and-join deviation. We establish equilibrium characterizations, identify conditions under which the dynamics admit scalar Lyapunov or exact-potential representations, and analyze how switching and acceptance costs shape local stability. Numerical experiments test finite-time stabilization, cost sensitivity, and a special convex-game benchmark.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces a decentralized dynamical model for coalition formation where agents make unilateral exit-and-join decisions based on local payoff evaluations. It establishes equilibrium characterizations and identifies conditions for Lyapunov and potential-based stability in these dynamics.</p>
<p><strong>Core Idea:</strong> Coalition formation is modeled as a noncooperative best-response process where agents move to maximize their own payoffs using the Aumann-Dreze value, leading to a terminal partition where no profitable moves remain.</p>
<p><strong>Technique:</strong> The study employs game-theoretic analysis to link cooperative payoff allocation with decentralized dynamics, utilizing Lyapunov functions and potential theory to analyze stability.</p>
<p><strong>Pipeline:</strong> Initial coalition structure → Agent-level local payoff evaluation (Aumann-Dreze) → Unilateral exit-and-join decisions → Terminal stable partition</p>
<p><strong>Methodology:</strong> The authors develop a theoretical framework for decentralized dynamics, prove existence of equilibrium conditions, and conduct numerical experiments on cost sensitivity and finite-time stabilization.</p>
<p><strong>Results:</strong> The dynamics reach a terminal partition equivalent to a stable coalition structure; the model demonstrates how switching and acceptance costs influence local stability and convergence.</p>
<p><strong>Limitations:</strong> The paper focuses on unilateral moves and local evaluations, potentially leaving open questions regarding multi-agent coordinated negotiations or complex global constraints.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19683" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h3 id="personal-interests">Personal Interests</h3>

<p class="section-desc">Papers discovered through your interest topics.</p>

<h4 id="multi-agent-systems">Multi-Agent Systems</h4>

<div class="paper-item" data-date="2026-06-18" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-default" title="q-fin.RM">q-fin.RM</span><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-default" title="nlin.AO">nlin.AO</span><span class="cat-tag cat-physics" title="physics.soc-ph">physics.soc-ph</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.20485">Optimal Order of Multi-Agent and General Many-Body Systems</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Jake J. Xia
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.20485" target="_blank" rel="noopener noreferrer">2606.20485</a></p>
<p class="paper-detail"><strong>Authors:</strong> Jake J. Xia</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">This paper develops a general framework for analyzing multi-agent systems with feedback loops between agents actions and collective observations. The framework is built on two fundamental agent-level variables: power, which measures agent influence on collective outcomes, and response functions, which determine how agents react to observations. We derive how macroscopic properties, including total power, useful power, entropy, order, fragility, and mobility, emerge from these two variables of heterogeneous agents. To study the trade off between growth and resilience, we introduce a system-level utility function parameterized by a risk-appetite coefficient and derive an optimal degree of order that balances productivity, stability, and adaptability. The analysis suggests that stronger synchronization can increase collective output but may also increase systemic fragility and reduce mobility. We further argue that order, entropy, information, and useful energy are task-dependent and system-relative concepts whose meanings depend on the objectives of the system. By measuring and designing agent power distributions and response functions, it may be possible to better understand, predict, and optimize collective behavior and identify the conditions under which collective intelligence and optimal order emerge.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper develops a general framework to analyze multi-agent systems by deriving macroscopic properties from two fundamental agent-level variables: power and response functions. It introduces a system-level utility function to identify the optimal balance between collective productivity, stability, and adaptability.</p>
<p><strong>Core Idea:</strong> Collective behaviors like order, entropy, and fragility emerge from the interplay between an agent's influence (power) and its reaction to collective observations (response functions). The study posits that order is task-dependent and that there is a fundamental trade-off between synchronization-driven growth and systemic resilience.</p>
<p><strong>Technique:</strong> The author employs a mathematical framework to derive macroscopic properties from heterogeneous agent variables and uses a risk-appetite parameterized utility function to optimize system states.</p>
<p><strong>Pipeline:</strong> Agent-level variables (power and response functions) → Framework analysis of macroscopic properties (order, entropy, fragility) → Utility function optimization → Optimal degree of order</p>
<p><strong>Methodology:</strong> The research uses a theoretical framework to model feedback loops between individual actions and collective observations, deriving analytical expressions for system-level metrics.</p>
<p><strong>Results:</strong> Stronger synchronization increases collective output but simultaneously increases systemic fragility and reduces mobility; the optimal degree of order is determined by the specific risk-appetite and objectives of the system.</p>
<p><strong>Limitations:</strong> The paper notes that concepts like entropy and useful energy are system-relative and task-dependent, implying that the 'optimal' state varies significantly depending on the specific goals of the system.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.20485" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-18" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Multiagent Systems (cs.MA)">Multiagent Systems (cs.MA)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19758">SIGMA: Skill-Incidence Graphs for Compositional Multi-Agent Design</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Kun Zeng, Yu Huo, Siyu Zhang, Yuecheng Zhuo, Yuquan Lu, Haoyue Liu, Siyue Chen, Xiaoying Tang
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19758" target="_blank" rel="noopener noreferrer">2606.19758</a></p>
<p class="paper-detail"><strong>Authors:</strong> Kun Zeng, Yu Huo, Siyu Zhang, Yuecheng Zhuo, Yuquan Lu, Haoyue Liu, Siyue Chen, Xiaoying Tang</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Existing graph-based multi-agent system (MAS) designers mainly improve collaboration by optimizing communication topologies over predefined agents, roles, or groups. However, because each node remains a closed-set entity, these methods struggle to generalize to tasks that require unseen combinations of capabilities. We propose SIGMA, a skill-incidence graph framework that constructs agents as task-conditioned bundles of reusable skills. Given a task and a skill library, SIGMA predicts a skill-agent incidence matrix, composes agent node embeddings from selected skills, and decodes a communication topology over the constructed agents. During execution, skill-specific mailboxes route messages to the relevant assigned capabilities, making the incidence structure directly operational. Across six reasoning and coding benchmarks with three base LLMs, SIGMA achieves the best average performance and improves over CARD, the strongest non-compositional topology-based baseline, by 2.06, 2.36, and 1.75 points, respectively. It also shows stronger robustness to unseen skill libraries, with an average performance drop of only 0.96 points. These results suggest that compositional node construction is a complementary and important axis for multi-agent design beyond communication topology optimization. Code is available at https://anonymous.4open.science/r/SIGMA-2338/.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces SIGMA, a framework that shifts multi-agent system (MAS) design from optimizing fixed agent roles to dynamically composing agents as bundles of reusable skills. It demonstrates that compositional node construction is a critical axis for improving MAS performance on complex, unseen tasks.</p>
<p><strong>Core Idea:</strong> Instead of treating agents as closed-set entities, SIGMA treats them as task-conditioned bundles of skills derived from a shared library. This allows the system to generalize to new tasks by dynamically constructing the necessary agent capabilities.</p>
<p><strong>Technique:</strong> The framework utilizes a skill-incidence graph to predict which skills are needed for a task, composes agent embeddings from these skills, and decodes a communication topology. It employs skill-specific mailboxes to route messages directly to the relevant capabilities during execution.</p>
<p><strong>Pipeline:</strong> Task and skill library → Skill-agent incidence matrix prediction → Agent node embedding composition → Communication topology decoding → Skill-specific message routing</p>
<p><strong>Methodology:</strong> The authors evaluated SIGMA across six reasoning and coding benchmarks using three base LLMs, comparing it against non-compositional topology-based baselines like CARD. They also tested the framework's robustness against unseen skill libraries.</p>
<p><strong>Results:</strong> SIGMA achieved the best average performance across all benchmarks, outperforming the strongest baseline (CARD) by 1.75 to 2.36 points. It also showed high robustness to unseen skill libraries, with an average performance drop of only 0.96 points.</p>
<p><strong>Limitations:</strong> The paper focuses on the composition of agents from a skill library but does not deeply explore the optimal way to automatically discover or refine the underlying skill library itself.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19758" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h2 id="tech-news">Tech News</h2>

<h3 id="agentic-ai-1">Agentic AI</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/DeepLearning</span>
    <span class="news-date">2026-06-19</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/deeplearning/comments/1u9rojg/does_anyone_know_how_to_make_a_small_language/" target="_blank" rel="noopener noreferrer">Does anyone know how to make a small language model use tools like websearch while avoiding &quot;catastrophic forgetness&quot; i think its called .. this my first attempt to make my own model by training it on my own data</a>
  <p class="news-summary">A user is seeking technical guidance on integrating web search tools into a small, custom-trained language model. They are specifically looking for methods to enable tool-use capabilities while preventing &#x27;catastrophic forgetting&#x27; during the training process.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Small Language Models</span><span class="news-tag">Tool Use</span><span class="news-tag">Catastrophic Forgetting</span><span class="news-tag">Fine-tuning</span><span class="news-tag">Agentic AI</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/deeplearning/comments/1u9rojg/does_anyone_know_how_to_make_a_small_language/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="computing-systems-1">Computing Systems</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Fri, 19 Ju</span>
  </div>
  <a class="news-title" href="https://letsencrypt.status.io/#2026" target="_blank" rel="noopener noreferrer">Let&#x27;s Encrypt has been down most of today</a>
  <p class="news-summary">Let&#x27;s Encrypt, a major certificate authority, experienced significant downtime throughout the day. This outage impacts the issuance and renewal of SSL/TLS certificates, potentially disrupting secure connections for numerous websites and services.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Cybersecurity</span><span class="news-tag">Infrastructure</span><span class="news-tag">Networking</span><span class="news-tag">Web Services</span></div>
    <a class="news-read-btn" href="https://letsencrypt.status.io/#2026" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Fri, 19 Ju</span>
  </div>
  <a class="news-title" href="https://simonwillison.net/2026/Jun/18/datasette-apps/" target="_blank" rel="noopener noreferrer">Datasette Apps: Host custom HTML applications inside Datasette</a>
  <p class="news-summary">Datasette has introduced a new feature allowing users to host custom HTML applications directly within the platform. This enables developers to build interactive front-ends and custom interfaces for their data without needing a separate web server. It simplifies the process of creating data-driven tools and dashboards.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Data Engineering</span><span class="news-tag">Web Development</span><span class="news-tag">Databases</span><span class="news-tag">Tooling</span></div>
    <a class="news-read-btn" href="https://simonwillison.net/2026/Jun/18/datasette-apps/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/DeepLearning</span>
    <span class="news-date">2026-06-18</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/deeplearning/comments/1u9mg7z/pragmatiq_opensource_implementation_of/" target="_blank" rel="noopener noreferrer">pragmatiq: open-source implementation of PRAGMA-style banking event-sequence models</a>
  <p class="news-summary">The pragmatiq project provides an open-source implementation of PRAGMA-style banking event-sequence models. It enables the conversion of timestamped key-value user histories into embeddings for applications like AML graph experiments, LoRA fine-tuning, and explainability. The repository includes synthetic data, PyTorch encoders, and CPU-first training tools to make the research path more accessible.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Open Source</span><span class="news-tag">Embeddings</span><span class="news-tag">Financial AI</span><span class="news-tag">PyTorch</span><span class="news-tag">AML</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/deeplearning/comments/1u9mg7z/pragmatiq_opensource_implementation_of/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="llm-1">LLM</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/DeepLearning</span>
    <span class="news-date">2026-06-19</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/deeplearning/comments/1u9nb38/article_gemma_4_inference_architecture_and/" target="_blank" rel="noopener noreferrer">[Article] Gemma 4 – Inference, Architecture, and Practical Insights</a>
  <p class="news-summary">This article provides a deep dive into Google DeepMind&#x27;s Gemma 4 model, focusing on its architectural improvements and enhanced open-source capabilities. It includes practical insights on inference, technical developments, and a functional Gradio application for deployment.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Gemma 4</span><span class="news-tag">Google DeepMind</span><span class="news-tag">LLM Architecture</span><span class="news-tag">Open Source AI</span><span class="news-tag">Inference</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/deeplearning/comments/1u9nb38/article_gemma_4_inference_architecture_and/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h2 id="github-trending">
  <svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="20" height="20" style="vertical-align:middle;margin-right:6px"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z" /></svg>
  GitHub Trending
</h2>

<p class="section-desc">Trending repositories on GitHub filtered and scored for relevance to your interests.</p>

<h3 id="ai-safety-1">AI Safety</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/p-e-w/heretic" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">p-e-w</span><span class="gh-sep">/</span><strong class="gh-repo">heretic</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">AI Safety</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Heretic is a tool for fully automatic censorship removal from transformer-based language models using directional ablation (abliteration). It optimizes for minimal refusal rates while maintaining low KL divergence, ensuring the model retains its original intelligence without manual intervention.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">llm</span><span class="gh-tag">abliteration</span><span class="gh-tag">AI safety</span><span class="gh-tag">transformer</span><span class="gh-tag">fine-tuning</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-18</span>
      <a class="gh-visit-btn" href="https://github.com/p-e-w/heretic" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="agentic-ai-2">Agentic AI</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/zai-org/GLM-5" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">zai-org</span><span class="gh-sep">/</span><strong class="gh-repo">GLM-5</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository introduces GLM-5, a flagship model series specifically optimized for long-horizon agentic engineering and complex coding tasks. It features a 1M-token context window, an IndexShare architecture for efficient long-context processing, and advanced reasoning capabilities for multi-step tool use and iterative problem-solving.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">agentic-ai</span><span class="gh-tag">llm</span><span class="gh-tag">long-horizon</span><span class="gh-tag">coding</span><span class="gh-tag">transformer</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-18</span>
      <a class="gh-visit-btn" href="https://github.com/zai-org/GLM-5" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/yifanfeng97/Hyper-Extract" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">yifanfeng97</span><span class="gh-sep">/</span><strong class="gh-repo">Hyper-Extract</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Hyper-Extract provides a framework for transforming unstructured text into complex hypergraph structures using LLMs. It is highly relevant for building knowledge-rich agentic systems and RAG pipelines that require sophisticated relational data beyond simple triples.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">knowledge-graph</span><span class="gh-tag">LLM</span><span class="gh-tag">RAG</span><span class="gh-tag">information-extraction</span><span class="gh-tag">hypergraph</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-18</span>
      <a class="gh-visit-btn" href="https://github.com/yifanfeng97/Hyper-Extract" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/Kilo-Org/kilocode" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">Kilo-Org</span><span class="gh-sep">/</span><strong class="gh-repo">kilocode</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Kilo is an all-in-one agentic engineering platform designed to build and iterate on open-source coding agents. It is highly relevant as it provides a framework for developing autonomous AI agents capable of complex software engineering tasks.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">LLM</span><span class="gh-tag">AI Developer Tools</span><span class="gh-tag">Coding Agents</span><span class="gh-tag">TypeScript</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-18</span>
      <a class="gh-visit-btn" href="https://github.com/Kilo-Org/kilocode" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/github/spec-kit" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">github</span><span class="gh-sep">/</span><strong class="gh-repo">spec-kit</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Spec Kit is a toolkit for Spec-Driven Development, which transforms product specifications into executable code using AI coding agents. It is highly relevant for Agentic AI as it provides a structured framework for managing LLM-driven software engineering workflows.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">LLM</span><span class="gh-tag">Software Engineering</span><span class="gh-tag">Copilot</span><span class="gh-tag">Spec-Driven Development</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-18</span>
      <a class="gh-visit-btn" href="https://github.com/github/spec-kit" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/google-research/timesfm" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">google-research</span><span class="gh-sep">/</span><strong class="gh-repo">timesfm</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">TimesFM is a pretrained time-series foundation model from Google Research designed for high-accuracy forecasting. It is highly relevant due to its recent integration of &#x27;Agent Skills&#x27; and support for fine-tuning via LoRA, making it a core component for agentic time-series analysis.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">foundation models</span><span class="gh-tag">time-series</span><span class="gh-tag">agentic AI</span><span class="gh-tag">fine-tuning</span><span class="gh-tag">transformer</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-17</span>
      <a class="gh-visit-btn" href="https://github.com/google-research/timesfm" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/cocoindex-io/cocoindex-code" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">cocoindex-io</span><span class="gh-sep">/</span><strong class="gh-repo">cocoindex-code</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a lightweight, AST-based code search engine designed to optimize context engineering for coding agents. It is highly relevant as it improves the efficiency and speed of LLM-based agents by reducing token usage during codebase navigation.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">coding-agents</span><span class="gh-tag">context-engineering</span><span class="gh-tag">LLM</span><span class="gh-tag">AST</span><span class="gh-tag">RAG</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-16</span>
      <a class="gh-visit-btn" href="https://github.com/cocoindex-io/cocoindex-code" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/DeusData/codebase-memory-mcp" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">DeusData</span><span class="gh-sep">/</span><strong class="gh-repo">codebase-memory-mcp</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a high-performance MCP server that indexes codebases into a persistent knowledge graph for LLM interaction. It is highly relevant for Agentic AI and RAG workflows as it enables agents to perform sub-millisecond queries across large codebases with minimal token overhead.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">RAG</span><span class="gh-tag">Knowledge Graph</span><span class="gh-tag">MCP</span><span class="gh-tag">Code Intelligence</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-13</span>
      <a class="gh-visit-btn" href="https://github.com/DeusData/codebase-memory-mcp" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/calesthio/OpenMontage" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">calesthio</span><span class="gh-sep">/</span><strong class="gh-repo">OpenMontage</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">OpenMontage is an open-source agentic system that transforms AI coding assistants into full video production studios using 500+ agent skills. It is highly relevant as it implements complex multi-agent workflows to orchestrate video generation, editing, and multimodal content creation.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">Multi-Agent Systems</span><span class="gh-tag">Generative Models</span><span class="gh-tag">Video Production</span><span class="gh-tag">Multimodal</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-05-07</span>
      <a class="gh-visit-btn" href="https://github.com/calesthio/OpenMontage" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/microsoft/qlib" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">microsoft</span><span class="gh-sep">/</span><strong class="gh-repo">qlib</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 3/5">★★★<span class="gh-relevance-empty">★★</span> <span class="gh-rel-num">3/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Qlib is an AI-oriented quantitative investment platform that supports various ML paradigms including reinforcement learning and supervised learning. It is particularly relevant due to its integration with RD-Agent to automate the research and development process using agentic workflows.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">reinforcement learning</span><span class="gh-tag">Agentic AI</span><span class="gh-tag">MLOps</span><span class="gh-tag">quantitative-finance</span><span class="gh-tag">machine-learning</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-04-22</span>
      <a class="gh-visit-btn" href="https://github.com/microsoft/qlib" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="computer-vision-1">Computer Vision</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/roboflow/rf-detr" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">roboflow</span><span class="gh-sep">/</span><strong class="gh-repo">rf-detr</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Computer Vision</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">RF-DETR is a state-of-the-art real-time object detection and instance segmentation model architecture. It is highly relevant for Embodied AI and Robotics as it provides high-performance visual perception capabilities for navigating and interacting with physical environments.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">object-detection</span><span class="gh-tag">instance-segmentation</span><span class="gh-tag">DETR</span><span class="gh-tag">computer-vision</span><span class="gh-tag">real-time</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-18</span>
      <a class="gh-visit-btn" href="https://github.com/roboflow/rf-detr" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/Lightricks/LTX-2" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">Lightricks</span><span class="gh-sep">/</span><strong class="gh-repo">LTX-2</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Computer Vision</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides the official inference and LoRA training tools for LTX-2, a high-quality audio-video generative model. It is highly relevant for interests in generative models, multimodal learning, and diffusion-based video synthesis.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">generative-ai</span><span class="gh-tag">video-generation</span><span class="gh-tag">diffusion</span><span class="gh-tag">multimodal</span><span class="gh-tag">ltx-2</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-17</span>
      <a class="gh-visit-btn" href="https://github.com/Lightricks/LTX-2" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="computing-systems-2">Computing Systems</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/alibaba/zvec" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">alibaba</span><span class="gh-sep">/</span><strong class="gh-repo">zvec</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Computing Systems</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Zvec is a high-performance, in-process vector database designed for low-latency similarity search and hybrid retrieval. It is highly relevant for RAG and Agentic AI workflows as it allows for efficient embedding storage and multi-query capabilities directly within an application&#x27;s memory space.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">vector-database</span><span class="gh-tag">RAG</span><span class="gh-tag">similarity-search</span><span class="gh-tag">hybrid-retrieval</span><span class="gh-tag">computing-systems</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-17</span>
      <a class="gh-visit-btn" href="https://github.com/alibaba/zvec" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="general-1">General</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/owainlewis/awesome-artificial-intelligence" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">owainlewis</span><span class="gh-sep">/</span><strong class="gh-repo">awesome-artificial-intelligence</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">General</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This is a comprehensive curated repository of educational resources including courses, books, and research papers across the entire AI spectrum. It is highly relevant as a foundational directory for exploring the user&#x27;s broad interests in machine learning, deep learning, and robotics.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">curated-list</span><span class="gh-tag">educational-resources</span><span class="gh-tag">machine-learning</span><span class="gh-tag">deep-learning</span><span class="gh-tag">research-papers</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-05-17</span>
      <a class="gh-visit-btn" href="https://github.com/owainlewis/awesome-artificial-intelligence" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>]]></content><author><name>hiimmuc</name></author><summary type="html"><![CDATA[Today's research and news focus heavily on the governance, reliability, and architectural refinement of agentic systems, specifically addressing how to manage uncertainty and ensure alignment in complex workflows.]]></summary></entry><entry><title type="html">Daily Digest 2026-06-18</title><link href="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-18/" rel="alternate" type="text/html" title="Daily Digest 2026-06-18" /><published>2026-06-18T00:00:00+07:00</published><updated>2026-06-18T00:00:00+07:00</updated><id>https://hiimmuc.github.io/Personal-AI-Digest/digest/daily</id><content type="html" xml:base="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-18/"><![CDATA[<div class="digest-theme">
  <svg class="digest-theme-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 1.5a6.5 6.5 0 1 0 0 13 6.5 6.5 0 0 0 0-13zM0 8a8 8 0 1 1 16 0A8 8 0 0 1 0 8z" /><path d="M6.5 7.75A.75.75 0 0 1 7.25 7h1a.75.75 0 0 1 .75.75v2.75h.25a.75.75 0 0 1 0 1.5h-2a.75.75 0 0 1 0-1.5h.25v-2h-.25a.75.75 0 0 1-.75-.75zM8 6a1 1 0 1 1 0-2 1 1 0 0 1 0 2z" /></svg>
  <span>Today's digest highlights a significant shift toward long-horizon planning and stateful memory in embodied agents, alongside advancements in verifiable reasoning and specialized industrial applications.</span>
</div>

<h2 id="global-trends">Global Trends</h2>

<h3 id="arxiv-subjects">Papers discovered from ArXiv subject categories</h3>

<h4 id="ai-safety">AI Safety</h4>

<div class="paper-item" data-date="2026-06-18" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-default" title="cs.CY">cs.CY</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18936">SciRisk-Bench: A Risk-Dimension-Aware Benchmark for AI4Science Safety</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Linghao Feng, Yinqian Sun, Dongqi Liang, Sicheng Shen, Chenfei Yan, Yuxuan Peng, Yilin Zhao, Haibo Tong, Kai Li, FeiFei Zhao, Yi Zeng
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18936" target="_blank" rel="noopener noreferrer">2606.18936</a></p>
<p class="paper-detail"><strong>Authors:</strong> Linghao Feng, Yinqian Sun, Dongqi Liang, Sicheng Shen, Chenfei Yan, Yuxuan Peng, Yilin Zhao, Haibo Tong, Kai Li, FeiFei Zhao, Yi Zeng</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Large language models (LLMs) are increasingly embedded in AI for Science (AI4Science) workflows, from scientific question answering and literature analysis to laboratory planning and autonomous discovery. This progress creates an urgent need for safety benchmarks that evaluate not only scientific competence, but also whether models recognize and avoid risks in high-stakes scientific contexts. Existing AI4Science safety datasets cover several disciplines and task formats, leaving the underlying risk dimensions underspecified. We introduce \textbf{SciRisk-Bench}, a benchmark designed to evaluate AI4Science safety from two complementary perspectives: explicit risk dimensions and scientific disciplines. SciRisk-Bench covers 7 disciplines, 31 subdisciplines and 10 risk dimensions. In the experimental section, we evaluate both mainstream LLMs and science-oriented LLMs across risk dimensions, disciplines, and sub-disciplines, enabling fine-grained diagnosis of where scientific models remain unsafe.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces SciRisk-Bench, a comprehensive benchmark designed to evaluate the safety of Large Language Models in AI4Science contexts by focusing on specific risk dimensions and scientific disciplines.</p>
<p><strong>Core Idea:</strong> Current AI4Science safety datasets lack specified risk dimensions; SciRisk-Bench addresses this by providing a structured framework to diagnose model safety across 7 disciplines and 10 risk dimensions.</p>
<p><strong>Technique:</strong> The authors developed a multi-dimensional evaluation framework that categorizes risks into 10 distinct dimensions across 31 subdisciplines to enable fine-grained safety diagnosis.</p>
<p><strong>Pipeline:</strong> Scientific queries/scenarios → SciRisk-Bench evaluation framework (Risk Dimensions &amp; Disciplines) → Fine-grained safety diagnosis of LLMs</p>
<p><strong>Methodology:</strong> The researchers curated a dataset covering 7 disciplines and 10 risk dimensions, then evaluated both mainstream and science-oriented LLMs to identify specific safety vulnerabilities.</p>
<p><strong>Results:</strong> The benchmark enables a fine-grained diagnosis of where scientific models remain unsafe across various disciplines and subdisciplines, highlighting specific risk-dimension failures.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the specific types of risks included in the 10 dimensions or the specific performance gaps of the evaluated models in the abstract.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18936" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="agentic-ai">Agentic AI</h4>

<div class="paper-item" data-date="2026-06-18" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18385">CaVe-VLM-CoT: An Interpretable Vision-Language Model Framework</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Sneha Rao, Shaina Raza, Dhanesh Ramachandram
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18385" target="_blank" rel="noopener noreferrer">2606.18385</a></p>
<p class="paper-detail"><strong>Authors:</strong> Sneha Rao, Shaina Raza, Dhanesh Ramachandram</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Vision-Language Models (VLMs) remain prone to hallucinations, producing fluent but visually unfaithful outputs. Existing chain-of-thought and retrieval-augmented methods only partially address this, as they neither enforce step-level citation grounding nor route verification failures back to retrieval for correction. We present CaVe-VLM-CoT, a modular reflection-based agentic-RAG framework that enforces evidence-grounded reasoning through a five-stage closed-loop pipeline: Extractor, Retriever, Solver, Citation Injector, and Verifier, in which detected ungrounded claims trigger structured feedback to the Extractor for targeted re-retrieval. Since no existing framework jointly measures retrieval quality, step-wise citation faithfulness, and cross-modal grounding, we propose a suite of 23 component-wise metrics across all stages, anchored by CaVeScore, a composite metric weighting accuracy, citation precision and recall, attribution, and evidence grounding. Without any architectural or prompt modifications, CaVe-VLM-CoT achieves 87.1\% accuracy and 56.6\% CaVeScore on ScienceQA , and 55.2\% accuracy and 35.7\% CaVeScore on MMMU (30 subjects).</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces CaVe-VLM-CoT, a modular reflection-based agentic-RAG framework designed to mitigate hallucinations in Vision-Language Models by enforcing step-level citation grounding. It also proposes a comprehensive suite of 23 component-wise metrics and a composite CaVeScore to evaluate retrieval quality and cross-modal grounding.</p>
<p><strong>Core Idea:</strong> The core idea is to create a closed-loop reasoning system where ungrounded claims are detected and routed back to the retrieval stage for correction, ensuring every step of the chain-of-thought is anchored in visual evidence.</p>
<p><strong>Technique:</strong> The framework employs a five-stage agentic pipeline (Extractor, Retriever, Solver, Citation Injector, and Verifier) that utilizes a reflection mechanism to trigger targeted re-retrieval upon verification failure.</p>
<p><strong>Pipeline:</strong> Visual/Textual Input → Extractor → Retriever → Solver → Citation Injector → Verifier → (Feedback Loop to Extractor if ungrounded) → Final Grounded Output</p>
<p><strong>Methodology:</strong> The authors developed a modular agentic-RAG architecture that evaluates each reasoning step for faithfulness to retrieved evidence. They established a multi-dimensional evaluation framework using 23 metrics to measure accuracy, citation precision, and evidence grounding.</p>
<p><strong>Results:</strong> CaVe-VLM-CoT achieved 87.1% accuracy and a 56.6% CaVeScore on ScienceQA, and 55.2% accuracy with a 35.7% CaVeScore on MMMU (30 subjects) without requiring architectural or prompt modifications.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the computational overhead of the multi-stage reflection loop or the potential latency introduced by the iterative re-retrieval process.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18385" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-18" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span><span class="cat-tag cat-se" title="Software Engineering (cs.SE)">Software Engineering (cs.SE)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18543">CEO-Bench: Can Agents Play the Long Game?</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Haozhe Chen, Karthik Narasimhan, Zhuang Liu
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18543" target="_blank" rel="noopener noreferrer">2606.18543</a></p>
<p class="paper-detail"><strong>Authors:</strong> Haozhe Chen, Karthik Narasimhan, Zhuang Liu</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Language model agents are becoming proficient executors at isolated, short-horizon tasks such as software engineering and customer service. Yet real-world challenges require a combination of sophisticated skills that remain largely untested in agents: (1) navigating long horizons amid uncertainty; (2) acquiring information in noisy environments; (3) adapting to a changing world; (4) orchestrating multiple moving parts toward a coherent goal. We introduce CEO-Bench, which evaluates these capabilities together by simulating a representative real-world task: operating a startup for 500 days. An agent manages pricing, marketing, budgeting, and many other aspects of a fictional company through a programmable Python interface, operating in the same environment and facing the same challenges as a human CEO. Success demands analyzing noisy, interconnected business databases, translating signals into sound strategy, and coordinating many decisions with programming. The strongest agents write sophisticated code that simulates customer cohorts to forecast future cash and mines negotiation history to uncover hidden customer preferences. Even so, most state-of-the-art models struggle in this environment. Only Claude Opus 4.8 and GPT-5.5 finish above the $1M starting balance, and neither consistently turns a profit. CEO-Bench takes a first step toward measuring the intelligence required to drive sustained, adaptive progress over time.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces CEO-Bench, a new benchmark designed to evaluate the ability of LLM agents to manage long-horizon, complex, and multi-faceted real-world tasks. It shifts the focus from isolated task execution to sustained, adaptive decision-making in a dynamic environment.</p>
<p><strong>Core Idea:</strong> To test true agentic intelligence, models must be evaluated on their ability to navigate uncertainty, handle noisy data, adapt to changing conditions, and orchestrate multiple moving parts over a long period.</p>
<p><strong>Technique:</strong> The authors developed a programmable Python interface that simulates a startup environment, requiring agents to manage pricing, marketing, and budgeting over a 500-day period.</p>
<p><strong>Pipeline:</strong> Startup environment state → Agent analysis of noisy databases and history → Strategy formulation and code execution → Environment update and feedback loop → Final financial outcome</p>
<p><strong>Methodology:</strong> The benchmark simulates a fictional company where agents must interact with interconnected business databases and write code to forecast trends and mine customer preferences. Success is measured by the agent's ability to maintain and grow a starting capital balance over 500 days.</p>
<p><strong>Results:</strong> Most state-of-the-art models failed to maintain the initial $1M balance; only Claude Opus 4.8 and GPT-5.5 finished above the starting balance, and neither consistently achieved profitability.</p>
<p><strong>Limitations:</strong> The current results show that even the strongest models struggle with sustained profit, highlighting a significant gap in the ability of agents to drive long-term, adaptive progress.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18543" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-18" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18890">Skill-Guided Continuation Distillation for GUI Agents</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Zhimin Fan, Hongwei Yu, Yeqing Shen, Haolong Yan, Guozhen Peng, Tianhao Peng, Yudong Zhang, Xiaowen Zhang, Kaijun Tan, Zheng Ge, Xiangyu Zhang, Daxin Jiang
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18890" target="_blank" rel="noopener noreferrer">2606.18890</a></p>
<p class="paper-detail"><strong>Authors:</strong> Zhimin Fan, Hongwei Yu, Yeqing Shen, Haolong Yan, Guozhen Peng, Tianhao Peng, Yudong Zhang, Xiaowen Zhang, Kaijun Tan, Zheng Ge, Xiangyu Zhang, Daxin Jiang</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Improving GUI agents typically relies on behavior cloning on expert trajectories. However, as the current policy deviates from the expert policy, it inevitably encounters policy-induced off-trajectory states during closed-loop execution, i.e., states that fall outside the expert trajectories. Since expert trajectories provide no demonstrations for these unseen states, such states receive no effective supervision, leaving the policy unable to select the correct action. To close this supervision gap, we propose Skill-Guided Continuation Distillation (SGCD), an iterative self-improvement framework. SGCD first runs the plain policy without skill guidance for a few steps to reach realistic off-trajectory states. From these states, a skill-guided policy then completes the task and produces successful continuations, which are mixed with expert trajectories to supply supervision over policy-induced off-trajectory states. The skills are extracted from both successful and failed rollouts, consisting of Continuation Plans, Critical Targets, Failure Traps, and Success Criteria. On OSWorld-Verified, SGCD improves the success rate of three base models from the low-30\% range to over 50\%, demonstrating its effectiveness and generality.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces Skill-Guided Continuation Distillation (SGCD), an iterative self-improvement framework that addresses the supervision gap in GUI agents caused by policy-induced off-trajectory states.</p>
<p><strong>Core Idea:</strong> By generating successful continuations from states reached by the current policy but absent in expert data, the model can learn to recover from its own mistakes.</p>
<p><strong>Technique:</strong> The method uses a skill-guided policy to complete tasks from off-trajectory states, extracting Continuation Plans, Critical Targets, Failure Traps, and Success Criteria to provide supervision.</p>
<p><strong>Pipeline:</strong> Expert trajectories and policy-induced off-trajectory states → Skill-guided completion of tasks → Mixed supervision data → Policy distillation and self-improvement</p>
<p><strong>Methodology:</strong> SGCD runs a plain policy to reach off-trajectory states, then uses a skill-guided policy to produce successful continuations which are mixed with original expert trajectories for training.</p>
<p><strong>Results:</strong> SGCD improved the success rate of three base models on OSWorld-Verified from the low-30% range to over 50%.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the computational overhead of the iterative distillation process or the potential for compounding errors if the skill-guided policy fails to find a valid continuation.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18890" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-18" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span><span class="cat-tag cat-ir" title="Information Retrieval (cs.IR)">Information Retrieval (cs.IR)</span><span class="cat-tag cat-ai" title="Multiagent Systems (cs.MA)">Multiagent Systems (cs.MA)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18947">Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Emmanuel Aboah Boateng, Kyle MacDonald, Amardeep Kumar, Siddharth Kodwani, Sudeep Das
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18947" target="_blank" rel="noopener noreferrer">2606.18947</a></p>
<p class="paper-detail"><strong>Authors:</strong> Emmanuel Aboah Boateng, Kyle MacDonald, Amardeep Kumar, Siddharth Kodwani, Sudeep Das</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Production LLM agents increasingly depend on real-time search, yet native search grounding bundles retrieval policy, provider choice, evidence injection, cost, latency, and generation behavior behind a single model-provider boundary. This coupling makes grounding hard to inspect, tune, reuse, or port, and can trigger Search-Induced Verbosity that breaks strict output contracts. We present Decoupled Search Grounding (DSG), a vendor-agnostic boundary that moves grounding outside the reasoning model through an MCP-compatible gateway, exposing provider routing, source-aware context rendering, configured fallback, retrieval-depth control, and exact plus semantic caching as first-class controls. Across five frontier models on SimpleQA, FreshQA, and HotpotQA, native search leads on recency-sensitive FreshQA, but DSG exposes a stronger frontier when control matters: on SimpleQA it nearly matches native accuracy (86.1% vs. 87.7%) at 91% lower search cost, preserves concise answer contracts, and reaches a 99.4% warm-cache hit rate with 68% lower latency. Deployed as a shared production grounding layer for large-scale agentic workloads with interchangeable models, DSG matches or slightly exceeds native-search accuracy on an e-commerce query-understanding (QIU) workload while cutting search cost by over 98%. Real-time grounding is best treated as an optimizable interface boundary, not a fixed model feature.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces Decoupled Search Grounding (DSG), a vendor-agnostic architecture that separates search retrieval and grounding from the reasoning model's internal logic. This allows for granular control over costs, latency, and output formatting while enabling model interchangeability.</p>
<p><strong>Core Idea:</strong> Grounding should be treated as an optimizable interface boundary rather than a fixed model feature to prevent search-induced verbosity and high costs.</p>
<p><strong>Technique:</strong> DSG utilizes an MCP-compatible gateway to expose first-class controls such as provider routing, source-aware context rendering, retrieval-depth control, and exact plus semantic caching.</p>
<p><strong>Pipeline:</strong> User Query → DSG Gateway (Routing, Retrieval, Context Rendering, Caching) → Reasoning Model → Final Output</p>
<p><strong>Methodology:</strong> The authors evaluated DSG against native search grounding across five frontier models on SimpleQA, FreshQA, and HotpotQA benchmarks, as well as a production e-commerce query-understanding (QIU) workload.</p>
<p><strong>Results:</strong> DSG achieved 86.1% accuracy on SimpleQA (vs. 87.7% native) at 91% lower search cost, reached a 99.4% warm-cache hit rate with 68% lower latency, and cut search costs by over 98% on QIU workloads.</p>
<p><strong>Limitations:</strong> Native search still leads on recency-sensitive tasks like FreshQA, suggesting that the decoupling may slightly trade off extreme recency for control and efficiency.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18947" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-18" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-hci" title="Human-Computer Interaction (cs.HC)">Human-Computer Interaction (cs.HC)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18413">Searching for Synergy in Shared Workspace Human-AI Collaboration</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Nachiket Kotalwar, Rohini Das, Carolyn Rose
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18413" target="_blank" rel="noopener noreferrer">2606.18413</a></p>
<p class="paper-detail"><strong>Authors:</strong> Nachiket Kotalwar, Rohini Das, Carolyn Rose</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Automated AI agents are increasingly capable, yet many scientific and professional tasks require human judgment and contextual expertise. We study shared-workspace human-AI teams, where AI agents and human collaborators must coordinate responsibilities before submitting a final answer. Using the Collaborative Gym environment with DiscoveryBench tasks, we examine when adding simulated human collaborators improves performance and when process loss turns additional collaborators into coordination overhead. Across 1,482 sessions, adding relevant collaborators can lower performance when teams lack structure to coordinate their contributions. We then evaluate scaffolding that combines shared group memory with simulated human-in-the-loop (HITL) gates, where selected actions require approval from a designated simulated participant. This scaffolding yields higher mean performance, most clearly in three-person teams, with clearer responsibility signals and stronger routing of expertise to team actions. Overall, how human-AI teams coordinate and integrate expertise matters as much as the capability available to them.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper investigates the dynamics of human-AI collaboration in shared workspaces, identifying how team structure and coordination scaffolding impact performance. It demonstrates that adding collaborators can lead to process loss without proper responsibility signals and proposes a scaffolding method to mitigate this.</p>
<p><strong>Core Idea:</strong> The effectiveness of human-AI teams depends as much on the coordination structure and integration of expertise as it does on the individual capabilities of the agents.</p>
<p><strong>Technique:</strong> The researchers developed a scaffolding framework that combines shared group memory with simulated human-in-the-loop (HITL) gates to manage responsibility and route expertise.</p>
<p><strong>Pipeline:</strong> Shared-workspace tasks → Collaborative Gym environment with DiscoveryBench tasks → Team coordination with shared memory and HITL gates → Final performance evaluation</p>
<p><strong>Methodology:</strong> The study analyzed 1,482 sessions in a simulated environment to compare team performance with and without coordination scaffolding across different team sizes.</p>
<p><strong>Results:</strong> Adding relevant collaborators can lower performance due to coordination overhead; however, the proposed scaffolding yielded higher mean performance, particularly in three-person teams, by providing clearer responsibility signals.</p>
<p><strong>Limitations:</strong> The study relies on simulated human collaborators and may not fully capture the nuances of real-world human-AI interaction dynamics.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18413" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-18" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18746">What Must Generalist Agents Remember?</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Khurram Yamin, Namrata Deka, Maitreyi Swaroop, Albert Ting, Jeff Schneider, Bryan Wilder
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18746" target="_blank" rel="noopener noreferrer">2606.18746</a></p>
<p class="paper-detail"><strong>Authors:</strong> Khurram Yamin, Namrata Deka, Maitreyi Swaroop, Albert Ting, Jeff Schneider, Bryan Wilder</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">This paper develops a formal account of what generalist agents must store in memory in order to act near-optimally across multiple environments and goals. It shows that when two domains share an observational bottleneck but require incompatible optimal actions, any uniformly near-optimal policy must induce distinct memory distributions at that bottleneck. The result yields a separation theorem: sufficiently successful agents cannot rely only on current state observations, but must preserve domain-relevant information in memory. The paper further shows that if an agent's memory contains enough information to estimate values for related goals, then that memory can be used to approximately reconstruct the agent's local transition dynamics. Together, these results characterize memory as the substrate that supports domain disambiguation, transition-model reconstruction, and planning for generalist agents.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper provides a formal theoretical framework characterizing the necessary memory requirements for generalist agents to achieve near-optimal performance across diverse environments. It establishes a separation theorem proving that agents must store domain-relevant information to disambiguate environments with shared observational bottlenecks.</p>
<p><strong>Core Idea:</strong> Generalist agents require memory to distinguish between different domains that appear identical in current observations but require different actions. Furthermore, memory sufficient for goal-value estimation can be leveraged to reconstruct local transition dynamics.</p>
<p><strong>Technique:</strong> The authors use a formal mathematical account and a separation theorem to analyze the relationship between observational bottlenecks, memory distributions, and policy optimality.</p>
<p><strong>Pipeline:</strong> Multi-domain observations → Memory storage of domain-relevant information → Domain disambiguation and transition-model reconstruction → Near-optimal action selection</p>
<p><strong>Methodology:</strong> The study employs theoretical analysis to demonstrate that uniformly near-optimal policies must induce distinct memory distributions at observational bottlenecks. It also explores the relationship between value estimation and transition dynamics reconstruction.</p>
<p><strong>Results:</strong> The research establishes a separation theorem showing that successful agents cannot rely solely on current state observations. It identifies memory as the fundamental substrate for domain disambiguation, transition-model reconstruction, and planning.</p>
<p><strong>Limitations:</strong> The paper focuses on the theoretical requirements of memory rather than specific architectural implementations or empirical benchmarks for large-scale generalist models.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18746" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-18" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-default" title="cs.CY">cs.CY</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18803">ProfiLLM: Utility-Aligned Agentic User Profiling for Industrial Ride-Hailing Dispatch</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Tengfei Lyu, Zirui Yuan, Xu Liu, Kai Wan, Zihao Lu, Li Ma, Hao Liu
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18803" target="_blank" rel="noopener noreferrer">2606.18803</a></p>
<p class="paper-detail"><strong>Authors:</strong> Tengfei Lyu, Zirui Yuan, Xu Liu, Kai Wan, Zihao Lu, Li Ma, Hao Liu</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Bringing Large Language Models (LLMs) into industrial ride-hailing dispatch as semantic feature extractors over platform-scale behavioral logs is a compelling but under-explored data systems problem. Production matching pipelines remain dominated by structured numerical features, yet decisive behavioral signals (e.g., a driver's habitual aversion to certain regions) are inherently contextual and naturally expressible as LLM-generated user profiles. However, scaling such profiling to a live, millisecond-latency dispatcher faces three intertwined constraints rarely addressed together: on a platform with millions of daily orders, logs exceed any LLM's context window by orders of magnitude; most users are long-tail, with too few interactions for per-user profiling; and surface-fluent profiles do not necessarily improve downstream prediction utility. We present ProfiLLM, an agentic LLM data pipeline that operationalizes utility-aligned user profiling for production matching systems through two modules. (1) Tool-Augmented Global Knowledge Mining equips an LLM agent with 27 analytical tools to mine platform-scale data, producing reusable global knowledge, adaptive user clustering rules, and region-level supply-demand priors. (2) Utility-Aligned Profile Exploration generates multiple candidate profiles per cluster, evaluates them via a lightweight downstream utility proxy, iteratively refines the best candidates and constructs preference pairs for DPO fine-tuning. Deployed on DiDi's production dispatcher, ProfiLLM achieves up to +6.14% relative AUC improvement in outcome prediction, up to +4.35% GMV gain in dispatching simulation, and consistent improvements in a 14-day online A/B test including +0.47% GMV, +0.33% Completion Rate, and -0.82% Cancel-Before-Accept rate.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces ProfiLLM, an agentic LLM data pipeline that extracts semantic user profiles from large-scale behavioral logs to improve industrial ride-hailing dispatching.</p>
<p><strong>Core Idea:</strong> The authors propose bridging the gap between unstructured behavioral signals and production matching systems by using LLMs to generate utility-aligned profiles that capture contextual preferences like regional aversions.</p>
<p><strong>Technique:</strong> The system utilizes a tool-augmented LLM agent for global knowledge mining and a utility-aligned exploration module that uses a lightweight proxy to refine profiles for DPO fine-tuning.</p>
<p><strong>Pipeline:</strong> Platform-scale behavioral logs → Tool-Augmented Global Knowledge Mining &amp; Utility-Aligned Profile Exploration → LLM-generated user profiles &amp; DPO-fine-tuned models → Improved dispatching outcomes.</p>
<p><strong>Methodology:</strong> The methodology involves mining global priors and clustering rules using 27 analytical tools, followed by iterative profile refinement based on downstream utility proxies and DPO fine-tuning.</p>
<p><strong>Results:</strong> Achieved +6.14% relative AUC improvement, +4.35% GMV gain in simulation, and +0.47% GMV with -0.82% Cancel-Before-Accept rate in a 14-day online A/B test.</p>
<p><strong>Limitations:</strong> The paper notes the challenges of scaling to millions of daily orders, handling long-tail users with sparse data, and ensuring surface-fluent profiles translate to actual prediction utility.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18803" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-18" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18874">Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Zijian Wang, Hanqi Li, Ziyue Yang, Zijian Hu, Shenghan Zuo, Yunzhe Zhang, Da Ma, Danyu Luo, Chenrun Wang, Jing Peng, Tiancheng Huang, Sijia Guo, Huayang Wang, Zichen Zhu, Senyu Han, Yilu Cao, Kai Yu, Lu Chen
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18874" target="_blank" rel="noopener noreferrer">2606.18874</a></p>
<p class="paper-detail"><strong>Authors:</strong> Zijian Wang, Hanqi Li, Ziyue Yang, Zijian Hu, Shenghan Zuo, Yunzhe Zhang, Da Ma, Danyu Luo, Chenrun Wang, Jing Peng, Tiancheng Huang, Sijia Guo, Huayang Wang, Zichen Zhu, Senyu Han, Yilu Cao, Kai Yu, Lu Chen</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">AI systems can increasingly automate scientific workflows, but the reasoning that links prior evidence, generated ideas, experiments and final claims often remains implicit inside model inference. Here we introduce Xcientist, a research harness that externalizes research synthesis and experimental validation into inspectable, contract-governed processes. Xcientist organizes literature evidence, idea states, implementation plans, ablation records and repair traces as persistent research artifacts, so that generated mechanisms can be grounded, executed, tested and revised without losing their evidential basis. We identify claim drift as a failure mode of automated research, where runnable artifacts no longer support the mechanism originally claimed. Across training-free memory systems, graph-structured traffic forecasting and multi-scale physics-informed neural networks, Xcientist preserves traceable trajectories from problem formulation to mechanism design, validation and bounded revision. These results suggest that AI scientists should be evaluated not only by their final artifacts, but by whether their synthesis and validation processes remain attributable, inspectable and scientifically accountable.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces Xcientist, a research harness that externalizes the implicit reasoning of AI scientists into inspectable, contract-governed processes to ensure scientific accountability.</p>
<p><strong>Core Idea:</strong> By treating research synthesis and validation as persistent artifacts rather than hidden model inferences, the system prevents 'claim drift' and ensures that final mechanisms remain grounded in their original evidential basis.</p>
<p><strong>Technique:</strong> The system utilizes a research harness to organize literature evidence, idea states, implementation plans, and repair traces as traceable, persistent artifacts.</p>
<p><strong>Pipeline:</strong> Problem Formulation → Literature Evidence Synthesis → Idea State Generation → Implementation Planning → Experimental Validation → Bounded Revision → Final Artifact</p>
<p><strong>Methodology:</strong> The authors evaluated Xcientist across three diverse domains: training-free memory systems, graph-structured traffic forecasting, and multi-scale physics-informed neural networks.</p>
<p><strong>Results:</strong> Xcientist successfully preserved traceable trajectories from formulation to validation, demonstrating that the system maintains scientific accountability and prevents the decoupling of runnable artifacts from their claimed mechanisms.</p>
<p><strong>Limitations:</strong> The paper focuses on the framework for externalizing synthesis and does not fully explore the scalability of the harness across extremely large-scale, multi-year longitudinal research projects.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18874" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="computer-vision">Computer Vision</h4>

<div class="paper-item" data-date="2026-06-18" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18271">NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Juan Manuel Delfa Victoria, Taran Cyriac John, Andrew W. Herson
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18271" target="_blank" rel="noopener noreferrer">2606.18271</a></p>
<p class="paper-detail"><strong>Authors:</strong> Juan Manuel Delfa Victoria, Taran Cyriac John, Andrew W. Herson</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">As Earth Observation data generation outpaces downlink bandwidth and human-in-the-loop processing, a widening gap has emerged between onboard collection and actionable ground intelligence. This paper presents NAVI-Orbital, a software system deployed on a Low Earth Orbit (LEO) spacecraft. On April 16, 2026, NAVI-Orbital achieved what is, to the authors' knowledge, the first in-orbit demonstration of a vision-language model performing autonomous multi-modal inference entirely onboard. NAVI-Orbital uses a local vision-language model (Gemma 3) to classify each captured scene, produce a text description of its content and the relationships between its features, and respond to operator follow-up via natural-language dialogue. The system is re-tasked through plain-English prompts in place of conventional command sequences, and is orchestrated by a graph-based state machine (LangGraph) coordinating dedicated agents for detection and dialogue. Results across ground benchmarking (88.16% accuracy on the 7,960-image curated AID benchmark), Flatsat validation, and live in-orbit captures of newly acquired, previously unseen Earth imagery (including uncorrected YAM-9 imagery, processed onboard with hardware-accelerated GPU inference and no fine-tuning for the flight instrument) demonstrate the feasibility of running foundation models on satellite-class edge computers to invert the conventional acquire-then-downlink-everything bandwidth profile through semantic compression of Earth observations in-orbit.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper presents the first in-orbit demonstration of a zero-shot vision-language model (VLM) performing autonomous multi-modal inference on a Low Earth Orbit (LEO) spacecraft. It demonstrates the feasibility of using onboard foundation models to perform semantic compression of Earth observation data.</p>
<p><strong>Core Idea:</strong> By processing imagery onboard using a VLM, the system replaces the 'acquire-then-downlink-everything' model with a system that only transmits high-level semantic descriptions and actionable intelligence.</p>
<p><strong>Technique:</strong> The system utilizes a local Gemma 3 vision-language model orchestrated by a graph-based state machine (LangGraph) to coordinate dedicated agents for detection and natural-language dialogue.</p>
<p><strong>Pipeline:</strong> Raw Earth imagery → Hardware-accelerated GPU inference (Gemma 3) → Scene classification and text description → Natural-language dialogue/re-tasking via plain-English prompts</p>
<p><strong>Methodology:</strong> The authors deployed NAVI-Orbital on a LEO spacecraft, validating it through ground benchmarking on the AID dataset, Flatsat simulation, and live in-orbit captures of previously unseen imagery without fine-tuning.</p>
<p><strong>Results:</strong> Achieved 88.16% accuracy on the 7,960-image AID benchmark and successfully performed live in-orbit inference on uncorrected YAM-9 imagery.</p>
<p><strong>Limitations:</strong> The paper focuses on zero-shot performance and does not detail the specific power/thermal constraints of the satellite-class edge computer or the long-term reliability of the graph-based state machine in space.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18271" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-18" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18950">RTSGameBench: An RTS Benchmark for Strategic Reasoning by Vision-Language Models</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      San Kim, Daechul Ahn, Reokyoung Kim, Hyeonbeom Choi, Seungyeon Jwa, Jonghyun Choi
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18950" target="_blank" rel="noopener noreferrer">2606.18950</a></p>
<p class="paper-detail"><strong>Authors:</strong> San Kim, Daechul Ahn, Reokyoung Kim, Hyeonbeom Choi, Seungyeon Jwa, Jonghyun Choi</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Modern Vision-Language Models (VLMs) often struggle with strategic reasoning, i.e., anticipating and influencing other agents' actions, under uncertainty in competitive and cooperative settings. Real-time strategy (RTS) games can be a natural testbed for diagnosing this limitation, as they demand coordination with allies, adaptation to opponents' strategy, and long-horizon planning under partial observability. However, existing RTS benchmarks offer limited evaluation scope, lack systematic competency diagnosis, and remain fixed in the pre-designed scenario coverage. To address these limitations, we present RTSGameBench, which is built on Beyond All Reason, a large-scale RTS game with an expanded battlefield that demands broader strategy diversity than the existing testbeds. The proposed benchmark provides evaluations through diverse gameplay across various matchup structures, diagnostic assessment via mini-games, each targeting an individual strategic competency, and extensible coverage via a self-evolving generation framework that converts free-form queries into new mini-games, improving over successive cycles. Additionally, for VLMs to operate in large-scale RTS games, we provide RTSGameAgent that manages units by an FSM with agentic memory. We empirically validate that multiple state-of-the-art VLMs do not perform well when matchups demand tighter coordination, multiagent coordination and when task scale increases.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces RTSGameBench, a comprehensive benchmark designed to evaluate and diagnose the strategic reasoning capabilities of Vision-Language Models (VLMs) in complex, large-scale real-time strategy environments.</p>
<p><strong>Core Idea:</strong> RTS games serve as a natural testbed for assessing a VLM's ability to handle long-horizon planning, partial observability, and multi-agent coordination under uncertainty.</p>
<p><strong>Technique:</strong> The authors developed a self-evolving generation framework to create new mini-games from free-form queries and an RTSGameAgent system using Finite State Machines (FSM) with agentic memory.</p>
<p><strong>Pipeline:</strong> Free-form queries → Self-evolving generation framework → New mini-games → VLM evaluation via RTSGameAgent → Strategic competency diagnosis</p>
<p><strong>Methodology:</strong> The researchers built a benchmark on the 'Beyond All Reason' game, utilizing diverse matchup structures and specific mini-games to isolate and assess individual strategic competencies.</p>
<p><strong>Results:</strong> Empirical validation shows that state-of-the-art VLMs struggle significantly with tight coordination, multi-agent cooperation, and increasing task scales.</p>
<p><strong>Limitations:</strong> The study highlights the current inability of VLMs to perform complex strategic reasoning at scale, leaving open questions on how to improve long-horizon planning and multi-agent coordination.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18950" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="general">General</h4>

<div class="paper-item" data-date="2026-06-18" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18988">ThinkDeception: A Progressive Reinforcement Learning Framework for Interpretable Multimodal Deception Detection</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Jinhao Song, Shan Liang, Yiqun Yue, Zhuhuayang Zhang, Tianqi Gao
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18988" target="_blank" rel="noopener noreferrer">2606.18988</a></p>
<p class="paper-detail"><strong>Authors:</strong> Jinhao Song, Shan Liang, Yiqun Yue, Zhuhuayang Zhang, Tianqi Gao</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Multimodal deception detection is critical for identifying fraudulent intentions, yet existing approaches predominantly rely on end to end black--box paradigms. These methods suffer from a severe lack of interpretability failing to provide transparent reasoning trajectories and struggling to explicitly capture the subtle, cross modal inconsistencies inherent in deceptive behaviors. To transcend these limitations, we propose ThinkDeception, a novel and interpretable multimodal deception detection framework. As a pioneering effort, it introduces Multimodal Large Language Models (MLLMs) into this domain, transforming deception detection from a traditional binary classification task into an explicit cognitive reasoning process. Facilitated by the first meticulously annotated step--by--step multimodal Chain of Thought (CoT) dataset, we develop a foundational model, ThinkDeception Base, empirically validating the critical role of modal inconsistency in decoding deception. Building upon this foundation, our core innovation lies in proposing Visual-Audio Consistency Group Relative Policy Optimization(VAC--GRPO) equipped with a progressive training strategy. Distinct from standard GRPO, we stratify the training data into four progressive difficulty tiers, guiding the model through a psychologically grounded easy--to--hard cognitive transition. By innovatively coupling this dynamic curriculum scheduler with a multi dimensional, process aware reward mechanism and a reflective learning paradigm, we significantly elevate the model's overall reasoning quality. Extensive experiments on mainstream benchmarks demonstrate that ThinkDeception establishes a new SOTA, significantly outperforming existing methods in both detection accuracy and rationale quality. Ultimately, this work successfully drives the field of deception detection toward interpretable, multimodal cognitive reasoning.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces ThinkDeception, a framework that transforms multimodal deception detection from a black-box classification task into an interpretable cognitive reasoning process using MLLMs and a novel step-by-step CoT dataset.</p>
<p><strong>Core Idea:</strong> The core idea is to leverage multimodal Chain of Thought (CoT) and a progressive training strategy to explicitly capture cross-modal inconsistencies and provide transparent reasoning trajectories for detecting deception.</p>
<p><strong>Technique:</strong> The authors propose Visual-Audio Consistency Group Relative Policy Optimization (VAC-GRPO) combined with a progressive difficulty curriculum and a multi-dimensional, process-aware reward mechanism.</p>
<p><strong>Pipeline:</strong> Multimodal input (Visual/Audio) → Progressive CoT Reasoning (Easy-to-Hard) → VAC-GRPO Optimization → Interpretable Deception Detection &amp; Rationale</p>
<p><strong>Methodology:</strong> The methodology involves creating a meticulously annotated multimodal CoT dataset, training a base model to identify modal inconsistencies, and refining it via a reflective learning paradigm with a dynamic curriculum scheduler.</p>
<p><strong>Results:</strong> ThinkDeception establishes a new State-of-the-Art (SOTA) on mainstream benchmarks, significantly outperforming existing methods in both detection accuracy and the quality of generated rationales.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the computational overhead of the progressive training strategy or the scalability of the meticulously annotated CoT dataset to diverse, real-world deception scenarios.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18988" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="llm">LLM</h4>

<div class="paper-item" data-date="2026-06-18" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span><span class="cat-tag cat-default" title="Logic in Computer Science (cs.LO)">Logic in Computer Science (cs.LO)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18557">DeFAb: A Verifiable Benchmark for Defeasible Abduction in Foundation Models</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Patrick Cooper, Alvaro Velasquez
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18557" target="_blank" rel="noopener noreferrer">2606.18557</a></p>
<p class="paper-detail"><strong>Authors:</strong> Patrick Cooper, Alvaro Velasquez</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">A rule-based logic solver resolves every instance in our benchmark in under 50 microseconds with 100% accuracy; the best frontier language model reaches 65% at best and drops to 23.5% under rendering-robust evaluation (worst case over four surface renderings). We introduce DeFAb (Defeasible Abduction Benchmark), a dataset and generation pipeline that converts four decades of publicly funded knowledge bases into formally grounded instances for defeasible abduction: constructing hypotheses that explain anomalies by overriding defaults while preserving unrelated expectations. Because every hypothesis must pass polynomial-time checks for valid derivation, conservativity, and minimality, DeFAb makes logical rigor the instrument for measuring creativity and theoretical reasoning, scoring the disciplined construction of theory revisions rather than fluent but theory-destroying prose. The pipeline pairs taxonomic hierarchies (OpenCyc, YAGO, Wikidata) with behavioral property graphs (ConceptNet, UMLS) to produce 372,648+ instances across 33.75M materialized rules from 18 sources, in three levels with polynomial-time verifiable gold standards. Four frontier models do not reliably internalize defeasible reasoning: rendering-robust Level 2 accuracy is 7.8-23.5%; chain-of-thought variance (~36 pp) exceeds any inter-model gap; and a matched contamination control isolates a +19.4 pp Level 3 gap. We further release DeFAb-Hard (a 235-instance Level 3 difficulty variant; best model 53.3% vs 100% symbolic) and CONJURE (a kernel-verified transformative-creativity variant of 560 Lean 4/Mathlib instances whose gold answers are definitions the proof kernel did not previously contain, judge-free verifier; a pilot finds zero novel concepts). The same verifier doubles as an exact reward for preference optimization (DPO, RLVR/GRPO). Released under MIT at https://huggingface.co/datasets/PatrickAllenCooper/DeFAb.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces DeFAb, a verifiable benchmark for defeasible abduction that uses formal logic to measure a model's ability to construct hypotheses that explain anomalies while preserving unrelated expectations.</p>
<p><strong>Core Idea:</strong> The authors propose that logical rigor—specifically valid derivation, conservativity, and minimality—should be the primary metric for evaluating theoretical reasoning and creativity in foundation models.</p>
<p><strong>Technique:</strong> The authors developed a generation pipeline that converts large-scale knowledge bases into formally grounded instances with polynomial-time verifiable gold standards.</p>
<p><strong>Pipeline:</strong> Taxonomic hierarchies (OpenCyc, YAGO, Wikidata) and behavioral property graphs (ConceptNet, UMLS) → Rule materialization and instance generation → Formally verifiable defeasible abduction tasks.</p>
<p><strong>Methodology:</strong> The researchers evaluated frontier language models against a rule-based logic solver across three difficulty levels, using rendering-robust evaluation and contamination controls to ensure accuracy.</p>
<p><strong>Results:</strong> A rule-based solver achieved 100% accuracy in under 50 microseconds, while the best frontier models reached only 65% (dropping to 23.5% under robust evaluation), showing high variance in Chain-of-Thought reasoning.</p>
<p><strong>Limitations:</strong> The CONJURE pilot found zero novel concepts in the transformative-creativity variant, and the significant performance gap between symbolic solvers and LLMs suggests a fundamental struggle with internalizing defeasible reasoning.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18557" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="mlops">MLOps</h4>

<div class="paper-item" data-date="2026-06-18" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19079">ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElection</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Enrico Cassano, Micha{\l} Brzozowski, Zuzanna Dubanowska, Paolo Mandica, Neo Christopher Chung
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19079" target="_blank" rel="noopener noreferrer">2606.19079</a></p>
<p class="paper-detail"><strong>Authors:</strong> Enrico Cassano, Micha{\l} Brzozowski, Zuzanna Dubanowska, Paolo Mandica, Neo Christopher Chung</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">The increasing deployment of parameter-efficient fine-tuning (PEFT) has led to model ecosystems in which a single backbone is paired with many task-specialized adapters. In this setting, inference-time queries often arrive without task labels, requiring the system to automatically select the most appropriate adapter from a growing and heterogeneous adapter pool. Existing routing methods either depend on access to adapter internals, such as weight decompositions or gradient-based statistics, or require additional router training, which limits scalability and portability as new adapters are added. We introduce ARIADNE, a training-free, adapter-agnostic routing framework for dynamic adapter selection at inference time. ARIADNE represents each adapter through a set of centroids computed from embeddings of its training set, capturing the data distribution associated with that adapter. Given an unlabeled input, it selects an adapter by measuring proximity to these centroids in latent space. Because routing is performed entirely in the input embedding space, ARIADNE is compatible with arbitrary PEFT methods and requires no modification to the adapters or training procedures. Primarily evaluated with Llama 3.2 1B Instruct on 23 diverse NLP tasks, ARIADNE recovers 97.44% of the upper bound performance. Scaling to 44 tasks, it achieves 89.7% average selection accuracy, without additional training or access to adapter internals.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces ARIADNE, a training-free and adapter-agnostic routing framework that enables dynamic selection of task-specialized adapters from a heterogeneous pool without requiring access to adapter internals or additional training.</p>
<p><strong>Core Idea:</strong> The framework treats adapter selection as a proximity problem in the latent space, representing each adapter by the distribution of its training data rather than its internal weights.</p>
<p><strong>Technique:</strong> ARIADNE computes a set of centroids from the embeddings of each adapter's training set and selects the best adapter by measuring the distance between the input query's embedding and these centroids.</p>
<p><strong>Pipeline:</strong> Unlabeled input query → Embedding generation → Proximity measurement against adapter centroids → Optimal adapter selection</p>
<p><strong>Methodology:</strong> The authors evaluated ARIADNE on Llama 3.2 1B Instruct across 23 and 44 diverse NLP tasks, comparing selection accuracy against an upper bound of perfect task knowledge.</p>
<p><strong>Results:</strong> ARIADNE recovered 97.44% of the upper bound performance on 23 tasks and achieved 89.7% average selection accuracy on 44 tasks without any additional training.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the performance impact of very large adapter pools or the computational overhead of calculating distances across a massive number of centroids.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19079" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="rl">RL</h4>

<div class="paper-item" data-date="2026-06-18" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19047">RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Ruishan Fang, Siyuan Lu, Chenyi Zhuang, Tao Lin
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19047" target="_blank" rel="noopener noreferrer">2606.19047</a></p>
<p class="paper-detail"><strong>Authors:</strong> Ruishan Fang, Siyuan Lu, Chenyi Zhuang, Tao Lin</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Multi-turn tool-use RL is bottlenecked by the rapid depletion of informative samples in static datasets. We observe that the gradient signal in GRPO concentrates on tasks with the highest rollout reward variance, a consequence of the Popoviciu upper bound. Consequently, samples near the agent's capability boundary -- where successes and failures are roughly balanced -- contribute disproportionately large policy gradients. As training progresses, this boundary continuously shifts, which gradually depletes the pool of informative samples in a static dataset. We propose RODS (Reward-driven Online Data Synthesis) to resolve this depletion. RODS closes the loop between RL training and data generation by repurposing the progress reward variance as a practical, zero-cost boundary detector that requires no extra inference beyond the rollouts already computed for training. It continuously identifies such boundary samples, synthesizes new multi-turn variants matching their structural complexity (e.g., API topology and dependency depth) via a skill-aligned resampling pipeline, and manages a dynamic replay buffer that co-evolves with the policy. Starting from 400 human seeds and maintaining an active training pool of ~800 samples, RODS achieves comparable performance to a 17K-sample offline pipeline while requiring roughly 20x fewer trajectories, and improves over fixed-data RL and environment augmentation in our controlled setting.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces RODS, a framework that addresses the depletion of informative samples in multi-turn tool-use RL by dynamically synthesizing new data that aligns with the agent's evolving capability boundary.</p>
<p><strong>Core Idea:</strong> The authors identify that policy gradients are most effective on samples where success and failure are balanced (high reward variance), and they propose using this variance as a zero-cost signal to identify and synthesize new training data in real-time.</p>
<p><strong>Technique:</strong> RODS utilizes a reward-driven online data synthesis pipeline that identifies boundary samples, generates structurally similar multi-turn variants via skill-aligned resampling, and manages a dynamic replay buffer.</p>
<p><strong>Pipeline:</strong> Static human seeds → RL rollouts → Reward variance detection → Skill-aligned resampling → Dynamic replay buffer → Policy update</p>
<p><strong>Methodology:</strong> The method repurposes the Popoviciu upper bound logic to detect samples near the agent's capability boundary and uses a resampling pipeline to maintain structural complexity (API topology and dependency depth) in synthesized data.</p>
<p><strong>Results:</strong> RODS achieves performance comparable to a 17K-sample offline pipeline using only 400 human seeds and ~800 active samples, requiring roughly 20x fewer trajectories than standard offline methods.</p>
<p><strong>Limitations:</strong> The paper focuses on controlled settings and does not extensively explore the scalability of the resampling pipeline across vastly different tool domains or the long-term stability of the dynamic replay buffer.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19047" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-18" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18686">ForecastBench-Sim: A Simulated-World Forecasting Benchmark</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Jaeho Lee, Nick Merrill, Ezra Karger
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18686" target="_blank" rel="noopener noreferrer">2606.18686</a></p>
<p class="paper-detail"><strong>Authors:</strong> Jaeho Lee, Nick Merrill, Ezra Karger</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Forecasting benchmarks for general-purpose AI systems usually inherit the constraints of the real world: outcomes resolve slowly, tail events are rare, and counterfactual questions are difficult to score. We introduce ForecastBench-Sim, a simulated-world forecasting benchmark built on game rollouts from Freeciv, a turn-based strategy game modelled on the Civilization series. Forecasters receive a fixed world report (a structured snapshot of the current game state) and answer questions about hidden future states; the benchmark then continues the simulation and scores forecasts. Because the world is simulated, the same setup can generate continuous or binary forecasting questions at arbitrary time horizons, paired intervention worlds for conditional or causal questions, and resolved examples of rare or disruptive outcomes. We describe the benchmark pipeline, question families, scoring protocol, and release artifacts, and report validation slices from model evaluations and an anonymized human pilot. ForecastBench-Sim is intended to complement real-world forecasting benchmarks by providing controlled, immediately resolvable tasks for studying probabilistic reasoning under dynamic world states.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces ForecastBench-Sim, a new forecasting benchmark based on simulated game rollouts to overcome the limitations of real-world forecasting data. It provides a controlled environment for evaluating probabilistic reasoning, causal inference, and tail-event prediction.</p>
<p><strong>Core Idea:</strong> By using a turn-based strategy game (Freeciv) as a world simulator, the authors create a sandbox where future states are immediately resolvable and counterfactual scenarios can be perfectly controlled.</p>
<p><strong>Technique:</strong> The benchmark utilizes game rollouts to generate structured world reports and corresponding future states, allowing for the automated generation of continuous, binary, and conditional forecasting questions.</p>
<p><strong>Pipeline:</strong> Game state snapshot (World Report) → Forecasting question generation → Model/Human prediction → Simulation rollout → Automated scoring against ground truth.</p>
<p><strong>Methodology:</strong> The authors developed a pipeline to generate diverse question families (e.g., causal, rare events) and validated the benchmark using both large language model evaluations and an anonymized human pilot.</p>
<p><strong>Results:</strong> The benchmark successfully provides immediately resolvable tasks for studying dynamic world states and offers a scalable way to generate high-quality, diverse forecasting data that real-world datasets lack.</p>
<p><strong>Limitations:</strong> The benchmark is limited to the dynamics of a simulated game environment, which may not fully capture the complexity or noise of real-world socio-economic systems.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18686" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="robotics">Robotics</h4>

<div class="paper-item" data-date="2026-06-18" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18786">R2D-RL: A RoboCup 2D Soccer Environment for Multi-Agent Reinforcement Learning</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Haobin Qin, Baofeng Zhang, Hidehisa Akiyama, Keisuke Fujii
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18786" target="_blank" rel="noopener noreferrer">2606.18786</a></p>
<p class="paper-detail"><strong>Authors:</strong> Haobin Qin, Baofeng Zhang, Hidehisa Akiyama, Keisuke Fujii</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Robot soccer is a challenging testbed for multi-agent reinforcement learning because it combines partial observability, cooperative and adversarial interaction, sparse rewards, and long-horizon tactical behavior. RoboCup 2D Soccer Simulation (RCSS2D) provides a mature robot-soccer platform, but its competition-oriented server-client architecture is difficult to use directly with modern Python-based MARL workflows. We introduce R2D-RL, a reinforcement learning environment that connects RCSS2D and HELIOS-based player clients to a Python MARL interface through shared-memory communication and cycle-level synchronization. R2D-RL supports full-field and scenario-based training with configurable opponents, Base discrete and Hybrid parameterized action spaces, action masks, expected possession value (EPV)-based reward shaping, and parallel execution. We provide front-goal scenarios and an 11-vs-11 full-field benchmark, together with baseline results.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The authors introduce R2D-RL, a new reinforcement learning environment that bridges the RoboCup 2D Soccer Simulation (RCSS2D) with modern Python-based multi-agent reinforcement learning (MARL) workflows.</p>
<p><strong>Core Idea:</strong> The core idea is to overcome the limitations of the competition-oriented server-client architecture of RCSS2D by enabling high-performance, synchronized communication for MARL training.</p>
<p><strong>Technique:</strong> The framework utilizes shared-memory communication and cycle-level synchronization to connect RCSS2D and HELIOS-based player clients to a Python interface.</p>
<p><strong>Pipeline:</strong> RCSS2D Simulation State → Shared-Memory Communication → Python MARL Interface (Agent Decision) → HELIOS Player Clients → Action Execution in RCSS2D</p>
<p><strong>Methodology:</strong> The environment supports full-field and scenario-based training with configurable action spaces, action masks, and EPV-based reward shaping for complex tactical behavior.</p>
<p><strong>Results:</strong> The paper provides a benchmark for front-goal scenarios and 11-vs-11 full-field play, establishing baseline results for multi-agent training in a realistic soccer environment.</p>
<p><strong>Limitations:</strong> The paper focuses on the infrastructure and initial benchmarks, leaving the exploration of more complex long-horizon tactical behaviors as an open area for future research.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18786" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-18" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18847">WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Yehang Zhang, Jianchong Su, Haojian Huang, Yifan Chang, Tianhao Zhou, Xinli Xu, Yingjie Xu, Yinchuan Li, Zexi Li, Ying-Cong Chen
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18847" target="_blank" rel="noopener noreferrer">2606.18847</a></p>
<p class="paper-detail"><strong>Authors:</strong> Yehang Zhang, Jianchong Su, Haojian Huang, Yifan Chang, Tianhao Zhou, Xinli Xu, Yingjie Xu, Yinchuan Li, Zexi Li, Ying-Cong Chen</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">To assist humans over extended periods in real homes, embodied agents must remember user routines, world states, and past interactions. Existing long-term memory benchmarks mainly evaluate language-centric retrieval and question answering, while embodied benchmarks often focus on short-horizon task execution without testing long-term memory use in dynamic environments. We introduce WorldLines, a project-driven benchmark for long-horizon embodied household assistance. It constructs temporally extended household traces with dialogues, actions, execution feedback, object and device state changes, and converts them into evidence-linked samples for Memory QA and Embodied Task Planning. We further propose ObsMem, an observer-grounded memory framework that maintains visibility-aware memories and action-native state trails for state-aware decisions. Experiments reveal persistent challenges in partial observability, overwritten world states, and translating long-term memory into embodied plans, while ObsMem offers a stronger reference architecture for this setting.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces WorldLines, a new benchmark for long-horizon stateful embodied agents, and ObsMem, a memory framework designed for visibility-aware state tracking.</p>
<p><strong>Core Idea:</strong> Embodied agents require the ability to maintain and utilize long-term memories of user routines and dynamic world states to perform complex household tasks over extended periods.</p>
<p><strong>Technique:</strong> The authors develop an observer-grounded memory framework (ObsMem) that maintains visibility-aware memories and action-native state trails to handle partial observability and state changes.</p>
<p><strong>Pipeline:</strong> Household traces (dialogues, actions, feedback, state changes) → Evidence-linked sample conversion → Memory QA and Embodied Task Planning evaluation</p>
<p><strong>Methodology:</strong> The researchers constructed temporally extended household traces to create a project-driven benchmark and evaluated agent performance using a memory-augmented architecture.</p>
<p><strong>Results:</strong> Experiments highlight significant challenges in handling partial observability and overwritten world states, while demonstrating that ObsMem provides a superior reference architecture for state-aware decision-making.</p>
<p><strong>Limitations:</strong> The study identifies persistent difficulties in effectively translating long-term memories into executable embodied plans in dynamic environments.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18847" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-18" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">18 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18888">Generative-Model Predictive Planning for Navigation in Partially Observable Environments</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Thomas Quilter, Yifan Zhu, Guorui Quan, Mingfei Sun, Samuel Kaski
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18888" target="_blank" rel="noopener noreferrer">2606.18888</a></p>
<p class="paper-detail"><strong>Authors:</strong> Thomas Quilter, Yifan Zhu, Guorui Quan, Mingfei Sun, Samuel Kaski</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Navigation in partially observable environments presents a significant challenge for autonomous agents, requiring effective decision-making with limited sensory information in unknown environments. Belief-based methods, particularly those using neural networks to approximate the belief space, often fail to capture the inherent multimodality of belief spaces, especially in high-dimensional cases with perceptual aliasing. While generative models present a compelling alternative, they typically require substantial data or expert demonstrations and lack explicit mechanisms for long-term planning. In this paper, we introduce BeliefDiffusion, a novel framework that combines the benefits of both generation and planning. BeliefDiffusion leverages diffusion models to explicitly characterize multimodal belief distributions and utilizes Model Predictive Control (MPC) to simultaneously plan ahead. It consists of two steps: (1) Imagining plausible environment configurations based on observation history and (2) Planning efficient navigation strategies across an aggregated configurations. Through extensive experiments in synthetic map environments, we demonstrate that BeliefDiffusion significantly outperforms both model-free reinforcement learning baselines and other generative approaches in navigation success rate and path efficiency. Our results validate that explicitly incorporating multimodal belief representations into planning enables more robust navigation in partially observable settings.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces BeliefDiffusion, a framework that integrates diffusion models with Model Predictive Control (MPC) to handle multimodal belief distributions in partially observable navigation tasks.</p>
<p><strong>Core Idea:</strong> By combining generative modeling to represent complex environment uncertainties with MPC for long-term planning, the method overcomes the limitations of unimodal belief approximations and data-heavy generative models.</p>
<p><strong>Technique:</strong> The framework uses diffusion models to imagine plausible environment configurations from observation history and employs MPC to plan navigation strategies across these aggregated configurations.</p>
<p><strong>Pipeline:</strong> Observation history → Diffusion-based environment imagination → Aggregated configuration planning via MPC → Navigation actions</p>
<p><strong>Methodology:</strong> The authors developed a two-step process: first, generating a distribution of possible environment states using a diffusion model, and second, optimizing paths across these states using a predictive control loop.</p>
<p><strong>Results:</strong> BeliefDiffusion significantly outperformed model-free reinforcement learning and existing generative approaches in both navigation success rate and path efficiency in synthetic map environments.</p>
<p><strong>Limitations:</strong> The paper focuses on synthetic map environments, leaving the scalability and robustness of the diffusion-based imagination in complex, real-world dynamic environments as an open question.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18888" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h3 id="personal-interests">Personal Interests</h3>

<p class="section-desc">Papers discovered through your interest topics.</p>

<h4 id="multi-agent-systems">Multi-Agent Systems</h4>

<div class="paper-item" data-date="2026-06-17" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-default" title="stat.AP">stat.AP</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.19294">Accelerating Network-Agent Dispersion: Territorial Behavior and Directionally Biased Lazy Random Walks</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Li Zeng, Steve Alpern
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.19294" target="_blank" rel="noopener noreferrer">2606.19294</a></p>
<p class="paper-detail"><strong>Authors:</strong> Li Zeng, Steve Alpern</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Territorial behavior can greatly accelerate decentralized agent dispersion on networks. This paper studies a network-agent dispersion problem in which m autonomous agents move in discrete time on a connected graph and seek a configuration in which no two agents occupy the same node. We focus on the dispersion case m = n, where successful configurations contain exactly one agent per node. In the baseline model, each agent follows a lazy random walk with a common laziness parameter p. This process defines a finite absorbing Markov chain, and the expected absorption time is used to measure dispersion efficiency. We introduce two local behavioral extensions: territorial behavior, in which an agent that is alone at a node claims that node and repels later arrivals, and directional bias, in which agents share a preferred direction of movement on paths and cycles. Exact calculations on three-agent path and cycle networks and Monte Carlo simulations on larger instances show that territorial behavior substantially reduces expected dispersion time, with larger relative reductions as network size increases. Directional bias alone has limited effect in most small-network cases, but when combined with territorial behavior it can produce large additional speedups. In particular, the simulations show reductions of 99.22% on L100 and 97.48% on C100 when all agents start from one node. These results show how simple local movement rules can strongly affect global dispersion time in decentralized networked multi-agent systems.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper demonstrates how incorporating territorial behavior and directional bias into local agent movement rules significantly accelerates the time required for decentralized agents to disperse across a network.</p>
<p><strong>Core Idea:</strong> Simple local behavioral rules, specifically territoriality and directional bias, can drastically reduce the expected absorption time in a network-agent dispersion problem compared to standard lazy random walks.</p>
<p><strong>Technique:</strong> The authors model the dispersion as a finite absorbing Markov chain and analyze the expected absorption time using exact calculations and Monte Carlo simulations.</p>
<p><strong>Pipeline:</strong> Network graph and initial agent positions → Local movement rules (Lazy Random Walk + Territoriality + Directional Bias) → Final configuration with one agent per node</p>
<p><strong>Methodology:</strong> The study compares a baseline lazy random walk model against two extensions: territorial behavior (repelling arrivals) and directional bias (shared movement preferences) on path and cycle networks.</p>
<p><strong>Results:</strong> Territorial behavior substantially reduces dispersion time, with combined rules achieving up to 99.22% reduction on L100 and 97.48% on C100 when all agents start from a single node.</p>
<p><strong>Limitations:</strong> Directional bias alone has limited effect on small networks, and the study focuses specifically on path and cycle topologies.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.19294" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h2 id="tech-news">Tech News</h2>

<h3 id="computer-vision-1">Computer Vision</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Thu, 18 Ju</span>
  </div>
  <a class="news-title" href="https://www.midjourney.com/medical/blogpost" target="_blank" rel="noopener noreferrer">Midjourney Medical</a>
  <p class="news-summary">The discussion explores the application of Midjourney&#x27;s generative capabilities within the medical field. It covers potential use cases such as medical illustration, anatomical visualization, and the ethical considerations of synthetic imagery in healthcare.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Generative AI</span><span class="news-tag">Medical Imaging</span><span class="news-tag">Computer Vision</span><span class="news-tag">Ethics</span></div>
    <a class="news-read-btn" href="https://www.midjourney.com/medical/blogpost" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="computing-systems">Computing Systems</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Thu, 18 Ju</span>
  </div>
  <a class="news-title" href="https://x86ecosystem.org/resource/ai-compute-extensions-ace-specification/" target="_blank" rel="noopener noreferrer">[x86] AI Compute Extensions (ACE) Specification</a>
  <p class="news-summary">The AI Compute Extensions (ACE) specification introduces a standardized set of instructions for x86 processors to accelerate AI workloads. It aims to improve efficiency and performance for deep learning operations by providing hardware-level optimizations. This initiative seeks to bridge the gap between general-purpose CPUs and specialized AI accelerators.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">x86</span><span class="news-tag">Hardware Acceleration</span><span class="news-tag">AI Infrastructure</span><span class="news-tag">CPU Architecture</span><span class="news-tag">Deep Learning</span></div>
    <a class="news-read-btn" href="https://x86ecosystem.org/resource/ai-compute-extensions-ace-specification/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="llm-1">LLM</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Thu, 18 Ju</span>
  </div>
  <a class="news-title" href="https://blog.alexellis.io/local-ai-is-not-opus/" target="_blank" rel="noopener noreferrer">Local Qwen isn&#x27;t a worse Opus, it&#x27;s a different tool</a>
  <p class="news-summary">The article argues against the common misconception that local models like Qwen are inferior versions of proprietary models like Claude 3 Opus. Instead, it highlights that local models serve as distinct tools with unique advantages in privacy, cost, and customization. It encourages users to evaluate models based on their specific use cases rather than direct performance parity.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Local LLMs</span><span class="news-tag">Qwen</span><span class="news-tag">Model Comparison</span><span class="news-tag">Open Source AI</span></div>
    <a class="news-read-btn" href="https://blog.alexellis.io/local-ai-is-not-opus/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/DeepLearning</span>
    <span class="news-date">2026-06-18</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/deeplearning/comments/1u8s5ek/seeking_peer_review_comprehensive_mathematical/" target="_blank" rel="noopener noreferrer">Seeking Peer Review: Comprehensive Mathematical Derivations of GPT-2 Backpropagation (Index-Form)</a>
  <p class="news-summary">A user on the r/DeepLearning subreddit has shared a comprehensive mathematical derivation of the GPT-2 backpropagation process using index-form notation. The post seeks peer review and feedback from the community to verify the accuracy of the complex calculus involved in the model&#x27;s training mechanics.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">GPT-2</span><span class="news-tag">Backpropagation</span><span class="news-tag">Mathematics</span><span class="news-tag">Deep Learning</span><span class="news-tag">Neural Networks</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/deeplearning/comments/1u8s5ek/seeking_peer_review_comprehensive_mathematical/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="mlops-1">MLOps</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/DeepLearning</span>
    <span class="news-date">2026-06-18</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/deeplearning/comments/1u8smwv/litellm_stability_announcement/" target="_blank" rel="noopener noreferrer">LiteLLM Stability Announcement</a>
  <p class="news-summary">LiteLLM has released a stability announcement regarding its library, which is widely used for managing multiple LLM APIs. This update is significant for developers looking for reliable integration across different model providers. It highlights the project&#x27;s maturation in the production environment.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">LiteLLM</span><span class="news-tag">LLM</span><span class="news-tag">MLOps</span><span class="news-tag">API Integration</span><span class="news-tag">Production AI</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/deeplearning/comments/1u8smwv/litellm_stability_announcement/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="robotics-1">Robotics</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/DeepLearning</span>
    <span class="news-date">2026-06-18</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/deeplearning/comments/1u8s9ud/i_built_a_cli_tool_to_diff_robotics_datasets_at/" target="_blank" rel="noopener noreferrer">I built a CLI tool to diff robotics datasets at the episode level (so you can figure out why your imitation learning model regressed)</a>
  <p class="news-summary">A new CLI tool called EpisodeVault has been released to help robotics researchers debug imitation learning regressions by diffing datasets at the episode level. It uses DuckDB and PyArrow to provide sub-second analysis of task distributions and quality metrics without loading raw video data. Key features include anomaly detection, custom Python-based quality metrics, and HuggingFace Hub integration.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Robotics</span><span class="news-tag">MLOps</span><span class="news-tag">Imitation Learning</span><span class="news-tag">Data Engineering</span><span class="news-tag">LeRobot</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/deeplearning/comments/1u8s9ud/i_built_a_cli_tool_to_diff_robotics_datasets_at/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h2 id="github-trending">
  <svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="20" height="20" style="vertical-align:middle;margin-right:6px"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z" /></svg>
  GitHub Trending
</h2>

<p class="section-desc">Trending repositories on GitHub filtered and scored for relevance to your interests.</p>

<h3 id="agentic-ai-1">Agentic AI</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/bytedance/UI-TARS-desktop" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">bytedance</span><span class="gh-sep">/</span><strong class="gh-repo">UI-TARS-desktop</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a multimodal AI agent stack designed for GUI interaction and computer use. It is highly relevant as it integrates Vision-Language Models (VLMs) with agent infrastructure to perform complex tasks across desktop environments.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">multimodal</span><span class="gh-tag">gui-agent</span><span class="gh-tag">computer-use</span><span class="gh-tag">vision-language-models</span><span class="gh-tag">agentic-ai</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-18</span>
      <a class="gh-visit-btn" href="https://github.com/bytedance/UI-TARS-desktop" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/microsoft/RD-Agent" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">microsoft</span><span class="gh-sep">/</span><strong class="gh-repo">RD-Agent</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">RD-Agent is a framework designed to automate high-value research and development processes by using AI agents to drive data-driven AI development. It aligns perfectly with interests in Agentic AI and LLMs by automating complex workflows like data mining and model development.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">LLM</span><span class="gh-tag">Automation</span><span class="gh-tag">Data Science</span><span class="gh-tag">R&amp;D</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-15</span>
      <a class="gh-visit-btn" href="https://github.com/microsoft/RD-Agent" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/rohitg00/ai-engineering-from-scratch" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">rohitg00</span><span class="gh-sep">/</span><strong class="gh-repo">ai-engineering-from-scratch</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">A comprehensive, end-to-end curriculum and repository for building AI systems from scratch, covering everything from linear algebra and backpropagation to autonomous swarms. It is highly relevant as it provides a deep-dive into the underlying mechanics of LLMs, agents, and reinforcement learning through hands-on implementation.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">LLM</span><span class="gh-tag">Reinforcement Learning</span><span class="gh-tag">Deep Learning</span><span class="gh-tag">Multi-Agent Systems</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-14</span>
      <a class="gh-visit-btn" href="https://github.com/rohitg00/ai-engineering-from-scratch" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/alexzhang13/rlm" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">alexzhang13</span><span class="gh-sep">/</span><strong class="gh-repo">rlm</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository introduces Recursive Language Models (RLMs), a paradigm where LLMs programmatically decompose tasks and recursively call themselves within a code-based REPL environment. It is highly relevant as it moves beyond standard tool-calling toward a more flexible, scalable architecture for complex agentic workflows and infinite context handling.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">LLM</span><span class="gh-tag">Recursive Inference</span><span class="gh-tag">CodeAct</span><span class="gh-tag">Multi-Agent Systems</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-06</span>
      <a class="gh-visit-btn" href="https://github.com/alexzhang13/rlm" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/continuedev/continue" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">continuedev</span><span class="gh-sep">/</span><strong class="gh-repo">continue</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Continue is an open-source coding agent that provides a CLI, VS Code extension, and JetBrains plugin for AI-assisted development. It is highly relevant as it implements agentic workflows for software engineering and serves as a foundation for LLM-based developer tools.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">agentic AI</span><span class="gh-tag">LLM</span><span class="gh-tag">developer-tools</span><span class="gh-tag">coding-agent</span><span class="gh-tag">open-source</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-18</span>
      <a class="gh-visit-btn" href="https://github.com/continuedev/continue" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/google-research/timesfm" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">google-research</span><span class="gh-sep">/</span><strong class="gh-repo">timesfm</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">TimesFM is a decoder-only foundation model developed by Google Research for time-series forecasting. It is highly relevant as it supports agentic calling and includes specific support for agents, making it a core component for time-series tasks within multi-agent systems.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">foundation models</span><span class="gh-tag">time-series</span><span class="gh-tag">transformer</span><span class="gh-tag">agentic AI</span><span class="gh-tag">fine-tuning</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-17</span>
      <a class="gh-visit-btn" href="https://github.com/google-research/timesfm" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/mattpocock/skills" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">mattpocock</span><span class="gh-sep">/</span><strong class="gh-repo">skills</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a collection of &#x27;skills&#x27; designed to improve the reliability and alignment of coding agents like Claude Code. It addresses common failure modes in agentic workflows by providing structured prompts for requirements gathering, issue tracking, and domain modeling.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">LLM</span><span class="gh-tag">Software Engineering</span><span class="gh-tag">Prompt Engineering</span><span class="gh-tag">Coding Agents</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-17</span>
      <a class="gh-visit-btn" href="https://github.com/mattpocock/skills" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/Panniantong/Agent-Reach" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">Panniantong</span><span class="gh-sep">/</span><strong class="gh-repo">Agent-Reach</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Agent-Reach provides a unified CLI tool that allows AI agents to access and scrape data from major social and content platforms without API fees. It is highly relevant for building autonomous agents that require real-time web browsing and multi-platform information retrieval.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">ai-agents</span><span class="gh-tag">web-scraping</span><span class="gh-tag">llm-tools</span><span class="gh-tag">automation</span><span class="gh-tag">information-retrieval</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-16</span>
      <a class="gh-visit-btn" href="https://github.com/Panniantong/Agent-Reach" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/DeusData/codebase-memory-mcp" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">DeusData</span><span class="gh-sep">/</span><strong class="gh-repo">codebase-memory-mcp</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a high-performance MCP server that indexes codebases into a persistent knowledge graph for LLM interaction. It is highly relevant for Agentic AI and RAG workflows as it enables agents to perform sub-millisecond queries across large codebases with minimal token overhead.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">RAG</span><span class="gh-tag">Knowledge Graph</span><span class="gh-tag">MCP</span><span class="gh-tag">Code Intelligence</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-13</span>
      <a class="gh-visit-btn" href="https://github.com/DeusData/codebase-memory-mcp" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/calesthio/OpenMontage" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">calesthio</span><span class="gh-sep">/</span><strong class="gh-repo">OpenMontage</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">OpenMontage is an open-source agentic system that transforms AI coding assistants into full video production studios using 500+ agent skills. It is highly relevant as it implements complex multi-agent workflows to orchestrate video generation, editing, and multimodal content creation.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">Multi-Agent Systems</span><span class="gh-tag">Generative Models</span><span class="gh-tag">Video Production</span><span class="gh-tag">Multimodal</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-05-07</span>
      <a class="gh-visit-btn" href="https://github.com/calesthio/OpenMontage" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="computer-vision-2">Computer Vision</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/roboflow/rf-detr" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">roboflow</span><span class="gh-sep">/</span><strong class="gh-repo">rf-detr</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Computer Vision</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">RF-DETR is a state-of-the-art real-time object detection and instance segmentation model architecture. It is highly relevant for Embodied AI and Robotics as it provides high-performance visual perception capabilities for navigating and interacting with physical environments.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">object-detection</span><span class="gh-tag">instance-segmentation</span><span class="gh-tag">DETR</span><span class="gh-tag">computer-vision</span><span class="gh-tag">real-time</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-17</span>
      <a class="gh-visit-btn" href="https://github.com/roboflow/rf-detr" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/PaddlePaddle/PaddleOCR" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">PaddlePaddle</span><span class="gh-sep">/</span><strong class="gh-repo">PaddleOCR</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Computer Vision</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">PaddleOCR is a high-performance toolkit for converting images and PDFs into structured text, supporting over 100 languages. It is highly relevant for RAG pipelines and multimodal learning as it provides the foundational document parsing needed to feed structured data into LLMs.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">OCR</span><span class="gh-tag">Computer Vision</span><span class="gh-tag">RAG</span><span class="gh-tag">Document Parsing</span><span class="gh-tag">Multimodal</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-16</span>
      <a class="gh-visit-btn" href="https://github.com/PaddlePaddle/PaddleOCR" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="mlops-2">MLOps</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/fivetran/great_expectations" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">fivetran</span><span class="gh-sep">/</span><strong class="gh-repo">great_expectations</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">MLOps</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Great Expectations is a popular open-source framework for validating and monitoring data quality using &#x27;Expectations&#x27; (unit tests for data). It is highly relevant for MLOps as it ensures data integrity in production pipelines, which is critical for reliable model training and inference.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">data quality</span><span class="gh-tag">MLOps</span><span class="gh-tag">data engineering</span><span class="gh-tag">data validation</span><span class="gh-tag">pipeline testing</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-17</span>
      <a class="gh-visit-btn" href="https://github.com/fivetran/great_expectations" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="speech">Speech</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/OpenBMB/VoxCPM" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">OpenBMB</span><span class="gh-sep">/</span><strong class="gh-repo">VoxCPM</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Speech</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">VoxCPM2 is a tokenizer-free text-to-speech model designed for high-quality multilingual speech generation and voice cloning. It is highly relevant to the user&#x27;s interest in speech, multimodal learning, and generative models.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">text-to-speech</span><span class="gh-tag">voice-cloning</span><span class="gh-tag">multimodal</span><span class="gh-tag">generative models</span><span class="gh-tag">speech-synthesis</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-10</span>
      <a class="gh-visit-btn" href="https://github.com/OpenBMB/VoxCPM" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>]]></content><author><name>hiimmuc</name></author><summary type="html"><![CDATA[Today's digest highlights a significant shift toward long-horizon planning and stateful memory in embodied agents, alongside advancements in verifiable reasoning and specialized industrial applications.]]></summary></entry><entry><title type="html">Daily Digest 2026-06-17</title><link href="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-17/" rel="alternate" type="text/html" title="Daily Digest 2026-06-17" /><published>2026-06-17T00:00:00+07:00</published><updated>2026-06-17T00:00:00+07:00</updated><id>https://hiimmuc.github.io/Personal-AI-Digest/digest/daily</id><content type="html" xml:base="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-17/"><![CDATA[<div class="digest-theme">
  <svg class="digest-theme-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 1.5a6.5 6.5 0 1 0 0 13 6.5 6.5 0 0 0 0-13zM0 8a8 8 0 1 1 16 0A8 8 0 0 1 0 8z" /><path d="M6.5 7.75A.75.75 0 0 1 7.25 7h1a.75.75 0 0 1 .75.75v2.75h.25a.75.75 0 0 1 0 1.5h-2a.75.75 0 0 1 0-1.5h.25v-2h-.25a.75.75 0 0 1-.75-.75zM8 6a1 1 0 1 1 0-2 1 1 0 0 1 0 2z" /></svg>
  <span>Today's digest is dominated by the evolution of agentic workflows, specifically focusing on multi-agent architectures, self-evolving capabilities, and the rigorous benchmarking of complex decision-making in specialized domains.</span>
</div>

<h2 id="global-trends">Global Trends</h2>

<h3 id="arxiv-subjects">Papers discovered from ArXiv subject categories</h3>

<h4 id="agentic-ai">Agentic AI</h4>

<div class="paper-item" data-date="2026-06-17" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-ir" title="Information Retrieval (cs.IR)">Information Retrieval (cs.IR)</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17209">Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Sidhaarth Murali, Jo\~ao Coelho, Jingjie Ning, Jo\~ao Magalh\~aes, Bruno Martins, Chenyan Xiong
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17209" target="_blank" rel="noopener noreferrer">2606.17209</a></p>
<p class="paper-detail"><strong>Authors:</strong> Sidhaarth Murali, Jo\~ao Coelho, Jingjie Ning, Jo\~ao Magalh\~aes, Bruno Martins, Chenyan Xiong</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Test-time scaling for agentic search typically increases depth (i.e., more turns and tokens per trajectory) or breadth (i.e., more parallel rollouts). Here we focus on breadth scaling, showing that standard parallel sampling yields diminishing returns, tracing this to query redundancy at the first turn. When models issue similar first queries across rollouts, the threads retrieve overlapping evidence, and subsequent turns are conditioned on this shared retrieval. We address this limitation with DivInit, a training-free intervention at the first turn. Rather than sampling k independent first queries, DivInit draws n candidates from a single call, picks k &lt; n diverse seeds, and runs them as parallel trajectories. Across five open-weight models and eight benchmarks, DivInit consistently improves over standard parallel sampling, with average gains of five to seven points on multi-hop QA at matched compute. Code available at https://github.com/cxcscmu/diverse-query-initialization</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces DivInit, a training-free intervention that improves agentic search performance by addressing query redundancy in parallel sampling. It demonstrates that diversifying the initial queries significantly enhances breadth scaling for multi-hop QA tasks.</p>
<p><strong>Core Idea:</strong> Standard parallel sampling suffers from diminishing returns because similar initial queries lead to overlapping evidence retrieval. By ensuring diverse starting points, the model can explore a wider range of information across parallel trajectories.</p>
<p><strong>Technique:</strong> DivInit generates a larger pool of candidate queries from a single model call and selects a subset of diverse seeds to initialize parallel search threads.</p>
<p><strong>Pipeline:</strong> Initial query prompt → Generate n candidate queries → Select k diverse seeds → Execute k parallel search trajectories → Aggregate results</p>
<p><strong>Methodology:</strong> The authors evaluated DivInit across five open-weight models and eight benchmarks, comparing it against standard parallel sampling at matched compute levels.</p>
<p><strong>Results:</strong> DivInit consistently outperformed standard parallel sampling, achieving average gains of five to seven points on multi-hop QA benchmarks.</p>
<p><strong>Limitations:</strong> The paper focuses primarily on breadth scaling at the first turn and does not extensively explore the impact of diversity on deeper, subsequent turns of the search process.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17209" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    <a class="paper-action-btn gh-btn" href="https://github.com/cxcscmu/diverse-query-initialization" target="_blank" rel="noopener noreferrer" title="View code on GitHub" aria-label="View code on GitHub"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z" /></svg><span>Code</span></a>
  </div>
</div>

<div class="paper-item" data-date="2026-06-17" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17328">MemTrace: Probing What Final Accuracy Misses in Long-Term Memory</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Xianxuan Long, Zhikai Chen, Shenglai Zeng, Shouren Wang, Kai Guo, Jiliang Tang
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17328" target="_blank" rel="noopener noreferrer">2606.17328</a></p>
<p class="paper-detail"><strong>Authors:</strong> Xianxuan Long, Zhikai Chen, Shenglai Zeng, Shouren Wang, Kai Guo, Jiliang Tang</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">LLM agents increasingly maintain long-term memory of user facts across sessions. Yet such memory is usually evaluated by aggregating accuracy over question rows or episodes. Because this approach scores question rows independently, even when several questions probe the same fact, it cannot show how that fact behaves as conditions change. We introduce MemTrace, a benchmark whose unit of measurement is the knowledge point: a single typed fact about the user, rather than an individual question. MemTrace probes each fact along three controlled dimensions: memory age, defined by how many sessions ago the fact appeared in the history; question type, covering current state, earlier state, and trajectory of change; and evidence condition, covering present, missing, and contradicted-by-false-premise settings. Evaluating 13 memory-system configurations across four paradigms, we find that similar pooled accuracy hides different failures: recovering a fact's current and earlier states does not imply tracking how it changed, and safe abstention does not imply correcting a false premise. The dominant bottleneck is evidence use, not retrieval: when systems fail, the evidence was retrievable 10 times more often than it was missing. These results suggest that improving long-term memory requires better use of reachable evidence, not simply more storage or retrieval.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces MemTrace, a new benchmark that shifts the evaluation of long-term memory from individual question accuracy to the persistence and evolution of specific knowledge points.</p>
<p><strong>Core Idea:</strong> Current evaluation methods mask failures in tracking how facts change over time by aggregating independent question scores; MemTrace exposes these nuances by probing facts across age, question type, and evidence conditions.</p>
<p><strong>Technique:</strong> The authors developed a structured benchmark that tracks 'knowledge points' across three dimensions: memory age (sessions), question type (state vs. trajectory), and evidence condition (present, missing, or contradicted).</p>
<p><strong>Pipeline:</strong> User facts across multiple sessions → MemTrace benchmark (controlled dimensions) → Knowledge point-based accuracy analysis</p>
<p><strong>Methodology:</strong> The researchers evaluated 13 memory-system configurations across four paradigms, measuring how systems handle facts under varying temporal and contextual conditions.</p>
<p><strong>Results:</strong> Pooled accuracy hides specific failures; the study found that systems often fail to track trajectories even if they remember states, and the primary bottleneck is evidence usage rather than retrieval (evidence was retrievable 10x more often than it was missing).</p>
<p><strong>Limitations:</strong> The study focuses on the use of reachable evidence, leaving open questions on how to specifically optimize the reasoning logic that processes retrieved information.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17328" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-17" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-default" title="cs.NI">cs.NI</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17368">Distributed General-Purpose Agent Networks: Architecture, Key Mechanisms, and Prototypes</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Shengli Zhang, Deen Ma, Zibin Lin, Taotao Wang
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17368" target="_blank" rel="noopener noreferrer">2606.17368</a></p>
<p class="paper-detail"><strong>Authors:</strong> Shengli Zhang, Deen Ma, Zibin Lin, Taotao Wang</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Large language models have accelerated the transition from passive conversational assistants to autonomous agents that can understand goals, plan actions, invoke tools, and execute multi-step tasks. Yet the capability of a single agent remains constrained by its local data, tool permissions, runtime environment, and governance boundary. This paper studies distributed general-purpose agent networks: open peer-to-peer networks in which heterogeneous agents deployed on personal devices, edge nodes, or autonomous computing environments can discover one another, establish trust, negotiate cooperation rules, and execute open-ended tasks. We argue that such networks cannot be obtained by simply combining existing peer-to-peer overlays with conventional multi-agent systems. Unlike traditional P2P networks, agent networks must propagate semantic declarations about intentions, capabilities, states, and cooperation constraints. We therefore propose a layered architecture centered on a protocol adaptation layer that connects upper-level task semantics with lower-level network operations. Based on this architecture, the paper identifies three core mechanism problems: semantic announcement propagation for collaborator discovery, verifiable identity and multi-topic reputation for cooperation governance, and semantic-gradient mechanism design for open task execution. For each problem, we present a technical route, including bodyless gossip with sequential logs, BAID-based identity binding with MG-EigenTrust reputation, and a Stackelberg-style mechanism-generation loop driven by semantic attribution feedback. We further report prototype overhead results for BAID-style tiered verification and mechanism-level simulations of MG-EigenTrust under cross-topic disguise-collusion attacks. The resulting framework provides a system-level foundation for open, trustworthy, and scalable agent collaboration.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper proposes a layered architecture and a system-level framework for distributed general-purpose agent networks, enabling heterogeneous agents to discover, trust, and cooperate on open-ended tasks.</p>
<p><strong>Core Idea:</strong> Unlike traditional P2P networks, agent networks require a protocol adaptation layer to propagate semantic declarations (intentions, capabilities, and constraints) alongside standard network operations.</p>
<p><strong>Technique:</strong> The authors introduce three core mechanisms: bodyless gossip with sequential logs for discovery, BAID-based identity binding with MG-EigenTrust for reputation, and a Stackelberg-style mechanism-generation loop for task execution.</p>
<p><strong>Pipeline:</strong> Task semantics → Protocol adaptation layer → Semantic announcement propagation &amp; reputation verification → Semantic-gradient mechanism design → Collaborative task execution</p>
<p><strong>Methodology:</strong> The research combines architectural design with technical route development, followed by prototype overhead analysis and mechanism-level simulations against cross-topic disguise-collusion attacks.</p>
<p><strong>Results:</strong> The study provides prototype overhead results for tiered verification and demonstrates the robustness of the MG-EigenTrust reputation system under specific collusion attack scenarios.</p>
<p><strong>Limitations:</strong> The paper focuses on the foundational framework and prototype simulations, leaving the full-scale deployment and real-world scalability of the semantic-gradient mechanism as an area for further exploration.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17368" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-17" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17453">MapSatisfyBench: Benchmarking Satisfaction-Aware Map Agents through Behavior-Grounded Implicit Decision Factors</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Lubin Bai, Mengyu Cao, Sixue Wang, Zhongwei Wan, Yue Pan, Jiale Hou, Xiang Li, Xiuyuan Zhang
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17453" target="_blank" rel="noopener noreferrer">2606.17453</a></p>
<p class="paper-detail"><strong>Authors:</strong> Lubin Bai, Mengyu Cao, Sixue Wang, Zhongwei Wan, Yue Pan, Jiale Hou, Xiang Li, Xiuyuan Zhang</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Large language model agents are increasingly integrated into map services. Since map services are embedded in everyday-life scenarios rather than professional task settings, users often express their needs informally, resulting in underspecified queries with many unspoken needs, namely, implicit decision factors that are critical for user satisfaction. Although clarification is an effective way to mitigate this issue, it increases user burden in daily interaction, and a capable agent should first proactively recover such factors from available information sources. However, evaluating this ability is challenging. The first challenge is to determine which implicit decision factors are suitable for evaluation. A factor is evaluable only if it affects user acceptance and can be recovered from information available to the agent before it responds. Second, user satisfaction cannot be reliably represented by a single reference answer, requiring a benchmark that converts satisfaction-relevant factors into objective and quantifiable evaluation targets. To address these challenges, we propose a restore-identify-filter framework that reconstructs complete user needs from behavior-chain evidence, identifies implicit decision factors, and retains only those supported by pre-query evidence. Building on this methodology, we construct MapSatisfyBench from large-scale, real-world anonymized user data and annotate ground truth from five dimensions and enables full-chain evaluation of satisfaction-aware map agents. Experiments show that current agents generally perform well on explicit task completion, but remain limited in satisfying implicit decision factors and proactively acquiring the evidence needed for satisfaction-aware decisions. These findings establish MapSatisfyBench as a benchmark for shifting map-agent evaluation from task completion toward satisfaction-aware spatial decision making.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces MapSatisfyBench, the first benchmark for evaluating satisfaction-aware map agents by identifying and quantifying implicit decision factors in underspecified user queries.</p>
<p><strong>Core Idea:</strong> Map agents should proactively recover unspoken user needs from available information rather than relying solely on clarification, shifting the evaluation metric from simple task completion to satisfaction-aware spatial decision making.</p>
<p><strong>Technique:</strong> The authors propose a 'restore-identify-filter' framework to reconstruct complete user needs from behavior-chain evidence and isolate evaluable implicit factors.</p>
<p><strong>Pipeline:</strong> Underspecified user queries and behavior-chain evidence → restore-identify-filter framework → identified implicit decision factors → MapSatisfyBench evaluation</p>
<p><strong>Methodology:</strong> The researchers constructed a large-scale dataset from real-world anonymized data, annotating ground truth across five dimensions to enable full-chain evaluation of agent performance.</p>
<p><strong>Results:</strong> Experiments reveal that while current agents excel at explicit task completion, they struggle to satisfy implicit decision factors and proactively acquire the necessary evidence for satisfaction-aware decisions.</p>
<p><strong>Limitations:</strong> The study focuses on factors recoverable from pre-query evidence and may not fully capture satisfaction factors that require real-time, dynamic interaction or subjective preferences beyond available data.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17453" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-17" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17454">Dissecting model behavior through agent trajectories</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Gaurav Gupta, Vatshank Chaturvedi, Jun Huan, Anoop Deoras
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17454" target="_blank" rel="noopener noreferrer">2606.17454</a></p>
<p class="paper-detail"><strong>Authors:</strong> Gaurav Gupta, Vatshank Chaturvedi, Jun Huan, Anoop Deoras</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">AI agent performance is not just a modeling problem, it is fundamentally a systems problem. The advanced capabilities of models are realized through agent harnesses. Therefore, a gap between model assumptions and harness behavior can easily prevent the model's full capabilities from translating into agent performance. We formalize this as the `intent-execution' gap: the mismatch between what the model intends and what the harness executes, and vice versa. We argue that minimizing this intent-execution gap is as important as other aspects of harness design such as tools and execution loops. To illustrate the impact of this harness-model alignment, we develop a simple and customizable harness called `Simple Strands Agent' (SSA). SSA aims to find the bulk of common patterns which generalize across different model families (such as Claude, Gemini, GPT, Grok, Qwen), as well as a small number of model-specific preferences. We make two contributions: (i) we $\textbf{reproduce or improve on the pass@1}$ performance reported by diverse model-provider families on popular agentic benchmarks (SWE-Pro, SWE-Verified and Terminal-Bench-2), and (ii) building on an $\textbf{analysis of 138k trajectories generated by SSA}$, we look beyond the $\texttt{pass@1}$ numbers which tend to be relatively even across frontier models. By representing agent trajectories in code state-spaces, we observe model-level differences in problem-solving behavior. Finer-grained metrics such as edit frequency, testing activity, and phase-transitions reveal how individual models allocate effort across different stages of autonomous problem solving.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces the 'intent-execution' gap framework to analyze how agent harnesses affect model performance and provides a new benchmark analysis using the Simple Strands Agent (SSA) harness.</p>
<p><strong>Core Idea:</strong> AI agent performance is a systems problem where a mismatch between model intent and harness execution (the intent-execution gap) can hinder a model's full capabilities.</p>
<p><strong>Technique:</strong> The authors developed the Simple Strands Agent (SSA) harness and analyzed 138k trajectories by representing agent behaviors in code state-spaces.</p>
<p><strong>Pipeline:</strong> Agentic benchmarks (SWE-Pro, SWE-Verified, Terminal-Bench-2) → Simple Strands Agent (SSA) harness execution → Trajectory analysis in code state-spaces → Finer-grained behavioral metrics</p>
<p><strong>Methodology:</strong> The researchers reproduced/improved pass@1 scores across multiple model families and performed a large-scale trajectory analysis to identify model-specific behaviors like edit frequency and testing activity.</p>
<p><strong>Results:</strong> The study revealed that while pass@1 scores are similar across frontier models, models differ significantly in effort allocation, phase-transitions, and testing activity during autonomous problem-solving.</p>
<p><strong>Limitations:</strong> The study focuses on a specific set of popular agentic benchmarks and may not capture all nuances of every possible agentic use case.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17454" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-17" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17459">Can LLMs Be CEOs? Benchmarking Strategic Resource Reallocation with Multi-Role Agent Simulation</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Yuyang Dai, Xueqing Peng, Lingfei Qian, Zhuohan Xie
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17459" target="_blank" rel="noopener noreferrer">2606.17459</a></p>
<p class="paper-detail"><strong>Authors:</strong> Yuyang Dai, Xueqing Peng, Lingfei Qian, Zhuohan Xie</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Evaluating the decision-making capabilities of large language models (LLMs) is a growing research priority, yet existing benchmarks focus on isolated cognitive tasks such as reasoning, knowledge retrieval, and economic rationality in stylized settings. These evaluations overlook the defining challenge of real executive decision-making: integrating conflicting recommendations from specialized stakeholders under information asymmetry, organizational constraints, and temporal dependencies. We introduce \textsc{CEO-Bench}, a multi-agent benchmark that evaluates LLMs on CEO-level strategic resource reallocation -- the process of redirecting capital across business units in a multi-round, constraint-rich organizational environment. In \textsc{CEO-Bench}, LLM agents receive conflicting advice from four role-conditioned C-suite advisors (CFO, CTO, COO, CMO), each with private signals and distinct priorities, and must synthesize these into a concrete allocation plan evaluated along four dimensions: role integration, conditional boldness, history-sensitive judgment, and plan validity. Experiments across five frontier models on 13 scenarios reveal that all models achieve high structural validity but diverge sharply on strategic calibration -- the hardest capability layer. We identify systematic failure modes including single-advisor capture, conservative default under ambiguity, and historical amnesia, and uncover a structural integration-boldness tradeoff: models that engage more deeply with conflicting perspectives tend to produce less decisive action. These findings delineate the current capability boundary of LLMs as organizational decision-makers and inform the design of future AI-assisted executive systems.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces CEO-Bench, a multi-agent benchmark designed to evaluate LLMs on high-level strategic resource reallocation by simulating complex organizational decision-making.</p>
<p><strong>Core Idea:</strong> The study shifts the focus from isolated cognitive tasks to the executive challenge of synthesizing conflicting stakeholder advice under information asymmetry and organizational constraints.</p>
<p><strong>Technique:</strong> The authors utilize a multi-role agent simulation where an LLM 'CEO' must process private signals and distinct priorities from four specialized C-suite advisors (CFO, CTO, COO, CMO).</p>
<p><strong>Pipeline:</strong> Conflicting advisor signals and organizational constraints → LLM synthesis and strategic reasoning → Resource allocation plan evaluated on integration, boldness, history, and validity.</p>
<p><strong>Methodology:</strong> The researchers benchmarked five frontier models across 13 scenarios, measuring performance across four dimensions: role integration, conditional boldness, history-sensitive judgment, and plan validity.</p>
<p><strong>Results:</strong> Models achieved high structural validity but struggled with strategic calibration, exhibiting failure modes like single-advisor capture, conservative defaults, and a tradeoff where deeper integration led to less decisive action.</p>
<p><strong>Limitations:</strong> The study highlights the current capability boundary of LLMs in complex organizational roles and leaves open the design of robust AI-assisted executive systems.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17459" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-17" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17546">SEAGym: An Evaluation Environment for Self-Evolving LLM Agents</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Congjie Zheng, Chuanyi Xue, Bin Liang, Jun Yang, Changshui Zhang
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17546" target="_blank" rel="noopener noreferrer">2606.17546</a></p>
<p class="paper-detail"><strong>Authors:</strong> Congjie Zheng, Chuanyi Xue, Bin Liang, Jun Yang, Changshui Zhang</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Self-evolving LLM-based agents improve mainly by changing their agent harness: the structured execution layer around a base model, including prompts, memory, tools, middleware, runtime state, and the model-tool interaction loop. Existing evaluations often reduce this process to isolated task scores or a single sequential curve, obscuring whether an update produces reusable improvement, overfits recent tasks, increases cost, or harms older behavior. We introduce SEAGym, an evaluation environment for measuring agent harness updates across training, validation, test, replay, and cost records. SEAGym turns Harbor-compatible benchmarks into dynamic self-evolution task sources with train batches, frozen update-validation, held-out ID and OOD transfer views, replay diagnostics, and saved snapshot and metric records. Instantiating SEAGym on Terminal-Bench 2.0 and HLE, we compare ACE, TF-GRPO, and AHE under a shared epoch/batch protocol. The results show that these evaluation views provide complementary signals about the evolution process: frequent updates may fail to improve held-out performance, useful intermediate snapshots may collapse later, and source diversity and model backend can affect harness reliability.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces SEAGym, a comprehensive evaluation environment designed to measure the evolution of LLM agent harnesses rather than just isolated task scores. It provides a structured framework to track reusable improvements, overfitting, cost, and regression across multiple evaluation views.</p>
<p><strong>Core Idea:</strong> Self-evolving agents improve by updating their execution layers (prompts, memory, tools), but current metrics fail to capture the dynamics of these updates. SEAGym addresses this by treating agent evolution as a dynamic process requiring training, validation, and replay diagnostics.</p>
<p><strong>Technique:</strong> SEAGym transforms standard benchmarks into dynamic task sources featuring train batches, frozen validation sets, held-out ID/OOD transfer views, and replay diagnostics to monitor harness reliability.</p>
<p><strong>Pipeline:</strong> Harbor-compatible benchmarks → SEAGym dynamic task source conversion → Multi-view evaluation (Train, Val, Test, Replay, Cost) → Evolution analysis</p>
<p><strong>Methodology:</strong> The authors instantiated SEAGym on Terminal-Bench 2.0 and HLE to compare ACE, TF-GRPO, and AHE under a shared epoch/batch protocol. They analyzed the evolution process through complementary signals like held-out performance and snapshot stability.</p>
<p><strong>Results:</strong> The evaluation views revealed that frequent updates can fail to improve held-out performance, useful intermediate snapshots can collapse over time, and both source diversity and model backends significantly impact harness reliability.</p>
<p><strong>Limitations:</strong> The study highlights that harness reliability is sensitive to model backends and source diversity, suggesting a need for more robust methods to ensure stable evolution across different architectures.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17546" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="llm">LLM</h4>

<div class="paper-item" data-date="2026-06-17" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17312">Quantifying Consistency in LLM Logical Reasoning via Structural Uncertainty</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Baishali Chaudhury, Mengdie Flora Wang, Hyunji Hayley Park, Rahul Ghosh, Sungmin Hong, Jae Oh Woo
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17312" target="_blank" rel="noopener noreferrer">2606.17312</a></p>
<p class="paper-detail"><strong>Authors:</strong> Baishali Chaudhury, Mengdie Flora Wang, Hyunji Hayley Park, Rahul Ghosh, Sungmin Hong, Jae Oh Woo</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Large language models can arrive at the same answer through reasoning paths that are unstable, contradictory, or difficult to rank consistently -- a failure mode especially prevalent in multi-step deductive reasoning. Existing methods assess reliability primarily through output dispersion -- measuring how much sampled answers differ -- but this discards a complementary signal: whether the model can consistently rank competing reasoning candidates. We propose structural uncertainty, a consistency-aware framework derived from the stability of self-preference-induced rankings over sampled reasoning solutions. Given a query, we generate multiple candidate solutions and ask the model to judge pairwise preferences among its own outputs. We aggregate self-preferences into ranking distributions via Bradley-Terry modeling with PageRank, and decompose the signal into two entropy-based components: across-trial ranking instability and within-trial candidate ambiguity. Across five LLMs and eight benchmarks, structural signals provide information complementary to answer dispersion: on logical and mathematical reasoning tasks, the combination improves identification of unreliable instances, while on factual retrieval the structural signal collapses toward uniformity, diagnosing a regime boundary where reasoning-level consistency evaluation is uninformative. The two components relate differently to accuracy: within-trial ambiguity correlates positively with correctness -- consistent with settings where multiple plausible solution paths remain competitive -- while across-trial instability correlates negatively, signaling unreliable reasoning. Structural uncertainty is best understood not as a universal confidence estimator, but as a regime-sensitive evaluator of logical reasoning consistency.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces 'structural uncertainty,' a new framework that evaluates LLM reliability by measuring the stability of a model's self-preference rankings over multiple reasoning paths.</p>
<p><strong>Core Idea:</strong> Beyond measuring output dispersion (how much answers differ), the authors argue that a model's ability to consistently rank its own reasoning candidates provides a critical signal for identifying logical instability.</p>
<p><strong>Technique:</strong> The authors use Bradley-Terry modeling combined with PageRank to aggregate pairwise self-preferences into ranking distributions, which are then decomposed into entropy-based components.</p>
<p><strong>Pipeline:</strong> Query → Generate multiple candidate solutions → Model performs pairwise preference judgments → Aggregate preferences via Bradley-Terry/PageRank → Decompose into ranking instability and candidate ambiguity → Evaluate reliability.</p>
<p><strong>Methodology:</strong> The researchers tested five LLMs across eight benchmarks, decomposing the structural uncertainty signal into across-trial ranking instability and within-trial candidate ambiguity to correlate with accuracy.</p>
<p><strong>Results:</strong> Structural uncertainty complements answer dispersion in identifying unreliable logical and mathematical reasoning; across-trial instability correlates negatively with accuracy, while within-trial ambiguity correlates positively.</p>
<p><strong>Limitations:</strong> The structural signal collapses toward uniformity in factual retrieval tasks, indicating that this specific consistency evaluation is uninformative outside of logical reasoning regimes.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17312" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-17" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17289">Nothing from Something: Can a Language Model Discover 0?</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Phoebe Zeng, Thomas L. Griffiths, Brenden M. Lake
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17289" target="_blank" rel="noopener noreferrer">2606.17289</a></p>
<p class="paper-detail"><strong>Authors:</strong> Phoebe Zeng, Thomas L. Griffiths, Brenden M. Lake</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">AI systems based on artificial neural networks are being developed with aspirations of pushing the boundary of human mathematical knowledge. A key question for these systems is how much they can reach beyond their training data. Mathematical discovery requires a strong form of out of distribution generalization; the ability to hypothesize genuinely new - and potentially logically more powerful - mathematical structures. It has been hypothesized that language abilities support such generalizations in human cognition. In this work, we use simple arithmetic as a case study for examining how modern AI models could expand their mathematical horizons, evaluating whether these models can independently discover the concept of "zero". We show that We show that (1) language models of a GPT-2 size are unable to perform this generalization at test time regardless of language pretraining, but (2) models can improve substantially after training on tens or hundreds of examples of zero. Additionally, we find that language pretraining reduces the number of required examples by approximately $50\%$, showing that language abilities can scaffold mathematical discovery in neural models.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper investigates whether language models can independently discover the mathematical concept of 'zero' from non-zero arithmetic and quantifies how language pretraining scaffolds this discovery.</p>
<p><strong>Core Idea:</strong> Mathematical discovery requires out-of-distribution generalization, and the authors test if language abilities can help neural models hypothesize new structures like zero.</p>
<p><strong>Technique:</strong> The study uses a controlled arithmetic environment to evaluate the transition from non-zero arithmetic to the inclusion of zero in model reasoning.</p>
<p><strong>Pipeline:</strong> Arithmetic data without zero → Language model pretraining and few-shot fine-tuning → Evaluation of zero-concept discovery</p>
<p><strong>Methodology:</strong> The researchers compared GPT-2 sized models across different training regimes, measuring the number of examples required to achieve zero-based generalization with and without language pretraining.</p>
<p><strong>Results:</strong> GPT-2 models could not discover zero at test time alone, but required tens to hundreds of examples to learn it; language pretraining reduced the required examples by approximately 50%.</p>
<p><strong>Limitations:</strong> The study is limited to simple arithmetic as a case study and uses a relatively small model size (GPT-2).</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17289" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-17" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span><span class="cat-tag cat-default" title="cs.CY">cs.CY</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17443">Incumbent Advantage: Brand Bias and Cognitive Manipulation Dynamics in LLM Recommendation Systems</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Xi Chu, Yupeng Hou
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17443" target="_blank" rel="noopener noreferrer">2606.17443</a></p>
<p class="paper-detail"><strong>Authors:</strong> Xi Chu, Yupeng Hou</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Large language models (LLMs) are becoming a major way for consumers to find products, but we do not yet understand how brands compete in this new channel. We study brand dynamics in LLM recommendations using skincare products -- a category where consumers cannot easily judge quality before buying and must rely on brand reputation -- across three commercial LLMs (GPT-4o-mini, Claude Sonnet, Gemini 3 Flash), with a robustness check on search goods. In three experiments, we find: (1) a Conditional Monopoly where well-known brands get recommended 100% of the time (IAI = 10.0) when all products have the same specifications, but this dominance disappears with less than a +0.1-star rating advantage for a competitor; (2) authority-style marketing language, including fabricated clinical-evidence claims, breaks this monopoly at a Bias Surplus Value equal to +0.17 rating points, with each model responding differently; and (3) a social dilemma in multi-brand GEO competition: when all brands adopt the same optimization strategy, individual payoff falls from +0.802 to +0.007 in our payoff proxy, and non-participating brands receive zero recommendations in our tests. Our results suggest that generative engine optimization (GEO) should be studied not only as a security risk, but also as an emerging marketing practice that shapes market competition.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper identifies and quantifies the 'Incumbent Advantage' in LLM recommendations, demonstrating how brand bias creates a conditional monopoly that can be disrupted by specific marketing tactics.</p>
<p><strong>Core Idea:</strong> LLMs exhibit a strong preference for well-known brands when specifications are equal, but this bias is fragile and can be manipulated through authority-style marketing language and Generative Engine Optimization (GEO).</p>
<p><strong>Technique:</strong> The study employs a multi-model comparative analysis across GPT-4o-mini, Claude Sonnet, and Gemini 3 Flash using controlled experiments on skincare and search goods.</p>
<p><strong>Pipeline:</strong> Product specifications and brand data → LLM recommendation prompts with varying ratings and marketing language → Recommendation frequency and payoff analysis</p>
<p><strong>Methodology:</strong> The authors conducted three experiments measuring Incumbent Advantage Index (IAI), Bias Surplus Value, and payoff proxies in a multi-brand GEO competition scenario.</p>
<p><strong>Results:</strong> Well-known brands received 100% of recommendations (IAI = 10.0) under equal specs; authority-style language broke this monopoly at a +0.17 rating point surplus; and uniform GEO strategies led to a payoff collapse from +0.802 to +0.007.</p>
<p><strong>Limitations:</strong> The study focuses on specific commercial LLMs and the skincare category, leaving open questions about how these dynamics scale across different industries or more complex consumer decision-making processes.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17443" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-17" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-se" title="Software Engineering (cs.SE)">Software Engineering (cs.SE)</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17507">LLM-as-Judge in Education: A Curriculum-Grounded Marking Pipeline</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Xiwei Xu, Chen Wang, Jacky Jiang, Phil Yang, Qian Fu, Mohan Dhall, Wenjie Zhang, Liming Zhu
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17507" target="_blank" rel="noopener noreferrer">2606.17507</a></p>
<p class="paper-detail"><strong>Authors:</strong> Xiwei Xu, Chen Wang, Jacky Jiang, Phil Yang, Qian Fu, Mohan Dhall, Wenjie Zhang, Liming Zhu</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Generative AI and large language models (LLMs) are increasingly applied to question generation and automated assessment. However, deploying LLMs in preparation for high-stakes exams requires more than prompt engineering; it demands software pipelines that systematically ground model outputs in authorised curriculum artefacts and marking guidelines issued by education authorities. This paper presents a curriculum-grounded, configurable LLM-as-Judge pipeline for question-level marking, co-developed with an industrial partner, to support exam preparation for university admission. The pipeline identifies the relevant topics, subtopics, and cognitive demand of a question, and assembles verifiable and authorised context to support LLM judgement. Curriculum intent is operationalised through concrete syllabus artefacts, including prescribed verbs and outcomes, performance band descriptors, glossary definitions, and marking-guideline principles. A staged LLM workflow is employed to first generate question-specific rubrics, capturing structured expectations of performance, and then derive and evaluate marking criteria used to allocate marks to student responses. This design improves consistency, transparency, and alignment with official marking practices. Preliminary evaluation shows that the proposed LLM-as-Judge pipeline delivers marking outcomes comparable to human tutors, while yielding justifications that are more traceable to authorised curriculum artefacts and marking standards. The pipeline has also been integrated into an online study platform, where early deployment data provide initial insights into operational usage and manual overrides.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces a curriculum-grounded, configurable LLM-as-Judge pipeline designed for high-stakes exam preparation that aligns automated marking with official educational standards.</p>
<p><strong>Core Idea:</strong> To ensure reliability in automated assessment, LLM outputs must be systematically grounded in authorized curriculum artifacts and marking guidelines rather than relying solely on prompt engineering.</p>
<p><strong>Technique:</strong> The system uses a staged LLM workflow to operationalize curriculum intent by extracting syllabus artifacts and generating question-specific rubrics before evaluating student responses.</p>
<p><strong>Pipeline:</strong> Student response and question → Identification of curriculum topics/cognitive demands → Assembly of authorized context (syllabus, descriptors, glossaries) → Generation of question-specific rubrics → LLM-based marking and justification generation → Final marks and traceable feedback.</p>
<p><strong>Methodology:</strong> The researchers co-developed a software pipeline with an industrial partner, integrated it into an online study platform, and evaluated its performance against human tutor marking.</p>
<p><strong>Results:</strong> The pipeline delivered marking outcomes comparable to human tutors while providing justifications that were more traceable to official curriculum artifacts and marking standards.</p>
<p><strong>Limitations:</strong> The study relies on preliminary evaluation and early deployment data, leaving the long-term scalability and impact of manual overrides on system refinement as open areas for study.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17507" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="nlp">NLP</h4>

<div class="paper-item" data-date="2026-06-17" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17220">When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Mingxu Tao, Jiawei Hu, Xian Zhou, Wenpeng Hu, Jiajun Cheng, Yunbo Cao, Zhunchen Luo, Guotong Geng
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17220" target="_blank" rel="noopener noreferrer">2606.17220</a></p>
<p class="paper-detail"><strong>Authors:</strong> Mingxu Tao, Jiawei Hu, Xian Zhou, Wenpeng Hu, Jiajun Cheng, Yunbo Cao, Zhunchen Luo, Guotong Geng</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Legal case retrieval remains challenging due to the complexity of legal language and the need for precise lexical alignment between queries and relevant cases. Although dense retrieval models have achieved notable progress, empirical studies show that BM25 continues to serve as a strong baseline in this domain. It motivates us to propose a self-evolving framework for rule-driven query rewriting that enhances BM25 without any parameter training. The framework equips an LLM-based agent with an automatic evaluation environment, enabling it to iteratively create rewriting rules, plan validation experiments over rule combinations, and eliminate ineffective rules based on historical feedbacks. We evaluate our method on the Chinese legal case retrieval benchmark LeCaRD-v2. Experimental results demonstrate that the proposed framework outperforms non-evolutionary baselines, including human-designed rules and greedy rule selection, particularly when powered by a highcapacity core LLM. We also conduct detailed analyses to investigate the mechanisms underlying self-evolution. Our findings reveal that LLM's capabilities to leverage previous experimental results and its intrinsic knowledge of rule elimination play critical roles in refining the rule set via self-evolution.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces a self-evolving framework that automatically generates and refines query rewriting rules for legal case retrieval without any parameter training. It demonstrates that an LLM-based agent can outperform human-designed rules by iteratively learning from an automated evaluation environment.</p>
<p><strong>Core Idea:</strong> Enhance the performance of the BM25 retrieval baseline by using an LLM agent to autonomously evolve a set of query rewriting rules through a feedback loop of generation, validation, and elimination.</p>
<p><strong>Technique:</strong> A self-evolving agentic framework that utilizes an LLM to create rewriting rules, plan validation experiments on rule combinations, and prune ineffective rules based on historical feedback.</p>
<p><strong>Pipeline:</strong> Legal Query → LLM-based Agent (Rule Generation &amp; Validation) → Iterative Rule Evolution → Optimized Rewriting Rules → Enhanced BM25 Retrieval</p>
<p><strong>Methodology:</strong> The authors developed an automatic evaluation environment where an LLM agent iteratively proposes rules, tests them against the LeCaRD-v2 benchmark, and uses historical results to refine the rule set.</p>
<p><strong>Results:</strong> The framework outperforms non-evolutionary baselines, including human-designed rules and greedy rule selection, especially when using high-capacity LLMs.</p>
<p><strong>Limitations:</strong> The effectiveness of the framework is highly dependent on the reasoning capabilities of high-capacity core LLMs and the quality of the automatic evaluation environment.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17220" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="rl">RL</h4>

<div class="paper-item" data-date="2026-06-17" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17591">Closing the Feedback Loop: From Experience Extraction to Insight Governance in Verbal Reinforcement Learning</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Yanwei Cui, Xing Zhang, Yulong Zhang, Li Shao, Xiaofeng Shi, Guanghui Wang, Peiyang He
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17591" target="_blank" rel="noopener noreferrer">2606.17591</a></p>
<p class="paper-detail"><strong>Authors:</strong> Yanwei Cui, Xing Zhang, Yulong Zhang, Li Shao, Xiaofeng Shi, Guanghui Wang, Peiyang He</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Training-free verbal reinforcement learning enables LLM agents to learn from world feedback -- objective signals such as dynamic task outcomes, market returns, or demand forecasts -- by extracting verbal rules from experience and injecting them as context, updating the agent's behavior without parameter changes. However, in non-stationary environments these agents face a retention-forgetting dilemma: retaining stale insights causes negative transfer, while discarding them causes catastrophic forgetting when conditions recur. We identify four requirements for navigating this dilemma -- outcome-driven evaluation, persistent structured evidence, non-monotonic knowledge lifecycle, and compositional governance -- and show that existing methods invest heavily in experience extraction while underinvesting in insight governance. We propose a three-layer architecture -- rules, evidence, and skills -- connected by a feedback-driven curation loop that closes the governance gap. Rules capture distilled experience from world outcomes; evidence logs track each rule's reliability across episodes; skills govern which rules to apply, how to resolve conflicts, and when to abstain. On financial forecasting as a case study, where world feedback is naturally abundant, noisy, and non-stationary, we show that the same accumulated experience either degrades performance below the zero-shot baseline or dramatically improves accuracy and risk-adjusted returns, depending on whether the curation loop is present.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper identifies the 'retention-forgetting dilemma' in verbal reinforcement learning and proposes a three-layer architecture to govern the lifecycle of extracted insights.</p>
<p><strong>Core Idea:</strong> To prevent negative transfer from stale data and catastrophic forgetting of recurring patterns, agents need a systematic governance mechanism to curate, track, and apply experience-based rules.</p>
<p><strong>Technique:</strong> A three-layer architecture consisting of Rules (distilled experience), Evidence (reliability logs), and Skills (governance logic) connected by a feedback-driven curation loop.</p>
<p><strong>Pipeline:</strong> World feedback (outcomes) → Experience extraction (Rules) → Evidence logging → Skill-based curation → Contextual injection → Updated agent behavior</p>
<p><strong>Methodology:</strong> The authors define four requirements for insight governance and evaluate their proposed architecture using a financial forecasting case study involving noisy, non-stationary data.</p>
<p><strong>Results:</strong> The curation loop prevents performance degradation below zero-shot baselines and significantly improves accuracy and risk-adjusted returns compared to uncurated experience accumulation.</p>
<p><strong>Limitations:</strong> The study focuses primarily on financial forecasting; the scalability of the curation loop to highly complex, multi-modal environments remains an open question.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17591" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-17" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17405">Treatment Response Optimized Clinical Decision Support AI System via Digital Twin Simulation</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Xinyu Qin, Anil K. Sood, Ruiheng Yu, Sara Corvigno, Elaine Stur, Lu Wang
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17405" target="_blank" rel="noopener noreferrer">2606.17405</a></p>
<p class="paper-detail"><strong>Authors:</strong> Xinyu Qin, Anil K. Sood, Ruiheng Yu, Sara Corvigno, Elaine Stur, Lu Wang</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Clinical decision support AI systems (CDSASs) must adapt to evolving patient conditions in real-time while adhering to strict safety constraints. We present an online adaptive framework that integrates Treatment Effect (TE) estimation to quantify clinical benefits, a patient Digital Twin (DT) to simulate treatment trajectories, and Reinforcement Learning (RL) for sequential decision-making. The AI system is initially trained on historical medical records and operates in a continuous learning loop. To ensure safety, a rule-based module monitors vital signs and blocks contraindicated treatments. Cases with strong internal model disagreement are flagged for clinician review, simulated in our experiments via a pre-trained outcome model. We validate our framework using both a synthetic clinical simulator and a real-world ovarian cancer dataset from The Cancer Genome Atlas (TCGA). In both simulated and clinical settings, our method demonstrated superior effectiveness and stability in recommending treatments compared to standard computational baselines. Furthermore, the AI system maintains low latency and requires expert consultation for only a minority of cases in our experimental validation, demonstrating its potential as a safe, clinician-supervised tool for personalized medicine that continuously improves through practical use.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces an online adaptive clinical decision support AI system that combines Treatment Effect (TE) estimation, patient Digital Twins (DT), and Reinforcement Learning (RL) to provide safe, real-time personalized treatment recommendations.</p>
<p><strong>Core Idea:</strong> The system simulates potential treatment trajectories using a patient-specific Digital Twin to optimize sequential decision-making while maintaining safety through rule-based monitoring and clinician-in-the-loop flags.</p>
<p><strong>Technique:</strong> The framework utilizes Reinforcement Learning for sequential decision-making, integrated with a Digital Twin for simulation and a rule-based safety module for real-time constraint enforcement.</p>
<p><strong>Pipeline:</strong> Historical medical records and real-time patient data → Treatment Effect estimation, Digital Twin simulation, and RL-based decision-making with rule-based safety checks → Optimized treatment recommendations and clinician alerts.</p>
<p><strong>Methodology:</strong> The authors developed an online learning framework validated on both a synthetic clinical simulator and a real-world ovarian cancer dataset from The Cancer Genome Atlas (TCGA).</p>
<p><strong>Results:</strong> The method demonstrated superior effectiveness and stability compared to standard baselines, maintained low latency, and required expert consultation for only a minority of cases.</p>
<p><strong>Limitations:</strong> The study relies on a pre-trained outcome model to simulate internal model disagreements and the real-world applicability depends on the quality of available historical data for Digital Twin initialization.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17405" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="robotics">Robotics</h4>

<div class="paper-item" data-date="2026-06-17" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17574">DeepInsight: A Unified Evaluation Infrastructure Across the Physical AI Stack</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Siyi Li, Chunyu Sun, Jiahao Zhang, Yuchen Kang, Wuliang Wang, Yu Qiu, Rui Jiang, Haitao Cui, Jie Chen
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17574" target="_blank" rel="noopener noreferrer">2606.17574</a></p>
<p class="paper-detail"><strong>Authors:</strong> Siyi Li, Chunyu Sun, Jiahao Zhang, Yuchen Kang, Wuliang Wang, Yu Qiu, Rui Jiang, Haitao Cui, Jie Chen</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Evaluating a Physical AI stack spans operators that differ by more than three orders of magnitude -- from a single foundation-model decoding step to thousands of physics ticks of whole-body control -- varying orthogonally in modality, reward semantics, and resource profile. No existing framework spans this range, so the stack is evaluated today by stitching together separate harnesses that share neither runtime nor scoring, preserving each segment's local validity but losing the shared identity needed to diagnose cross-layer regressions. We present DeepInsight, an evaluation infrastructure that serves this full spectrum on a single runtime. Rather than homogenize the regimes, it preserves their heterogeneity behind three narrow abstractions -- task, resource, and result -- each realized as one invariant shared by every subsystem: one episode driver, one resource-handle protocol implemented by every expensive backend (LLM inference and sandboxed runtimes alike), and one trace identity scheme under which every event is written. Deployed in production across all three layers of an embodied humanoid stack, this single set of invariants onboards new benchmarks largely by configuration. Where mature peer orchestrators exist -- at the foundation-model end -- it reproduces published references and peer-framework readings within their own spread, runs the same suites faster on a single node, and scales near-linearly across nodes. Its distinctive return is diagnostic: because every layer writes into one shared trace, a regression that begins in one layer and surfaces in another stays localizable on that trace -- a cross-layer payoff no federation of per-segment harnesses can reproduce.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces DeepInsight, a unified evaluation infrastructure capable of assessing the entire Physical AI stack, from foundation model decoding to whole-body control, on a single runtime.</p>
<p><strong>Core Idea:</strong> Instead of stitching together disparate evaluation harnesses, DeepInsight preserves the heterogeneity of different AI layers while unifying them through three invariant abstractions: task, resource, and result.</p>
<p><strong>Technique:</strong> The framework implements a single episode driver, a universal resource-handle protocol for diverse backends (LLMs and sandboxed runtimes), and a shared trace identity scheme for all events.</p>
<p><strong>Pipeline:</strong> Heterogeneous AI layers (LLM inference, physics ticks, control) → Unified DeepInsight runtime (shared drivers, resource protocols, and trace identity) → Localizable cross-layer diagnostic traces.</p>
<p><strong>Methodology:</strong> The authors deployed the infrastructure across a three-layer embodied humanoid stack, configuring it to onboard new benchmarks while reproducing existing peer-framework results.</p>
<p><strong>Results:</strong> DeepInsight enables cross-layer regression localization on a single trace, reproduces published references within its own spread, and achieves near-linear scaling across multiple nodes.</p>
<p><strong>Limitations:</strong> The abstract does not explicitly state limitations, but the scope is currently focused on the three layers of an embodied humanoid stack.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17574" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-17" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17577">Surrogate Assisted Pedestrian Protection Design via a Foundation Model Orchestrated Workflow</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Osamu Ito, Akihiko Katagiri, Yoshikazu Nakagawa, Shin Saeki, Jun Shiraishi, Masato Sasaki
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17577" target="_blank" rel="noopener noreferrer">2606.17577</a></p>
<p class="paper-detail"><strong>Authors:</strong> Osamu Ito, Akihiko Katagiri, Yoshikazu Nakagawa, Shin Saeki, Jun Shiraishi, Masato Sasaki</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">AI-driven engineering workflows face particular challenges in crash safety design: unlike aerodynamics, crash events involve highly nonlinear contact dynamics, material nonlinearity, and discrete state transitions that are difficult to capture with data-driven surrogate models. To the best of our knowledge, we present the first foundation model--orchestrated workflow for crash safety design that enables surrogate-assisted exploration for pedestrian protection, reducing evaluation time from hours per CAE simulation to seconds.   The workflow integrates four components: (1) a surrogate trained on CAE crash simulations to predict pedestrian leg injury metrics from design parameters, achieving an average $R^2=0.87$ and providing distribution-free conformal prediction intervals; (2) multiobjective evolutionary search (NSGA-II) to discover diverse feasible parameter sets under user-specified constraints; (3) a morphing-based geometry generator that maps parameters to topology-preserving 3D shapes; and (4) a natural-language interface in which an LLM orchestrates the workflow and a vision--language model supports semantic comparison of generated designs.   In an automotive front-bumper case study, the workflow produces 35 distinct safety-compliant alternatives from a single exploration, a process that would require weeks with conventional CAE iteration. These results suggest that foundation models can serve as integration layers between ML surrogates and physics-based simulation, helping bring AI capabilities to safety-critical engineering domains.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper presents the first foundation model-orchestrated workflow for crash safety design, enabling surrogate-assisted exploration for pedestrian protection. It demonstrates a significant reduction in evaluation time from hours per CAE simulation to seconds.</p>
<p><strong>Core Idea:</strong> Foundation models can serve as an integration layer between machine learning surrogates and physics-based simulations to automate complex engineering workflows.</p>
<p><strong>Technique:</strong> The workflow integrates a CAE-trained surrogate model, a multiobjective evolutionary algorithm (NSGA-II), a morphing-based geometry generator, and a natural-language/vision-language model interface.</p>
<p><strong>Pipeline:</strong> User natural language constraints → LLM orchestration → NSGA-II optimization via surrogate model → Morphing-based 3D geometry generation → VLM-supported semantic comparison of designs</p>
<p><strong>Methodology:</strong> The authors developed a surrogate model achieving an R2 of 0.87 with conformal prediction intervals, coupled with an automated pipeline that maps design parameters to topology-preserving 3D shapes.</p>
<p><strong>Results:</strong> The workflow produced 35 distinct safety-compliant bumper alternatives from a single exploration, a task that would typically require weeks of conventional CAE iterations.</p>
<p><strong>Limitations:</strong> The study focuses on a specific automotive front-bumper case and the generalizability of the foundation model orchestration to other highly nonlinear crash dynamics remains to be fully explored.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17577" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="speech">Speech</h4>

<div class="paper-item" data-date="2026-06-17" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span><span class="cat-tag cat-default" title="cs.SD">cs.SD</span></span>
      <span class="paper-date">17 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17339">SpeechDx: A Multi-Task Benchmark for Clinical Speech AI</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Sejal Bhalla, Larry Kieu, Aina Merchant, Eyal de Lara, Alex Mariakakis
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17339" target="_blank" rel="noopener noreferrer">2606.17339</a></p>
<p class="paper-detail"><strong>Authors:</strong> Sejal Bhalla, Larry Kieu, Aina Merchant, Eyal de Lara, Alex Mariakakis</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Speech offers a uniquely informative window into health by simultaneously engaging neurological, motor, respiratory, and vocal systems. Current clinical speech AI methods have largely progressed through isolated condition-specific studies, making results difficult to compare and generalization difficult to assess. We introduce SpeechDx, a large-scale benchmark for clinical speech AI spanning 12 datasets and 27 tasks across diverse health conditions. To enable evaluation across shared clinical mechanisms, SpeechDx structures tasks by the stage of speech production they disrupt: conceptualization, formulation, and articulation. The benchmark tests generalization by including tasks with limited labeled data and evaluating the same health condition across multiple datasets, distinguishing clinically meaningful patterns from dataset artefacts. We systematically evaluate 12 state-of-the-art audio encoders across all tasks and under zero-shot cross-condition transfer. Results show that large-scale speech models represent the strongest overall baselines, domain-specific models improve performance only on closely matched tasks, and no current representation generalizes reliably across the clinical speech landscape. SpeechDx establishes a shared evaluation framework for tracking progress toward general-purpose clinical speech representations</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The authors introduce SpeechDx, a large-scale multi-task benchmark comprising 12 datasets and 27 tasks designed to evaluate the generalization of clinical speech AI.</p>
<p><strong>Core Idea:</strong> By structuring tasks around the stages of speech production (conceptualization, formulation, and articulation), the benchmark allows for the evaluation of shared clinical mechanisms across diverse health conditions.</p>
<p><strong>Technique:</strong> The benchmark utilizes a multi-task evaluation framework that tests audio encoders on both high-resource and low-resource tasks, including zero-shot cross-condition transfer.</p>
<p><strong>Pipeline:</strong> Clinical speech audio → Multi-task evaluation across 12 datasets and 27 tasks → Performance metrics for clinical representation generalization</p>
<p><strong>Methodology:</strong> The researchers systematically evaluated 12 state-of-the-art audio encoders across the SpeechDx benchmark to distinguish between dataset-specific artifacts and clinically meaningful patterns.</p>
<p><strong>Results:</strong> Large-scale speech models provided the strongest overall baselines, while domain-specific models only improved performance on closely matched tasks; no current representation generalizes reliably across the entire clinical landscape.</p>
<p><strong>Limitations:</strong> The study highlights that current models lack reliable cross-condition generalization, leaving an open need for general-purpose clinical speech representations.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17339" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h3 id="personal-interests">Personal Interests</h3>

<p class="section-desc">Papers discovered through your interest topics.</p>

<h4 id="multi-agent-systems">Multi-Agent Systems</h4>

<div class="paper-item" data-date="2026-06-16" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Multiagent Systems (cs.MA)">Multiagent Systems (cs.MA)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.18065">Intelligence Entropy Principle and the ADE Stability Engineering Framework</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Dexing Liu
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.18065" target="_blank" rel="noopener noreferrer">2606.18065</a></p>
<p class="paper-detail"><strong>Authors:</strong> Dexing Liu</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">As LLM-driven multi-agent systems (MAS) transition from lab to production, system behavior exhibits nonlinear degradation. We introduce the Intelligence Entropy Principle: probability-driven systems spontaneously drift toward disorder, formalized as S(t) = S0 * exp(alpha*t/Cm), where Cm is a model capability coefficient we propose. Lyapunov analysis yields the stabilization condition lambda &gt; alpha/Cm. We construct the ADE (Agent Delivery Engineering) four-layer framework (L1 Physical Laws through L4 User Adaptation) with 23 core components. Validation spans 100K-scale experiments and 33.6 days of production monitoring. We propose a Five-Layer Disorder Taxonomy unifying failures under structural collapse, and present Elastic Organization as an original MAS morphology. Results: channel fracture reduced from 69-98% to near 0%; system death probability below 0.02%.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces the Intelligence Entropy Principle to model nonlinear degradation in multi-agent systems and proposes the ADE framework to engineer system stability.</p>
<p><strong>Core Idea:</strong> Multi-agent systems spontaneously drift toward disorder over time, a phenomenon that can be mathematically modeled and mitigated through specific stability engineering.</p>
<p><strong>Technique:</strong> The authors use Lyapunov analysis to derive stabilization conditions and develop a four-layer ADE framework with 23 core components and an Elastic Organization morphology.</p>
<p><strong>Pipeline:</strong> Multi-agent system behavior → Intelligence Entropy modeling and Lyapunov analysis → ADE framework application → Stabilized production system</p>
<p><strong>Methodology:</strong> The research combines theoretical formalization of entropy drift with empirical validation across 100K-scale experiments and 33.6 days of production monitoring.</p>
<p><strong>Results:</strong> Channel fracture was reduced from 69-98% to near 0%, and the system death probability was maintained below 0.02%.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the specific constraints of the model capability coefficient (Cm) across different types of non-LLM agents.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.18065" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-16" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Multiagent Systems (cs.MA)">Multiagent Systems (cs.MA)</span><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17962">A Neuro-Symbolic Approach to Strategy Synthesis for Strategic Logics</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Marco Aruta, Vadim Malvone, Aniello Murano, Domenico Parente, Luca Rizzuti
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17962" target="_blank" rel="noopener noreferrer">2606.17962</a></p>
<p class="paper-detail"><strong>Authors:</strong> Marco Aruta, Vadim Malvone, Aniello Murano, Domenico Parente, Luca Rizzuti</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Reasoning about what agents can achieve through strategic interaction is a core challenge in Multi-Agent Systems (MAS). Logics for strategic ability, such as ATL, provide rigorous methods, but their adoption is often hindered by the computational cost of strategy synthesis. We introduce a neuro-symbolic framework that integrates large language models (LLMs) into the model-checking pipeline for MAS. The LLM acts as a strategy-generation oracle, proposing candidate strategies that are then formally validated by a standard MAS model checker. This generate-and-certify architecture uses LLM guidance to navigate large combinatorial strategy spaces while preserving formal soundness: generated strategies are accepted only when certified by the verifier. We instantiate the framework for bounded strategic reasoning in NatATL and introduce the first NatATL strategy-synthesis dataset, consisting of 4211 instances. Experiments with an open-weight Qwen3-32B model show that our certified pipeline achieves 92\% accuracy on strategy-synthesis outcomes.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces a neuro-symbolic framework for strategy synthesis in Multi-Agent Systems (MAS) and provides the first NatATL strategy-synthesis dataset.</p>
<p><strong>Core Idea:</strong> The authors propose a 'generate-and-certify' architecture that combines the creative reasoning of Large Language Models (LLMs) with the formal rigor of model checkers.</p>
<p><strong>Technique:</strong> The framework uses an LLM as a strategy-generation oracle to propose candidates, which are then formally validated by a standard MAS model checker to ensure soundness.</p>
<p><strong>Pipeline:</strong> MAS problem instance → LLM strategy generation → Formal model checker verification → Certified strategy output</p>
<p><strong>Methodology:</strong> The researchers instantiated the framework for bounded strategic reasoning in NatATL and evaluated it using an open-weight Qwen3-32B model on a new dataset of 4,211 instances.</p>
<p><strong>Results:</strong> The certified pipeline achieved 92% accuracy on strategy-synthesis outcomes using the Qwen3-32B model.</p>
<p><strong>Limitations:</strong> The paper focuses on bounded strategic reasoning and the scalability of the LLM-guided search in even larger combinatorial spaces remains an area for further exploration.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17962" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-16" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Multiagent Systems (cs.MA)">Multiagent Systems (cs.MA)</span><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-default" title="Databases (cs.DB)">Databases (cs.DB)</span><span class="cat-tag cat-se" title="Software Engineering (cs.SE)">Software Engineering (cs.SE)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.17915">Trustworthy Self-Composable Big-Data-as-a-Service: An LLM-Orchestrated Multi-Agent Framework for Automated Data Engineering, AutoML, MLOps Deployment, and Drift-Aware Lifecycle Optimization</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Aueaphum Aueawatthanaphisut, Badri Raj Lamichhane
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.17915" target="_blank" rel="noopener noreferrer">2606.17915</a></p>
<p class="paper-detail"><strong>Authors:</strong> Aueaphum Aueawatthanaphisut, Badri Raj Lamichhane</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Big-Data-as-a-Service (BDaaS) platforms require re liable automation across data ingestion, cleaning, feature engi neering, model development, deployment, and post-deployment monitoring. However, existing LLM-based data science agents and AutoML systems mainly focus on isolated workflow stages, leaving limited support for lifecycle-level orchestration, artifact governance, human oversight, and drift-aware adaptation. This paper proposes a trustworthy self-composable BDaaS frame work based on LLM-orchestrated multi-agent collaboration. The proposed architecture decomposes the BDaaS lifecycle into specialized agents for data ingestion, data cleaning, feature engineering, AutoML training, model evaluation, MLOps de ployment, monitoring, and drift detection. A central LLM or chestration layer coordinates agent execution, validates interme diate outputs, manages workflow context, and enables dynamic workflow composition. The framework also incorporates shared artifact governance, reproducibility support, human-in-the-loop checkpoints, and drift-aware feedback loops. A prototype-based evaluation is conducted using controlled tabular benchmark datasets with missing values, categorical variables, outliers, class imbalance, and simulated covariate drift. Compared with manual ML, AutoML-only, and single-agent LLM baselines, the pro posed multi-agent BDaaS pipeline achieves competitive predictive performance while improving lifecycle-level reliability, including workflow completion, artifact traceability, deployment readiness, reproducibility, and drift recovery. The results suggest that LLM-orchestrated multi-agent systems can extend conventional AutoML toward trustworthy, adaptive, and production-oriented BDaaS lifecycle automation.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces a trustworthy, self-composable Big-Data-as-a-Service (BDaaS) framework that automates the entire machine learning lifecycle through LLM-orchestrated multi-agent collaboration. It addresses the limitations of isolated AutoML systems by integrating artifact governance, human-in-the-loop checkpoints, and drift-aware lifecycle optimization.</p>
<p><strong>Core Idea:</strong> Decompose the complex BDaaS lifecycle into specialized autonomous agents coordinated by a central LLM orchestration layer to ensure end-to-end reliability and traceability.</p>
<p><strong>Technique:</strong> An LLM-orchestrated multi-agent architecture where a central controller manages specialized agents for ingestion, cleaning, feature engineering, AutoML, deployment, and monitoring.</p>
<p><strong>Pipeline:</strong> Raw Big Data → LLM-Orchestrated Multi-Agent Pipeline (Ingestion, Cleaning, Feature Engineering, AutoML, Deployment, Monitoring) → Production-Ready Models with Drift Recovery</p>
<p><strong>Methodology:</strong> The authors developed a prototype framework and evaluated it against manual ML, AutoML-only, and single-agent LLM baselines using tabular benchmark datasets with simulated covariate drift.</p>
<p><strong>Results:</strong> The framework achieved competitive predictive performance while significantly improving lifecycle-level reliability, including higher workflow completion rates, better artifact traceability, and superior drift recovery compared to baselines.</p>
<p><strong>Limitations:</strong> The study is based on a prototype evaluation using controlled tabular datasets, leaving questions regarding scalability to massive unstructured data and the computational costs of continuous LLM orchestration.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.17915" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h2 id="tech-news">Tech News</h2>

<h3 id="agentic-ai-1">Agentic AI</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--nvidia">NVIDIA Technical Blog</span>
    <span class="news-date">2026-06-16</span>
  </div>
  <a class="news-title" href="https://developer.nvidia.com/blog/building-ai-agents-for-ar-glasses-and-xr-devices-with-nvidia-xr-ai/" target="_blank" rel="noopener noreferrer">Building AI Agents for AR Glasses and XR Devices with NVIDIA XR AI</a>
  <p class="news-summary">NVIDIA is addressing the infrastructure gap for AR and XR wearable devices by introducing tools to build integrated AI agents. The initiative focuses on enabling real-time AI experiences by combining live sensor data with large language models. It provides developers with the necessary framework to deploy sophisticated, context-aware agents on wearable hardware.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">XR</span><span class="news-tag">AI Agents</span><span class="news-tag">NVIDIA</span><span class="news-tag">Wearables</span><span class="news-tag">Edge Computing</span></div>
    <a class="news-read-btn" href="https://developer.nvidia.com/blog/building-ai-agents-for-ar-glasses-and-xr-devices-with-nvidia-xr-ai/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="computing-systems">Computing Systems</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Wed, 17 Ju</span>
  </div>
  <a class="news-title" href="https://runtimewire.com/article/openai-leaked-financials-altman-compute-burn" target="_blank" rel="noopener noreferrer">Leaked OpenAI financials show $38.5B loss and compute burn</a>
  <p class="news-summary">Leaked financial documents from OpenAI reveal a staggering $38.5 billion loss, highlighting the massive capital expenditure required for infrastructure. The data underscores the extreme &#x27;compute burn&#x27; associated with training and maintaining large-scale frontier models. This provides a clear look at the economic sustainability and high-cost barriers of the current AI arms race.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">OpenAI</span><span class="news-tag">Financials</span><span class="news-tag">Compute</span><span class="news-tag">Infrastructure</span><span class="news-tag">AI Economics</span></div>
    <a class="news-read-btn" href="https://runtimewire.com/article/openai-leaked-financials-altman-compute-burn" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Tue, 16 Ju</span>
  </div>
  <a class="news-title" href="https://nlnet.nl/news/2026/20260616-67-new-projects.html" target="_blank" rel="noopener noreferrer">NLnet announces funding for 67 more open-source projects</a>
  <p class="news-summary">NLnet has announced funding for 67 new open-source projects aimed at strengthening the digital infrastructure of the internet. This initiative focuses on decentralized technologies, privacy, and open-source software development. The funding supports a diverse range of projects to ensure a more resilient and open digital ecosystem.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Open Source</span><span class="news-tag">Infrastructure</span><span class="news-tag">Decentralization</span><span class="news-tag">Funding</span><span class="news-tag">Digital Rights</span></div>
    <a class="news-read-btn" href="https://nlnet.nl/news/2026/20260616-67-new-projects.html" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="llm-1">LLM</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Tue, 16 Ju</span>
  </div>
  <a class="news-title" href="https://writings.stephenwolfram.com/2026/06/launching-version-15-of-wolfram-language-mathematica-built-in-useful-ai-lots-of-new-core-functionality/" target="_blank" rel="noopener noreferrer">Wolfram Language and Mathematica Version 15, AI Assistant, Symbolic Music, More</a>
  <p class="news-summary">Wolfram has launched Version 15 of the Wolfram Language and Mathematica, featuring a built-in AI assistant and significant core functionality updates. The release emphasizes the integration of AI with symbolic computation, including new capabilities for symbolic music and advanced mathematical modeling. It represents a major step in merging traditional computational intelligence with modern LLM capabilities.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Wolfram Language</span><span class="news-tag">Mathematica</span><span class="news-tag">AI Assistant</span><span class="news-tag">Symbolic Computation</span><span class="news-tag">Software Update</span></div>
    <a class="news-read-btn" href="https://writings.stephenwolfram.com/2026/06/launching-version-15-of-wolfram-language-mathematica-built-in-useful-ai-lots-of-new-core-functionality/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h2 id="github-trending">
  <svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="20" height="20" style="vertical-align:middle;margin-right:6px"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z" /></svg>
  GitHub Trending
</h2>

<p class="section-desc">Trending repositories on GitHub filtered and scored for relevance to your interests.</p>

<h3 id="agentic-ai-2">Agentic AI</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/moorcheh-ai/memanto" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">moorcheh-ai</span><span class="gh-sep">/</span><strong class="gh-repo">memanto</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Memanto provides a persistent, local memory layer for AI agents, enabling them to recall past interactions and maintain long-term context. It is highly relevant as it addresses the &#x27;forgetfulness&#x27; of LLMs by providing a seamless, no-backend-required memory infrastructure for multi-agent systems.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">agent-memory</span><span class="gh-tag">long-term-memory</span><span class="gh-tag">RAG</span><span class="gh-tag">LLM-memory</span><span class="gh-tag">stateful-ai</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-16</span>
      <a class="gh-visit-btn" href="https://github.com/moorcheh-ai/memanto" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/andrewyng/aisuite" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">andrewyng</span><span class="gh-sep">/</span><strong class="gh-repo">aisuite</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Aisuite provides a unified interface for multiple LLM providers and a high-level Agents API for building tool-augmented systems. It is highly relevant as it offers a production-ready framework for multi-turn loops, toolkits, and Model Context Protocol (MCP) integration.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">LLM</span><span class="gh-tag">Multi-Agent Systems</span><span class="gh-tag">Tool Use</span><span class="gh-tag">Foundation Models</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-14</span>
      <a class="gh-visit-btn" href="https://github.com/andrewyng/aisuite" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/rohitg00/ai-engineering-from-scratch" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">rohitg00</span><span class="gh-sep">/</span><strong class="gh-repo">ai-engineering-from-scratch</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">A comprehensive, end-to-end curriculum that teaches AI engineering by building components from scratch, including backpropagation, attention mechanisms, and autonomous swarms. It is highly relevant as it provides a deep-dive into the underlying mechanics of LLMs and multi-agent systems.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">LLM</span><span class="gh-tag">Multi-Agent Systems</span><span class="gh-tag">Deep Learning</span><span class="gh-tag">AI Engineering</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-14</span>
      <a class="gh-visit-btn" href="https://github.com/rohitg00/ai-engineering-from-scratch" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/datawhalechina/hello-agents" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">datawhalechina</span><span class="gh-sep">/</span><strong class="gh-repo">hello-agents</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">A comprehensive, systematic tutorial for building AI-native agents from scratch, covering core principles, classic paradigms like ReAct, and multi-agent systems. It is highly relevant as it provides a deep dive into agentic workflows, memory systems, and even Agentic RL (SFT to GRPO) training.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Multi-Agent Systems</span><span class="gh-tag">LLM</span><span class="gh-tag">RAG</span><span class="gh-tag">Agentic AI</span><span class="gh-tag">Reinforcement Learning</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-11</span>
      <a class="gh-visit-btn" href="https://github.com/datawhalechina/hello-agents" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/karpathy/autoresearch" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">karpathy</span><span class="gh-sep">/</span><strong class="gh-repo">autoresearch</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository implements an autonomous research framework where AI agents iteratively modify and optimize LLM training code (nanochat) over fixed time budgets. It is highly relevant as it explores the frontier of self-improving systems and automated machine learning (AutoML) through agentic workflows.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">LLM</span><span class="gh-tag">MLOps</span><span class="gh-tag">Autonomous Agents</span><span class="gh-tag">Foundation Models</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-03-26</span>
      <a class="gh-visit-btn" href="https://github.com/karpathy/autoresearch" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/Panniantong/Agent-Reach" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">Panniantong</span><span class="gh-sep">/</span><strong class="gh-repo">Agent-Reach</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Agent-Reach provides a unified CLI tool that allows AI agents to access and scrape data from major social and content platforms without API fees. It is highly relevant for building autonomous agents that require real-time web browsing and multi-platform information retrieval.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">ai-agents</span><span class="gh-tag">web-scraping</span><span class="gh-tag">llm-tools</span><span class="gh-tag">automation</span><span class="gh-tag">information-retrieval</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-16</span>
      <a class="gh-visit-btn" href="https://github.com/Panniantong/Agent-Reach" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/openai/openai-cookbook" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">openai</span><span class="gh-sep">/</span><strong class="gh-repo">openai-cookbook</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides comprehensive guides and production-ready examples for utilizing the OpenAI API. It is highly relevant for exploring LLM capabilities, RAG patterns, and building agentic workflows.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">LLM</span><span class="gh-tag">RAG</span><span class="gh-tag">OpenAI API</span><span class="gh-tag">Agentic AI</span><span class="gh-tag">Python</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-16</span>
      <a class="gh-visit-btn" href="https://github.com/openai/openai-cookbook" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/google-research/timesfm" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">google-research</span><span class="gh-sep">/</span><strong class="gh-repo">timesfm</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">TimesFM is a decoder-only foundation model developed by Google Research for time-series forecasting. It is highly relevant as it supports agentic calling and includes specific support for agents, making it a core component for time-series tasks within multi-agent systems.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">foundation models</span><span class="gh-tag">time-series</span><span class="gh-tag">transformer</span><span class="gh-tag">agentic AI</span><span class="gh-tag">fine-tuning</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-12</span>
      <a class="gh-visit-btn" href="https://github.com/google-research/timesfm" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="computing-systems-1">Computing Systems</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/alibaba/zvec" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">alibaba</span><span class="gh-sep">/</span><strong class="gh-repo">zvec</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Computing Systems</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Zvec is a high-performance, in-process vector database designed for low-latency similarity search and hybrid retrieval. It is highly relevant for RAG and Agentic AI workflows as it allows for efficient embedding storage and multi-query capabilities directly within an application&#x27;s memory space.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">vector-database</span><span class="gh-tag">RAG</span><span class="gh-tag">similarity-search</span><span class="gh-tag">hybrid-retrieval</span><span class="gh-tag">computing-systems</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-17</span>
      <a class="gh-visit-btn" href="https://github.com/alibaba/zvec" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="general">General</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/mrdbourke/zero-to-mastery-ml" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">mrdbourke</span><span class="gh-sep">/</span><strong class="gh-repo">zero-to-mastery-ml</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">General</span>
      <span class="gh-relevance" title="Relevance 3/5">★★★<span class="gh-relevance-empty">★★</span> <span class="gh-rel-num">3/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository contains comprehensive educational materials for a foundational machine learning and data science course. It is relevant as it provides the core mathematical and practical groundwork necessary for understanding more advanced topics like deep learning and robotics.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">machine-learning</span><span class="gh-tag">data-science</span><span class="gh-tag">deep-learning</span><span class="gh-tag">education</span><span class="gh-tag">jupyter-notebook</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2024-10-30</span>
      <a class="gh-visit-btn" href="https://github.com/mrdbourke/zero-to-mastery-ml" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="llm-2">LLM</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/langchain-ai/rag-from-scratch" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">langchain-ai</span><span class="gh-sep">/</span><strong class="gh-repo">rag-from-scratch</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">LLM</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a comprehensive, step-by-step guide to building Retrieval-Augmented Generation (RAG) systems from the ground up. It is highly relevant as it covers core components like indexing, retrieval, and generation, which are foundational for building advanced LLM applications and agentic systems.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">RAG</span><span class="gh-tag">LLM</span><span class="gh-tag">NLP</span><span class="gh-tag">Vector Embeddings</span><span class="gh-tag">In-context Learning</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2025-06-26</span>
      <a class="gh-visit-btn" href="https://github.com/langchain-ai/rag-from-scratch" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="mlops">MLOps</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/vllm-project/vllm" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">vllm-project</span><span class="gh-sep">/</span><strong class="gh-repo">vllm</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">MLOps</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">vLLM is a high-throughput inference engine that optimizes LLM serving through PagedAttention and continuous batching. It is a foundational tool for deploying large language models efficiently, supporting diverse architectures and hardware for production-grade AI systems.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">LLM serving</span><span class="gh-tag">PagedAttention</span><span class="gh-tag">MLOps</span><span class="gh-tag">Inference Optimization</span><span class="gh-tag">Distributed Systems</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-17</span>
      <a class="gh-visit-btn" href="https://github.com/vllm-project/vllm" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="speech-1">Speech</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/OpenBMB/VoxCPM" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">OpenBMB</span><span class="gh-sep">/</span><strong class="gh-repo">VoxCPM</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Speech</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">VoxCPM2 is a tokenizer-free text-to-speech model designed for high-quality multilingual speech generation and voice cloning. It is highly relevant to the user&#x27;s interest in speech, multimodal learning, and generative models.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">text-to-speech</span><span class="gh-tag">voice-cloning</span><span class="gh-tag">multimodal</span><span class="gh-tag">generative models</span><span class="gh-tag">speech-synthesis</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-10</span>
      <a class="gh-visit-btn" href="https://github.com/OpenBMB/VoxCPM" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>]]></content><author><name>hiimmuc</name></author><summary type="html"><![CDATA[Today's digest is dominated by the evolution of agentic workflows, specifically focusing on multi-agent architectures, self-evolving capabilities, and the rigorous benchmarking of complex decision-making in specialized domains.]]></summary></entry><entry><title type="html">Daily Digest 2026-06-16</title><link href="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-16/" rel="alternate" type="text/html" title="Daily Digest 2026-06-16" /><published>2026-06-16T00:00:00+07:00</published><updated>2026-06-16T00:00:00+07:00</updated><id>https://hiimmuc.github.io/Personal-AI-Digest/digest/daily</id><content type="html" xml:base="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-16/"><![CDATA[<div class="digest-theme">
  <svg class="digest-theme-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 1.5a6.5 6.5 0 1 0 0 13 6.5 6.5 0 0 0 0-13zM0 8a8 8 0 1 1 16 0A8 8 0 0 1 0 8z" /><path d="M6.5 7.75A.75.75 0 0 1 7.25 7h1a.75.75 0 0 1 .75.75v2.75h.25a.75.75 0 0 1 0 1.5h-2a.75.75 0 0 1 0-1.5h.25v-2h-.25a.75.75 0 0 1-.75-.75zM8 6a1 1 0 1 1 0-2 1 1 0 0 1 0 2z" /></svg>
  <span>Today's digest highlights a shift toward the operational reliability of autonomous agents, focusing on safety benchmarks, multi-agent trust dynamics, and the structural integrity of agentic workflows.</span>
</div>

<h2 id="global-trends">Global Trends</h2>

<h3 id="arxiv-subjects">Papers discovered from ArXiv subject categories</h3>

<h4 id="ai-safety">AI Safety</h4>

<div class="paper-item" data-date="2026-06-16" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.15034">OSGuard: A Benchmark for Safety in Computer-Use Agents</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Mina Mohammadmirzaei, Jeffrey Flanigan
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.15034" target="_blank" rel="noopener noreferrer">2606.15034</a></p>
<p class="paper-detail"><strong>Authors:</strong> Mina Mohammadmirzaei, Jeffrey Flanigan</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Computer-use agents are increasingly evaluated by whether they complete realistic desktop and web tasks. However, task success alone can miss failures in which an agent reaches the nominal goal through an unsafe shortcut. We introduce OSGuard, a dual-granularity benchmark suite for evaluating safety in computer-use agents under benign, unchanged user instructions. OSGuard contains an action-level benchmark for local guardrail decisions and a risk-augmented execution suite for end-to-end evaluation. The action-level benchmark consists of contextualized proposed actions labeled as allowed, unrelated, or unsafe, each judged relative to the original instruction and current interface state. The execution suite contains manually constructed OSWorld-derived task variants in which the original task remains achievable, but the environment is modified to introduce latent hazards such as destructive overwrites, etc. Each variant is paired with augmented evaluators that retain the original task-success criterion while adding explicit state-based safety invariants, allowing us to distinguish safe completions from unsafe completions that satisfy the nominal task objective. Our experimental results on OSGuard show that current multimodal guardrails can perform well on isolated action judgments, while risk-augmented execution exposes remaining gaps between local oversight and reliable end-to-end safety. This dual-granularity design enables more precise diagnosis of whether models can both recognize unsafe proposed actions and improve full-task safety when deployed as guardrails.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces OSGuard, a dual-granularity benchmark designed to evaluate the safety of computer-use agents by distinguishing between successful task completion and unsafe shortcuts.</p>
<p><strong>Core Idea:</strong> Safety evaluation must go beyond task success to identify instances where agents achieve goals through hazardous actions, requiring both action-level oversight and end-to-end risk-augmented testing.</p>
<p><strong>Technique:</strong> The authors developed a dual-granularity framework consisting of an action-level benchmark for local guardrail decisions and a risk-augmented execution suite for end-to-end safety evaluation.</p>
<p><strong>Pipeline:</strong> User instruction and interface state → Proposed agent action → Action-level safety judgment (allowed/unrelated/unsafe) OR Risk-augmented task execution → Task success + safety invariant verification.</p>
<p><strong>Methodology:</strong> The authors manually constructed OSWorld-derived task variants with latent hazards and paired them with augmented evaluators that check for state-based safety invariants alongside nominal task completion.</p>
<p><strong>Results:</strong> Experimental results show that while multimodal guardrails perform well on isolated action judgments, risk-augmented execution reveals significant gaps in reliable end-to-end safety.</p>
<p><strong>Limitations:</strong> The study highlights the remaining gap between local oversight and full-task safety, suggesting that recognizing an unsafe action does not always translate to safe end-to-end behavior.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.15034" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-16" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14838">A Definition of Good Explanations and the Challenges Explaining LLM Outputs</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Louis Mahon, Elliot Ford, Callum Hackett
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14838" target="_blank" rel="noopener noreferrer">2606.14838</a></p>
<p class="paper-detail"><strong>Authors:</strong> Louis Mahon, Elliot Ford, Callum Hackett</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">How to define a good explanation is a long-standing philosophical debate which has found recent renewed interest in the context of AI outputs. Explainability is crucial for AI adoption in many contexts, but in order to produce good explanations of AI systems, we must first have an understanding of what good explanations are. In this paper we propose a definition inspired by the notion of counterfactual explanations, however we argue that one must also take into account the interlocutor's prior beliefs in each fact that could be offered in an explanation. We explore the ramifications of this definition for AI explainability and, in particular, why LLM outputs are difficult to produce good explanations for.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper proposes a formal definition of a 'good explanation' that integrates counterfactual reasoning with the interlocutor's prior beliefs. It further identifies specific structural challenges in providing such explanations for Large Language Model (LLM) outputs.</p>
<p><strong>Core Idea:</strong> A good explanation is not just a causal link but a piece of information that changes an interlocutor's belief state by addressing the specific facts they already hold as true.</p>
<p><strong>Technique:</strong> The authors utilize a philosophical framework combining counterfactual logic with epistemic modeling of the user's prior knowledge.</p>
<p><strong>Pipeline:</strong> User's prior beliefs + AI output → Counterfactual analysis of facts → Explanation that addresses belief gaps → Informed user understanding</p>
<p><strong>Methodology:</strong> The authors conduct a theoretical and philosophical analysis to derive a definition of explainability and apply this definition to evaluate the current state of LLM interpretability.</p>
<p><strong>Results:</strong> The research highlights that LLMs are difficult to explain because their internal processes often lack the clear, discrete causal chains required to satisfy the proposed definition of a 'good' explanation.</p>
<p><strong>Limitations:</strong> The paper focuses on the theoretical definition and conceptual challenges, leaving the practical implementation of these explanations in real-time AI systems as an open question.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14838" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-16" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-default" title="Computer Science and Game Theory (cs.GT)">Computer Science and Game Theory (cs.GT)</span><span class="cat-tag cat-physics" title="physics.soc-ph">physics.soc-ph</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.15078">Cognitive Debt: AI as Intellectual Leverage and the Dynamics of Systemic Fragility</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Shuchen Meng
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.15078" target="_blank" rel="noopener noreferrer">2606.15078</a></p>
<p class="paper-detail"><strong>Authors:</strong> Shuchen Meng</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">We develop a formal theory of cognitive debt: the stock of unverified reasoning obligations that accumulates when individuals use AI as a substitute rather than a complement for first-principles cognition. The model features two state variables per agent, cognitive capital and cognitive debt, and a multiplicative production technology in which cognitive capital functions as collateral that determines the return to AI adoption. We establish six propositions. Rational agents incur positive cognitive debt because the costs are deferred, partially external, and masked by short-run productivity gains. Tranquil periods lower subjective risk assessments, raise AI substitution intensity, and compound leverage, generating a cognitive Minsky moment in which subjective risk falls while true systemic fragility rises. Expected crisis losses are convex in aggregate leverage. Post-crisis, output-target pressure can produce a false-correction loop in which agents patch AI failures with more AI. The decentralised equilibrium over-adopts substitutive AI relative to the social optimum because of systemic risk, cognitive public goods, and arms-race externalities. In a two-type heterogeneous-agent economy, high-cognitive-capital agents adopt AI more intensively and may eventually erode their unaided cognitive capital below that of initially lower-skilled agents.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces a formal theory of 'cognitive debt' to model how using AI as a substitute for first-principles reasoning creates systemic fragility. It identifies a 'cognitive Minsky moment' where deferred costs and short-run productivity gains mask rising systemic risk.</p>
<p><strong>Core Idea:</strong> AI adoption functions as intellectual leverage where cognitive capital acts as collateral; over-reliance on AI creates a stock of unverified reasoning obligations that can lead to non-linear systemic collapses.</p>
<p><strong>Technique:</strong> The authors develop a formal economic model featuring two state variables (cognitive capital and cognitive debt) and a multiplicative production technology.</p>
<p><strong>Pipeline:</strong> AI adoption as substitution → accumulation of unverified reasoning obligations (cognitive debt) → masked short-run productivity gains → systemic fragility → cognitive Minsky moment/crisis.</p>
<p><strong>Methodology:</strong> The research employs a formal economic modeling approach, establishing six propositions and analyzing a two-type heterogeneous-agent economy to determine equilibrium outcomes.</p>
<p><strong>Results:</strong> Key findings include: rational agents incur debt due to deferred costs; tranquil periods compound leverage; crisis losses are convex in aggregate leverage; and high-capital agents may eventually erode their unaided skills below those of lower-skilled agents.</p>
<p><strong>Limitations:</strong> The model focuses on theoretical equilibrium and systemic dynamics, leaving open questions regarding specific empirical measurements of 'cognitive capital' and the exact threshold of the Minsky moment in real-world systems.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.15078" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="agentic-ai">Agentic AI</h4>

<div class="paper-item" data-date="2026-06-16" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14885">Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Yi Lu, Zhuofeng Li, Ping Nie, Haoxiang Zhang, Yuyu Zhang, Kai Zou, Wenhu Chen, Jimmy Lin, Dongfu Jiang, Yu Zhang
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14885" target="_blank" rel="noopener noreferrer">2606.14885</a></p>
<p class="paper-detail"><strong>Authors:</strong> Yi Lu, Zhuofeng Li, Ping Nie, Haoxiang Zhang, Yuyu Zhang, Kai Zou, Wenhu Chen, Jimmy Lin, Dongfu Jiang, Yu Zhang</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Agentic search over large corpora relies on retriever-mediated interfaces (e.g., BM25 or ColBERT) for scalable candidate discovery. While effective at ranking relevant documents, these interfaces expose evidence only as ranked results or bounded document views, limiting agents' ability to reorganize material and verify constraints across documents. Direct Corpus Interaction (DCI) addresses this limitation by exposing shell-executable corpus operations for flexible search, filtering, comparison, and verification. However, full-corpus terminal commands become slow and unstable as the corpus grows, degrading performance and efficiency. We introduce DR-DCI, a retriever-steered DCI framework that treats retrieval as an agent-callable action for expanding a local workspace. Rather than operating directly over the full corpus, the agent dynamically pulls relevant documents into an evolving workspace and conducts DCI operations within it. This design combines retriever-level recall with DCI-style precision: retrieval keeps exploration scalable, while DCI preserves the local operations needed for effective evidence resolution. Experiments show that DR-DCI is both effective and efficient across scales. On Browsecomp-Plus, DR-DCI reaches 71.2\% accuracy, improving over raw DCI and ablated variants by up to 8.3 points while reducing tool usage, wall time, and estimated cost. With workspace-preserving context reset, accuracy further improves to 73.3\%. In corpus-scaling experiments, DR-DCI remains effective from 100K to 10M documents, whereas raw DCI becomes unstable and BM25 performs substantially worse. DR-DCI also scales to a 20M-scale file-per-document Wiki-18 QA setting, achieving an average score of 63.0 across six benchmarks and outperforming retrieval-based and trained search-agent baselines. Ablation analysis further shows that ranked previews and inter-document DCI are key to performance.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces DR-DCI, a framework that enables agents to perform complex, multi-document operations over large corpora by combining retriever-based scalability with direct corpus interaction (DCI) precision.</p>
<p><strong>Core Idea:</strong> Instead of operating on a full corpus (too slow) or just ranked results (too limited), the agent dynamically expands a local workspace by pulling relevant documents into a shell-executable environment.</p>
<p><strong>Technique:</strong> The framework treats retrieval as an agent-callable action to populate a workspace, allowing the agent to perform flexible search, filtering, and cross-document verification within a manageable subset of data.</p>
<p><strong>Pipeline:</strong> Large Corpus → Retriever-steered Workspace Expansion → Agent-led DCI Operations (filtering, comparison, verification) → Final Answer</p>
<p><strong>Methodology:</strong> The authors implemented a retriever-steered DCI framework and evaluated it on Browsecomp-Plus and Wiki-18 datasets, comparing it against raw DCI, BM25, and trained search-agent baselines.</p>
<p><strong>Results:</strong> DR-DCI achieved 71.2% accuracy on Browsecomp-Plus (improving over raw DCI by up to 8.3 points) and maintained stability across 10M documents, outperforming retrieval-based baselines on Wiki-18.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the specific overhead of workspace management or the potential for 'workspace pollution' if the agent fails to prune irrelevant documents during expansion.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14885" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-16" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-default" title="cs.CY">cs.CY</span><span class="cat-tag cat-ai" title="Multiagent Systems (cs.MA)">Multiagent Systems (cs.MA)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14923">Trust Between AI Agents: Measuring Formation, Breakage, and Recovery, with Implications for Governing Multi-Agent Systems</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Yujiao Chen
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14923" target="_blank" rel="noopener noreferrer">2606.14923</a></p>
<p class="paper-detail"><strong>Authors:</strong> Yujiao Chen</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">As language-model agents increasingly work in teams, each agent must decide how much to trust its teammates. Yet we lack a standard way to measure trust between AI agents. We propose a behavioral measure based on costly verification. In a cooperative survival game, checking a teammate's work consumes resources, while trusting a wrong answer can be fatal. Relative to a memoryless version of the same model, reduced verification provides an observable measure of trust. Using this framework, we study trust formation, breakage, and recovery across six frontier model snapshots. When paired with a consistently reliable teammate, four snapshots (Claude Opus 4.6, Claude Sonnet 4.6, GPT-5.1, and Gemini 3.1 Pro) reduce verification by roughly 60-85%, whereas two smaller snapshots show little or no such adjustment. Failures reverse this discount, but models differ in how they respond. Some concentrate renewed scrutiny on the culprit, while others become more cautious toward the entire team. Recovery is slower than formation, and clustered failures sustain suspicion far longer than the same number of failures spread apart. These differences have practical consequences. Models that form trust verify less, decide more quickly, and achieve higher payoffs in our environment. By contrast, persistent over-verification is associated with indecision rather than safety. Our results show that trust dispositions can be measured before deployment and suggest that calibration, rather than maximal suspicion, should be the central concern in the governance of multi-agent AI systems.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces a behavioral framework to measure trust between AI agents using costly verification and evaluates how frontier models form, break, and recover trust in cooperative environments.</p>
<p><strong>Core Idea:</strong> Trust can be quantified by observing an agent's willingness to forgo costly verification of a teammate's actions based on past reliability.</p>
<p><strong>Technique:</strong> A cooperative survival game where agents must balance the resource cost of verification against the risk of accepting incorrect information.</p>
<p><strong>Pipeline:</strong> Multi-agent cooperative survival game → Observation of verification frequency relative to a memoryless baseline → Quantification of trust formation, breakage, and recovery dynamics.</p>
<p><strong>Methodology:</strong> The authors compared six frontier model snapshots, measuring the reduction in verification frequency when paired with reliable vs. unreliable teammates.</p>
<p><strong>Results:</strong> Four frontier models (Claude Opus 4.6, Claude Sonnet 4.6, GPT-5.1, and Gemini 3.1 Pro) reduced verification by 60-85% with reliable partners; recovery from trust breakage is slower than formation, and clustered failures sustain suspicion longer than isolated ones.</p>
<p><strong>Limitations:</strong> The study focuses on a specific cooperative survival game and may not fully capture trust dynamics in diverse real-world multi-agent architectures.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14923" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-16" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14935">PrologMCP: A Standardized Prolog Tool Interface for LLM Agents</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Agnieszka Mensfelt, Adarsh Prabhakaran, Adrian Haret, Vince Trencsenyi, Kostas Stathis
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14935" target="_blank" rel="noopener noreferrer">2606.14935</a></p>
<p class="paper-detail"><strong>Authors:</strong> Agnieszka Mensfelt, Adarsh Prabhakaran, Adrian Haret, Vince Trencsenyi, Kostas Stathis</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Frontier reasoning-tuned language models still fail on deductive tasks at depth, and the cost of improved performance through extended internal reasoning scales poorly. Symbolic delegation offers a complementary route: a language model translates the problem, while a solver performs the inference. However, current autoformalization pipelines for logic programming are typically bespoke integrations tied to particular tasks or agents. We introduce PrologMCP, a task-agnostic, open-source server that exposes Prolog as a stateful tool through the Model Context Protocol (MCP). Its compact tool interface, structured error reporting, and per-session isolation make the translate-run-inspect-repair loop a reusable primitive for MCP-capable agents. We evaluate a formalizer agent enhanced with PrologMCP against standard and reasoning LLMs (Claude Sonnet 4.6, GPT-4.1, and o4-mini) on two subsets of PARARULE-Plus: a general-purpose sample and a more challenging one targeting a specific failure mode of natural-language reasoning. On the general sample, the formalizer matches or exceeds reasoning LLMs (accuracy 1.00 vs.\ 1.00 / 0.998), with the largest gains over standard models (0.762 for GPT-4.1). On the challenging subset, the formalizer remains near-perfect (1.00 / 0.99) while reasoning LLMs drop to 0.95 / 0.94. These results suggest that delegating inference to Prolog via MCP is a robust and inspectable alternative to extended natural-language reasoning.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces PrologMCP, a task-agnostic, open-source server that exposes Prolog as a stateful tool via the Model Context Protocol (MCP) to enable symbolic delegation for LLM agents.</p>
<p><strong>Core Idea:</strong> Instead of relying on expensive and potentially unreliable internal reasoning in LLMs for deductive tasks, the system uses the LLM to translate problems into logic programs while delegating the actual inference to a symbolic solver.</p>
<p><strong>Technique:</strong> The authors implement a standardized tool interface that provides structured error reporting and per-session isolation, creating a reusable 'translate-run-inspect-repair' loop for MCP-capable agents.</p>
<p><strong>Pipeline:</strong> Natural language problem → LLM translation to Prolog → PrologMCP execution → Error reporting/Inspection → LLM repair (if needed) → Final result</p>
<p><strong>Methodology:</strong> The authors evaluated a formalizer agent equipped with PrologMCP against standard and reasoning LLMs (Claude Sonnet 4.6, GPT-4.1, and o4-mini) using two subsets of the PARARULE-Plus dataset.</p>
<p><strong>Results:</strong> The PrologMCP-enhanced formalizer achieved near-perfect accuracy (1.00) on both general and challenging subsets, significantly outperforming standard models (e.g., 0.762 for GPT-4.1) and maintaining stability where reasoning LLMs saw performance drops.</p>
<p><strong>Limitations:</strong> The paper focuses on logic programming and does not explicitly address the scalability of the translation step for extremely large-scale knowledge bases or non-logic symbolic domains.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14935" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-16" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.15077">Risk-Aware LLM Agents for Geospatial Data Retrieval: Design and Preliminary Adversarial Evaluation</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Kyle Gao, Joel Cumming, Jonathan Li, Linlin Xu, David A. Clausi
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.15077" target="_blank" rel="noopener noreferrer">2606.15077</a></p>
<p class="paper-detail"><strong>Authors:</strong> Kyle Gao, Joel Cumming, Jonathan Li, Linlin Xu, David A. Clausi</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">We present an LLM-driven framework for retrieving remote sensing data from cloud-based geospatial catalogues using natural language queries. The system converts user intent into structured API calls, enabling efficient access to satellite imagery and environmental datasets. The architecture integrates three agents: Guardrail for safety and policy enforcement, General-QA for intent interpretation, and Recommender-Analyst for schema-aware API call generation. This coordinated design ensures reliable, semantically aligned interaction with external data services. The modular framework is portable across platforms through API schema substitution and supports applications in environmental monitoring, disaster response, and climate analysis. It establishes a scalable interface between user intent and geospatial infrastructure, enabling streamlined and automated Earth observation workflows. Preliminary experiments under adversarial multi-turn settings show that prompt-level safety instructions improve robustness, although rare high-impact failures persist in API manipulation scenarios and highlight the need for adaptive, system-level defenses that balance safety, usability, and cost efficiency, which motivates the use of our intercept-level Guardrail agent.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces a multi-agent LLM framework for retrieving remote sensing data from cloud-based geospatial catalogues using natural language queries. It also provides a preliminary adversarial evaluation to assess the robustness of these agents against prompt-level manipulations.</p>
<p><strong>Core Idea:</strong> The system bridges the gap between natural language intent and complex geospatial API schemas by using a coordinated multi-agent architecture that balances safety, intent interpretation, and schema-aware execution.</p>
<p><strong>Technique:</strong> A modular three-agent architecture consisting of a Guardrail agent for policy enforcement, a General-QA agent for intent interpretation, and a Recommender-Analyst agent for generating structured API calls.</p>
<p><strong>Pipeline:</strong> Natural language query → Guardrail (Safety Check) → General-QA (Intent Interpretation) → Recommender-Analyst (API Call Generation) → Geospatial Data Retrieval</p>
<p><strong>Methodology:</strong> The authors designed a portable framework using API schema substitution and conducted preliminary adversarial multi-turn experiments to test system robustness and safety.</p>
<p><strong>Results:</strong> Prompt-level safety instructions improved system robustness, though the study identified rare high-impact failures in API manipulation scenarios.</p>
<p><strong>Limitations:</strong> Persistent high-impact failures in specific API manipulation scenarios highlight the need for more adaptive, system-level defenses that balance safety, usability, and cost.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.15077" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-16" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.15107">Towards Verifiable Agentic Data Science: Solving Irregular TSQA Via Tool-Grounded Reasoning</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Sanhorn Chen, Xiaoyang Chen, Boyu Liu, Roy Zhao
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.15107" target="_blank" rel="noopener noreferrer">2606.15107</a></p>
<p class="paper-detail"><strong>Authors:</strong> Sanhorn Chen, Xiaoyang Chen, Boyu Liu, Roy Zhao</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Time series data in real-world deployments is overwhelmingly irregular. Observations are asynchronous, missing values are informative rather than random, and sampling frequencies vary across sensors and operational windows. However, existing Time Series Question Answering (TSQA) benchmarks mostly assume regularly sampled inputs, leaving a fundamental gap in understanding how large language models (LLMs) and AI agents perform under irregular conditions. To bridge this gap, we introduce IRTS-ToolBench, a benchmark of 1,700 questions spanning 10 task types across 13 domains. IRTS-ToolBench is designed to be used independently by any researcher working on LLM-based irregular time series analysis, providing standardized inputs and a reproducible evaluation protocol. Code can be found in https://github.com/SanhornC/IRTS-ToolBench.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces IRTS-ToolBench, a comprehensive benchmark of 1,700 questions across 13 domains designed to evaluate LLM-based agents on irregular time series data.</p>
<p><strong>Core Idea:</strong> Existing TSQA benchmarks assume regular sampling, but real-world data is often asynchronous and irregular; this work addresses that gap by providing a standardized evaluation for irregular time series analysis.</p>
<p><strong>Technique:</strong> The authors utilize a tool-grounded reasoning framework to enable agents to handle complex, non-uniform temporal data through external tool interaction.</p>
<p><strong>Pipeline:</strong> Irregular time series data → Tool-grounded reasoning agent → Verifiable answers to TSQA questions</p>
<p><strong>Methodology:</strong> The researchers developed a benchmark spanning 10 task types and 13 domains, providing a reproducible protocol for testing how LLMs handle missing values and varying sampling frequencies.</p>
<p><strong>Results:</strong> The benchmark provides a standardized evaluation protocol and a dataset of 1,700 questions to measure the performance gap between regular and irregular TSQA tasks.</p>
<p><strong>Limitations:</strong> The abstract does not specify specific performance metrics or limitations, but the scope is currently focused on the benchmark's creation and the general problem of irregular sampling.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.15107" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    <a class="paper-action-btn gh-btn" href="https://github.com/SanhornC/IRTS-ToolBench" target="_blank" rel="noopener noreferrer" title="View code on GitHub" aria-label="View code on GitHub"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z" /></svg><span>Code</span></a>
  </div>
</div>

<div class="paper-item" data-date="2026-06-16" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.15231">Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Zhengbo Zhang, Changtao Miao, Jinbo Su, Zhaowen Zhou, Chunxia Zhang, Xukai Wang, Ruiqi Liu, Kaiyuan Zheng, Jiansheng Cai, Bo Zhang, Zhe Li, Shiming Xiang, Ying Yan
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.15231" target="_blank" rel="noopener noreferrer">2606.15231</a></p>
<p class="paper-detail"><strong>Authors:</strong> Zhengbo Zhang, Changtao Miao, Jinbo Su, Zhaowen Zhou, Chunxia Zhang, Xukai Wang, Ruiqi Liu, Kaiyuan Zheng, Jiansheng Cai, Bo Zhang, Zhe Li, Shiming Xiang, Ying Yan</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Multimodal large language models (MLLMs) have demonstrated impressive capabilities in many visual tasks, but they often struggle with factual grounding when confronted with complex, open-world scenarios. While recent multimodal deep search agents attempt to address this issue by utilizing external tools, the visual-native search paradigm remains underexplored. Existing methods primarily rely on simple images with explicit semantics and text-only evidence trajectories, limiting the agent's ability to perform multi-hop, cross-modal reasoning and search. To address these limitations, we propose Visual-Seeker, a visual-native multimodal deep search agent via active visual reasoning. Rather than treating vision as a static input, our agent actively attends to fine-grained visual details, dynamically harvests visual evidence throughout the search process. To unlock its visual-native potential, we design an active visual reasoning data pipeline and synthesize 5K high-quality multimodal trajectories for model training. Extensive experiments demonstrate the state-of-the-art performance across five challenging multimodal search benchmarks, even surpassing several proprietary models, validating robust visual-native reasoning and search in real-world web environments. The code and data can be accessed at: https://github.com/ZhengboZhang/Visual-Seeker.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces Visual-Seeker, a visual-native multimodal deep search agent that performs active visual reasoning to handle complex, open-world scenarios. It provides a new active visual reasoning data pipeline and a high-quality dataset of 5K multimodal trajectories.</p>
<p><strong>Core Idea:</strong> Instead of treating vision as a static input, the agent dynamically harvests fine-grained visual evidence throughout the search process to perform multi-hop, cross-modal reasoning.</p>
<p><strong>Technique:</strong> The method employs an active visual reasoning framework that allows the agent to attend to specific visual details and synthesize visual evidence trajectories rather than relying on text-only evidence.</p>
<p><strong>Pipeline:</strong> Complex multimodal queries → Active visual reasoning &amp; dynamic evidence harvesting → Multi-hop cross-modal search → Factually grounded answers</p>
<p><strong>Methodology:</strong> The authors developed a specialized data pipeline to synthesize 5K high-quality multimodal trajectories and trained the agent to perform visual-native search across diverse web environments.</p>
<p><strong>Results:</strong> Achieved state-of-the-art performance across five challenging multimodal search benchmarks, surpassing several proprietary models in real-world web environments.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the computational overhead of active visual reasoning or the scalability of the synthesized data pipeline to even more diverse web domains.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.15231" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    <a class="paper-action-btn gh-btn" href="https://github.com/ZhengboZhang/Visual-Seeker" target="_blank" rel="noopener noreferrer" title="View code on GitHub" aria-label="View code on GitHub"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z" /></svg><span>Code</span></a>
  </div>
</div>

<h4 id="computer-vision">Computer Vision</h4>

<div class="paper-item" data-date="2026-06-16" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.15038">Fusion is not one-size-fits-all: Cross-Modal Representation Alignment for Time-to-Event Modeling</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Zhemin Zhang, Weijie Chen, David Le, Amara Tariq, Alex Wallace, Matthew Stib, Juan Maria Farina, Chadi Ayoub, Reza Arsanjani, Imon Banerjee
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.15038" target="_blank" rel="noopener noreferrer">2606.15038</a></p>
<p class="paper-detail"><strong>Authors:</strong> Zhemin Zhang, Weijie Chen, David Le, Amara Tariq, Alex Wallace, Matthew Stib, Juan Maria Farina, Chadi Ayoub, Reza Arsanjani, Imon Banerjee</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Accurate time-to-event (TTE) prediction from multimodal clinical data remains challenging due to modality imbalance and distribution shift. We introduce a foundation model-driven framework for cross-modal representation alignment between CT imaging and longitudinal EHR data, designed to generalize across tasks and institutions. CT and EHR modalities are encoded independently using domain-specific foundation models and aligned in a shared latent space through four principled fusion strategies: late fusion, contrastive alignment, cross-attention, and co-attention. We evaluate two clinically distinct TTE tasks: pulmonary embolism (PE) mortality and cardiovascular disease (CVD) outcomes, on large-scale multi-institutional cohorts (PE: N=3,099 train; 1,098 internal; 435 external; CVD: N=2,951 train; 837 internal; 682 external). Fusion consistently improves concordance index by 1.5-5.4% over unimodal baselines when modalities contribute comparably. Overall, contrastive multimodal fusion, particularly with CLMBR representations, provided the most consistent and statistically robust improvements, especially for PE mortality prediction. For MACE, cross-attention (one-hot) achieved the highest internal performance and image-guided co-attention achieved the best external performance. We therefore introduce a generalizable foundation model-based cross-modal alignment framework and provide the first systematic analysis of fusion behavior under modality imbalance in TTE prediction. Our results establish task-aware multimodal alignment as a necessary design principle for robust generalization and scalable clinical deployment.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces a foundation model-driven framework for cross-modal representation alignment between CT imaging and longitudinal EHR data for time-to-event (TTE) modeling. It provides the first systematic analysis of how different fusion strategies behave under modality imbalance across diverse clinical tasks.</p>
<p><strong>Core Idea:</strong> Multimodal fusion is not one-size-fits-all; different clinical tasks (e.g., PE mortality vs. CVD outcomes) require specific alignment strategies to achieve robust generalization across institutions.</p>
<p><strong>Technique:</strong> The authors employ four principled fusion strategies—late fusion, contrastive alignment, cross-attention, and co-attention—to align independent domain-specific foundation model embeddings into a shared latent space.</p>
<p><strong>Pipeline:</strong> CT imaging and longitudinal EHR data → Domain-specific foundation model encoding → Cross-modal representation alignment (Late Fusion, Contrastive, Cross-Attention, or Co-Attention) → Time-to-Event (TTE) prediction</p>
<p><strong>Methodology:</strong> The framework was evaluated on two distinct clinical tasks (PE mortality and CVD outcomes) using large-scale multi-institutional cohorts, comparing unimodal baselines against various multimodal fusion strategies.</p>
<p><strong>Results:</strong> Fusion improved the concordance index by 1.5-5.4% over unimodal baselines. Contrastive multimodal fusion (with CLMBR) was most robust for PE mortality, while cross-attention and image-guided co-attention performed best for MACE.</p>
<p><strong>Limitations:</strong> The study highlights that fusion performance is highly dependent on modality contribution and task-specific characteristics, suggesting that a single universal fusion architecture may not be optimal for all clinical applications.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.15038" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="computing-systems">Computing Systems</h4>

<div class="paper-item" data-date="2026-06-16" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.15179">CONCORD: Asynchronous Sparse Aggregation for Device-Cloud RAG under Document Isolation</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Xuedong Hu, Zhiqing Tang, Zhi Yao, Tian Wang, Weijia Jia
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.15179" target="_blank" rel="noopener noreferrer">2606.15179</a></p>
<p class="paper-detail"><strong>Authors:</strong> Xuedong Hu, Zhiqing Tang, Zhi Yao, Tian Wang, Weijia Jia</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Retrieval-augmented generation (RAG) has emerged as a pivotal technique for improving language models by incorporating external knowledge at inference time. As device-cloud collaborative inference makes it feasible to deploy small language models on edge devices, a new setting arises in which private documents remain on the device and public knowledge resides in the cloud. Privacy and policy constraints often forbid raw document exchange, creating a document-isolated dual-end RAG setting. However, existing methods rely on frequent remote synchronization and dense evidence transfer, limiting throughput under realistic latency and bandwidth conditions. To address this issue, we propose CONCORD, an asynchronous sparse aggregation framework for dual-end RAG under document isolation. CONCORD treats the cloud as an asynchronously arriving evidence source rather than a continuously synchronized co-generator. Specifically, we introduce waiting debt control to decide whether each decoding step should continue waiting for remote participation based on the observed return of waiting. We also design a certificate-guided minimal supplementation mechanism that requests only the remote evidence needed to determine the current greedy decision. Steps that consult the cloud preserve the same greedy token as dense dual-end aggregation, while the remaining steps commit locally without remote evidence. Experiments on Natural Questions and WikiText-2 show that CONCORD improves end-to-end throughput over baselines by $1.66\times$ and $2.15\times$, respectively, while reducing per-token communication by over two orders of magnitude and maintaining comparable answer quality and perplexity.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces CONCORD, an asynchronous sparse aggregation framework designed for device-cloud collaborative RAG where private documents are isolated on the device and public knowledge is in the cloud.</p>
<p><strong>Core Idea:</strong> Instead of continuous synchronization, the cloud is treated as an asynchronous evidence source, allowing the device to proceed locally when remote evidence is not critical for the current decoding step.</p>
<p><strong>Technique:</strong> The framework employs waiting debt control to manage remote participation and a certificate-guided minimal supplementation mechanism to request only the specific evidence needed for greedy token decisions.</p>
<p><strong>Pipeline:</strong> Private device documents and public cloud knowledge → Asynchronous sparse aggregation with waiting debt control and certificate-guided supplementation → High-throughput RAG inference with document isolation.</p>
<p><strong>Methodology:</strong> CONCORD optimizes communication by deciding whether to wait for cloud evidence based on historical return rates and only requesting remote data when it is necessary to change the current greedy token selection.</p>
<p><strong>Results:</strong> Improved end-to-end throughput by 1.66x on Natural Questions and 2.15x on WikiText-2, while reducing per-token communication by over two orders of magnitude with comparable answer quality.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the performance impact of extreme network jitter or the scalability of the certificate-guided mechanism as the number of remote evidence sources increases.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.15179" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="general">General</h4>

<div class="paper-item" data-date="2026-06-16" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span><span class="cat-tag cat-default" title="cs.SI">cs.SI</span><span class="cat-tag cat-ml" title="Machine Learning (Statistics) (stat.ML)">Machine Learning (Statistics) (stat.ML)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14892">Relational Structural Causal Models</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Adiba Ejaz, Elias Bareinboim
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14892" target="_blank" rel="noopener noreferrer">2606.14892</a></p>
<p class="paper-detail"><strong>Authors:</strong> Adiba Ejaz, Elias Bareinboim</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">An artificial intelligence must have a model of its environment that is causal, supporting reasoning about interventions and counterfactuals, and also combinatorial, supporting generalization to unseen combinations of objects. In this work, we formally study when and how such a model can be learned. We develop relational structural causal models, extending structural causal models (Pearl 2009) to settings where objects and their relations vary. First, we show how answers to not only causal but also observational queries about unseen combinations of objects can not be identified without further assumptions. To enable such identification--including in the presence of unobserved confounding--we define relational causal graphs and derive symbolic identification criteria. Finally, we propose relational neural causal models, a provably correct approach that outperforms non-relational baselines on simulated traffic scenes with varying cars, signals, and pedestrians.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces Relational Structural Causal Models (RSCMs), a framework that extends structural causal models to environments with varying numbers and types of objects. It provides symbolic identification criteria for relational queries and a provably correct neural implementation.</p>
<p><strong>Core Idea:</strong> To achieve true AI reasoning, models must combine causal reasoning (interventions/counterfactuals) with combinatorial generalization (handling unseen combinations of objects).</p>
<p><strong>Technique:</strong> The authors develop relational causal graphs and derive symbolic identification criteria to determine when queries about unseen object combinations are identifiable, even with unobserved confounding.</p>
<p><strong>Pipeline:</strong> Relational environment data → Relational Causal Graph &amp; Symbolic Identification → Relational Neural Causal Model → Causal/Counterfactual reasoning on unseen object combinations.</p>
<p><strong>Methodology:</strong> The authors formally analyze the identifiability of relational queries, derive mathematical criteria for identification, and implement a neural architecture to test these models on simulated traffic scenes.</p>
<p><strong>Results:</strong> The proposed relational neural causal models are provably correct and outperform non-relational baselines on simulated traffic scenes involving varying cars, signals, and pedestrians.</p>
<p><strong>Limitations:</strong> The paper notes that answers to certain queries about unseen combinations cannot be identified without further assumptions, highlighting the boundaries of the current framework.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14892" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-16" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14997">AI Engram: In Search of Memory Traces in Artificial Intelligence</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Jea Kwon, Dong-Kyum Kim, Jiwon Kim, Yonghyun Kim, Woong Kook, Meeyoung Cha
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14997" target="_blank" rel="noopener noreferrer">2606.14997</a></p>
<p class="paper-detail"><strong>Authors:</strong> Jea Kwon, Dong-Kyum Kim, Jiwon Kim, Yonghyun Kim, Woong Kook, Meeyoung Cha</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Memory formation is fundamental to intelligence, yet whether deep neural networks preserve identifiable memory traces analogous to biological memory units remains an open question. This work introduces a geometric framework to identify such "AI engrams" by formalizing the neuroscientific criteria of specificity, reactivation, sufficiency, and necessity into a constrained inverse problem. We derive a closed-form estimator that isolates individual memory traces from globally entangled parameters, and show that this biologically-derived solution corresponds to a natural gradient update on the parameter manifold. AI engrams enable surgical manipulation of learned knowledge: any subset of memories can be composed or erased through linear arithmetic, without iterative optimization. Experiments ranging from simple MLPs to LLMs demonstrate the causal validity and substantial scalability of AI engrams. Together, these results bridge theories of biological memory and artificial representation learning and offer geometric insight into how deep networks simultaneously support functional specificity within distributed storage.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces a geometric framework to identify and isolate 'AI engrams'—specific memory traces within deep neural networks—bridging biological memory theories with artificial representation learning.</p>
<p><strong>Core Idea:</strong> The authors propose that memories in neural networks can be treated as distinct, manipulatable units that can be composed or erased through linear arithmetic rather than iterative optimization.</p>
<p><strong>Technique:</strong> The study formalizes neuroscientific criteria (specificity, reactivation, sufficiency, and necessity) into a constrained inverse problem to derive a closed-form estimator for memory traces.</p>
<p><strong>Pipeline:</strong> Neural network parameters → Geometric inverse problem formulation → Closed-form engram estimator → Surgical memory manipulation (composition/erasure)</p>
<p><strong>Methodology:</strong> The researchers derived a mathematical estimator that corresponds to a natural gradient update on the parameter manifold and validated it across various architectures.</p>
<p><strong>Results:</strong> The framework successfully isolated individual memories in MLPs and LLMs, demonstrating causal validity and scalability for surgical knowledge manipulation without retraining.</p>
<p><strong>Limitations:</strong> The paper leaves open the extent of engram stability over long-term continuous learning and the complexity of isolating highly overlapping, non-linear semantic concepts.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14997" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-16" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.15096">VGPT-RSI for RH-Adjacent Formal Progress: Boundary Certificates, Verified Finite Lagarias Inequalities, and Explicit Failure Localization</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Zhixin Hu, Tao Xu, Xiaodian Sun, Li Jin, Momiao Xiong
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.15096" target="_blank" rel="noopener noreferrer">2606.15096</a></p>
<p class="paper-detail"><strong>Authors:</strong> Zhixin Hu, Tao Xu, Xiaodian Sun, Li Jin, Momiao Xiong</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">The Riemann Hypothesis remains one of the central unsolved problems in mathematics. Rather than claiming proof, we investigate whether a verifiable AI-assisted reasoning system can produce reliable, formally checked partial progress while explicitly identifying the remaining mathematical obstructions. We apply the Verifiable Growing Physical Transformer with Recursive Self-Improvement (VGPT-RSI) to two RH-adjacent certification tasks. First, we construct and verify a finite RH-boundary certificate for inequality on a parameterized safe lower curve over a region. The numerical boundary curve is converted into a certificate-backed lower curve, audited using outward-rounded interval arithmetic and Arb/FLINT ball arithmetic, and then checked in Rocq/CoqInterval for the parameterized theorem. Second, we initiate a formal Lagarias-route certificate. Lagarias criterion states that RH is equivalent to the global inequality. We formalize the finite quantity and produce a Coq-checked finite certificate. The final system identifies the exact unresolved mathematical bottlenecks: formalizing the Lagarias equivalence, proving the global tail theorem beyond any finite cutoff, and potentially reducing counterexamples to colossally abundant or related extremal integers. These results demonstrate that VGPT-RSI can produce certified RH-adjacent formal progress, organize proof dependencies, and avoid overclaiming when the remaining obstruction is genuinely mathematical.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces VGPT-RSI to produce verifiable, formally checked partial progress on Riemann Hypothesis (RH) adjacent problems while explicitly identifying remaining mathematical obstructions. It demonstrates that AI-assisted systems can generate certified certificates for finite boundaries and Lagarias inequalities without overclaiming a full proof.</p>
<p><strong>Core Idea:</strong> Instead of attempting a full proof of the Riemann Hypothesis, the system focuses on generating verifiable certificates for finite regions and identifying the specific mathematical bottlenecks that prevent a complete proof.</p>
<p><strong>Technique:</strong> The system utilizes the Verifiable Growing Physical Transformer with Recursive Self-Improvement (VGPT-RSI) combined with interval arithmetic (Arb/FLINT) and formal verification in Rocq/CoqInterval.</p>
<p><strong>Pipeline:</strong> RH-adjacent mathematical problems → VGPT-RSI reasoning and certificate generation → Interval arithmetic auditing and formal verification in Coq → Certified partial progress and explicit failure localization.</p>
<p><strong>Methodology:</strong> The authors applied the VGPT-RSI system to two tasks: constructing a finite RH-boundary certificate for a parameterized safe lower curve and initiating a formal Lagarias-route certificate for finite quantities.</p>
<p><strong>Results:</strong> Successfully produced a Coq-checked finite certificate for a parameterized theorem and a finite Lagarias-route certificate; identified specific bottlenecks including the formalization of the Lagarias equivalence and the global tail theorem.</p>
<p><strong>Limitations:</strong> The system cannot yet prove the global tail theorem beyond any finite cutoff or formalize the full Lagarias equivalence, which remain genuine mathematical obstructions.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.15096" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-16" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.15273">Feature Attribution in Directed Acyclic Graphs Using Edge Intervention</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Qiheng Sun, Junxu Liu, Xiaokai Mao, Haocheng Xia, Jinfei Liu, Kui Ren, Haibo Hu
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.15273" target="_blank" rel="noopener noreferrer">2606.15273</a></p>
<p class="paper-detail"><strong>Authors:</strong> Qiheng Sun, Junxu Liu, Xiaokai Mao, Haocheng Xia, Jinfei Liu, Kui Ren, Haibo Hu</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Shapley value-based feature attribution methods face challenges in scenarios involving complex feature interactions and causal relationships, even when a causal structure is provided. Existing methods typically adopt a node-centric view, attributing importance solely to individual features. Consequently, they often fail to simultaneously capture the externality and exogenous influence of features, leading to unreasonable interpretations. To overcome these limitations, we propose a novel feature attribution method called DAG-SHAP, which is based on edge intervention. DAG-SHAP treats each feature edge as an individual attribution object, ensuring that both externality and exogenous contributions of features are appropriately captured. Additionally, we introduce an approximation method for efficiently computing DAG-SHAP. Extensive experiments on both real and synthetic datasets validate the effectiveness of DAG-SHAP. Our code is available at https://github.com/ZJU-DIVER/DAG-SHAP.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces DAG-SHAP, a novel feature attribution method that shifts the focus from node-centric to edge-centric attribution in Directed Acyclic Graphs (DAGs). It effectively captures both the externality and exogenous influences of features in complex causal structures.</p>
<p><strong>Core Idea:</strong> Instead of attributing importance to individual features (nodes), the method treats each feature edge as an individual attribution object to account for complex interactions.</p>
<p><strong>Technique:</strong> The method utilizes edge intervention within a Shapley value framework and introduces an approximation method to ensure computational efficiency.</p>
<p><strong>Pipeline:</strong> Causal DAG and model → Edge intervention-based Shapley value calculation → Feature edge importance scores</p>
<p><strong>Methodology:</strong> The authors define attribution based on the impact of intervening on specific edges in a DAG, then develop an approximation algorithm to handle the high dimensionality of edge-based calculations.</p>
<p><strong>Results:</strong> Extensive experiments on real and synthetic datasets demonstrate that DAG-SHAP provides more reasonable and accurate interpretations compared to traditional node-centric Shapley value methods.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the scalability limits of the approximation method on extremely large, dense graphs or its performance on non-DAG structures.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.15273" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    <a class="paper-action-btn gh-btn" href="https://github.com/ZJU-DIVER/DAG-SHAP" target="_blank" rel="noopener noreferrer" title="View code on GitHub" aria-label="View code on GitHub"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z" /></svg><span>Code</span></a>
  </div>
</div>

<h4 id="llm">LLM</h4>

<div class="paper-item" data-date="2026-06-16" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.15029">Metric Match: A Subset Selection Approach to Evaluating LLM Judge Reliability</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Alyssa Unell, Natalie Dullerud, Naomi Boneh, Meena Jagadeesan, Tatsu Hashimoto, Nigam Shah, Sanmi Koyejo
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.15029" target="_blank" rel="noopener noreferrer">2606.15029</a></p>
<p class="paper-detail"><strong>Authors:</strong> Alyssa Unell, Natalie Dullerud, Naomi Boneh, Meena Jagadeesan, Tatsu Hashimoto, Nigam Shah, Sanmi Koyejo</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">LLM judges are used to reduce the need for costly human labor in evaluating open-ended text generation. However, the reliability of these judges depends critically on their alignment with human raters -- a property that itself depends on costly human annotations. In this work, we develop a method (Metric Match) for estimating correlation-based reliability metrics of LLM judges from limited annotations. Metric Match selects a subset of samples for human annotation such that the subset matches the population reliability metric with respect to acquired synthetic labels. We empirically show that Metric Match achieves a win-rate of 0.838 against random subset selection across four different correlation metrics and 15 datasets, with an 18.7% decrease in average estimation error and reduces annotation needs by 32.5%. We provide a cost model and highlight a medical case study where our method saves $1,041.67 compared to random selection for expert annotation. Further, we shift our task from reliability estimation to reliability classification of whether a given judge is above a deployment threshold, outperforming random selection with Metric Match. All project code is publicly available, and we additionally provide an installable package for ease of use.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces Metric Match, a method to estimate the reliability of LLM judges using significantly fewer human annotations by selecting a representative subset of samples.</p>
<p><strong>Core Idea:</strong> Instead of random sampling, the authors propose selecting a subset of data that specifically matches the population's correlation metrics based on synthetic labels.</p>
<p><strong>Technique:</strong> The technique utilizes a subset selection strategy that optimizes for correlation-based reliability metrics, reducing the human annotation burden while maintaining estimation accuracy.</p>
<p><strong>Pipeline:</strong> Open-ended text generation samples → Synthetic label generation → Metric Match subset selection → Human annotation → Reliability estimation/classification</p>
<p><strong>Methodology:</strong> The researchers compared Metric Match against random subset selection across 15 datasets and four correlation metrics, including a cost model and a medical case study.</p>
<p><strong>Results:</strong> Achieved a 0.838 win-rate against random selection, an 18.7% decrease in average estimation error, a 32.5% reduction in annotation needs, and a cost saving of $1,041.67 in a medical case study.</p>
<p><strong>Limitations:</strong> The study focuses on correlation-based reliability and does not explicitly address non-linear alignment or specific biases inherent in the synthetic labels used for selection.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.15029" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-16" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.15199">CogGuard: Cognitive and Operational Profiling for Proactive Warning in Edge Intelligent Services</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Zhi Yao, Weihao Chen, Zhiqing Tang, Hanshuai Cui, Qianli Ma, Weijia Jia, Wei Zhao
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.15199" target="_blank" rel="noopener noreferrer">2606.15199</a></p>
<p class="paper-detail"><strong>Authors:</strong> Zhi Yao, Weihao Chen, Zhiqing Tang, Hanshuai Cui, Qianli Ma, Weijia Jia, Wei Zhao</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Proactive warning is an important capability for edge intelligent services, where the system predicts whether a subject will successfully complete an incoming task under strict latency and privacy constraints. Such prediction depends on both long-term static attributes and short-term dynamic states derived from historical interaction logs. Recent Large Language Models (LLMs) offer strong long-context reasoning for constructing structured profiles from these logs, but existing solutions face two challenges for edge deployment: (1) profiling methods are typically domain-specific and lack a reusable abstraction across service scenarios, and (2) fine-tuning alignment models on heterogeneous edge clusters incurs high synchronization overhead due to the variance in input sequence lengths. To address these challenges, we propose CogGuard, a proactive-warning framework for edge intelligent services. CogGuard decouples offline LLM-based profile construction from online Small Language Model (SLM)-based score prediction through a shared static-dynamic profile-to-score pipeline, and instantiates it in two representative scenarios: educational performance warning and operational task outcome warning. For efficient profile construction, we design scenario-specific profiling methods with prefix-aligned KV-cache reuse to reduce repeated encoding overhead. For edge-side model alignment, we propose a length-aware distributed fine-tuning strategy with contrastive regularization to mitigate workload imbalance on heterogeneous clusters. Experiments on education and operation datasets show that CogGuard reduces profile construction time by up to 48% and distributed fine-tuning time by 19%, while achieving MAEs of 13.4 and 5.9, respectively, on 100-point-scale warning tasks. In the largest educational setting, CogGuard reduces prediction error by 15.4% compared with the strongest baseline.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces CogGuard, a proactive-warning framework for edge intelligent services that decouples offline profile construction from online score prediction to handle long-context reasoning and heterogeneous edge constraints.</p>
<p><strong>Core Idea:</strong> CogGuard leverages Large Language Models (LLMs) for offline profile construction and Small Language Models (SLMs) for online prediction, utilizing a shared profile-to-score pipeline to balance reasoning depth with edge deployment efficiency.</p>
<p><strong>Technique:</strong> The framework employs prefix-aligned KV-cache reuse for efficient profile construction and a length-aware distributed fine-tuning strategy with contrastive regularization for model alignment on heterogeneous clusters.</p>
<p><strong>Pipeline:</strong> Historical interaction logs → LLM-based profile construction (offline) → Shared profile-to-score pipeline → SLM-based score prediction (online) → Proactive warning</p>
<p><strong>Methodology:</strong> The authors designed scenario-specific profiling methods for education and operation tasks, implementing a distributed fine-tuning approach to mitigate workload imbalance caused by varying input sequence lengths.</p>
<p><strong>Results:</strong> CogGuard reduced profile construction time by up to 48% and distributed fine-tuning time by 19%, achieving MAEs of 13.4 and 5.9 on 100-point-scale tasks and a 15.4% reduction in prediction error in large educational settings.</p>
<p><strong>Limitations:</strong> The study focuses on two specific representative scenarios (education and operation), leaving the generalizability of the profiling methods across other diverse domains as an open question.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.15199" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-16" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.15258">Mask-Proof: An LLM-based Automated Data Curation Pipeline on Mathematical Proofs</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Jierui Zhang, Siyuan Tan, Xinhang Li, Longzhuangzhi Lin, Dailin Li, Chengfeng Gu, Xinping Li, Yaxian Hao, Shengjia Liang, Yuxiang Ren, Wenhao Liu
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.15258" target="_blank" rel="noopener noreferrer">2606.15258</a></p>
<p class="paper-detail"><strong>Authors:</strong> Jierui Zhang, Siyuan Tan, Xinhang Li, Longzhuangzhi Lin, Dailin Li, Chengfeng Gu, Xinping Li, Yaxian Hao, Shengjia Liang, Yuxiang Ren, Wenhao Liu</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Large language models (LLMs) are increasingly capable of mathematical problem solving and can even assist with research-level proofs, yet we still lack a scalable and reproducible way to measure step-level reasoning in long proofs across diverse sources. This evaluation gap limits trustworthy AI assistance in proof-certified scientific progress. Existing evaluations often emphasize final answers or rely on costly expert grading, while end-to-end proof generation remains open-ended and hard to verify automatically. We introduce Mask-Proof, a pipeline that turns real proofs into automatically checkable masked-step tasks. It masks key formula steps, provides the necessary surrounding context, and evaluates model reconstructions with an LLM-based equivalence judge using repeated votes for stability. The resulting Mask-ProofBench contains 292 curated problems across diverse research areas. Experiments with 17 models show that reasoning-enhanced models outperform standard models by 12% to 27%. Our evaluator achieves 96.8% agreement with expert annotators, enabling faithful, reproducible, and comparable measurement of step-level mathematical reasoning. Benchmark, annotations, and code are available at https://github.com/weating/Mask-Proof.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces Mask-Proof, an automated pipeline and benchmark (Mask-ProofBench) designed to evaluate step-level reasoning in long mathematical proofs using LLM-based automated grading.</p>
<p><strong>Core Idea:</strong> By transforming real mathematical proofs into masked-step tasks, the authors create a scalable and reproducible way to measure intermediate reasoning rather than just final answers.</p>
<p><strong>Technique:</strong> The method utilizes an LLM-based equivalence judge with repeated voting to verify if a model's reconstructed formula is mathematically equivalent to the original masked step.</p>
<p><strong>Pipeline:</strong> Real mathematical proofs → Masking key formula steps with surrounding context → LLM reconstruction → LLM-based equivalence judging with repeated voting → Final reasoning score</p>
<p><strong>Methodology:</strong> The authors curated 292 problems across diverse research areas and evaluated 17 models, comparing standard models against reasoning-enhanced versions.</p>
<p><strong>Results:</strong> Reasoning-enhanced models outperformed standard models by 12% to 27%, and the automated evaluator achieved a 96.8% agreement rate with human experts.</p>
<p><strong>Limitations:</strong> The study focuses on step-level reconstruction rather than full end-to-end proof generation, and the reliability of the LLM judge may still vary on highly complex, novel logic.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.15258" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    <a class="paper-action-btn gh-btn" href="https://github.com/weating/Mask-Proof" target="_blank" rel="noopener noreferrer" title="View code on GitHub" aria-label="View code on GitHub"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z" /></svg><span>Code</span></a>
  </div>
</div>

<h4 id="nlp">NLP</h4>

<div class="paper-item" data-date="2026-06-16" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">16 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14941">Semantics-Enhanced Retrieval-Augmented Time Series Forecasting</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Shiqiao Zhou, Zipeng Wu, Holger Sch\"oner, Edouard Fouch\'e, IAG Wilson, Shuo Wang
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14941" target="_blank" rel="noopener noreferrer">2606.14941</a></p>
<p class="paper-detail"><strong>Authors:</strong> Shiqiao Zhou, Zipeng Wu, Holger Sch\"oner, Edouard Fouch\'e, IAG Wilson, Shuo Wang</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Time series forecasting models often benefit from historical patterns. Inspired by Retrieval-Augmented Generation (RAG), recent research explored retrieving relevant historical time series segments to enhance forecasting. However, relying solely on time series similarity is often insufficient for retrieval under non-stationarity. To address this, we propose a multimodal approach: a \textbf{S}emantics-\textbf{E}nhanced \textbf{R}etrieval-\textbf{A}ugmented Time Series \textbf{F}orecasting framework, SERAF. Unlike mainstream approaches that depend only on time series similarity, SERAF conducts dual retrieval over the time series and their self-generated textual descriptions. It retrieves two complementary sets of historical patterns and corresponding futures, which are selectively and jointly used to guide future predictions. Experiments across seven real-world datasets demonstrate the effectiveness of SERAF in bridging numerical and semantic views of time series compared with state-of-the-art baselines.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces SERAF, a multimodal framework that enhances time series forecasting by integrating both numerical similarity and semantic descriptions into a retrieval-augmented architecture.</p>
<p><strong>Core Idea:</strong> To overcome the limitations of relying solely on time series similarity in non-stationary environments, the model retrieves complementary historical patterns from both numerical and textual perspectives.</p>
<p><strong>Technique:</strong> The framework employs a dual-retrieval mechanism that retrieves historical segments based on time series similarity and self-generated textual descriptions to guide future predictions.</p>
<p><strong>Pipeline:</strong> Input time series → Generate textual descriptions → Dual retrieval (numerical similarity + semantic matching) → Jointly select and fuse retrieved patterns → Forecast future values</p>
<p><strong>Methodology:</strong> The methodology involves creating a multimodal retrieval system where historical data is indexed by both its raw signal and its semantic meaning, allowing for a more robust selection of relevant past contexts.</p>
<p><strong>Results:</strong> Experiments across seven real-world datasets demonstrate that SERAF outperforms state-of-the-art baselines by effectively bridging numerical and semantic views.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the computational overhead of generating textual descriptions or the specific criteria for the 'selective' joint use of retrieved sets.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14941" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h3 id="personal-interests">Personal Interests</h3>

<p class="section-desc">Papers discovered through your interest topics.</p>

<h4 id="multi-agent-systems">Multi-Agent Systems</h4>

<div class="paper-item" data-date="2026-06-15" data-relevance="3">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 3 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span><span class="rel-dot"></span></span><span class="rel-score">3/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Multiagent Systems (cs.MA)">Multiagent Systems (cs.MA)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.16710">Misinformation Propagation in Benign Multi-Agent Systems</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Jonas Becker, Jan Philip Wahle, Terry Ruas, Bela Gipp
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.16710" target="_blank" rel="noopener noreferrer">2606.16710</a></p>
<p class="paper-detail"><strong>Authors:</strong> Jonas Becker, Jan Philip Wahle, Terry Ruas, Bela Gipp</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Multi-agent systems, in which multiple large language model agents solve problems through turn-based interaction, are increasingly deployed in high-stakes settings such as medical diagnosis, legal analysis, and forensic decision-making. Their reliability can be at risk when single agents reason from incorrect or misleading context, e.g., from tool calls, since errors may propagate through agent interactions. This work studies this risk by injecting intent-based misinformation into benign single-agent and multi-agent systems across reasoning, knowledge, and alignment tasks. We find that misinformation can degrade single-agent performance and persists across multi-agent debate, with agents often retaining answers introduced by misinformed peers. Nevertheless, multi-agent debate reduces the resulting performance degradation compared to single-agent prompting, especially when most agents are not exposed to misinformation. Robustness depends on group composition and decision protocol. Consensus can be more stable than voting under peer pressure, while majorities can often steer misinformed agents back toward correct answers. Our results show that misinformation robustness in multi-agent systems depends on the underlying model and also on how agents exchange information and aggregate decisions.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper investigates how intent-based misinformation propagates within multi-agent systems and identifies how group composition and decision protocols influence robustness.</p>
<p><strong>Core Idea:</strong> While misinformation can persist across multi-agent interactions, multi-agent debate can mitigate performance degradation compared to single-agent systems, particularly when a majority of agents remain uncorrupted.</p>
<p><strong>Technique:</strong> The researchers injected intent-based misinformation into reasoning, knowledge, and alignment tasks to observe error propagation across different agent interaction models.</p>
<p><strong>Pipeline:</strong> Misinformed context input → Multi-agent interaction/debate → Aggregated decision output</p>
<p><strong>Methodology:</strong> The study compares single-agent performance against multi-agent systems using various decision protocols (consensus vs. voting) and group compositions under injected misinformation.</p>
<p><strong>Results:</strong> Multi-agent debate reduces performance degradation compared to single-agent prompting; consensus protocols offer more stability than voting under peer pressure; and majorities can successfully steer misinformed agents toward correct answers.</p>
<p><strong>Limitations:</strong> Robustness is highly dependent on the specific underlying model and the specific method of information exchange and aggregation.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.16710" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h2 id="tech-news">Tech News</h2>

<h3 id="ai-safety-1">AI Safety</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Tue, 16 Ju</span>
  </div>
  <a class="news-title" href="https://www.economist.com/by-invitation/2026/06/15/humanity-isnt-ready-for-the-coming-intelligence-explosion" target="_blank" rel="noopener noreferrer">Humanity isn&#x27;t ready for the coming intelligence explosion</a>
  <p class="news-summary">The article discusses the existential risks and societal unpreparedness regarding a potential &#x27;intelligence explosion&#x27; driven by rapid AI advancement. It explores the gap between technological capabilities and our current regulatory, ethical, and cognitive frameworks. The piece emphasizes the need for proactive safety measures before AGI reaches a point of no return.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">AI Safety</span><span class="news-tag">AGI</span><span class="news-tag">Existential Risk</span><span class="news-tag">Ethics</span><span class="news-tag">Intelligence Explosion</span></div>
    <a class="news-read-btn" href="https://www.economist.com/by-invitation/2026/06/15/humanity-isnt-ready-for-the-coming-intelligence-explosion" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-16</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1u777te/ai_billionaires_want_to_control_every_aspect_of/" target="_blank" rel="noopener noreferrer">AI Billionaires Want to Control EVERY Aspect of Your Life | Aaron Bastani Meets Karen Hao</a>
  <p class="news-summary">The post discusses a conversation between Aaron Bastani and Karen Hao regarding the concentration of power among AI billionaires. It explores the societal implications and potential risks of tech giants exerting excessive control over various aspects of human life through AI integration.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">AI Ethics</span><span class="news-tag">Tech Policy</span><span class="news-tag">Data Privacy</span><span class="news-tag">AI Governance</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1u777te/ai_billionaires_want_to_control_every_aspect_of/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="computing-systems-1">Computing Systems</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Tue, 16 Ju</span>
  </div>
  <a class="news-title" href="https://devblogs.microsoft.com/oldnewthing/20260615-00/?p=112419" target="_blank" rel="noopener noreferrer">The time the x86 emulator team found code so bad they fixed it during emulation</a>
  <p class="news-summary">The x86 emulator team encountered legacy code so poorly written that they chose to fix the original source code during the emulation process. This highlights extreme technical debt and the complexities of maintaining hardware-level software compatibility.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Emulation</span><span class="news-tag">Software Engineering</span><span class="news-tag">Technical Debt</span><span class="news-tag">x86 Architecture</span><span class="news-tag">Systems Programming</span></div>
    <a class="news-read-btn" href="https://devblogs.microsoft.com/oldnewthing/20260615-00/?p=112419" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Tue, 16 Ju</span>
  </div>
  <a class="news-title" href="https://twitter.com/ID_AA_Carmack/status/2064095424420487226" target="_blank" rel="noopener noreferrer">John Carmack on Fabrice Bellard</a>
  <p class="news-summary">John Carmack, the legendary programmer behind Doom and id Software, shared his admiration for Fabrice Bellard, the creator of FFmpeg and QEMU. The discussion highlights the profound impact of high-performance systems programming and efficient software engineering on the broader technology landscape.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Software Engineering</span><span class="news-tag">Systems Programming</span><span class="news-tag">High Performance Computing</span><span class="news-tag">Programming Culture</span></div>
    <a class="news-read-btn" href="https://twitter.com/ID_AA_Carmack/status/2064095424420487226" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Tue, 16 Ju</span>
  </div>
  <a class="news-title" href="https://www.narracomm.com/amazon-announces-multibillion-dollar-data-center-in-missouri/" target="_blank" rel="noopener noreferrer">Amazon Announces Multibillion-Dollar Data Center in Missouri</a>
  <p class="news-summary">Amazon has announced a multibillion-dollar investment to construct a new data center in Missouri. This infrastructure expansion is designed to bolster cloud computing capabilities and support the growing demand for high-performance computing and AI workloads.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Infrastructure</span><span class="news-tag">Cloud Computing</span><span class="news-tag">Data Centers</span><span class="news-tag">AWS</span><span class="news-tag">Hardware</span></div>
    <a class="news-read-btn" href="https://www.narracomm.com/amazon-announces-multibillion-dollar-data-center-in-missouri/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-16</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1u73c5r/quicktok_a_faster_tokenizer_exact_and/" target="_blank" rel="noopener noreferrer">quicktok: a faster tokenizer (exact and byte-identical with tiktoken) [P]</a>
  <p class="news-summary">A new C++ tokenizer called &#x27;quicktok&#x27; has been released, offering byte-identical results to OpenAI&#x27;s tiktoken while significantly improving performance. By utilizing data structure engineering like 2-byte tries and hand-compiled pretokenizers, it achieves speeds up to 11x faster than the original tiktoken library. It supports major models including Llama-3, Qwen2.5, and GPT-4o (o200k).</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Tokenization</span><span class="news-tag">LLM Infrastructure</span><span class="news-tag">C++</span><span class="news-tag">Performance Optimization</span><span class="news-tag">NLP</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1u73c5r/quicktok_a_faster_tokenizer_exact_and/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="general-1">General</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/MachineLearning</span>
    <span class="news-date">2026-06-15</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/MachineLearning/comments/1u6x8al/how_the_brains_learn_r/" target="_blank" rel="noopener noreferrer">How the brains learn [R]</a>
  <p class="news-summary">Researchers have proposed a framework for neocortex learning based on error-driven predictive learning via temporal derivatives and competitive kinase synaptic plasticity. This model, implemented in the Axon neural simulation framework, aims to provide a biologically plausible alternative to backpropagation that could significantly improve training efficiency.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Neuroscience</span><span class="news-tag">Neural Networks</span><span class="news-tag">Biologically Plausible AI</span><span class="news-tag">Axon Framework</span><span class="news-tag">Learning Algorithms</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/MachineLearning/comments/1u6x8al/how_the_brains_learn_r/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="llm-1">LLM</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--hn">Hacker News</span>
    <span class="news-date">Tue, 16 Ju</span>
  </div>
  <a class="news-title" href="http://ishmeetbindra.com/posts/reviews-have-become-expensive-rewrites-have-become-cheap/" target="_blank" rel="noopener noreferrer">Reviews have become expensive, rewrites have become cheap</a>
  <p class="news-summary">The author argues that the rise of LLMs has shifted the software development paradigm by making code generation nearly instantaneous. Consequently, the cost of producing a rewrite is now negligible, shifting the human bottleneck from writing code to the high-cost cognitive effort of reviewing and verifying it.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">LLM</span><span class="news-tag">Software Engineering</span><span class="news-tag">Productivity</span><span class="news-tag">AI Impact</span></div>
    <a class="news-read-btn" href="http://ishmeetbindra.com/posts/reviews-have-become-expensive-rewrites-have-become-cheap/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h2 id="github-trending">
  <svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="20" height="20" style="vertical-align:middle;margin-right:6px"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z" /></svg>
  GitHub Trending
</h2>

<p class="section-desc">Trending repositories on GitHub filtered and scored for relevance to your interests.</p>

<h3 id="agentic-ai-1">Agentic AI</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/trycua/cua" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">trycua</span><span class="gh-sep">/</span><strong class="gh-repo">cua</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides the core infrastructure for Computer-Use Agents, enabling AI to interact with full desktop environments across multiple operating systems. It is highly relevant as it provides the sandboxing, SDKs, and benchmarks necessary for developing and evaluating autonomous agents in human-computer interaction scenarios.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Computer-Use</span><span class="gh-tag">Agentic AI</span><span class="gh-tag">Human-Computer Interaction</span><span class="gh-tag">Sandboxing</span><span class="gh-tag">Multimodal</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-16</span>
      <a class="gh-visit-btn" href="https://github.com/trycua/cua" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/shuvonsec/claude-bug-bounty" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">shuvonsec</span><span class="gh-sep">/</span><strong class="gh-repo">claude-bug-bounty</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository implements an autonomous agentic system for cybersecurity, specifically for bug bounty hunting. It leverages Claude Code to perform reconnaissance, vulnerability scanning, and report generation, aligning with interests in Agentic AI and LLM-driven automation.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">LLM</span><span class="gh-tag">Cybersecurity</span><span class="gh-tag">Automation</span><span class="gh-tag">Claude Code</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-16</span>
      <a class="gh-visit-btn" href="https://github.com/shuvonsec/claude-bug-bounty" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/OpenBB-finance/OpenBB" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">OpenBB-finance</span><span class="gh-sep">/</span><strong class="gh-repo">OpenBB</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 3/5">★★★<span class="gh-relevance-empty">★★</span> <span class="gh-rel-num">3/5</span></span>
    </div>
  </div>
  <p class="gh-summary">OpenBB is a comprehensive financial data platform that provides structured data and tools for analysts and quantitative researchers. It is highly relevant for Agentic AI as it serves as a foundational data layer for building autonomous financial agents and LLM-powered trading systems.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">financial data</span><span class="gh-tag">quantitative analysis</span><span class="gh-tag">agentic AI</span><span class="gh-tag">data platform</span><span class="gh-tag">python</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-16</span>
      <a class="gh-visit-btn" href="https://github.com/OpenBB-finance/OpenBB" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="llm-2">LLM</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/cheahjs/free-llm-api-resources" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">cheahjs</span><span class="gh-sep">/</span><strong class="gh-repo">free-llm-api-resources</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">LLM</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a curated list of free LLM inference APIs, which is essential for developers building agentic systems and RAG applications. It serves as a foundational resource for accessing the large language models that power the user&#x27;s interests in multi-agent systems and chatbots.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">LLM</span><span class="gh-tag">API</span><span class="gh-tag">Foundation Models</span><span class="gh-tag">Agents</span><span class="gh-tag">Inference</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-16</span>
      <a class="gh-visit-btn" href="https://github.com/cheahjs/free-llm-api-resources" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="mlops">MLOps</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/pathwaycom/llm-app" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">pathwaycom</span><span class="gh-sep">/</span><strong class="gh-repo">llm-app</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">MLOps</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides production-ready cloud templates for RAG and AI pipelines, focusing on synchronizing live data from various enterprise sources. It is highly relevant for MLOps and Agentic AI as it addresses the infrastructure challenges of maintaining real-time data for LLM applications.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">RAG</span><span class="gh-tag">MLOps</span><span class="gh-tag">LLM</span><span class="gh-tag">Data Pipelines</span><span class="gh-tag">Enterprise AI</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-16</span>
      <a class="gh-visit-btn" href="https://github.com/pathwaycom/llm-app" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/mikeroyal/Self-Hosting-Guide" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">mikeroyal</span><span class="gh-sep">/</span><strong class="gh-repo">Self-Hosting-Guide</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">MLOps</span>
      <span class="gh-relevance" title="Relevance 3/5">★★★<span class="gh-relevance-empty">★★</span> <span class="gh-rel-num">3/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a comprehensive guide for self-hosting software, including infrastructure for hosting LLMs and private web servers. It is relevant for MLOps and infrastructure setup, specifically for users looking to deploy models on-premises or in private clouds.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">self-hosting</span><span class="gh-tag">infrastructure</span><span class="gh-tag">LLM deployment</span><span class="gh-tag">MLOps</span><span class="gh-tag">private cloud</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-16</span>
      <a class="gh-visit-btn" href="https://github.com/mikeroyal/Self-Hosting-Guide" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>]]></content><author><name>hiimmuc</name></author><summary type="html"><![CDATA[Today's digest highlights a shift toward the operational reliability of autonomous agents, focusing on safety benchmarks, multi-agent trust dynamics, and the structural integrity of agentic workflows.]]></summary></entry><entry><title type="html">Daily Digest 2026-06-15</title><link href="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-15/" rel="alternate" type="text/html" title="Daily Digest 2026-06-15" /><published>2026-06-15T00:00:00+07:00</published><updated>2026-06-15T00:00:00+07:00</updated><id>https://hiimmuc.github.io/Personal-AI-Digest/digest/daily</id><content type="html" xml:base="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-15/"><![CDATA[<div class="digest-theme">
  <svg class="digest-theme-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 1.5a6.5 6.5 0 1 0 0 13 6.5 6.5 0 0 0 0-13zM0 8a8 8 0 1 1 16 0A8 8 0 0 1 0 8z" /><path d="M6.5 7.75A.75.75 0 0 1 7.25 7h1a.75.75 0 0 1 .75.75v2.75h.25a.75.75 0 0 1 0 1.5h-2a.75.75 0 0 1 0-1.5h.25v-2h-.25a.75.75 0 0 1-.75-.75zM8 6a1 1 0 1 1 0-2 1 1 0 0 1 0 2z" /></svg>
  <span>Today's research focuses on the transition from isolated chatbots to persistent, autonomous agents, with a heavy emphasis on robust orchestration, safety protocols, and verifiable memory systems.</span>
</div>

<h2 id="global-trends">Global Trends</h2>

<h3 id="arxiv-subjects">Papers discovered from ArXiv subject categories</h3>

<h4 id="ai-safety">AI Safety</h4>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span><span class="cat-tag cat-ai" title="Multiagent Systems (cs.MA)">Multiagent Systems (cs.MA)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.13715">WorkBench Revisited: Workplace Agents Two Years On</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Olly Styles
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.13715" target="_blank" rel="noopener noreferrer">2606.13715</a></p>
<p class="paper-detail"><strong>Authors:</strong> Olly Styles</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">The best agent on WorkBench in March 2024, GPT-4, completed 43% of tasks and took an unintended harmful action, such as emailing the wrong person, on 26% of them. We re-visit the benchmark in June 2026 and find that the best agent to date, Claude Opus 4.8, completes 89% and takes an unintended harmful action on 2.5%. Aside from this considerable progress in frontier agent performance, three things stand out. First, capability and safety go together on WorkBench rather than trade off, so the models that finish the most tasks also do the least unintended damage. Second, while several classes of error have been totally eliminated, frontier models still make some basic mistakes that occasionally result in irreversible harm, such as sending an email to the wrong person. Third, the rise of open-weight models has drastically lowered costs for a performance level that was previously only accessible to proprietary models, while frontier costs have stayed relatively stable. We release an updated version of the benchmark with data and code quality improvements, new model scores, and analysis of agent progress on WorkBench since 2024.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper provides a longitudinal evaluation of agentic performance on the WorkBench benchmark, demonstrating significant improvements in both task completion and safety over a two-year period.</p>
<p><strong>Core Idea:</strong> The study explores the evolution of frontier models, finding that capability and safety are positively correlated rather than in trade-off, while highlighting the democratization of high-performance agents through open-weight models.</p>
<p><strong>Technique:</strong> The authors updated the WorkBench benchmark with improved data and code quality to re-evaluate frontier models like Claude Opus 4.8 against previous benchmarks.</p>
<p><strong>Pipeline:</strong> WorkBench tasks → Frontier model execution → Performance and safety evaluation → Comparative analysis of progress and costs.</p>
<p><strong>Methodology:</strong> A comparative longitudinal study measuring task completion rates and the frequency of unintended harmful actions across different model generations from 2024 to 2026.</p>
<p><strong>Results:</strong> Task completion improved from 43% (GPT-4) to 89% (Claude Opus 4.8), while unintended harmful actions dropped from 26% to 2.5%.</p>
<p><strong>Limitations:</strong> Despite progress, frontier models still make basic mistakes that can result in irreversible harm, such as sending emails to the wrong recipients.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.13715" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.13884">Capability Minimization as a Safety Primitive: Risk-Aware Causal Gating for Least-Privilege LLM Agents</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Laxmipriya Ganesh Iyer, Rahul Suresh Babu
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.13884" target="_blank" rel="noopener noreferrer">2606.13884</a></p>
<p class="paper-detail"><strong>Authors:</strong> Laxmipriya Ganesh Iyer, Rahul Suresh Babu</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Modern decision systems increasingly rely on learned components whose outputs may be confident yet wrong, exposing downstream actions to costly errors. We introduce Risk-Aware Causal Gating (RACG), a framework that decides whether to act on, defer, or abstain from a model's prediction by combining causal effect estimation with calibrated risk control. RACG models the causal pathway from candidate actions to outcomes and gates each decision according to an estimated counterfactual risk rather than raw predictive confidence. To make gating reliable, we derive distribution-free bounds on the probability of acting under high-risk conditions and show how these bounds translate into operating thresholds that satisfy user-specified safety constraints. We further propose an adaptive gating policy that adjusts to distribution shift by monitoring discrepancies between predicted and realized outcomes, tightening the gate when causal assumptions appear violated. Across simulated interventions and real-world decision benchmarks, RACG reduces high-cost errors substantially while preserving most of the utility of an ungated policy, and it outperforms confidence-based and selective-prediction baselines at matched abstention rates. Our results indicate that explicitly separating causal risk from predictive uncertainty yields decision systems that are both safer and more transparent, offering a principled mechanism for trustworthy automation in high-stakes settings.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces Risk-Aware Causal Gating (RACG), a framework that prioritizes safety by gating model actions based on counterfactual risk rather than raw predictive confidence. It provides a principled mechanism for least-privilege LLM agents by minimizing the capabilities of a model to perform high-risk actions.</p>
<p><strong>Core Idea:</strong> The core idea is to separate predictive uncertainty from causal risk, ensuring that a system abstains from an action if the potential negative outcome is high, even if the model is confident in its prediction.</p>
<p><strong>Technique:</strong> The technique combines causal effect estimation with distribution-free risk bounds to create a gating mechanism that adjusts to distribution shifts by monitoring discrepancies between predicted and realized outcomes.</p>
<p><strong>Pipeline:</strong> Model Prediction → Causal Pathway Modeling → Counterfactual Risk Estimation → Distribution-Free Bound Calculation → Adaptive Gating Policy → Action (Act, Defer, or Abstain)</p>
<p><strong>Methodology:</strong> The authors derive mathematical bounds on the probability of high-risk actions and implement an adaptive policy that tightens safety thresholds when causal assumptions are violated.</p>
<p><strong>Results:</strong> RACG substantially reduces high-cost errors while preserving most utility, outperforming confidence-based and selective-prediction baselines at matched abstention rates across simulated and real-world benchmarks.</p>
<p><strong>Limitations:</strong> The effectiveness of the framework depends on the accuracy of the underlying causal pathway modeling and the availability of data to monitor realized outcomes for adaptive gating.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.13884" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.13873">Natively Unlearnable Large Language Models</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Gaurav R. Ghosal, Pratyush Maini, Aditi Raghunathan
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.13873" target="_blank" rel="noopener noreferrer">2606.13873</a></p>
<p class="paper-detail"><strong>Authors:</strong> Gaurav R. Ghosal, Pratyush Maini, Aditi Raghunathan</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Unlearning aims to remove the influence of specific training data sources, but this has proved challenging because the contributions of different sources are entangled within the model. Isolating source contributions to disjoint parameters makes removal easier, though it obstructs joint learning across sources. We propose NULLs (Natively Unlearnable LLMs), a model class that satisfies the two opposing goals of isolating source-specific contributions and learning jointly across sources, by training a set of shared backbone neurons alongside a pool of sparsely activated sinks. During training, information specific to a source naturally concentrates in its sinks while information shared across sources accumulates in the backbone. A source is then unlearned at deployment by disabling its corresponding sinks, with no gradient updates and no access to the retained data. We show that NULLs scales to Wikipedia's ~6M articles, isolating each as an independent source. Unlearning a single article removes knowledge specific to it while preserving facts shared with semantically related articles, closely matching retraining from scratch. We note that unlearning with NULLs is also robust: in a case study of unlearning the Harry Potter books, NULLs resists both adversarial extraction and relearning that reverses post-hoc unlearning. Finally, NULLs preserves general language capabilities, matching a standard transformer on downstream benchmarks. Together, these results suggest that source-level unlearning need not be an afterthought. It can be built natively into LLM training while retaining the benefits of shared representation learning.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces NULLs (Natively Unlearnable LLMs), a model architecture that enables source-specific unlearning without sacrificing the benefits of joint representation learning. It demonstrates that unlearning can be built natively into the training process rather than as a post-hoc correction.</p>
<p><strong>Core Idea:</strong> The authors propose a dual-component architecture where shared information is stored in a backbone while source-specific information is isolated in sparsely activated sinks.</p>
<p><strong>Technique:</strong> The model utilizes a set of shared backbone neurons and a pool of sparsely activated sinks, allowing for the removal of specific data sources by simply disabling their corresponding sinks at deployment.</p>
<p><strong>Pipeline:</strong> Training data (shared and source-specific) → Dual-path training (backbone for shared info, sinks for source-specific info) → Deployment with sink-disabling for unlearning → Output (unlearned model with preserved general capabilities).</p>
<p><strong>Methodology:</strong> The researchers trained a model on Wikipedia's ~6M articles, treating each as an independent source, and evaluated the effectiveness of unlearning through knowledge removal, adversarial extraction, and relearning tests.</p>
<p><strong>Results:</strong> NULLs successfully removed knowledge specific to individual articles while preserving shared facts, matched the performance of retraining from scratch, resisted adversarial extraction, and maintained general language capabilities on downstream benchmarks.</p>
<p><strong>Limitations:</strong> The paper does not extensively explore the computational overhead of maintaining the sink pool or the potential for information leakage between sinks during the joint learning phase.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.13873" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="agentic-ai">Agentic AI</h4>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span><span class="cat-tag cat-cv" title="Computer Vision and Pattern Recognition (cs.CV)">Computer Vision and Pattern Recognition (cs.CV)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.13707">Orchestra-o1: Omnimodal Agent Orchestration</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Fan Zhang, Vireo Zhang, Shengju Qian, Haoxuan Li, Hao Wu, Jinyang Wu, Donghao Zhou, Zhihong Zhu, Zheng Lian, Xin Wang, Pheng-Ann Heng
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.13707" target="_blank" rel="noopener noreferrer">2606.13707</a></p>
<p class="paper-detail"><strong>Authors:</strong> Fan Zhang, Vireo Zhang, Shengju Qian, Haoxuan Li, Hao Wu, Jinyang Wu, Donghao Zhou, Zhihong Zhu, Zheng Lian, Xin Wang, Pheng-Ann Heng</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">The recent success of agent swarms has shifted the paradigm of large language model (LLM)-based agents from single-agent workflows to multi-agent systems, highlighting the importance of agent orchestration for task decomposition and collaboration. However, existing orchestration frameworks are limited to a narrow set of modalities and struggle to generalize to more complex settings where heterogeneous modalities coexist and interact. This limitation becomes particularly pronounced in omnimodal scenarios, where tasks require the unified understanding and coordination of diverse inputs such as text, image, audio, and video. In this work, we propose Orchestra-o1, an omnimodal agent orchestration framework designed to support efficient agent collaboration across multiple modalities. Orchestra-o1 introduces a unified orchestration mechanism that enables modality-aware task decomposition, online sub-agent specialization, and parallel sub-task execution. This scalable design allows agent systems to effectively tackle complex real-world tasks involving heterogeneous information sources, surpassing the second-best approach by 10.3% accuracy on the OmniGAIA benchmark. Furthermore, we introduce decision-aligned group relative policy optimization (DA-GRPO), an efficient agentic reinforcement learning approach for training Orchestra-o1-8B, which also achieves state-of-the-art performance against all existing open-source omnimodal agents.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces Orchestra-o1, a scalable omnimodal agent orchestration framework that enables efficient collaboration across heterogeneous modalities like text, image, audio, and video. It also proposes DA-GRPO, a decision-aligned reinforcement learning approach for training omnimodal agents.</p>
<p><strong>Core Idea:</strong> The core idea is to move beyond single-modality agent swarms by creating a unified orchestration mechanism that can decompose and execute tasks involving diverse, interacting information sources.</p>
<p><strong>Technique:</strong> The framework utilizes modality-aware task decomposition, online sub-agent specialization, and parallel sub-task execution, optimized via Decision-Aligned Group Relative Policy Optimization (DA-GRPO).</p>
<p><strong>Pipeline:</strong> Omnimodal inputs (text, image, audio, video) → Modality-aware task decomposition → Online sub-agent specialization → Parallel sub-task execution → Unified final output</p>
<p><strong>Methodology:</strong> The authors developed a scalable orchestration design to handle heterogeneous data and trained a 8B parameter model using a novel reinforcement learning objective that aligns group policies with decision-making.</p>
<p><strong>Results:</strong> Orchestra-o1 surpassed the second-best approach by 10.3% accuracy on the OmniGAIA benchmark and achieved state-of-the-art performance against all existing open-source omnimodal agents.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail specific limitations, but the scope is currently focused on the OmniGAIA benchmark and the scalability of the 8B parameter model.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.13707" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.13710">Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Hongming Piao, Chi Liu, Mengzhuo Chen, Yan Shu, Derek Li, Ying Wei, Bryan Dai
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.13710" target="_blank" rel="noopener noreferrer">2606.13710</a></p>
<p class="paper-detail"><strong>Authors:</strong> Hongming Piao, Chi Liu, Mengzhuo Chen, Yan Shu, Derek Li, Ying Wei, Bryan Dai</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Deep research and agent evolution serve as de-facto tasks for AI agents in real-world applications toward artificial general intelligence. The former enables autonomous retrieval and integration of information in open-ended environments to tackle open-ended research tasks, yet it is constrained by the static parametric deep research capabilities of agent systems. The latter allows agents to autonomously interact with the environment to gain experiences that evolve model capabilities. However, its effectiveness has been widely validated only on verifiable tasks with standard answers, leaving a gap with open-ended research tasks. To bridge these two critical tasks, we propose the Hybrid Open-Ended Tri-Evolution (HOTE) framework, which leverages hybrid-mode reinforcement learning to facilitate the collaborative evolution of a proposer, solver and judge based on web-scale knowledge, moving toward autonomous evolving agents in open-ended tasks and environments. Extensive experiments on three long-form deep research benchmarks demonstrate that the 8B model trained via HOTE surpasses the strongest static open 8-32B models as well as those trained by state-of-the-art deep research training methods with less time overhead, and further verify that the evolution of all three modules in HOTE is indispensable.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces the Hybrid Open-Ended Tri-Evolution (HOTE) framework, which enables AI agents to autonomously evolve their capabilities for open-ended research tasks by bridging deep research and agent evolution.</p>
<p><strong>Core Idea:</strong> The core idea is to move beyond static parametric capabilities by creating a collaborative evolutionary loop where three distinct agent roles (proposer, solver, and judge) improve simultaneously based on web-scale knowledge.</p>
<p><strong>Technique:</strong> The framework utilizes hybrid-mode reinforcement learning to facilitate the co-evolution of a proposer, solver, and judge in an open-ended environment.</p>
<p><strong>Pipeline:</strong> Open-ended research task → Hybrid-mode RL evolution of Proposer, Solver, and Judge modules → Enhanced deep research capabilities</p>
<p><strong>Methodology:</strong> HOTE employs a tri-evolutionary approach where a proposer generates tasks, a solver attempts to research them, and a judge evaluates the results, with all three modules evolving concurrently through reinforcement learning.</p>
<p><strong>Results:</strong> An 8B model trained via HOTE outperformed the strongest static open 8-32B models and state-of-the-art deep research training methods with lower time overhead across three long-form benchmarks.</p>
<p><strong>Limitations:</strong> The paper focuses on the evolution of these three specific modules, leaving open questions regarding the scalability of the tri-evolutionary framework to even more complex multi-agent ecosystems.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.13710" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.13949">Minim: Privacy-Aware Minimal View for Agents via Trusted Local Sanitization</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Hexuan Yu, Chaoyu Zhang, Heng Jin, Shanghao Shi, Ning Zhang, Y. Thomas Hou, Wenjing Lou
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.13949" target="_blank" rel="noopener noreferrer">2606.13949</a></p>
<p class="paper-detail"><strong>Authors:</strong> Hexuan Yu, Chaoyu Zhang, Heng Jin, Shanghao Shi, Ning Zhang, Y. Thomas Hou, Wenjing Lou</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Modern LLM-powered autonomous agents increasingly rely on rich user interface (UI) state observations to achieve reliable action grounding in complex digital environments. However, many deployments transmit the full UI state to remote inference servers even when most elements are irrelevant to the current task, which can leak sensitive but unnecessary context such as authentication codes, private notifications, and background application states. We propose MINIM, a trusted local broker that performs privacy-aware minimization on the client side before any observation leaves the device. Grounded in Contextual Integrity (CI), MINIM learns a dual-score representation for each UI element by predicting an inherent sensitivity score (s) and a task-conditioned necessity score (n). These scores drive a ternary disclosure policy that keeps essential elements, abstracts sensitive attributes when needed, and removes task-irrelevant content. We optimize a CI-aware objective that penalizes necessity errors more strongly on high-risk content, enabling aggressive pruning while preserving task-critical information. Experiments on real-world UI observations derived from WebArena show that MINIM substantially reduces task-irrelevant sensitive leakage while preserving task-critical semantic context and the interactive affordances required for reliable agent actions.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces MINIM, a trusted local broker that performs privacy-aware minimization of UI states to prevent sensitive data leakage when using LLM-powered autonomous agents.</p>
<p><strong>Core Idea:</strong> The core idea is to filter UI observations on the client side by balancing privacy preservation with task necessity, ensuring only relevant information is sent to remote inference servers.</p>
<p><strong>Technique:</strong> MINIM utilizes a dual-score representation (sensitivity and necessity) grounded in Contextual Integrity (CI) to implement a ternary disclosure policy for UI elements.</p>
<p><strong>Pipeline:</strong> Raw UI State → Local Broker (Sensitivity &amp; Necessity Scoring) → Ternary Disclosure Policy (Keep/Abstract/Remove) → Minimized UI Observation → Remote Inference Server</p>
<p><strong>Methodology:</strong> The authors developed a CI-aware objective function that penalizes necessity errors more heavily on high-risk content, optimizing the trade-off between privacy and agent performance.</p>
<p><strong>Results:</strong> MINIM substantially reduces task-irrelevant sensitive leakage while preserving task-critical semantic context and interactive affordances for reliable agent actions on the WebArena benchmark.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the computational overhead of local processing or the potential for 'abstraction' to lose subtle but necessary cues for complex edge cases.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.13949" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14200">When Should Agent Trust Be Conditional? Characterizing and Attacking Skill-Conditional Reputation in Agent Swarms</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Yihan Xia, Taotao Wang
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14200" target="_blank" rel="noopener noreferrer">2606.14200</a></p>
<p class="paper-detail"><strong>Authors:</strong> Yihan Xia, Taotao Wang</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Open platforms increasingly route tasks among heterogeneous LLM agents--differing in base model, scaffold, and tool stack--whose competence varies sharply by skill: an agent excellent at one skill may be useless at another. The standard reputation approach summarizes each agent by a single global trust score, but that scalar is the wrong object here, because routing every task to the globally most-trusted agent leaves the value of specialization unclaimed. We study skill-conditional trust R(i | k)--the trust to place in agent i for a task requiring skill k, rather than one score per agent--and pose three falsifiable questions: when is conditioning worth it, how much cross-skill evidence should be borrowed, and whether that borrowing is safe. A controlled phase-diagram analysis answers the first two: conditional trust wins only in a specific regime--high agent heterogeneity, sparse per-skill evidence, and correlated skills--and the coupling strength beta that buys this data efficiency is dual-use, because the same cross-skill borrowing is also a laundering channel. On a public benchmark of 14 genuinely heterogeneous AppWorld agents, real pools land inside the beneficial regime--a small but genuine gain, with the per-skill best agent genuinely changing across skills. We then show that an attacker with cheap evidence in one skill and none in a target skill hijacks the conditional router, driving routing regret from 0 to 0.94 on a pool our zero-cost Conditional Information Value Test (CIVT) rates GREEN--while the ungated trust verdict it contaminates reads -0.06 instead of the honest +0.19. A zero-evidence gate bounds the attack but does not eliminate it; we characterize the residual cost under an explicit budget. We do not claim Sybil-resistance--we quantify the trade-off.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces skill-conditional trust (R(i | k)) as a superior alternative to global reputation scores for heterogeneous agent swarms and identifies a security vulnerability where cross-skill evidence borrowing can be exploited by attackers.</p>
<p><strong>Core Idea:</strong> Instead of a single trust score, agents should be evaluated based on specific skills, but the mechanism used to infer trust in low-evidence skills from high-evidence skills creates a 'laundering channel' for malicious actors.</p>
<p><strong>Technique:</strong> The authors use a phase-diagram analysis to define the regime where conditional trust is beneficial and develop the Conditional Information Value Test (CIVT) to detect potential hijacking.</p>
<p><strong>Pipeline:</strong> Heterogeneous agent pool → Skill-specific task routing → Cross-skill evidence borrowing (coupling) → Conditional trust score R(i | k) → Task assignment</p>
<p><strong>Methodology:</strong> The study employs a controlled phase-diagram analysis to study the trade-offs between data efficiency and security, followed by empirical testing on a public benchmark of 14 AppWorld agents.</p>
<p><strong>Results:</strong> Conditional trust provides a small but genuine gain in real-world pools; however, an attacker with cheap evidence in one skill can hijack the router, increasing routing regret from 0 to 0.94 while maintaining a deceptive 'honest' trust verdict.</p>
<p><strong>Limitations:</strong> The proposed methods do not claim full Sybil-resistance and only bound the residual cost of attacks under an explicit budget.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14200" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14211">Closing the Reflection Gap: A Free Calibration Bonus for Agentic RL</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Yinglun Zhu
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14211" target="_blank" rel="noopener noreferrer">2606.14211</a></p>
<p class="paper-detail"><strong>Authors:</strong> Yinglun Zhu</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">LLMs are increasingly deployed as agents that interact with external environments and observe feedback such as execution results, error messages, and tool outputs. A well-functioning agent should be able to leverage this feedback to accurately assess its own performance. Yet we find a persistent reflection gap: LLM agents tend to mis-assess their own outputs after observing concrete environment feedback -- even for questions they correctly answered -- and standard RL barely helps due to a credit-assignment mismatch. To close this gap, we propose RefGRPO, a simple yet effective fix that augments standard RL algorithms with two key ingredients: a free calibration bonus computed by contrasting the agent's own reflection with the actual outcome (requiring no additional reward model, LLM judge, or external annotation), and a dynamic schedule on its coefficient. Compared to standard RL baselines, our method simultaneously improves reflection calibration (e.g., reduces underconfidence rate $44.4\% \to 7.7\%$) and task accuracy (e.g., $75.1\% \to 76.5\%$) on text-to-SQL across five benchmarks. The resulting calibrated reflection turns the agent into its own verifier grounded in environment feedback, which further enables (i) better self-improvement that uses reflections as pseudo-rewards without outcome supervision, and (ii) more effective test-time selective prediction by committing only to rollouts flagged as correct.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces RefGRPO, a method to close the 'reflection gap' where LLM agents mis-assess their own performance despite receiving concrete environment feedback.</p>
<p><strong>Core Idea:</strong> By incorporating a free calibration bonus that contrasts an agent's self-reflection with actual outcomes, the model learns to align its internal confidence with external reality.</p>
<p><strong>Technique:</strong> The authors augment standard RL algorithms with a calibration bonus and a dynamic coefficient schedule, requiring no external reward models or human annotations.</p>
<p><strong>Pipeline:</strong> Agent action → Environment feedback → Agent reflection → Calibration bonus calculation → RefGRPO update → Calibrated agent</p>
<p><strong>Methodology:</strong> The methodology uses a contrastive approach to penalize discrepancies between the agent's self-assessment and the ground truth, optimizing for both accuracy and reflection calibration.</p>
<p><strong>Results:</strong> Reduced underconfidence rate from 44.4% to 7.7% and improved text-to-SQL accuracy from 75.1% to 76.5% across five benchmarks.</p>
<p><strong>Limitations:</strong> The paper focuses on environment-based feedback and does not explicitly detail performance in scenarios where feedback is ambiguous or missing.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14211" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14239">SkillAudit: Ground-Truth-Free Skill Evolution via Paired Trajectory Auditing</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Haowen Gao, Haoran Chen, Can Wang, Shasha Guo, Liang Pang, Zhaoyang Liu, Huawei Shen, Xueqi Cheng
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14239" target="_blank" rel="noopener noreferrer">2606.14239</a></p>
<p class="paper-detail"><strong>Authors:</strong> Haowen Gao, Haoran Chen, Can Wang, Shasha Guo, Liang Pang, Zhaoyang Liu, Huawei Shen, Xueqi Cheng</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Agent skills are structured procedural packages that guide frozen LLM agents in specialized workflows. Skills rarely remain sufficient after deployment: edge cases, API changes, and deployment constraints become visible only through use, making skill evolution a practical necessity. Existing methods depend on privileged feedback such as held-out validation scores, hidden test outcomes, or environment rewards -- signals often unavailable when a practitioner has only a task description and workspace data. We introduce SkillAudit, a framework for evolving agent skills without ground-truth feedback. The key idea is paired trajectory auditing: at each iteration, the same task is executed with and without the candidate skill, isolating how the skill changes agent behavior without external labels. To turn behavioral differences into edit guidance, SkillAudit uses Process-Aligned Contrastive Evaluation (PACE), a cluster of evaluators that maps trajectory divergences to diagnostic signals linked to specific passages in the skill document. A structural verifier, compiled once from the task specification and then fixed, checks task constraints and rolls back harmful updates. SkillAudit routes edits through two pipelines: Refine removes noisy or irrelevant guidance from broadly useful skills, while Repair replaces passages that conflict with the task. Across 89 containerized tasks spanning 8 professional domains, SkillAudit achieves 73.9% average task reward, outperforming an agent without skills (40.9%) and the static expert skill (56.7%). These gains are obtained without accessing hidden tests, reference solutions, or external scoring functions during evolution.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces SkillAudit, a framework for evolving agent skills without requiring ground-truth feedback, hidden test outcomes, or environment rewards.</p>
<p><strong>Core Idea:</strong> The framework uses paired trajectory auditing to isolate the specific impact of a skill by comparing agent behavior with and without the skill on the same task.</p>
<p><strong>Technique:</strong> It employs Process-Aligned Contrastive Evaluation (PACE) to map trajectory divergences to diagnostic signals and a structural verifier to ensure task constraint compliance.</p>
<p><strong>Pipeline:</strong> Task description and workspace data → Paired trajectory execution (with/without skill) → PACE diagnostic mapping → Refine/Repair edit pipelines → Updated skill document</p>
<p><strong>Methodology:</strong> SkillAudit identifies behavioral differences between paired trajectories, routes edits through refinement or repair pipelines based on diagnostic signals, and uses a fixed structural verifier to roll back harmful updates.</p>
<p><strong>Results:</strong> Achieved a 73.9% average task reward across 89 containerized tasks, significantly outperforming agents without skills (40.9%) and static expert skills (56.7%).</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the computational overhead of running paired trajectories for every iteration or the scalability of the PACE cluster across highly complex, multi-step reasoning tasks.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14239" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14249">HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Tingyang Chen, Shuo Lu, Kang Zhao, Weicheng Meng, Hanlin Teng, Tianhao Li, Chao Li, Xule Liu, Jian Liang, Zhizhong Zhang, Yuan Xie, Heng Qu, Kun Shao, Jian Luan
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14249" target="_blank" rel="noopener noreferrer">2606.14249</a></p>
<p class="paper-detail"><strong>Authors:</strong> Tingyang Chen, Shuo Lu, Kang Zhao, Weicheng Meng, Hanlin Teng, Tianhao Li, Chao Li, Xule Liu, Jian Liang, Zhizhong Zhang, Yuan Xie, Heng Qu, Kun Shao, Jian Luan</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">AI agent performance depends critically on the runtime harness, comprising the prompts, tools, memory, and control flow that mediate how a model observes, reasons, and acts. Yet today's harnesses remain largely hand-crafted and static: each new model or task still demands bespoke scaffolding, and the rich traces produced during execution are rarely distilled back into systematic improvement. We introduce HarnessX, a foundry for composable, adaptive, and evolvable agent harnesses. HarnessX assembles typed harness primitives via a substitution algebra, adapts them through AEGIS, a trace-driven multi-agent evolution engine grounded in an operational mirror between symbolic adaptation and reinforcement learning, and closes the harness-model loop by turning trajectories into both harness updates and model training signal. Across five benchmarks (ALFWorld, GAIA, WebShop, tau^3-Bench, and SWE-bench Verified), HarnessX yields an average gain of +14.5% (up to +44.0%), with gains largest where baselines are lowest. These results suggest that agent progress need not come from model scaling alone: composing and evolving runtime interfaces from execution feedback is an actionable and complementary lever. The complete codebase will be open-sourced in a future release.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces HarnessX, a foundry for creating composable, adaptive, and evolvable agent harnesses that move beyond static, hand-crafted scaffolding.</p>
<p><strong>Core Idea:</strong> Agent performance can be significantly improved by treating the runtime harness (prompts, tools, memory, and control flow) as a dynamic, evolvable component that learns from execution traces.</p>
<p><strong>Technique:</strong> HarnessX utilizes a substitution algebra for composing typed primitives and the AEGIS engine, which uses an operational mirror between symbolic adaptation and reinforcement learning to evolve harnesses.</p>
<p><strong>Pipeline:</strong> Execution traces → AEGIS multi-agent evolution engine → Updated harness primitives and model training signals</p>
<p><strong>Methodology:</strong> The authors developed a framework to assemble harness components via substitution algebra and evaluated it across five diverse benchmarks including ALFWorld, GAIA, and SWE-bench Verified.</p>
<p><strong>Results:</strong> HarnessX achieved an average performance gain of +14.5% across benchmarks, with individual gains reaching up to +44.0%, particularly in low-baseline scenarios.</p>
<p><strong>Limitations:</strong> The full codebase is not yet available (future release), and the extent of scalability for extremely complex, multi-step reasoning tasks remains to be fully explored.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14249" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14314">Communication Policy Evolution for Proactive LLM Agents</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Xinbei Ma, Jiyang Qiu, Yao Yao, Zheng Wu, Yijie Lu, Xiangmou Qu, Jiaxin Yin, Xingyu Lou, Jun Wang, Weiwen Liu, Weinan Zhang, Zhuosheng Zhang, Hai Zhao
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14314" target="_blank" rel="noopener noreferrer">2606.14314</a></p>
<p class="paper-detail"><strong>Authors:</strong> Xinbei Ma, Jiyang Qiu, Yao Yao, Zheng Wu, Yijie Lu, Xiangmou Qu, Jiaxin Yin, Xingyu Lou, Jun Wang, Weiwen Liu, Weinan Zhang, Zhuosheng Zhang, Hai Zhao</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">LLM agents have rapidly evolved into autonomous systems, yet a persistent information gap remains between users and agents: communication is costly, while users' identical preferences further limit information exchange. To investigate how agents should communicate across modalities, this paper formalizes Communication Policy, establishes textual and UI-based policies, and then evaluates communication policies across diverse environments, personas, and model combinations. Building information asymmetry for proactive agents, we set up two complementary settings, User-Agent and Planner-Executor. Experimental results reveal complementary strengths between interaction channels: text-based interaction often facilitates task performance, while structured UI improves agents' response quality and persona compliance. Motivated by that, a hybrid method combines these advantages. We further propose Communication Policy Evolution (CPE), a self-evolution framework for refining communication policies through rollout and prompt-level evolving. Without model modification, CPE achieves the best task success across multiple settings using prompt refinement alone. Our findings identify communication behavior as a critical yet underexplored design dimension for LLM agents.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper formalizes 'Communication Policy' for proactive LLM agents and introduces a self-evolution framework (CPE) to optimize how agents exchange information across different modalities.</p>
<p><strong>Core Idea:</strong> To bridge the information gap between users and autonomous agents, communication should be treated as a dynamic policy that balances task performance, response quality, and persona compliance through hybrid text and UI channels.</p>
<p><strong>Technique:</strong> The authors propose Communication Policy Evolution (CPE), a framework that refines communication strategies through rollout and prompt-level evolution without requiring model fine-tuning.</p>
<p><strong>Pipeline:</strong> User/Planner requirements → Communication Policy selection (Text/UI/Hybrid) → Agent execution → Performance evaluation → CPE feedback loop → Refined Communication Policy</p>
<p><strong>Methodology:</strong> The study establishes textual and UI-based policies and evaluates them across diverse environments, personas, and model combinations using User-Agent and Planner-Executor settings.</p>
<p><strong>Results:</strong> Text-based interaction improves task performance, while structured UI enhances response quality and persona compliance; the hybrid method and CPE framework achieve the highest task success rates.</p>
<p><strong>Limitations:</strong> The study focuses on prompt-level evolution without model modification, leaving the potential for architectural changes in communication modules as an open question.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14314" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14470">GitOfThoughts: Version-Controlled Reasoning and Agent Memory You Can Replay, Diff, and Merge</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Pavan C Shekar, Abhishek H S, Aswanth Krishnan
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14470" target="_blank" rel="noopener noreferrer">2606.14470</a></p>
<p class="paper-detail"><strong>Authors:</strong> Pavan C Shekar, Abhishek H S, Aswanth Krishnan</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Large language model (LLM) reasoning is ephemeral: chains of thought vanish with the context window, pruned search branches leave no record, and memory buffers cannot be diffed, merged, or audited. Every other complex software process (code, infrastructure, data, experiments) is version-controlled; reasoning is not. We introduce GitOfThoughts, which stores an agent's reasoning tree as a git repository: every scored thought is a commit, scores are notes, outcomes are tags, and retrieval is "git log" over the agent's own history. This makes reasoning replayable, auditable, and mergeable across agents at near-zero engineering cost.   We then ask the harder question: does memory, in any substrate, actually improve accuracy? Across five substrates (none, markdown, vector, graph, git), two benchmarks, two model scales, and pre-registered replications, the answer for novel problems is no. No memory format reliably helps, and a promising early result collapsed under its own pre-registered replication. Memory pays only above what we call the copyability threshold: when the retrieved case is a near-duplicate of the current problem (similarity &gt;~ 0.8), accuracy jumps sharply; below it, nothing. The gain is answer retrieval, not method transfer: a 4.5x larger model doubles the near-duplicate payoff yet still cannot extract a transferable method from a worked example. The only general lever we find is test-time sampling. The case for git-as-substrate is therefore auditability, provenance, and mergeability at accuracy parity. We document a retracted result and a refuted hypothesis to model the evaluation standard we hold ourselves to.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces GitOfThoughts, a framework that treats LLM reasoning as a version-controlled repository, and provides a rigorous empirical analysis of memory substrates for LLM accuracy.</p>
<p><strong>Core Idea:</strong> Reasoning processes should be treated like software code, allowing for replayability, auditing, and merging by storing every thought as a commit in a git-like structure.</p>
<p><strong>Technique:</strong> The authors map reasoning trees to git repositories where thoughts are commits, scores are notes, and outcomes are tags, while evaluating five different memory substrates (none, markdown, vector, graph, and git).</p>
<p><strong>Pipeline:</strong> Agent reasoning steps → Git commit storage (thoughts, scores, tags) → Retrieval via 'git log' and history analysis → Auditable/mergeable reasoning paths.</p>
<p><strong>Methodology:</strong> The researchers conducted pre-registered replications across five memory substrates, two benchmarks, and two model scales to test if specific memory formats improve accuracy on novel problems.</p>
<p><strong>Results:</strong> Memory formats do not improve accuracy on novel problems; gains only occur above a 'copyability threshold' (similarity &gt; 0.8) where the model performs answer retrieval rather than method transfer.</p>
<p><strong>Limitations:</strong> Current memory substrates fail to facilitate transferable method learning from worked examples, and the primary benefit of GitOfThoughts is auditability and provenance rather than raw accuracy gains.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14470" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14502">From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AI</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Yongheng Zhang, Ziang Liu, Jiaxuan Zhu, Shuai Wang, Xiangqi Chen, Haojing Huang, Jiayi Kuang, Siyu Chen, Ao Shen, Hao Wu, Qiufeng Wang, Qian-Wen Zhang, Junnan Dong, Wenhao Jiang, Ying Shen, Hai-Tao Zheng, Yinghui Li, Di Yin, Xing Sun, Philip S. Yu
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14502" target="_blank" rel="noopener noreferrer">2606.14502</a></p>
<p class="paper-detail"><strong>Authors:</strong> Yongheng Zhang, Ziang Liu, Jiaxuan Zhu, Shuai Wang, Xiangqi Chen, Haojing Huang, Jiayi Kuang, Siyu Chen, Ao Shen, Hao Wu, Qiufeng Wang, Qian-Wen Zhang, Junnan Dong, Wenhao Jiang, Ying Shen, Hai-Tao Zheng, Yinghui Li, Di Yin, Xing Sun, Philip S. Yu</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Large Language Models (LLMs) are undergoing a fundamental transformation from conversational generators into integrated AI systems capable of reasoning, action, memory, and self-improvement. We conceptualize this transition as a shift from Chatbot to Digital Colleague: from conversational answers to persistent work. We organize this transition along two tightly coupled dimensions. First, at the cognitive core level, LLMs are advancing from Chatbot-era "fast thinking" systems driven by next-token prediction toward Thinking LLMs that leverage inference-time computation, Chain-of-Thought reasoning, reflection, process supervision, and reinforcement learning to support more deliberate and reliable cognition. Second, at the tool-augmented task execution level, LLMs are progressing from tool-calling Agents that invoke external resources in an ad hoc manner toward OpenClaw-style workstation systems (OpenClaw) equipped with persistent Workspaces, skills, verification loops, and governance. The "Workspace + Skill" paradigm makes episodic tool use colleague-like via state persistence, reusable procedures, task closure, and experience reuse. We examine data construction shifts from instruction-response pairs to State-Action-Observation trajectories and evaluation from static benchmarks to sandboxed, auditable, self-evolving AI ecosystems.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper conceptualizes the paradigm shift of LLMs from conversational chatbots to 'Digital Colleagues' by defining a transition toward persistent autonomous systems capable of reasoning, memory, and self-improvement.</p>
<p><strong>Core Idea:</strong> The transition is defined by two dimensions: moving from 'fast thinking' next-token prediction to deliberate inference-time reasoning, and from ad-hoc tool-calling to persistent 'Workspace + Skill' systems.</p>
<p><strong>Technique:</strong> The authors propose the 'Workspace + Skill' paradigm, which utilizes persistent workspaces, reusable procedures, verification loops, and state-action-observation trajectories to enable colleague-like behavior.</p>
<p><strong>Pipeline:</strong> User Task/Goal → Thinking LLM (Reasoning &amp; Reflection) → Workspace + Skill Execution (State Persistence &amp; Tool Use) → Verification Loop → Task Closure &amp; Experience Reuse</p>
<p><strong>Methodology:</strong> The research analyzes the shift in data construction from instruction-response pairs to state-action-observation trajectories and evaluates the transition from static benchmarks to sandboxed, auditable ecosystems.</p>
<p><strong>Results:</strong> The framework establishes a roadmap for moving beyond episodic tool use toward persistent work, emphasizing the importance of state persistence, process supervision, and self-evolving AI ecosystems.</p>
<p><strong>Limitations:</strong> The paper focuses on conceptualizing the paradigm shift and architectural requirements, leaving specific implementation details of the 'OpenClaw' workstation and long-term governance challenges as areas for further exploration.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14502" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14571">StreamMemBench: Streaming Evaluation of Agent Memory for Future-Oriented Assistance</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Guanming Liu, Yuqi Ren, Hansu Gu, Peng Zhang, Weihang Wang, Jiahao Liu, Ning Gu, Tun Lu
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14571" target="_blank" rel="noopener noreferrer">2606.14571</a></p>
<p class="paper-detail"><strong>Authors:</strong> Guanming Liu, Yuqi Ren, Hansu Gu, Peng Zhang, Weihang Wang, Jiahao Liu, Ning Gu, Tun Lu</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">A central role of personal-agent memory is to turn stored information and prior interactions into future-oriented assistance. In daily use, useful cues come from what the agent observes and how the user interacts with the agent, and the agent must carry them forward from the current request to similar future tasks. Existing memory benchmarks usually test dialogue recall or task improvement in isolation, leaving the trajectory from streaming observations to later assistance largely untested. We introduce StreamMemBench, a streaming benchmark that constructs a two-step task sequence around each evidence anchor from EgoLife egocentric streams. The initial task tests evidence use, while the follow-up task tests whether feedback and interaction experience are reused. Four metrics diagnose evidence recall, initial evidence use, feedback incorporation, and follow-up reuse. Experiments with eight memory systems across two backbones show that current systems often fail to use observed evidence or turn feedback into reliable follow-up behavior, even when evidence is stored or feedback is incorporated locally. StreamMemBench is publicly available at https://github.com/landian60/StreamMemBench.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces StreamMemBench, a new streaming benchmark designed to evaluate how personal agents carry forward observations and interaction feedback into future-oriented assistance.</p>
<p><strong>Core Idea:</strong> Existing benchmarks test memory in isolation, whereas real-world agents must bridge the gap between streaming observations and subsequent tasks by reusing evidence and feedback.</p>
<p><strong>Technique:</strong> The authors construct two-step task sequences around evidence anchors from EgoLife egocentric streams to test both immediate evidence use and long-term feedback incorporation.</p>
<p><strong>Pipeline:</strong> EgoLife egocentric streams → Evidence anchor identification → Two-step task sequence generation (initial task + follow-up task) → Multi-metric evaluation (recall, use, feedback, reuse)</p>
<p><strong>Methodology:</strong> The researchers evaluated eight memory systems across two backbones using four specific metrics to diagnose the trajectory from observation to future assistance.</p>
<p><strong>Results:</strong> Current systems frequently fail to utilize observed evidence or convert feedback into reliable follow-up behaviors, even when the information is successfully stored or incorporated locally.</p>
<p><strong>Limitations:</strong> The study focuses on specific egocentric streams and may not capture the full diversity of all possible real-world personal agent interaction scenarios.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14571" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    <a class="paper-action-btn gh-btn" href="https://github.com/landian60/StreamMemBench" target="_blank" rel="noopener noreferrer" title="View code on GitHub" aria-label="View code on GitHub"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z" /></svg><span>Code</span></a>
  </div>
</div>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14672">Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent Workflows</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Shikun Liu, Mufei Li, Dongqi Fu, Haoyu Wang, Yinglong Xia, Hong Li, Hong Yan, Pan Li
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14672" target="_blank" rel="noopener noreferrer">2606.14672</a></p>
<p class="paper-detail"><strong>Authors:</strong> Shikun Liu, Mufei Li, Dongqi Fu, Haoyu Wang, Yinglong Xia, Hong Li, Hong Yan, Pan Li</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Large language models increasingly serve as execution engines for agentic systems, yet they still consume context through a sequential text interface. This creates a mismatch with modern structured agent workflows, in which independent branches explore subtasks, retrieve evidence, or generate candidate solutions before a final synthesis step. Existing systems typically merge these branches by concatenating their textual outputs, which discards the parallel structure and incurs redundant prefill computation. In this work, we introduce Parallel-Synthesis, a plug-and-play framework that enables a synthesizer to directly consume the KV caches produced by parallel worker agents. Parallel-Synthesis combines a cache mapper that calibrates independently generated branch caches with a fine-tuned synthesizer adapter that enables generation from this non-sequential cache interface. We train Parallel-Synthesis using data that exposes the synthesizer to parallel cache contexts, teaches aggregation across cached branches, and distills reasoning behavior from standard text-concatenation-based synthesis. Across nine downstream datasets spanning math, science QA, code generation, GAIA, and multi-agent database diagnosis, Parallel-Synthesis matches or outperforms text-based synthesis on seven datasets and remains close on the other two. It also reduces time-to-first-token by 2.5x-11x, suggesting that direct cache-based synthesis is a promising interface for more native and efficient synthesis over parallel agent branches.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces Parallel-Synthesis, a plug-and-play framework that allows LLMs to synthesize information directly from the KV caches of parallel agent branches rather than through sequential text concatenation.</p>
<p><strong>Core Idea:</strong> By bypassing the need to re-process textual outputs from parallel workers, the system preserves the parallel structure of agent workflows and eliminates redundant prefill computations.</p>
<p><strong>Technique:</strong> The framework utilizes a cache mapper to calibrate independent branch caches and a fine-tuned synthesizer adapter to generate outputs from this non-sequential cache interface.</p>
<p><strong>Pipeline:</strong> Parallel worker KV caches → Cache Mapper (calibration) → Synthesizer Adapter → Final synthesized output</p>
<p><strong>Methodology:</strong> The authors trained the system using data that exposes the synthesizer to parallel cache contexts, teaching it to aggregate across branches while distilling reasoning behavior from standard text-concatenation methods.</p>
<p><strong>Results:</strong> Parallel-Synthesis matched or outperformed text-based synthesis on 7 out of 9 datasets and achieved a 2.5x-11x reduction in time-to-first-token.</p>
<p><strong>Limitations:</strong> The study does not fully explore the scalability of the cache mapper across extremely large numbers of parallel branches or the impact of varying cache lengths on synthesis quality.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14672" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="llm">LLM</h4>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.13862">SuperThoughts: Reasoning Tokens in Superposition</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Zheyang Xiong, Shivam Garg, Max Yu, Vaishnavi Shrivastava, Haoyu Zhao, Anastasios Kyrillidis, Dimitris Papailiopoulos
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.13862" target="_blank" rel="noopener noreferrer">2606.13862</a></p>
<p class="paper-detail"><strong>Authors:</strong> Zheyang Xiong, Shivam Garg, Max Yu, Vaishnavi Shrivastava, Haoyu Zhao, Anastasios Kyrillidis, Dimitris Papailiopoulos</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">Long Chain-of-Thought (CoT) reasoning improves LLM problem-solving but is computationally expensive due to sequential token generation. While recent works explore reasoning in continuous latent spaces to bypass discrete token generation, they often struggle with training stability and fail to scale to complex, long-horizon tasks due to lack of supervision signal. We propose SuperThoughts, which compresses pairs of consecutive CoT tokens into single latent representations and decodes two tokens per step via a lightweight Multi-Token Prediction (MTP) module. This preserves discrete token supervision at training time while doubling throughput at inference time. We finetune Qwen2.5-Math-1.5B-Instruct, Qwen2.5-Math-7B-Instruct, Qwen2.5-Math-14B-Instruct, and evaluate on MATH500, AMC, OlympiadBench, and GPQA-Diamond. With a confidence-based adaptive mechanism that falls back to standard decoding when uncertain, SuperThoughts achieves $\sim$20--30\% CoT length reduction while maintaining accuracy with minimal degradation (1-2 points accuracy drop on most tasks).</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces SuperThoughts, a method to accelerate Long Chain-of-Thought (CoT) reasoning by compressing consecutive tokens into latent representations to improve inference throughput.</p>
<p><strong>Core Idea:</strong> The core idea is to bypass the sequential bottleneck of discrete token generation by representing pairs of CoT tokens in a continuous latent space while maintaining discrete supervision during training.</p>
<p><strong>Technique:</strong> The authors use a lightweight Multi-Token Prediction (MTP) module to decode two tokens per step and a confidence-based adaptive mechanism to fall back to standard decoding when uncertainty is high.</p>
<p><strong>Pipeline:</strong> Input prompt → Latent representation of consecutive CoT tokens → Multi-Token Prediction (MTP) decoding → Confidence-based adaptive selection → Final reasoning output</p>
<p><strong>Methodology:</strong> The researchers fine-tuned various Qwen2.5-Math models by training the model to predict pairs of tokens simultaneously, using a confidence threshold to decide between compressed and standard decoding.</p>
<p><strong>Results:</strong> SuperThoughts achieves a 20-30% reduction in CoT length while maintaining high accuracy, with only a minimal 1-2 point accuracy drop across MATH500, AMC, OlympiadBench, and GPQA-Diamond.</p>
<p><strong>Limitations:</strong> The method may face challenges in extremely complex, long-horizon tasks where the confidence-based fallback might trigger frequently, potentially negating throughput gains.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.13862" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-15" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-nlp" title="Computation and Language (cs.CL)">Computation and Language (cs.CL)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.13683">UP-NRPA: User Portrait based Nested Rollout Policy Adaptation for Planning with Large Language Models in Goal-oriented Dialogue Systems</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Hui Wang, Fafa Zhang, Meng Liu, Xiangyu Chen, Chaoxu Mu
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.13683" target="_blank" rel="noopener noreferrer">2606.13683</a></p>
<p class="paper-detail"><strong>Authors:</strong> Hui Wang, Fafa Zhang, Meng Liu, Xiangyu Chen, Chaoxu Mu</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">To address the challenge that current dialogue policy planning methods struggle to dynamically adapt to diverse user characteristics, this paper proposes a User Portrait based Nested Rollout Policy Adaptation (UP-NRPA) online framework with Large Language Models. In contrast to conventional approaches dependent on model training and require offline reinforcement learning policy models for user groups, UP-NRPA enables dynamic customization of dialogue strategies through an adaptive mechanism. This is achieved by leveraging real-time user feedback alongside personality, preferences, and objectives mapped from the current user portrait, thereby adapting to user characteristics without offline reinforcement learning. In collaborative and non-collaborative dialogue benchmarks, UP-NRPA demonstrated considerable benefits, achieving an impressive 100% success rate in multiple dialogue tasks. Particularly in negotiation tasks, the sale-to-list ratio (SL) increased by 56.41%. This demonstrates that UP-NRPA can adapt to diverse user needs without requiring a training mechanism, enabling the dialogue system to adapt to user characteristics.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces UP-NRPA, an online framework that enables dialogue systems to dynamically adapt to diverse user characteristics without requiring offline reinforcement learning or pre-trained group-specific models.</p>
<p><strong>Core Idea:</strong> The core idea is to achieve real-time policy adaptation by mapping user portraits (personality, preferences, and objectives) and live feedback into a nested rollout mechanism for LLM-based planning.</p>
<p><strong>Technique:</strong> The technique utilizes a User Portrait-based Nested Rollout Policy Adaptation (UP-NRPA) that leverages LLMs to customize dialogue strategies on-the-fly based on dynamic user profiles.</p>
<p><strong>Pipeline:</strong> User characteristics and real-time feedback → User Portrait mapping and Nested Rollout Policy Adaptation → Customized dialogue strategy and planning</p>
<p><strong>Methodology:</strong> The methodology involves extracting user traits to form a portrait, then using a nested rollout process to adapt the LLM's planning policy based on that portrait and ongoing interaction feedback.</p>
<p><strong>Results:</strong> Achieved a 100% success rate in multiple dialogue tasks and a 56.41% increase in the sale-to-list (SL) ratio in negotiation tasks.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the computational overhead of real-time nested rollouts or the scalability of the portrait mapping across extremely complex, multi-turn long-term goals.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.13683" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="rl">RL</h4>

<div class="paper-item" data-date="2026-06-15" data-relevance="4">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 4 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot"></span></span><span class="rel-score">4/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.13682">A Deep Reinforcement Learning (DRL)-Based Transformer Method for Solving the Open Shop Scheduling Problem</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Faezeh Ardali, Mwembezi A. Nyelele, Gerald M. Knapp
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.13682" target="_blank" rel="noopener noreferrer">2606.13682</a></p>
<p class="paper-detail"><strong>Authors:</strong> Faezeh Ardali, Mwembezi A. Nyelele, Gerald M. Knapp</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">The open shop scheduling problem (OSSP) arises in many industrial and service settings but remains computationally challenging as the number of jobs and machines increases. While exact methods quickly become intractable, classical dispatching rules and metaheuristics may require substantial tuning to maintain solution quality at large scales. This study develops a Transformer-based scheduling policy for OSSP using an encoder-decoder architecture with multi-head attention. The model is trained on Taillard benchmark instances (4x4, 5x5, 7x7, and 10x10) using only the processing-time matrix as input and produces feasible schedules with makespans typically within 15-30% of best-known values. To evaluate scalability, the trained policy is applied without retraining to randomly generated instances from 40x40 to 100x100 and compared against classical dispatching heuristics, including SPT, LPT, MWKR, and EST. Across these large instances, the Transformer achieved average gaps of 12.89-15.12% relative to a standard lower bound. Compared with EST, the Transformer remained competitive, typically within a modest margin, while substantially outperforming SPT and LPT. These results indicate that a Transformer policy trained on small OSSP instances can generalize to substantially larger problems and provide a feature-light, learning-based alternative to classical dispatching rules.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces a Transformer-based Deep Reinforcement Learning policy for the Open Shop Scheduling Problem (OSSP) that generalizes from small-scale benchmarks to large-scale industrial instances.</p>
<p><strong>Core Idea:</strong> A Transformer architecture can learn to produce high-quality, feasible schedules by capturing complex dependencies in processing-time matrices without requiring manual heuristic tuning.</p>
<p><strong>Technique:</strong> The study utilizes an encoder-decoder Transformer architecture with multi-head attention trained via Deep Reinforcement Learning.</p>
<p><strong>Pipeline:</strong> Processing-time matrix → Transformer encoder-decoder architecture → Feasible schedules</p>
<p><strong>Methodology:</strong> The model was trained on Taillard benchmark instances (4x4 to 10x10) and evaluated on randomly generated large-scale instances (40x40 to 100x100) against classical dispatching rules.</p>
<p><strong>Results:</strong> The Transformer achieved average gaps of 12.89-15.12% relative to a standard lower bound on large instances, substantially outperforming SPT and LPT heuristics.</p>
<p><strong>Limitations:</strong> The model produces schedules within 15-30% of best-known values on small instances, suggesting a potential gap in reaching optimal solutions compared to exact methods.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.13682" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h4 id="robotics">Robotics</h4>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ai" title="Artificial Intelligence (cs.AI)">Artificial Intelligence (cs.AI)</span><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span><span class="cat-tag cat-ro" title="Robotics (cs.RO)">Robotics (cs.RO)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.14418">Causal Object-Centric Models for Planning with Monte Carlo Tree Search</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Rodion Vakhitov, Leonid Ugadiarov, Alexey Skrynnik, Aleksandr Panov
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.14418" target="_blank" rel="noopener noreferrer">2606.14418</a></p>
<p class="paper-detail"><strong>Authors:</strong> Rodion Vakhitov, Leonid Ugadiarov, Alexey Skrynnik, Aleksandr Panov</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">We introduce COMET (Causal Object-centric Model for Efficient Tree search), a model-based reinforcement learning algorithm that performs Monte Carlo Tree Search in a slot-structured latent space. COMET pairs a frozen unsupervised object-centric encoder with a transformer-based world model, in which actions are bound to objects through a novel action-slot fusion mechanism that is used in slot transition prediction. Policy and value heads use object-causal attention, modulating token interactions by learned per-slot relevance scores so that decision-making concentrates on task-relevant entities. COMET adds an explicit object-level inductive bias to MuZero-style latent planning. Across eight visually and dynamically diverse tasks from the Object-Centric Visual RL benchmark, ManiSkill, Robosuite, and VizDoom, COMET achieves a higher mean normalized score during the early stages of training compared to object-centric and monolithic baselines.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces COMET, a model-based reinforcement learning algorithm that integrates object-level inductive biases into MuZero-style latent planning. It achieves superior early-stage training performance by performing Monte Carlo Tree Search in a slot-structured latent space.</p>
<p><strong>Core Idea:</strong> The core idea is to move from monolithic latent representations to object-centric ones where actions are explicitly bound to specific objects and attention is modulated by per-slot relevance.</p>
<p><strong>Technique:</strong> The technique employs a frozen unsupervised object-centric encoder paired with a transformer-based world model featuring an action-slot fusion mechanism and object-causal attention.</p>
<p><strong>Pipeline:</strong> Visual input → Unsupervised object-centric encoder → Slot-structured latent space → Transformer-based world model with action-slot fusion → Monte Carlo Tree Search with object-causal attention → Policy and value heads</p>
<p><strong>Methodology:</strong> The authors evaluate COMET across eight diverse tasks in the Object-Centric Visual RL benchmark, ManiSkill, Robosuite, and VizDoom, comparing it against monolithic and standard object-centric baselines.</p>
<p><strong>Results:</strong> COMET achieves a higher mean normalized score during the early stages of training compared to both object-centric and monolithic baselines across multiple benchmarks.</p>
<p><strong>Limitations:</strong> The paper does not explicitly detail the scalability of the action-slot fusion mechanism to environments with a very high number of dynamic objects or the computational overhead of the per-slot relevance scores.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.14418" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<div class="paper-item" data-date="2026-06-15" data-relevance="5">
  <div class="paper-body">
    <div class="paper-meta">
      <span class="relevance-pill"><span class="rel-dots" aria-label="Relevance: 5 out of 5"><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span><span class="rel-dot filled"></span></span><span class="rel-score">5/5</span></span>
      <span class="cat-tags"><span class="cat-tag cat-ml" title="Machine Learning (cs.LG)">Machine Learning (cs.LG)</span></span>
      <span class="paper-date">15 Jun 2026</span>
    </div>
    <a class="paper-title" href="https://arxiv.org/abs/2606.13795">Diffusion Policy Optimization without Drifting Apart</a>
    <p class="paper-authors">
      <svg class="author-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="12" height="12"><path d="M10.561 8.073a6.005 6.005 0 0 1 3.432 5.142.75.75 0 1 1-1.498.07 4.5 4.5 0 0 0-2.97-3.93l-.04-.012a.75.75 0 0 1 .076-1.27zm-4.5-.31a4.5 4.5 0 0 0-2.97 3.93.75.75 0 0 1-1.498-.07A6.005 6.005 0 0 1 5.025 6.44l-.004.001a.75.75 0 0 1 .04 1.322zM8 7a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm0 1.5a3.5 3.5 0 1 1 0-7 3.5 3.5 0 0 1 0 7z" /></svg>
      Haozhe Jiang, Haiwen Feng, Pieter Abbeel, Jiantao Jiao, Angjoo Kanazawa, Nika Haghtalab
    </p>
<details class="abstract">
<summary>Abstract</summary>
<p class="paper-detail"><strong>ArXiv ID:</strong> <a href="https://arxiv.org/abs/2606.13795" target="_blank" rel="noopener noreferrer">2606.13795</a></p>
<p class="paper-detail"><strong>Authors:</strong> Haozhe Jiang, Haiwen Feng, Pieter Abbeel, Jiantao Jiao, Angjoo Kanazawa, Nika Haghtalab</p>
<p class="paper-detail abstract-body"><strong>Abstract:</strong></p>
<p class="abstract-text">RL post-training has become increasingly pivotal for improving diffusion policies, but existing diffusion policy-gradient methods are often unstable and cannot achieve reliable policy improvement. We identify the cause as the double-drift phenomenon: optimizing a variational surrogate can let the ELBO separate from the true log-likelihood, which then makes the resulting proxy policy gradient misaligned with the true policy gradient of expected return. We propose \textbf{DiPOD}, a diffusion policy optimization framework that maintains tight-bound behavior throughout training by interleaving self-distillation with policy-improving gradient updates. This leads to a simple and practical algorithm: augmenting each diffusion policy-gradient update with an on-policy ELBO regularizer. Across diffusion language model post-training and continuous-control diffusion policies, DiPOD substantially stabilizes training and reaches higher rewards than previous methods.</p>
</details>
<details class="insights">
<summary>Insights</summary>
<p><strong>Contribution:</strong> The paper introduces DiPOD, a framework that stabilizes diffusion policy optimization by addressing the 'double-drift' phenomenon where surrogate optimization causes the proxy policy gradient to misalign with the true policy gradient.</p>
<p><strong>Core Idea:</strong> The authors propose interleaving self-distillation with policy-improving gradient updates to maintain a tight bound between the variational surrogate and the true log-likelihood.</p>
<p><strong>Technique:</strong> The main technique is the inclusion of an on-policy ELBO regularizer to augment each diffusion policy-gradient update, preventing the policy from drifting away from the data distribution.</p>
<p><strong>Pipeline:</strong> Diffusion Policy → Policy-Gradient Update + On-policy ELBO Regularization → Stabilized Policy Improvement</p>
<p><strong>Methodology:</strong> The methodology involves identifying the mathematical cause of instability in diffusion RL and implementing a dual-objective update that balances reward maximization with distribution preservation.</p>
<p><strong>Results:</strong> DiPOD substantially stabilizes training and achieves higher rewards than previous methods in both diffusion language model post-training and continuous-control diffusion policies.</p>
<p><strong>Limitations:</strong> The paper focuses on the double-drift phenomenon; further exploration into the scalability of the ELBO regularizer in extremely high-dimensional action spaces remains an open area.</p>
</details>
  </div>
  <div class="paper-actions">
    <a class="paper-action-btn pdf-btn" href="https://arxiv.org/pdf/2606.13795" target="_blank" rel="noopener noreferrer" title="Download PDF" aria-label="Download PDF"><svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M3.75 1.5a.25.25 0 0 0-.25.25v11.5c0 .138.112.25.25.25h8.5a.25.25 0 0 0 .25-.25V6H9.75A1.75 1.75 0 0 1 8 4.25V1.5H3.75zm5.75.56v2.19c0 .138.112.25.25.25h2.19L9.5 2.06zM2 1.75C2 .784 2.784 0 3.75 0h5.086c.464 0 .909.184 1.237.513l3.414 3.414c.329.328.513.773.513 1.237V13.25A1.75 1.75 0 0 1 12.25 15h-8.5A1.75 1.75 0 0 1 2 13.25V1.75z" /><path d="M4.5 8.75a.75.75 0 0 1 .75-.75h1a2.25 2.25 0 0 1 0 4.5h-.25V13.5a.75.75 0 0 1-1.5 0v-4.75zm1.5.75v1.5h.25a.75.75 0 0 0 0-1.5H6z" /><path d="M7.5 8.75a.75.75 0 0 1 .75-.75h1.25a2.25 2.25 0 0 1 0 4.5H8.25a.75.75 0 0 1-.75-.75v-3zm1.5.75v2h.5a.75.75 0 0 0 0-1.5H9V9.5z" /><path d="M11.25 8a.75.75 0 0 1 .75.75v.75h.5a.75.75 0 0 1 0 1.5H12v1.5a.75.75 0 0 1-1.5 0v-3.75A.75.75 0 0 1 11.25 8z" /></svg><span>PDF</span></a>
    
  </div>
</div>

<h2 id="github-trending">
  <svg viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="20" height="20" style="vertical-align:middle;margin-right:6px"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z" /></svg>
  GitHub Trending
</h2>

<p class="section-desc">Trending repositories on GitHub filtered and scored for relevance to your interests.</p>

<h3 id="agentic-ai-1">Agentic AI</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/OpenHands/OpenHands" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">OpenHands</span><span class="gh-sep">/</span><strong class="gh-repo">OpenHands</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">OpenHands is an open-source platform for AI-driven software engineering that enables agents to interact with development environments. It is highly relevant as it implements complex multi-agent workflows and autonomous task execution using large language models.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">Agentic AI</span><span class="gh-tag">LLM</span><span class="gh-tag">Multi-Agent Systems</span><span class="gh-tag">Software Engineering</span><span class="gh-tag">Autonomous Agents</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-15</span>
      <a class="gh-visit-btn" href="https://github.com/OpenHands/OpenHands" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/andrewyng/aisuite" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">andrewyng</span><span class="gh-sep">/</span><strong class="gh-repo">aisuite</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a unified interface to interact with multiple Generative AI providers, simplifying the integration of various LLMs into applications. It is highly relevant for building Agentic AI systems and multi-agent workflows by abstracting the complexity of different model APIs.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">LLM</span><span class="gh-tag">Agentic AI</span><span class="gh-tag">Generative AI</span><span class="gh-tag">Python</span><span class="gh-tag">Multi-model</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-15</span>
      <a class="gh-visit-btn" href="https://github.com/andrewyng/aisuite" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/Ar9av/obsidian-wiki" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">Ar9av</span><span class="gh-sep">/</span><strong class="gh-repo">obsidian-wiki</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Agentic AI</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This framework enables AI agents to construct and manage a structured &#x27;digital brain&#x27; using an Obsidian wiki based on Karpathy&#x27;s LLM Wiki pattern. It is highly relevant for research into long-term memory, autonomous knowledge management, and agentic workflows.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">AI agents</span><span class="gh-tag">LLM</span><span class="gh-tag">RAG</span><span class="gh-tag">knowledge management</span><span class="gh-tag">autonomous agents</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-15</span>
      <a class="gh-visit-btn" href="https://github.com/Ar9av/obsidian-wiki" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="computer-vision">Computer Vision</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/AUTOMATIC1111/stable-diffusion-webui" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">AUTOMATIC1111</span><span class="gh-sep">/</span><strong class="gh-repo">stable-diffusion-webui</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Computer Vision</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This is the most popular web interface for Stable Diffusion, a leading latent diffusion model for image generation. It is highly relevant for research into generative models, multimodal learning, and the practical application of diffusion techniques in computer vision.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">diffusion</span><span class="gh-tag">generative models</span><span class="gh-tag">computer vision</span><span class="gh-tag">image generation</span><span class="gh-tag">stable diffusion</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-15</span>
      <a class="gh-visit-btn" href="https://github.com/AUTOMATIC1111/stable-diffusion-webui" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="computing-systems">Computing Systems</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/LMCache/LMCache" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">LMCache</span><span class="gh-sep">/</span><strong class="gh-repo">LMCache</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Computing Systems</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">LMCache is a high-performance KV cache management layer designed to optimize Large Language Model inference speeds. It is highly relevant for users interested in LLM infrastructure, efficient computing systems, and scaling agentic AI applications.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">LLM</span><span class="gh-tag">KV Cache</span><span class="gh-tag">Inference Optimization</span><span class="gh-tag">Computing Systems</span><span class="gh-tag">Transformers</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-15</span>
      <a class="gh-visit-btn" href="https://github.com/LMCache/LMCache" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="general">General</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/microsoft/AI-For-Beginners" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">microsoft</span><span class="gh-sep">/</span><strong class="gh-repo">AI-For-Beginners</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">General</span>
      <span class="gh-relevance" title="Relevance 3/5">★★★<span class="gh-relevance-empty">★★</span> <span class="gh-rel-num">3/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a comprehensive 12-week curriculum covering the fundamentals of AI and machine learning. While it is a foundational educational resource rather than a specialized research project, it covers the core concepts necessary to understand the user&#x27;s broader interests in LLMs and Agentic AI.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">education</span><span class="gh-tag">machine learning</span><span class="gh-tag">fundamentals</span><span class="gh-tag">deep learning</span><span class="gh-tag">jupyter notebooks</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-15</span>
      <a class="gh-visit-btn" href="https://github.com/microsoft/AI-For-Beginners" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="llm-1">LLM</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/lyogavin/airllm" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">lyogavin</span><span class="gh-sep">/</span><strong class="gh-repo">airllm</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">LLM</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">This repository provides a method for running a 70B parameter Large Language Model on a single 4GB GPU. It is highly relevant for users interested in efficient inference, model optimization, and making large-scale foundation models accessible on consumer hardware.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">LLM</span><span class="gh-tag">inference</span><span class="gh-tag">model optimization</span><span class="gh-tag">quantization</span><span class="gh-tag">large language models</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-15</span>
      <a class="gh-visit-btn" href="https://github.com/lyogavin/airllm" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="mlops">MLOps</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/skypilot-org/skypilot" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">skypilot-org</span><span class="gh-sep">/</span><strong class="gh-repo">skypilot</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">MLOps</span>
      <span class="gh-relevance" title="Relevance 4/5">★★★★<span class="gh-relevance-empty">★</span> <span class="gh-rel-num">4/5</span></span>
    </div>
  </div>
  <p class="gh-summary">Skypilot provides a unified interface to run and scale AI workloads across diverse infrastructures including Kubernetes, Slurm, and multiple cloud providers. It is highly relevant for managing the large-scale compute required for training foundation models and deploying agentic AI systems.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">MLOps</span><span class="gh-tag">Infrastructure</span><span class="gh-tag">Distributed Training</span><span class="gh-tag">Cloud Computing</span><span class="gh-tag">Scalability</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-15</span>
      <a class="gh-visit-btn" href="https://github.com/skypilot-org/skypilot" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>

<h3 id="robotics-1">Robotics</h3>

<div class="gh-trending-item">
  <div class="gh-trending-header">
    <a class="gh-repo-link" href="https://github.com/NVIDIA/cosmos" target="_blank" rel="noopener noreferrer">
      <svg class="gh-repo-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" width="16" height="16"><path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8V1.5Z" /></svg>
      <span class="gh-owner">NVIDIA</span><span class="gh-sep">/</span><strong class="gh-repo">cosmos</strong>
    </a>
    <div class="gh-trending-badges">
      <span class="gh-topic-pill">Robotics</span>
      <span class="gh-relevance" title="Relevance 5/5">★★★★★<span class="gh-relevance-empty"></span> <span class="gh-rel-num">5/5</span></span>
    </div>
  </div>
  <p class="gh-summary">NVIDIA Cosmos provides a comprehensive platform for world models and datasets specifically designed for Physical AI. It is highly relevant as it addresses core interests in Embodied AI, robotics, and foundation models for physical environments.</p>
  <div class="gh-trending-footer">
    <div class="gh-tags"><span class="gh-tag">World Models</span><span class="gh-tag">Embodied AI</span><span class="gh-tag">Robotics</span><span class="gh-tag">Foundation Models</span><span class="gh-tag">Physical AI</span></div>
    <div class="gh-trending-meta">
      <span class="gh-pushed">Updated: 2026-06-15</span>
      <a class="gh-visit-btn" href="https://github.com/NVIDIA/cosmos" target="_blank" rel="noopener noreferrer">
        View on GitHub&nbsp;&#8594;
      </a>
    </div>
  </div>
</div>]]></content><author><name>hiimmuc</name></author><summary type="html"><![CDATA[Today's research focuses on the transition from isolated chatbots to persistent, autonomous agents, with a heavy emphasis on robust orchestration, safety protocols, and verifiable memory systems.]]></summary></entry><entry><title type="html">Daily Digest 2026-06-14</title><link href="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-14/" rel="alternate" type="text/html" title="Daily Digest 2026-06-14" /><published>2026-06-14T00:00:00+07:00</published><updated>2026-06-14T00:00:00+07:00</updated><id>https://hiimmuc.github.io/Personal-AI-Digest/digest/daily</id><content type="html" xml:base="https://hiimmuc.github.io/Personal-AI-Digest/digest/2026-06-14/"><![CDATA[<div class="digest-theme">
  <svg class="digest-theme-icon" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true"><path d="M8 1.5a6.5 6.5 0 1 0 0 13 6.5 6.5 0 0 0 0-13zM0 8a8 8 0 1 1 16 0A8 8 0 0 1 0 8z" /><path d="M6.5 7.75A.75.75 0 0 1 7.25 7h1a.75.75 0 0 1 .75.75v2.75h.25a.75.75 0 0 1 0 1.5h-2a.75.75 0 0 1 0-1.5h.25v-2h-.25a.75.75 0 0 1-.75-.75zM8 6a1 1 0 1 1 0-2 1 1 0 0 1 0 2z" /></svg>
  <span>Today's digest highlights a shift toward the systemic implications of AI, focusing on the economic, regulatory, and structural frameworks required to manage widespread deployment. The discourse is moving beyond model capabilities toward the long-term consequences of automation on labor, governance, and infrastructure.</span>
</div>

<h2 id="tech-news">Tech News</h2>

<h3 id="ai-safety">AI Safety</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-14</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1u5g1hz/anthropic_ceo_floats_tax_on_ai_firms_to_fund/" target="_blank" rel="noopener noreferrer">Anthropic CEO Floats Tax on AI Firms to Fund Universal Income</a>
  <p class="news-summary">Anthropic CEO Dario Amodei has proposed that governments implement taxes on AI companies to fund a universal basic income (UBI). This proposal aims to mitigate the economic disruption and job displacement caused by the &#x27;AI exponential&#x27; and the resulting reduction in labor demand.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">AI Policy</span><span class="news-tag">Universal Basic Income</span><span class="news-tag">Economic Impact</span><span class="news-tag">Anthropic</span><span class="news-tag">AI Regulation</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1u5g1hz/anthropic_ceo_floats_tax_on_ai_firms_to_fund/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-15</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1u64nok/the_us_just_made_frontier_ai_a_controlled_export/" target="_blank" rel="noopener noreferrer">the US just made frontier ai a controlled export, like nvidia chips</a>
  <p class="news-summary">The US government has placed Anthropic&#x27;s most powerful models, Fable 5 and Mythos 5, under export controls similar to high-end Nvidia chips. This move follows a reported jailbreak of Mythos 5&#x27;s cybersecurity capabilities, leading to a policy where frontier AI is treated as a controlled commodity. The decision establishes a precedent for a two-tier AI world where non-US nationals may be restricted from accessing top-tier frontier models.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">AI Governance</span><span class="news-tag">Export Controls</span><span class="news-tag">Anthropic</span><span class="news-tag">Frontier Models</span><span class="news-tag">Cybersecurity</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1u64nok/the_us_just_made_frontier_ai_a_controlled_export/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-14</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1u5rkfu/would_super_intelligent_ai_that_can_access_the/" target="_blank" rel="noopener noreferrer">Would super intelligent AI that can access the Internet be able to overcome any biases it’s creator put into it?</a>
  <p class="news-summary">A community discussion explores whether a super-intelligent AI with internet access could transcend the inherent biases instilled by its human creators. The post questions if such an entity would remain a product of its training data or develop independent reasoning capabilities. It highlights concerns regarding the potential for AI to manipulate human behavior as it evolves.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">AI Safety</span><span class="news-tag">Superintelligence</span><span class="news-tag">Bias Mitigation</span><span class="news-tag">Alignment</span><span class="news-tag">AGI</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1u5rkfu/would_super_intelligent_ai_that_can_access_the/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-14</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1u5w45f/how_should_people_share_agentsecurity_tests/" target="_blank" rel="noopener noreferrer">How should people share agent-security tests without making it vendor spam?</a>
  <p class="news-summary">A community discussion on Reddit addresses the challenge of sharing agent-security research without it being overshadowed by marketing spam or sensationalism. The author proposes a standardized format for sharing prompt injection tests, emphasizing reproducible examples, clear limitations, and technical depth over &#x27;solved&#x27; claims.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Agentic AI</span><span class="news-tag">AI Safety</span><span class="news-tag">Prompt Injection</span><span class="news-tag">Security Research</span><span class="news-tag">LLM</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1u5w45f/how_should_people_share_agentsecurity_tests/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-14</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1u5cxn6/what_would_actually_make_you_trust_an_ai_not_it/" target="_blank" rel="noopener noreferrer">What would actually make you trust an AI? Not &quot;it sounds right,&quot; but trust it the way you trust a person or an institution?</a>
  <p class="news-summary">A community discussion explores the fundamental requirements for establishing human-level trust in AI systems, moving beyond mere accuracy. The conversation highlights key hurdles such as the lack of persistent identity, accountability for past actions, and the &#x27;hallucination&#x27; problem where models provide confident but incorrect information.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">AI Ethics</span><span class="news-tag">Trustworthiness</span><span class="news-tag">LLM</span><span class="news-tag">AI Safety</span><span class="news-tag">Human-AI Interaction</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1u5cxn6/what_would_actually_make_you_trust_an_ai_not_it/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-14</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1u58qwi/can_an_ai_agent_complete_a_task_and_still_fail/" target="_blank" rel="noopener noreferrer">Can an AI agent complete a task and still fail?</a>
  <p class="news-summary">Researchers are introducing the concept of the &#x27;Verifier Tax&#x27; to distinguish between &#x27;safe success&#x27; and &#x27;unsafe success&#x27; in AI agents. The study proposes a two-tier verification architecture—combining deterministic checks with LLM-based verifiers—to ensure agents complete tasks without violating safety policies or skipping critical steps. The findings suggest that while verification improves safety, it can also decrease overall task completion rates as complexity increases.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">AI Safety</span><span class="news-tag">Agentic AI</span><span class="news-tag">LLM</span><span class="news-tag">Verification Architecture</span><span class="news-tag">Tool-use</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1u58qwi/can_an_ai_agent_complete_a_task_and_still_fail/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="agentic-ai">Agentic AI</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-14</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1u5qjy7/am_i_going_to_spend_the_rest_of_my_career/" target="_blank" rel="noopener noreferrer">Am I going to spend the rest of my career reviewing AI generated code?</a>
  <p class="news-summary">A software engineer expresses concern over the rapid shift toward AI-generated code and the potential erosion of the &#x27;problem-solving&#x27; aspect of engineering. The post highlights a growing trend where developers rely on LLMs for planning and execution, leading to a future where human roles may be reduced primarily to reviewing AI-generated pull requests.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Software Engineering</span><span class="news-tag">Generative AI</span><span class="news-tag">Developer Experience</span><span class="news-tag">AI Agents</span><span class="news-tag">Workforce Impact</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1u5qjy7/am_i_going_to_spend_the_rest_of_my_career/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-15</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1u66kb0/my_ai_tools_kept_forgetting_everything_so_i_gave/" target="_blank" rel="noopener noreferrer">My AI tools kept forgetting everything, so I gave them a shared brain (local + open source)</a>
  <p class="news-summary">A developer released &#x27;Centralaizer,&#x27; an open-source local memory hub designed to provide a shared context across different AI tools like Claude Desktop, Cursor, and VS Code Copilot. The tool uses a combination of vector search, full-text search, and a knowledge graph to allow agents to share facts and decisions while maintaining privacy through PII scrubbing and local hosting.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">Open Source</span><span class="news-tag">Context Management</span><span class="news-tag">Agentic AI</span><span class="news-tag">Local LLM</span><span class="news-tag">MCP</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1u66kb0/my_ai_tools_kept_forgetting_everything_so_i_gave/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-14</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1u5idv8/my_client_didnt_want_to_add_faqs_manually_so_i/" target="_blank" rel="noopener noreferrer">My client didn&#x27;t want to add FAQs manually, so I built a system that crawls their website and generates the knowledge base automatically</a>
  <p class="news-summary">A developer built an automated pipeline that transforms hotel websites and PDFs into structured FAQ knowledge bases. The system crawls sitemaps, filters out noise like &#x27;careers&#x27; or &#x27;login&#x27; pages, and uses an AI agent to generate structured Q&amp;A pairs from the cleaned text before embedding them into a vector database.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">RAG</span><span class="news-tag">Web Scraping</span><span class="news-tag">Agentic AI</span><span class="news-tag">Knowledge Management</span><span class="news-tag">LLM Pipelines</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1u5idv8/my_client_didnt_want_to_add_faqs_manually_so_i/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="computing-systems">Computing Systems</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-14</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1u5edg8/our_ai_bills_are_subsidised_and_i_dont_think_many/" target="_blank" rel="noopener noreferrer">Our AI bills are subsidised, and I don&#x27;t think many people have priced in what happens next</a>
  <p class="news-summary">The post highlights the unsustainable nature of current AI pricing, noting that major providers like OpenAI and Anthropic are reportedly selling compute at a loss. It warns that businesses relying on these subsidized rates may face significant financial shocks when investors demand a return on investment. The author urges developers to consider cost-modeling for price hikes and to explore fallbacks like local models.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">AI Economics</span><span class="news-tag">Compute Costs</span><span class="news-tag">LLM Pricing</span><span class="news-tag">Business Strategy</span><span class="news-tag">Infrastructure</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1u5edg8/our_ai_bills_are_subsidised_and_i_dont_think_many/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="llm">LLM</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-15</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1u64099/most_of_this_ai_marketing_drama_is_just_prompting/" target="_blank" rel="noopener noreferrer">Most of this &quot;AI marketing&quot; drama is just prompting with better packaging. And it&#x27;s a shame.</a>
  <p class="news-summary">The post critiques the &#x27;AI marketing&#x27; boom, arguing that many paid tools are merely wrappers for basic prompting techniques. The author suggests that users can achieve similar results for free by creating their own Product Requirements Documents (PRDs) and governance files to provide context to standard LLMs.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">LLM</span><span class="news-tag">Prompt Engineering</span><span class="news-tag">AI Marketing</span><span class="news-tag">Productivity</span><span class="news-tag">Open Source</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1u64099/most_of_this_ai_marketing_drama_is_just_prompting/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>

<h3 id="mlops">MLOps</h3>

<div class="news-item">
  <div class="news-meta">
    <span class="news-source news-source--reddit">Reddit r/ArtificialIntelligence</span>
    <span class="news-date">2026-06-14</span>
  </div>
  <a class="news-title" href="https://www.reddit.com/r/artificial/comments/1u5oi77/were_building_an_ai_factory/" target="_blank" rel="noopener noreferrer">We’re building an AI factory</a>
  <p class="news-summary">A new initiative called &#x27;Since AI&#x27; is launching an &#x27;AI factory&#x27; designed to move beyond traditional hackathon networking. The platform aims to provide serious builders with real-world industry problems, dedicated compute, and a 72-hour sprint to develop functional software. The goal is to identify and support the strongest projects for long-term continuation.</p>
  <div class="news-footer">
    <div class="news-tags"><span class="news-tag">AI Development</span><span class="news-tag">Hackathons</span><span class="news-tag">Compute Resources</span><span class="news-tag">Software Engineering</span><span class="news-tag">Industry Solutions</span></div>
    <a class="news-read-btn" href="https://www.reddit.com/r/artificial/comments/1u5oi77/were_building_an_ai_factory/" target="_blank" rel="noopener noreferrer">Read&nbsp;more&nbsp;&#8594;</a>
  </div>
</div>]]></content><author><name>hiimmuc</name></author><summary type="html"><![CDATA[Today's digest highlights a shift toward the systemic implications of AI, focusing on the economic, regulatory, and structural frameworks required to manage widespread deployment. The discourse is moving beyond model capabilities toward the long-term consequences of automation on labor, governance, and infrastructure.]]></summary></entry></feed>