Tuesday, June 23, 2026

Week 24 — 2026-06-08 – 2026-06-14

Sort:
133 papers · 70 news · 28 repos · Week 24

This week’s developments highlight a sophisticated shift toward optimizing the operational efficiency and cognitive architecture of autonomous agents, moving beyond simple prompting toward "context engineering" and tree-search cognition layers. Research is increasingly focused on the mechanics of reliability, specifically addressing deployment-time memorization, knowledge conflicts, and the auditing of parametric tool knowledge to ensure more predictable agent behavior. In the broader tech landscape, the competitive tension between open-source and proprietary models intensified as DeepSeek V4 Pro challenged industry leaders, while community discourse reflected a growing anxiety over the erosion of traditional software engineering roles. Simultaneously, the high engagement with repositories like llama.cpp and various agent-skill frameworks underscores a massive grassroots effort to democratize and refine these high-level capabilities for practical, local deployment.


Research Highlights

By Personal Interest

Top papers per interest topic, ranked by relevance.

3D Scene Graph

#1 Computer Vision and Pattern Recognition (cs.CV) 10 Jun 2026
SG2Loc: Sequential Visual Localization on 3D Scene Graphs

The paper introduces SG2Loc, a lightweight sequential visual localization method that replaces large image databases or point clouds with compact 3D scene graphs.

Embodied AI

#2 Computation and Language (cs.CL)Artificial Intelligence (cs.AI)Computer Vision and Pattern Recognition (cs.CV) 9 Jun 2026
Beyond APIs: Probing the Limits of MLLMs in Physical Tool Use

The paper introduces PhysTool-Bench, the first comprehensive benchmark designed to evaluate Multimodal Large Language Models (MLLMs) on their ability to recognize and plan the use of physical tools in real-world scenarios.

#3 Robotics (cs.RO)Artificial Intelligence (cs.AI) 9 Jun 2026
Test-time Adversarial Takeover: A Real-time Hijacking Interface against Robotic Diffusion Policies

The paper introduces Test-time Adversarial Takeover (TAKO), a novel attack that enables real-time remote hijacking of frozen robotic diffusion policies by injecting universal visual patches. It demonstrates that these policies can be turned into remotely piloted instruments across various tasks and hardware configurations.

Human-Computer Interaction

#1 Image and Video Processing (eess.IV)Computer Vision and Pattern Recognition (cs.CV) 5 Jun 2026
Beyond Universality: The GCC-FER Dataset and Culture-Aware Adaptation for Dynamic Facial Expression Recognition
#2 Signal Processing (eess.SP) 8 Jun 2026
Adaptive Derivative Estimation via Stein's Unbiased Risk

The paper introduces SURDE, a causal and computationally efficient derivative estimator that adaptively balances the noise-amplification and smoothing-bias tradeoff using Stein's Unbiased Risk Estimator (SURE).

#3 Computer Vision and Pattern Recognition (cs.CV) 9 Jun 2026
GUI-AC: Enhancing Continual Learning in GUI Agents

The paper introduces GUI-AC, a method designed to enhance the continual learning of GUI agents by addressing the instability and exploration collapse inherent in Reinforcement Fine-Tuning (RFT) under non-stationary data distributions.

Multi-Agent Systems

#6 Artificial Intelligence (cs.AI)cs.GR 10 Jun 2026
A Lightweight Multi-Agent Framework for Automated Concrete Barrier Design

The paper introduces a lightweight multi-agent framework that automates the design of reinforced concrete highway barriers while ensuring strict adherence to AASHTO-LRFD safety standards. It demonstrates that specialized agentic orchestration can outperform much larger flagship models in domain-specific engineering tasks.

#7 Artificial Intelligence (cs.AI) 10 Jun 2026
Embodied-BenchClaw: An Autonomous Multi-Agent System for Embodied Spatial Intelligence Benchmark Construction

The paper introduces Embodied-BenchClaw, an autonomous multi-agent system designed to automate the construction, maintenance, and updating of embodied spatial intelligence benchmarks.

#8 Machine Learning (cs.LG)Artificial Intelligence (cs.AI)Computation and Language (cs.CL) 9 Jun 2026
FlowBank: Query-Adaptive Agentic Workflows Optimization through Precompute-and-Reuse

The paper introduces FlowBank, a framework that optimizes LLM-based multi-agent systems by building a compact bank of reusable, complementary workflows to balance offline computation and inference costs.

#9 astro-ph.COphysics.comp-ph 9 Jun 2026
DarkAgents

The paper introduces DarkAgents, a multi-agent system designed to automate complex theoretical astroparticle physics research by combining LLM reasoning with deterministic human-written code.

#10 Multiagent Systems (cs.MA)Computer Science and Game Theory (cs.GT)Machine Learning (cs.LG) 9 Jun 2026
Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria

The paper introduces Φ-Actor-Critic (Φ-AC), a framework that steers multi-agent reinforcement learning toward Pareto-efficient correlated equilibria in general-sum games.

Vision-Language Models

#1 Computer Vision and Pattern Recognition (cs.CV)Artificial Intelligence (cs.AI)Computation and Language (cs.CL) 5 Jun 2026
MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism
#2 Computer Vision and Pattern Recognition (cs.CV)Artificial Intelligence (cs.AI)Computation and Language (cs.CL)Machine Learning (cs.LG) 5 Jun 2026
TEVI: Text-Conditioned Editing of Visual Representations via Sparse Autoencoders for Improved Vision-Language Alignment
#4 Computer Vision and Pattern Recognition (cs.CV)Artificial Intelligence (cs.AI)Computation and Language (cs.CL)Machine Learning (cs.LG) 5 Jun 2026
Textual Supervision Enhances Geospatial Representations in Vision-Language Models
#5 Computer Vision and Pattern Recognition (cs.CV) 5 Jun 2026
TraRA: Trajectory-level Recognition Aggregation for Video Text Spotting in Urban Surveillance
#6 Computer Vision and Pattern Recognition (cs.CV)Artificial Intelligence (cs.AI) 5 Jun 2026
GP-Adapter: Gaussian Process CLIP-Adapter for Few-Shot Out-of-Distribution Detection
#8 Computer Vision and Pattern Recognition (cs.CV)Artificial Intelligence (cs.AI) 5 Jun 2026
SS-TPT: Stability and Suitability-Guided Test-Time Prompt Tuning for Adversarially Robust Vision-Language Models
#10 Computer Vision and Pattern Recognition (cs.CV) 5 Jun 2026
Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors

Vision-Language Navigation

#1 Robotics (cs.RO)Artificial Intelligence (cs.AI)Computer Vision and Pattern Recognition (cs.CV) 5 Jun 2026
Beyond Waypoints: A Trajectory-Centric Waypointing Paradigm for Vision-Language Navigation
#2 Robotics (cs.RO)Artificial Intelligence (cs.AI)Computer Vision and Pattern Recognition (cs.CV) 5 Jun 2026
Think Like a Pilot: Fine-Grained Long-Horizon UAV Navigation

By Research Area

Top papers per ArXiv subject category, ranked by relevance.

AI Safety

#1 Artificial Intelligence (cs.AI)Human-Computer Interaction (cs.HC) 12 Jun 2026
Strategic Decision Support for AI Agents

The paper introduces a strategic decision support framework for AI agents that minimizes support usage while bounding the probability of 'missed-support' errors. It provides a unified lens for modeling information gathering, human-AI collaboration, and tool use as strategic optimization problems.

#2 Artificial Intelligence (cs.AI) 12 Jun 2026
"Did you lie?" Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms

The paper introduces 13 belief-verified model organisms and a prompted-lying testbed to rigorously evaluate lie detectors across various model scales. It demonstrates that current activation- and logprob-based detectors fail on models with verified internal beliefs, highlighting a significant gap in current lie-detection capabilities.

#3 Artificial Intelligence (cs.AI) 12 Jun 2026
Definitional alignment before capability alignment: a Design-Science framework for adjudicating claims about AGI

The paper introduces DAF-AGI, a Design Science framework that treats the lack of a stable definition for AGI as a governance problem rather than just a technical one. It proposes 'definitional sovereignty' as a necessary component of algorithmic sovereignty to ensure public accountability in technological categorization.

#4 Artificial Intelligence (cs.AI) 12 Jun 2026
Prefill Awareness in Large Language Models

The paper identifies and quantifies 'prefill awareness,' the ability of frontier LLMs to detect and react to tampered or inserted assistant-side context in their input history. It demonstrates that this capability can compromise the validity of safety evaluations and AI control protocols that rely on prefilling model outputs.

#5 Artificial Intelligence (cs.AI)econ.GNq-fin.EC 12 Jun 2026
(Human) Attention Is (Still) All You Need: Human oversight makes AI-assisted social science reliable

The paper introduces Human-in-the-Loop Economic Research (HLER), a decision architecture that significantly improves the reliability of AI-assisted social science research by structuring cognitive labor between humans and LLMs.

#6 Artificial Intelligence (cs.AI)Computation and Language (cs.CL)Machine Learning (cs.LG) 12 Jun 2026
Zero-source LLM Hallucination Detection with Human-like Criteria Probing

The paper introduces HCPD, a zero-source hallucination detection framework that emulates human-like multi-faceted reasoning to evaluate LLM outputs without internal model access or external references.

#7 Artificial Intelligence (cs.AI)cs.CY 12 Jun 2026
Under What Conditions Can a Machine Become Genuinely Creative?

The paper establishes a formal requirement framework for genuine machine creativity based on Designics, moving beyond output novelty to structural transformation of incomplete situations. It argues that proactive AI ethics and human-AI co-living are intrinsic components of the creative process rather than external constraints.

Agentic AI

#1 Artificial Intelligence (cs.AI)Multiagent Systems (cs.MA) 11 Jun 2026
Deployment-Time Memorization in Foundation-Model Agents

The paper introduces 'deployment-time memorization' as a formal framework to evaluate how agent memory design choices jointly impact personalization utility, extraction risk, and deletion fidelity. It establishes a privacy-utility frontier and introduces the Forgetting Residue Score (FRS) to quantify information persistence after deletion.

#2 Artificial Intelligence (cs.AI)Machine Learning (cs.LG)Software Engineering (cs.SE) 11 Jun 2026
Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents

The paper demonstrates that selective context pruning combined with automated summarization improves the reliability and efficiency of long-horizon tool-using LLM agents in enterprise workflows. It provides a quantitative framework for balancing context window management with task completion accuracy.

#3 Artificial Intelligence (cs.AI) 11 Jun 2026
Regimes: An Auditable, Held-Out-Gated Improvement Loop Demonstrated on LongMemEval with ActiveGraph

The paper introduces Regimes, an auditable, held-out-gated improvement loop built on the ActiveGraph runtime, which treats agent self-improvement as a first-class, event-sourced workflow. It provides a durable substrate for diagnosing failures, proposing repairs, and validating improvements through a structured, auditable pipeline.

#4 Artificial Intelligence (cs.AI)Information Retrieval (cs.IR)Machine Learning (cs.LG) 12 Jun 2026
ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs

The paper introduces ToolSense, an open-source diagnostic framework designed to audit whether LLMs truly understand tool semantics or merely rely on retrieval shortcuts. It provides three new benchmarks (RRB, MCQ, and QA) to identify knowledge-retrieval dissociation in parametric tool retrieval models.

#5 Artificial Intelligence (cs.AI) 12 Jun 2026
Arbor: Tree Search as a Cognition Layer for Autonomous Agents

The paper introduces Arbor, a multi-agent framework that utilizes structured tree search as a cognition layer to solve complex optimization problems in large, stateful action spaces.

#6 Artificial Intelligence (cs.AI) 12 Jun 2026
Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

The paper introduces Evoflux, an inference-time evolutionary search method that significantly improves the execution feasibility of tool workflows for compact language models.

#7 Artificial Intelligence (cs.AI)Machine Learning (cs.LG) 12 Jun 2026
Benchmarking AI Agents for Addressing Scientific Challenges Across Scales

The paper introduces SciAgentArena, a systematic benchmark designed to evaluate the capabilities of AI agents in real-world scientific research scenarios across multiple domains.

#8 Artificial Intelligence (cs.AI) 12 Jun 2026
The Containment Gap: How Deployed Agentic AI Frameworks Fail Public-Facing Safety Requirements

The paper identifies a critical 'containment gap' in popular agentic AI frameworks, demonstrating that they lack native architectural safety guarantees for public-facing deployments. It proposes and validates two lightweight containment mechanisms to mitigate memory-poisoning and policy-bypass vulnerabilities.

#9 Artificial Intelligence (cs.AI) 12 Jun 2026
Teach-and-Repeat: Accurately Extracting Operational Knowledge from Mobile Screen Demonstrations to Empower GUI Agents

The paper introduces Teach VLM and the Teach-and-Repeat paradigm to extract structured operational knowledge from mobile screen demonstrations to guide GUI agents. It also provides a systematic data flywheel for scalable data acquisition and a new Chinese Mobile Screen Teach Benchmark.

#10 Artificial Intelligence (cs.AI) 12 Jun 2026
Fantastic Scientific Agents and How to Build Them: AgentBuild for Rietveld Refinement

The paper introduces AgentBuild, a framework that treats scientific agent construction as a version-controlled workflow where agents are built from a scientist-authored contract rather than manual prompt engineering.

Computer Vision

#1 Artificial Intelligence (cs.AI) 12 Jun 2026
PRISMR: Overcoming Parse Collapse in Multimodal Listwise Ranking via Parameterized Representation Internalization

The paper identifies 'parse collapse' in LMM-based listwise ranking and proposes PRISMR, a framework that replaces in-context list processing with parameterized representation internalization.

#2 Artificial Intelligence (cs.AI)Computer Vision and Pattern Recognition (cs.CV)Machine Learning (cs.LG)Image and Video Processing (eess.IV) 12 Jun 2026
OpenMedQ: Broad Open Pretraining for Medical Vision-Language Models

The authors introduce OpenMedQ, a medical vision-language model pretrained on a massive, fully-open dataset of 3.35M samples across multiple medical modalities. It achieves state-of-the-art performance on PathVQA and superior zero-shot classification results compared to existing large-scale models.

#3 Artificial Intelligence (cs.AI)Computer Vision and Pattern Recognition (cs.CV) 12 Jun 2026
Augmentation techniques for video surveillance in the visible and thermal spectral range

The paper investigates the suitability and robustness of various data augmentation techniques to improve multispectral CNN-based object detection using visible and thermal infrared imagery. It specifically addresses the challenge of training models with limited thermal datasets by leveraging visible spectral data.

#4 Artificial Intelligence (cs.AI) 12 Jun 2026
Hallucination in Medical Imaging AI: A Cross-Modality Analytical Framework for Taxonomy, Detection, and Mitigation under Regulatory Constraints

The paper provides a cross-modality analytical framework for medical imaging AI hallucinations, unifying taxonomies and mapping mitigation strategies to FDA regulatory lifecycle requirements.

#5 Artificial Intelligence (cs.AI) 12 Jun 2026
EPIG: Emotion-Based Prompting for Personalised Image Generation

The paper introduces EPIG, a lightweight, training-free method that enhances the emotional expressiveness of text-to-image diffusion models by enriching prompts with psychologically informed affective attributes.

Computing Systems

#1 Artificial Intelligence (cs.AI)cs.AR 12 Jun 2026
Reducing the Complexity of Deep Learning Models for EEG Analysis on Wearable Devices

The paper investigates the feasibility of deploying deep neural networks for EEG-based seizure detection on resource-constrained wearable devices by analyzing the trade-offs between model complexity and accuracy.

#2 Artificial Intelligence (cs.AI) 12 Jun 2026
Structured Testbench Generation for LLM-Driven HDL Design and Verification-Oriented Data Curation

The paper introduces STG, a Structured Testbench Generation framework that addresses the bottlenecks of stochastic, high-cost, and low-coverage testbench generation in LLM-driven RTL workflows.

#3 Artificial Intelligence (cs.AI) 12 Jun 2026
Otters++: A Time-to-first-spike Based Energy Efficient Optical Spiking Transformer

The paper introduces Otters++, a hardware-aware spiking Transformer that leverages natural signal decay in optoelectronic devices to eliminate the computational overhead of temporal decay terms in TTFS coding.

General

#1 Artificial Intelligence (cs.AI) 11 Jun 2026
Exploratory Responsiveness and Adaptive Rigidity under AI-Assisted Optimization

The paper develops a formal theory of exploratory adaptation to explain how AI-assisted optimization can either cause systemic rigidity or enhance adaptive mobility depending on a system's baseline responsiveness.

#2 Artificial Intelligence (cs.AI) 11 Jun 2026
Predictive Assistance and the Temporal Dynamics of Exploratory Compression

The paper introduces a geometric dynamical framework to model how predictive AI assistance reshapes the geometry of exploratory cognition by stabilizing trajectories before internal search occurs.

#3 Artificial Intelligence (cs.AI)Machine Learning (cs.LG) 11 Jun 2026
Minimalist Genetic Programming

The paper introduces Minimalist Genetic Programming (MGP), a new approach to program induction that replaces evolutionary search with a syntactic derivation process inspired by the Minimalist Program in linguistics.

#4 Artificial Intelligence (cs.AI)cs.CYMachine Learning (cs.LG) 12 Jun 2026
From AGI to ASI

The paper provides a formal framework for characterizing Artificial Superintelligence (ASI) and identifies four distinct pathways for the transition from human-level AGI to ASI. It also highlights the potential for a continuous series of transformative societal changes rather than a single discrete leap.

#5 Artificial Intelligence (cs.AI) 12 Jun 2026
The Theory of Mind Utility: Formal Specification of a Mentalizing Mechanism

The paper introduces Theory of Mind Utility (ToM-U), a formal computational specification of mentalizing that defines what mentalizing computes without committing to specific neural or algorithmic implementations.

#6 Artificial Intelligence (cs.AI) 12 Jun 2026
Topical Phase Transitions in Artificial Intelligence Research: Large-Scale Evidence and an Early-Warning Signature for Emerging Topics

The paper provides a large-scale characterization of AI research evolution as 'topical phase transitions' and introduces an early-warning signature to predict emerging research trends.

#7 Artificial Intelligence (cs.AI) 12 Jun 2026
A Mathematical Forum Platform for Collaborative Problem Solving and Dataset Generation for AI Reasoning

The paper presents a unified mathematical forum platform that integrates an image-to-LaTeX conversion pipeline directly into the posting interface to eliminate friction in sharing mathematical content. It also positions the platform as a source for community-validated datasets to train and benchmark AI reasoning systems.

#8 Artificial Intelligence (cs.AI) 12 Jun 2026
APCyc: Property-Informed Design of Cyclic Peptides via Automated Cyclization

The paper introduces APCyc, a target-aware framework for the de novo design of cyclic peptides that explicitly models cyclization constraints and multi-property optimization.

#9 Artificial Intelligence (cs.AI) 12 Jun 2026
MOSAIC: Modality-Specific Adaptation for Incremental Continual Learning in Parkinson's Disease Gait Assessment

The paper introduces MOSAIC, a compact continual learning framework designed to handle modality-incremental learning for Parkinson's disease gait assessment where new sensors are added over time without access to historical data.

#10 Artificial Intelligence (cs.AI) 12 Jun 2026
A Minimal Model of Bounded Trade-Off Screening in Multi-Attribute Choice

The paper introduces a bounded trade-off reasoning framework that explains why humans reject multi-attribute options with poor performance on critical attributes, moving beyond classical compensatory utility models.

LLM

#1 Artificial Intelligence (cs.AI)Computation and Language (cs.CL) 11 Jun 2026
From Context-Aware to Conflict-Aware: Generalizing Contrastive Decoding for Knowledge Conflict in LLMs

The paper introduces a conflict-aware paradigm for contrastive decoding that dynamically balances parametric priors and external context, addressing the failure of existing methods to handle erroneous context. It also introduces TriState-Bench for multi-state conflict evaluation and the Adaptive Regime Routing (ARR) method.

#2 Artificial Intelligence (cs.AI) 11 Jun 2026
Instruction Finetuning DeepSeek-R1-8B Model Using LoRA and NEFTune

The paper demonstrates that combining LoRA and NEFTune with the DeepSeek-R1-8B model significantly improves performance on financial named-entity recognition (NER) tasks.

#3 Artificial Intelligence (cs.AI) 12 Jun 2026
MARS: Margin-Adversarial Risk-controlled Stopping for Parallel LLM Test-time Scaling

The paper introduces MARS, a margin-adversarial risk-controlled stopping rule that significantly reduces computational overhead in parallel test-time scaling for LLMs without sacrificing accuracy.

#4 Artificial Intelligence (cs.AI) 12 Jun 2026
SciR: A Controllable Benchmark for Scientific Reasoning in LLMs

The paper introduces SciR, the first multi-paradigm scientific reasoning benchmark that allows for independent, parametric control over information extraction difficulty and logical inference complexity.

#5 Artificial Intelligence (cs.AI) 12 Jun 2026
Rethinking RAG in Long Videos: What to Retrieve and How to Use It?

The paper introduces V-RAGBench, a benchmark for decoupled evaluation of retrieval and generation in long videos, and CARVE, a method that dynamically selects the optimal modality and granularity for each retrieved chunk.

#6 Artificial Intelligence (cs.AI)Computation and Language (cs.CL) 11 Jun 2026
RealMath-Eval: Why SOTA Judges Struggle with Real Human Reasoning

The paper introduces RealMath-Eval, a benchmark of 224 real-world high school math responses, and identifies a significant 'Evaluation Gap' where LLM judges perform poorly on human reasoning compared to synthetic text.

#7 Artificial Intelligence (cs.AI) 11 Jun 2026
Self-Distillation Policy Optimization via Visual Feedback: Bridging Code and Visual Artifacts

The paper introduces Visual-SDPO, a self-distillation framework that enables code-generating LLMs to correct visual artifacts (like overlapping elements or clipped text) by leveraging rendered visual feedback as privileged context.

#8 Artificial Intelligence (cs.AI)Computation and Language (cs.CL)cs.CYMachine Learning (cs.LG) 12 Jun 2026
Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior

The paper demonstrates that self-reports in LLMs predict behavior more reliably when using task-specific intention frameworks (Theory of Planned Behavior) rather than broad personality traits (Big 5). It also identifies that coherence between self-reports and behavior is highly dependent on conversational context and prompt priming.

#9 Artificial Intelligence (cs.AI)Machine Learning (cs.LG) 12 Jun 2026
MLUBench: A Benchmark for Lifelong Unlearning Evaluation in MLLMs

The paper introduces MLUBench, a large-scale benchmark for lifelong unlearning in Multimodal Large Language Models (MLLMs), and proposes LUMoE to address cumulative degradation and multimodal alignment issues.

#10 Artificial Intelligence (cs.AI) 12 Jun 2026
The Hidden Power of Scaling Factor in LoRA Optimization

The paper identifies the scaling factor $\alpha$ as the primary driver of LoRA optimization and introduces LoRA-$\alpha$, a framework that restores $\alpha$ to a principled regime to improve performance and simplify hyperparameter tuning.

MLOps

#1 Artificial Intelligence (cs.AI) 12 Jun 2026
Brick: Spatial Capability Routing for the Mixture-of-Models (MoM) Paradigm

The paper introduces Brick, a multimodal router for Mixture-of-Models (MoM) paradigms that optimizes the trade-off between query accuracy and inference cost by accounting for within-domain difficulty variance.

#2 Artificial Intelligence (cs.AI) 12 Jun 2026
Deployment-Centered Evaluation: Predicting Query-Level Rejection Risk in a Clinical LLM System

The paper introduces a deployment-centered evaluation framework that predicts query-level user rejection risk in clinical LLM systems using pre-response context. It demonstrates that incorporating deployment-specific metadata significantly improves the ability to anticipate user dissatisfaction compared to using query content alone.

Multimodal Learning

#1 Artificial Intelligence (cs.AI)Computation and Language (cs.CL)Computer Vision and Pattern Recognition (cs.CV)cs.SD 11 Jun 2026
From Senses to Decisions: The Information Flow of Auditory and Visual Perception in Multimodal LLMs

The study provides the first coherent mapping of internal information flow in Audio-Visual Large Language Models (AVLLMs), revealing how audio and visual signals are routed and integrated to shape final predictions.

NLP

#1 Artificial Intelligence (cs.AI)Computation and Language (cs.CL)Machine Learning (cs.LG) 11 Jun 2026
Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction

The paper demonstrates that supervised fine-tuning (SFT) with synthetic rationale data can significantly degrade performance in clinical disease prediction tasks compared to label-only fine-tuning. It identifies a structural conflict between narrative plausibility and discriminative optimization as the root cause of this degradation.

#2 Artificial Intelligence (cs.AI) 12 Jun 2026
Multi-Field Hybrid Retrieval-Augmented Generation for Maritime Accident Root Cause Analysis

The paper introduces a multi-field hybrid retrieval-augmented generation (RAG) framework specifically designed to automate maritime accident root cause analysis (RCA) using a structured knowledge base of historical tribunal reports.

#3 Artificial Intelligence (cs.AI) 12 Jun 2026
Constructing Evaluation Datasets for Procedural Reasoning: Balancing Naturalness, Grounding, and Multi-Hop Coverage

The paper introduces a grounding validation framework and evaluates three distinct question generation strategies to balance naturalness, grounding, and multi-hop coverage in procedural reasoning datasets.

#4 Artificial Intelligence (cs.AI) 12 Jun 2026
AAbAAC: An Annotated Corpus for Autoimmunity Information Extraction

The authors introduce AAbAAC, a manually annotated corpus of 115 PubMed abstracts specifically designed for information extraction in the domain of autoimmunity.

RL

#1 Artificial Intelligence (cs.AI) 11 Jun 2026
Reasoning or Memorization? Direction-Aware Diversity Exploration in LLM Reinforcement Learning

The paper introduces DiRL, a framework that distinguishes between reasoning-based and memorization-based exploration in LLM reinforcement learning to prevent models from optimizing for shortcuts.

#2 Artificial Intelligence (cs.AI) 12 Jun 2026
From Verdict to Process: Agentic Reinforcement Learning for Multi-Stage Fact Verification

The paper introduces ProFact, an agentic reinforcement learning framework that optimizes the entire multi-stage trajectory of fact verification rather than individual components in isolation.

#3 Artificial Intelligence (cs.AI) 11 Jun 2026
Belief-Space Control for Personalized Cancer Treatment via Active Inference

The paper introduces a belief-space planning framework for personalized cancer treatment that unifies goal-directed therapy with information-gathering actions under measurement constraints.

Robotics

#1 Artificial Intelligence (cs.AI)Computation and Language (cs.CL) 12 Jun 2026
PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation

The paper introduces PersonaDrive, a framework for generating human-style, style-diverse non-ego traffic agents in closed-loop driving simulations using a retrieval-augmented Vision-Language-Action (VLA) model.

#2 Artificial Intelligence (cs.AI) 12 Jun 2026
A Tutorial on World Models and Physical AI

The paper provides a unified framework for world modeling in physical AI, categorizing and distinguishing between explicit and implicit world models. It synthesizes diverse approaches into a coherent structure based on how predictive dynamics are represented and exploited.

#3 Artificial Intelligence (cs.AI) 11 Jun 2026
Mobility Anomaly Generation using LLM-Driven Behavior with Kinematic Constraints

The paper introduces an end-to-end generative framework to synthesize large-scale, annotated human trajectory anomalies, addressing the scarcity of ground-truth data caused by the rarity of real-world anomalous events.

Top News

Hacker News Mon, 08 Ju
DeepSeek V4 Pro beats GPT-5.5 Pro on precision

DeepSeek V4 Pro has demonstrated superior precision compared to GPT-5.5 Pro in benchmark tests, marking a significant advancement in large language model performance. This development highlights ongoing competition in AI research to improve accuracy and efficiency in natural language processing tasks.

Hacker News Mon, 08 Ju
Algorithmic Monocultures in Hiring

The article discusses risks of over-reliance on algorithmic hiring systems, highlighting potential biases and lack of diversity caused by 'monocultures' of AI tools in recruitment. It emphasizes ethical and safety concerns in deploying such systems.

Hacker News Sun, 07 Ju
How's Linear so fast? A technical breakdown

The article provides a technical analysis of why Linear, an AI company, achieves high performance, likely focusing on optimizations in their machine learning infrastructure or model architecture.

Hacker News Sun, 07 Ju
Show HN: Lathe – Use LLMs to learn a new domain, not skip past it

Lathe is a project that leverages large language models (LLMs) to deeply engage with and learn new domains rather than superficially generating content. It emphasizes iterative exploration and understanding of domain-specific knowledge through LLMs.

Hacker News Sun, 07 Ju
Proliferate (YC S25) is hiring to building open source Codex

Proliferate (YC S25) is hiring a founding engineer to build an open-source version of Codex, a code-generation model. The role focuses on advancing open-source AI tools for developers.

Hacker News Sun, 07 Ju
LLMs are eroding my software engineering career and I don't know what to do

A software engineer discusses concerns about how large language models (LLMs) are disrupting traditional software engineering roles, sparking debates about career adaptation and the future of the profession in the AI era.

Reddit r/MachineLearning 2026-06-08
For those using Google Colab, what features did you wish it had? [D]

A UC Berkeley student and researchers seek community input on improving Google Colab, focusing on environment management and kernel persistence issues. The post invites ML professionals and enthusiasts to share desired features and pain points with the platform.

Reddit r/MachineLearning 2026-06-07
Research collection of Arxiv whitepapers [R]

A user compiled 1,700 Arxiv whitepapers post-ChatGPT into 90 categories and created 6,000 'Inquiring Lines' for cross-cutting research analysis, accessible via a curated online vault. The resource includes synthesized notes, wikilinks, and prompts for exploring related research.

Reddit r/ArtificialIntelligence 2026-06-07
Has anyone else noticed this LLM language bias?

A user developed an app called Biblians to explore how LLMs handle religious texts, discovering that models exhibit denominational bias depending on the language used (e.g., Protestant-leaning in English vs. Catholic-leaning in Spanish/French/Portuguese). The post invites testing to identify linguistic biases in AI outputs.

Reddit r/ArtificialIntelligence 2026-06-07
K-pop Fans Are Calling Out Creepy Deepfakes of Idols

K-pop fans are criticizing the creation and spread of unsettling deepfake videos of idols, highlighting concerns about AI-generated content's ethical implications and potential harm. The discussion underscores growing awareness of AI's misuse in media and fan communities.

Trending Repos

Top repositories this week, sorted by stars.

This repository compiles the system prompts and internal tool configurations for popular AI coding assistants and autonomous agents like Devin, Cursor, and Windsurf. It is highly relevant for understanding the engineering behind agentic workflows and prompt engineering for complex software development tasks.

system promptsagentic AILLMprompt engineering
#2 ggml-org/llama.cpp LLM ★ 115.4k

This repository provides C/C++ implementations for efficient large language model inference, enabling deployment of LLMs on resource-constrained systems. It directly addresses core interests in large language models and computing systems, with broad applicability to AI research and practical deployment.

llmc++inferencetransformer
#3 msitarzewski/agency-agents Agentic AI ★ 111.9k

This repository provides a framework for deploying specialized AI agents with distinct personalities and workflows to perform complex tasks. It is highly relevant as it directly implements multi-agent systems and agentic workflows, which are core interests of the user.

multi-agent systemsagentic AILLMautomation
#4 opencv/opencv Computer Vision ★ 88.1k

The OpenCV library provides essential tools for computer vision tasks such as image processing, object detection, and machine learning. It is foundational for research and applications in computer vision, robotics, and AI systems requiring visual perception.

computer visionimage processingmachine learningrobotics
#5 karpathy/autoresearch Agentic AI ★ 86.3k

This repository features AI agents capable of autonomously conducting research and managing nanochat training on single-GPU systems. It is highly relevant as it demonstrates practical Agentic AI workflows and automated MLOps for large language model development.

Agentic AILLMMLOpsautonomous agents
#6 MemPalace/mempalace Computing Systems ★ 54.6k

MemPalace is an open-source AI memory system designed for efficient data handling and retrieval, which is critical for advancing AI applications requiring robust memory management. Its focus on benchmarking makes it relevant to computing systems research and practical implementation in AI/ML workflows.

memory-systemaicomputing-systemsopen-source
#7 addyosmani/agent-skills Agentic AI ★ 52.2k

This repository provides production-grade engineering skills and tools designed to empower AI coding agents. It is highly relevant for research into Agentic AI as it addresses the practical infrastructure needed for autonomous agents to interact with and manipulate software environments.

Agentic AILLMsCoding AgentsSoftware Engineering
#8 BerriAI/litellm MLOps ★ 50.0k

LiteLLM provides a unified interface and proxy server to call over 100 different LLM APIs using the OpenAI format. It is highly relevant for building Agentic AI and Multi-Agent Systems as it simplifies model switching, cost tracking, and load balancing across various providers.

LLMMLOpsAI GatewayMulti-Agent Systems
#9 microsoft/VibeVoice Speech ★ 48.7k

This repository focuses on open-source voice AI technologies, directly aligning with the user's interest in speech. It likely contributes to speech synthesis, recognition, or related domains, offering tools for researchers in voice-driven AI applications.

speechaiopen-sourcevoice
#10 aaif-goose/goose Agentic AI ★ 47.6k

This repository provides an extensible AI agent framework that interacts with any LLM to perform tasks beyond code suggestions, such as installation, execution, and testing. It directly advances research in agentic systems by enabling LLMs to perform complex, interactive operations.

Agentic AILLMComputing Systems