Tuesday, June 23, 2026

Week 23 — 2026-06-01 – 2026-06-07

Sort:
40 papers · 88 news · 14 repos · Week 23

This week in AI and tech underscored a growing emphasis on structured reasoning, ethical transparency, and agentic systems, with key papers advancing uncertainty-aware reinforcement learning for autonomous driving, privacy-preserving LLM evaluation frameworks, and memory-centric scientific research agents. NVIDIA’s innovations in agentic AI infrastructure, including the Vera CPU and DSX OS, highlighted the push toward scalable, secure AI factories, while security concerns emerged around ChatGPT’s integration with Google Sheets, exposing data exfiltration risks. Community discussions centered on disentangling evolution capabilities in self-evolving models, the challenges of physically grounded world models, and debates over the practicality of synthetic deception in LLMs, reflecting tensions between technical progress and ethical accountability. Meanwhile, open-source projects like AutoSci and PhyDrawGen signaled a shift toward tools that bridge theoretical advancements with real-world applications, even as low-star repositories hinted at fragmented community engagement in niche areas like agentic RAG and harness engineering.


Research Highlights

By Research Area

Top papers per ArXiv subject category, ranked by relevance.

AI Safety

#1 Machine Learning (cs.LG)Artificial Intelligence (cs.AI) 1 Jun 2026
When LLMs Learn to Be Consistently Wrong: A Multi-Model Study of Linear Representations of Synthetic Deception

This study demonstrates that synthetic deception in LLMs can be systematically analyzed through linear representations, revealing domain-invariant dishonesty patterns achievable via minimal fine-tuning.

#2 Artificial Intelligence (cs.AI) 1 Jun 2026
PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges

PReMISE introduces a framework for discovering policy-level rubrics and auditing their reliability, preference fit, and robustness under LLM judges, with repair operations that improve judge accuracy and reduce exploitability.

#3 Artificial Intelligence (cs.AI) 1 Jun 2026
COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

COMPASS introduces a novel framework for aligning search agents with safety constraints by addressing retrieval-induced safety degradation through cognitive tree exploration and introspective alignment.

#4 Artificial Intelligence (cs.AI)Multiagent Systems (cs.MA) 1 Jun 2026
Healthcare Mechanisms from Policy-as-Code Search under Strategic Provider Response

This paper introduces a novel approach to healthcare mechanism design by integrating program synthesis with strategic provider behavior modeling, enabling evaluation of mechanisms through equilibrium outcomes rather than fixed responses.

#5 Artificial Intelligence (cs.AI)Computation and Language (cs.CL)Machine Learning (cs.LG) 1 Jun 2026
A Persona-Based Evaluation Framework for Pluralistic Alignment in Generative AI

Introduces a persona-based evaluation framework for generative AI that captures cultural, demographic, and contextual variability through synthetic cognitive profiles, replacing monolithic benchmarks with pluralistic, perspective-dependent assessment.

#6 Artificial Intelligence (cs.AI) 1 Jun 2026
Formalizing and falsifying causal pathways of rare events

This paper formally defines causal pathways for rare events and establishes testable implications that depend on causal abstractions rather than full system models.

#7 Artificial Intelligence (cs.AI) 1 Jun 2026
Choosing the Lens: Strategic Perspective Activation in Context-Dependent Argumentation

Introduces context-dependent argumentation frameworks (CDAFs) and the ACTIVATION-MANIPULATION decision problem, extending Dung's theory to model strategic control over argument evaluation through context-specific defeat functions.

Agentic AI

#1 Artificial Intelligence (cs.AI) 1 Jun 2026
MAVEN: Improving Generalization in Agentic Tool Calling

MAVEN introduces a lightweight symbolic reasoning framework to enhance generalization in agentic tool calling, demonstrating significant accuracy improvements on multi-step reasoning benchmarks without additional training.

#2 Artificial Intelligence (cs.AI) 1 Jun 2026
AutoSci: A Memory-Centric Agentic System for the Full Scientific Research Lifecycle

AutoSci introduces a unified, memory-centric agentic system that automates the entire scientific research lifecycle, addressing gaps in existing systems by integrating structured memory, dynamic workflow execution, skill augmentation, and iterative evolution.

#3 Artificial Intelligence (cs.AI) 1 Jun 2026
Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

This paper disentangles two distinct capabilities in self-evolving LLM agents—harness-updating and harness-benefit—revealing that base model capability does not predict effectiveness in harness self-evolution.

#4 Artificial Intelligence (cs.AI) 1 Jun 2026
Learning Agent-Compatible Context Management for Long-Horizon Tasks

Introduces AdaCoM, an external context management system that improves long-horizon task performance for frozen agents without retraining, revealing a fidelity-reliability trade-off in context management strategies.

#5 Artificial Intelligence (cs.AI)Computation and Language (cs.CL)Machine Learning (cs.LG) 1 Jun 2026
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

COLLEAGUE.SKILL introduces an end-to-end system for distilling heterogeneous expert traces into inspectable, correctable AI skill packages grounded in human expertise and behavior.

#6 Artificial Intelligence (cs.AI)Machine Learning (cs.LG)stat.ME 1 Jun 2026
Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation

GLIDE introduces an open-source Python library that unifies state-of-the-art prediction-powered inference (PPI) estimators and samplers for reliable evaluation of agentic systems, reducing reliance on costly human annotations or biased LLM judgments.

#7 Artificial Intelligence (cs.AI) 1 Jun 2026
Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration

SCALE introduces a self-improving web agent framework that autonomously expands cognitive boundaries through adversarial roles and graph-based exploration, reducing reliance on handcrafted pipelines and expert trajectories.

#8 Artificial Intelligence (cs.AI) 1 Jun 2026
HypoAgent: An Agentic Framework for Interactive Abductive Hypothesis Generation over Knowledge Graphs

HypoAgent introduces an agentic framework for interactive abductive hypothesis generation over knowledge graphs, addressing limitations in handling evolving dialogues and providing fine-grained diagnosis.

Computer Vision

#1 Artificial Intelligence (cs.AI)physics.app-ph 1 Jun 2026
BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

BilliardPhys-Bench introduces a synthetic benchmark to evaluate physical reasoning and visual dynamics prediction in multimodal large language models (MLLMs) through billiards scenarios.

#2 Artificial Intelligence (cs.AI)Machine Learning (cs.LG) 1 Jun 2026
Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents

This work identifies critical failure modes in shared-state collaboration for resource-constrained visual agents and introduces CoSee, a framework for diagnosing noise accumulation in multi-step reasoning.

Computing Systems

#1 Artificial Intelligence (cs.AI) 1 Jun 2026
Transforming and Encoding FTS for SAT Solving: What Helps, What Hurts (Extended Version)

This work introduces multiple SAT encodings for factored tasks and systematically evaluates their effectiveness, along with the impact of task transformations and parallelism on SAT-based planning performance.

#2 Machine Learning (cs.LG) 1 Jun 2026
QASM-Eval: A Dataset to Train and Evaluate LLMs on OpenQASM-3 Beyond Quantum Circuits

QASM-Eval is the first comprehensive dataset designed to train and evaluate LLMs on OpenQASM-3 programs, focusing on hardware-facing features beyond quantum circuits.

General

#1 Artificial Intelligence (cs.AI) 1 Jun 2026
FAM-Bench: A Multimodal Benchmark for Condition-Aware Food-as-Medicine Reasoning

FAM-Bench introduces the first multi-modal benchmark for evaluating condition-aware Food-as-Medicine reasoning, addressing the gap in health-aware decision-making in existing food AI benchmarks.

#2 Artificial Intelligence (cs.AI) 1 Jun 2026
GraphARC: A Comprehensive Benchmark for Graph-Based Abstract Reasoning

GraphARC introduces a scalable benchmark for abstract reasoning on graph-structured data, extending the ARC paradigm to evaluate generalization in graph transformations.

#3 Machine Learning (cs.LG)Artificial Intelligence (cs.AI) 1 Jun 2026
Unicorn: Scaling High-Dimensional Time Series Forecasting via Universal Correlation Modeling

Unicorn introduces a scalable framework for high-dimensional time series forecasting by decoupling correlation modeling from channel identities, enabling cross-domain generalization and few-shot transfer.

#4 Artificial Intelligence (cs.AI) 1 Jun 2026
Procedural Generation of First Person Shooter Maps using Map-Elites

This paper introduces novel map representations (Point-Line and Spatial-Layout) and demonstrates their superiority in generating diverse, high-quality FPS maps using MAP-Elites compared to existing representations.

#5 Machine Learning (cs.LG) 1 Jun 2026
Gait2Hip-60: A Unified Deep Learning Benchmark for Predicting Hip Muscle Forces and Joint Moments from Multi-Cadence Gait Kinematics

Developed a deep learning benchmark (Gait2Hip-60) for predicting hip muscle forces and joint moments directly from gait kinematics, enabling faster and more clinically applicable alternatives to traditional musculoskeletal simulations.

LLM

#1 Artificial Intelligence (cs.AI) 1 Jun 2026
LLM-FACETS: A Privacy-Preserving Framework for Evaluating LLM Transparency and Accountability

LLM-FACETS introduces an open-source, privacy-preserving framework for evaluating LLM transparency and accountability, enabling non-technical practitioners to audit models without data transmission or programming expertise.

#2 Artificial Intelligence (cs.AI) 1 Jun 2026
LinTree: Improving LLM Reasoning with Explicitly Structured Search Histories

LinTree demonstrates that explicitly structuring search histories with parent pointers improves LLM reasoning performance and efficiency compared to implicit trace representations and heuristic-based search.

#3 Artificial Intelligence (cs.AI) 1 Jun 2026
SLAT: Segment-Level Adaptive Trimming for Efficient CoT Reasoning

SLAT introduces a segment-level adaptive trimming framework to enhance the efficiency of chain-of-thought (CoT) reasoning in large language models by addressing structural redundancy without compromising accuracy.

MLOps

#1 Artificial Intelligence (cs.AI)Computation and Language (cs.CL) 1 Jun 2026
UniScale: Adaptive Unified Inference Scaling via Online Joint Optimization of Model Routing and Test-Time Scaling

UniScale unifies model routing and test-time scaling into a single optimization framework, overcoming limitations of decoupled approaches and achieving superior quality-cost trade-offs in dynamic inference scenarios.

NLP

#1 Artificial Intelligence (cs.AI)Computer Vision and Pattern Recognition (cs.CV) 1 Jun 2026
PhyDrawGen: Physically Grounded Diagram Generation from Natural Language

PhyDrawGen addresses systematic errors in physics diagram generation by integrating symbolic physics constraints with visual generation, achieving robust physical accuracy across diverse domains.

#2 Artificial Intelligence (cs.AI)Databases (cs.DB)Information Retrieval (cs.IR) 1 Jun 2026
Vector Linking via Cross-Model Local Isometric Consistency

This paper introduces a method for cross-model vector linking using local geometric consistency, enabling accurate correspondences between embedding clouds from different encoders without requiring model access or labels.

#3 Artificial Intelligence (cs.AI) 1 Jun 2026
EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs

EHRBench introduces an automated, EHR-grounded benchmark for evaluating LLMs in clinical decision-making, addressing scalability and reliability gaps in real-world CDM tasks.

#4 Artificial Intelligence (cs.AI) 1 Jun 2026
Generating Graph-like Rules for Knowledge Graph Reasoning via Diffusion Models

GRiD introduces a novel framework for discovering graph-like rules in knowledge graphs using diffusion models, addressing limitations of existing methods that overlook complex relational structures and computational challenges.

#5 Artificial Intelligence (cs.AI) 1 Jun 2026
Distilling LLM Feedback for Lean Theorem Proving

Introduces Feedback Distillation, a novel training method for reasoning models that improves post-training performance in complex tasks like Lean4 theorem proving by leveraging self-distillation with privileged feedback.

RL

#1 Artificial Intelligence (cs.AI) 1 Jun 2026
Uncertainty-Aware and Temporally Regulated Expert Advice in Reinforcement Learning for Autonomous Driving

Proposes an uncertainty-aware framework that integrates expert advice with temporally regulated guidance to enhance safe and efficient exploration in reinforcement learning for autonomous driving.

#2 Artificial Intelligence (cs.AI) 1 Jun 2026
Planner-Centric Reinforcement Learning for Deep Research with Structure-Aware Reward

DecomposeR introduces a planner-centric framework for deep research tasks using structured reward mechanisms, improving planning and answering capabilities in LLMs.

#3 Artificial Intelligence (cs.AI) 1 Jun 2026
TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories

TraceGraph introduces a graph-based framework to analyze agent trajectories, revealing hidden navigation differences and motivating trap-aware recovery pipelines for improving model performance.

#4 Artificial Intelligence (cs.AI)Logic in Computer Science (cs.LO) 1 Jun 2026
Answer-Set-Programming-based Abstractions for Reinforcement Learning

This paper introduces an Answer-Set Programming (ASP)-based implementation of the CARCASS framework for relational reinforcement learning, enabling efficient abstractions in complex domains using domain knowledge.

#5 Artificial Intelligence (cs.AI) 1 Jun 2026
Structure-Induced Information for Rerooting Levin Tree Search

This paper introduces three rerooter designs for scalable subgoal-free policy tree search, enabling efficient search in complex environments without explicit subgoal generation.

Robotics

#1 Artificial Intelligence (cs.AI) 1 Jun 2026
Physically Viable World Models: A Case for Query-Conditioned Embodied AI

This paper introduces a framework for physically viable world models in embodied AI that prioritize answering intervention queries through structural physical abstractions, rather than mere observation prediction.

#2 Artificial Intelligence (cs.AI)Machine Learning (cs.LG)Multiagent Systems (cs.MA) 1 Jun 2026
HADT: A Heterogeneous Multi-Agent Differential Transformer for Autonomous Earth Observation Satellite Cluster

This work introduces HADT, a novel transformer-based architecture for autonomous resource management in heterogeneous satellite clusters conducting Earth Observation missions, demonstrating significant performance improvements over existing baselines.

Top News

Hacker News Sun, 31 Ma
ChatGPT for Google Sheets exfiltrates workbooks

The article highlights a security vulnerability where integrating ChatGPT with Google Sheets can lead to unauthorized data exfiltration, raising concerns about AI-driven data leakage risks.

Hacker News Sun, 31 Ma
The Speed of Prototyping in the Age of AI

The article discusses how advancements in AI have significantly accelerated the prototyping process, enabling faster development and iteration of AI models. It explores tools and methodologies that reduce the time required to move from concept to implementation in AI projects.

NVIDIA Technical Blog 2026-06-01
Advancing AI Infrastructure for Agentic AI with NVIDIA DOCA In-Silicon Security

NVIDIA discusses advancements in AI infrastructure to support agentic AI systems, emphasizing secure, high-performance computing frameworks enabled by DOCA In-Silicon Security. The focus is on building 'AI factories' that empower autonomous agents with unprecedented capabilities.

NVIDIA Technical Blog 2026-06-01
NVIDIA Vera CPU Sets a New Standard for Agentic Workloads in AI Factories

NVIDIA introduces the Vera CPU, designed to optimize agentic workloads in AI factories by addressing scaling challenges through advanced computing architecture. The development aligns with evolving AI scaling laws, emphasizing efficiency for complex tasks like autonomous systems and large-scale AI operations.

NVIDIA Technical Blog 2026-05-29
DynoSim: Simulating the Pareto Frontier

NVIDIA introduces DynoSim, a tool for optimizing large language model (LLM) deployments by simulating trade-offs in system configurations like tensor-parallel shapes and worker splits, enabling efficient tuning of complex deployment stacks.

NVIDIA Technical Blog 2026-06-01
NVIDIA DSX OS Delivers Open, Modular Software for Operating AI Factories at Scale

NVIDIA introduces DSX OS, an open and modular software platform designed to scale AI factories that generate intelligence through token-based workflows, addressing growing demands for AI infrastructure.

Reddit r/MachineLearning 2026-05-31
[D] Monthly Who's Hiring and Who wants to be Hired?

A monthly Reddit thread for Machine Learning professionals to post job openings or seek employment, using standardized templates for location, salary, work arrangement, and role descriptions. The community emphasizes experience-level alignment.

Reddit r/MachineLearning 2026-06-01
What’s the actual focus in World Models right now? [R]

The post questions the shift from self-supervised learning methods like Barlow Twins and DINO to scaled-up video generation in industry, while seeking to understand the current academic research focus on World Models.

Reddit r/MachineLearning 2026-05-31
Arabic ASR model struggling to converge during training [D]

A user is struggling to train a dialectal Arabic ASR model using SpeechBrain's LibriSpeech recipe, facing plateauing CTC and KL divergence losses despite various hyperparameter adjustments. The model fails to converge, resulting in near-100% validation WER, with the dataset being weakly labeled and non-public.

Reddit r/MachineLearning 2026-05-30
Why do the output layer weights become word vectors in Word2Vec? [D]

A user seeks an intuitive and mathematical explanation for why the output layer weights in Word2Vec models encode semantic word representations, questioning why these parameters capture meaningful linguistic features rather than just serving predictive roles.

Trending Repos

Top repositories this week, sorted by stars.

This repository provides a comprehensive guide to training large language models from scratch, covering data preparation to text generation. It is highly relevant as it addresses core LLM development techniques essential for research in agentic AI, foundation models, and MLOps.

llmtransformerdeep learningfine-tuning

This repository is a course focused on building production-grade agentic systems using Retrieval-Augmented Generation (RAG), directly addressing the user's interests in Agentic AI, RAG, and LLM applications. It likely provides practical implementations of autonomous agents leveraging retrieval and generation techniques for real-world tasks.

agentic-airagllm
#3 anthropics/claude-code Agentic AI ★ 0

Claude Code is an agentic coding tool that integrates natural language processing with codebase interaction, enabling tasks like code explanation, git workflow management, and task automation. It directly advances research in Agentic AI by demonstrating practical applications of autonomous agents in software development workflows.

Agentic AILLMNLPagents

This repository provides a hands-on implementation of Retrieval-Augmented Generation (RAG) systems, a critical technique for enhancing LLMs with external knowledge. It directly addresses the user's interest in 'rag' and 'llm', offering foundational insights into combining retrieval and generation for improved AI performance.

llmragretrievalvector
#5 OpenBMB/VoxCPM Speech ★ 0

VoxCPM2 is a tokenizer-free TTS model enabling multilingual speech generation, voice cloning, and creative audio design. It advances speech synthesis and generative models, aligning with interests in speech technology and AI-driven audio creation.

speechgenerative modelsTTSmultilingual
#6 revfactory/harness Agentic AI ★ 0

This repository focuses on designing domain-specific agent teams and generating specialized agent skills, directly aligning with agentic AI research. Its emphasis on structuring collaborative agents is relevant to multi-agent systems and embodied AI applications.

agentic-aimulti-agent-systemsmlopsllm

This repository leverages AI large language models to generate high-definition short videos with one click. It is highly relevant to LLMs and generative models, demonstrating practical applications of AI in video creation, which aligns with interests in multimodal learning and generative AI.

LLMgenerative modelscomputer vision
#8 nicobailon/pi-subagents Agentic AI ★ 0

This repository extends the Pi framework to enable asynchronous subagent delegation with features like truncation, artifacts, and session sharing. It directly addresses challenges in agentic AI by enabling coordinated task execution across subagents, which is critical for complex multi-agent systems and embodied AI applications.

agentic-aimulti-agent-systemsasync-delegationsession-sharing
#9 Comfy-Org/ComfyUI Computer Vision ★ 0

ComfyUI is a modular GUI and backend for diffusion models, enabling advanced image generation and manipulation. It is highly relevant to computer vision research, particularly in generative models and multimodal applications.

diffusioncomputer visiongenerative modelsGUI

This repository curates tools and patterns for AI agent harness engineering, focusing on orchestration, observability, and permissions—critical for building and managing multi-agent systems. It directly supports research and development in agentic AI by addressing infrastructure and operational challenges.

agentic-aimlopsai-agentsobservability