HCDE Capstone · Winter 2026

Retrieval Breakdown Under Vague Recall Across Fragmented Tools

Research Report & Spring Project Proposal

Asim Sayed

Executive Summary

We conducted semi-structured contextual inquiries with six graduate researchers (physics, computer science, engineering, design, machine learning), a competitive analysis of 28 tools across 9 categories, and a secondary research synthesis to investigate retrieval breakdown during active knowledge work. All six participants reported the same failure pattern: vague recall of relevant material triggers tool-hopping across an average of 15 tools, which disrupts cognitive flow and leads to idea abandonment (4/6 participants) or redundant recreation of existing work (3/6).

Two findings proved most consequential:

  1. Synthesis work happens in transient artifacts — slides, scratch documents, meeting notes — that are disconnected from source materials, breaking provenance (F5).
  2. The primary cost of retrieval is the mode switch from synthesis to search, not the search itself; the context switch degrades relational thinking, and that degradation persists after the search concludes (F6; Tankelevitch et al., 2025).

The competitive analysis of 28 tools across 9 categories confirmed that every existing tool requires this departure. Tools are either query-dependent (requiring precise keyword recall) or structure-dependent (requiring maintained organization). None keep project context available within the user's active synthesis surface.

Contributions

(1) Identification of the retrieval-synthesis mode switch as the primary mechanism of flow disruption, distinct from search quality; (2) evidence that the mode switch is a structural property of the current tool landscape; (3) a design direction — workspace consolidation with persistent project context — with three design principles and a measurable outcome for Spring Quarter prototyping.

Introduction

About the Researcher

My work sits at the intersection of AI, accessibility, and product design. With this project I am interested in studying how systems hold context, how structure shapes the dialogue between human intention and machine intelligence, and where that structure fails.

Project Overview

Knowledge workers routinely manage 10–20 tools across their workflow — reference managers, cloud drives, note apps, slide decks, coding environments, messaging platforms. These tools serve distinct, legitimate purposes, and the fragmentation itself is not pathological. But when a user mid-task vaguely remembers a relevant source and must leave their current context to locate it across this fragmented landscape, the resulting retrieval breakdown is costly: flow disruption, idea abandonment, and redundant recreation of existing work.

Existing tools address retrieval through either precise search (which requires exact keyword recall) or stable organization (which requires consistent maintenance). Neither accommodates the vague, associative nature of human memory (Belkin, 1980; Mandler, 1980). We therefore investigated what happens when retrieval breaks down during active synthesis work — what triggers it, what makes it costly, and what the structure of the disruption reveals about possible interventions.

The key findings are that the mode switch from synthesis to search — not search quality — is the primary mechanism of disruption, and that this mode switch is a structural property of the current tool landscape.

Definition of Terms

Retrieval Breakdown
The failure to locate a vaguely remembered item across fragmented tools, resulting in abandoned search or redundant recreation.
Mode Switch
The cognitive shift from synthesis to search — the primary cost of retrieval disruption, not the search itself.
Vague Recall
Partial memory of content without the precise terms or location needed to retrieve it.
Recognition-Based Retrieval
Locating information by seeing and identifying it (visual/relational cues) rather than formulating a keyword query.
Source Traceability
A preserved link from a synthesized idea back to the raw material it came from.
Knowledge Work
Cognitively demanding, multi-source work — writing, analysis, design, coding — where working memory is fully occupied.
Tool Fragmentation
The distribution of a user's materials across many unconnected platforms, making retrieval a compound task of remembering both what and where.

Background & Problem Space

Target Audience

Knowledge workers who consume and produce high volumes of information across multiple platforms: they synthesize across sources, manage concurrent projects, use 10+ tools daily, and context-switch between consumption, analysis, and creation.

Theoretical Context

Three established frameworks informed our research design and analysis:

Anomalous State of Knowledge (ASK)

Belkin, 1980

Information needs arise from recognized gaps in understanding that the user cannot precisely articulate. Systems requiring precise keyword queries will systematically fail users with vague recall.

Recognition vs. Recall

Mandler, 1980

Recognition — identifying a previously encountered item — is substantially easier than recall — generating the item without cues. Systems presenting visual/relational cues should outperform those requiring precise search terms.

Cognitive Load & Context Switching

Tankelevitch et al., 2025

When retrieval interrupts in-flow work, users manage dual-task demand: holding primary task state while executing search. The cost is in the context switch, not the search duration.

Problem Framing

The most acute breakdown in personal knowledge management occurs not during capture or organization but during retrieval under vague recall — specifically when that retrieval interrupts active synthesis work. Users recall ideas vaguely, retrieval demands precision across fragmented tools, and after two to three failed attempts they abandon the effort.

Problem Statement

The mode switch from synthesis to search during active project work, triggered by vague recall across fragmented tools, disrupts cognitive flow and leads to derailment, abandonment, or redundant work.

Research Questions

RQ1

What breaks when users attempt retrieval during active knowledge work?

RQ2

How do current tools handle this, and where do they fall short?

Research Methods

Semi-structured Interviews

6 participants · 30–60 min each

Semi-structured contextual inquiries with six graduate researchers across physics, computer science, engineering, design, and machine learning. Each session combined open-ended questions about retrieval experiences with live workflow walkthroughs to surface both reported and observed breakdown patterns.

Interviews generated the primary evidence base. Themes from participant accounts (e.g., mode-switch cost, organizational decay) directed what to look for in the competitive analysis and which constructs to ground in secondary research.

Complete

Workflow Walkthrough

Contextual inquiry · embedded in interviews

During each interview, participants shared their screens and walked through real projects, showing how they organize files, search for past work, and handle retrieval mid-task. This revealed tool-hopping patterns and organizational structures that self-report alone would miss.

Walkthroughs validated interview claims with observable behavior. The tool inventory table and fragmentation data (avg. 15 tools per participant) came directly from these sessions and informed the scope of the competitive analysis.

Complete

Competitive Analysis

28 tools across 9 categories

Systematic analysis of 28 tools spanning PKM, AI project containers, coding/workflow tools, visual canvas tools, research/discovery, meeting capture, agent memory frameworks, personal AI assistants, and AI-native computer use agents. Each tool was evaluated against retrieval-relevant capabilities identified from interview findings.

Interview findings defined the evaluation criteria: does this tool reduce the mode switch? Support vague recall? Maintain project context? The analysis confirmed that no existing tool structurally addresses the synthesis-retrieval separation participants described.

Complete

Secondary Research

Literature synthesis

Synthesis of academic literature and industry reports across information retrieval, agentic AI, memory systems, and cognitive load theory. Sources included recent work on hybrid search, agentic RAG, the Zettelkasten memory model, and the cognitive costs of context switching in knowledge work.

Secondary research grounded interview findings in established theory (Belkin's ASK model, Mandler's recognition-recall distinction, Tankelevitch et al. on cognitive load) and situated the competitive landscape within broader technical trends. Evidence mapping triangulated all three methods to validate each design requirement.

Complete

Participants

IDLevelFieldPrimary FocusTools
P1PhD, 1st YearPhysicsElectronic transport measurements13
P2MS, 2nd YearHCDECapstone research, design artifacts14
P3MS, 2nd YearComputer ScienceLiterature reviews, coding projects16
P4PhD, 4th YearCivil EngineeringGeospatial data, statistical analysis18
P5PhD, 4th YearML / GeoscienceML detection models, satellite data15
P6PhD, 4th YearCivil EngineeringGeospatial data, statistical analysis22

Recruited across career stages (1st–4th year) and disciplines for variance in workflow maturity. All managed high-volume, multi-format inputs across 3+ platforms daily.

Tool Inventory Across Participants

The following maps tool categories observed during workflow walkthroughs, illustrating the fragmentation that makes retrieval a compound memory task (remembering both what and where).

CategoryP1P2P3P4P5P6
Note-takingNotion, hand notesiPad Notes, Sticky Notes, NotepadNotionNotion
File storageGoogle DriveGoogle DriveGoogle Drive (×3)OneDrive, DriveDrive, localOneDrive, Drive, local
WritingOverleaf, DocsGoogle DocsDocs, WordWordOverleafWord, Overleaf
ReferencesZoteroBookmarksBookmarksScholarScholarZotero, Scholar
Data / analysisJupyterExcel, SheetsJupyter, VS Code, QGISJupyter, PyTorchJupyter, VS Code, QGIS
PresentationSlidesFigma/FigJamPowerPointSlidesPowerPoint, Slides
AI toolsChatGPT, Gemini, ClaudeChatGPTChatGPT

Every participant spans 4–7 tool categories. No two share the same configuration — fragmentation is both universal and idiosyncratic.

Limitations

Sample homogeneity. All UW graduate students, predominantly STEM. Spring evaluation should test generalizability.

Single-coder analysis. One researcher conducted, transcribed, and analyzed all interviews; triangulated across participants, competitive analysis, and secondary research.

Transcription. AI-assisted via Granola; some quotes may be minor paraphrases.

Research Findings

Seven findings emerged, grouped by role in the argument.

#FindingPrevalenceRole
F5Synthesis happens in transient artifacts5/6Core
F6Search mode disrupts cognitive flow6/6Core
F1Retrieval breakdown follows consistent pattern6/6Mechanism
F4Current tools create false dichotomies6/6Mechanism
F2Organizational intent decays over time3/6Supporting
F3Knowledge work is inherently multi-tool6/6Supporting
F7Reading-writing gap compounds retrieval2/6Supporting

F5 and F6 form the core pair: synthesis and retrieval are interleaved activities forced into separate modes by current tools. F1 and F4 describe the mechanism by which this separation causes breakdown. F2, F3, and F7 provide structural context.

Retrieval breakdown pattern diagram
The retrieval breakdown sequence: vague recall triggers tool-hopping, failed searches, and abandonment.

The Core Pair

F5

Synthesis happens in transient artifacts

The most consequential thinking occurs not in knowledge management tools but in Google Slides, meeting notes, scratch documents, and draft paragraphs. These transient artifacts become the working source of truth — but are disconnected from the source materials they draw on.

"I wouldn't say PowerPoint is the source of truth — the code is — but PowerPoint filters the most important figures/plots for people who don't want technical details."

— P4
F6

Search mode disrupts cognitive flow

When users switch from synthesis to search, the context switch disrupts the mental state required for deep work. The cost is not time spent searching — it is the loss of the cognitive frame.

"My focus deviates from the deeper thinking needed to synthesize material and shifts to getting new information from search. Then I have to track what I left off on."

— P3

"Maybe two screens. Right now I switch tabs and lose context. Gemini is a good example — it opens a pop-up so I can search without fully leaving my current task."

— P2

The Mechanism

F1

Retrieval breakdown follows a consistent pattern

All six participants reported the same sequence: vague recall → tool-hopping → flow disruption → abandonment or redundant work, multiple times per work session.

"I remember reading something last week... I know it's somewhere... after a few tries I just gave up and moved on."

— P1
F4

Current tools create false dichotomies

Tools force a choice between precise search (requiring exact keywords) and stable organization (requiring consistent maintenance). Neither accommodates vague, associative recall.

"It requires you to remember precisely the keywords that are relevant and specific ones. Like, if you remember something too broad, then it's going to bring up 30 things."

— P1

Supporting Context

F2

Organizational intent decays over time

Tags drift, files accumulate in miscellaneous folders, users forget their own organizational logic. Decay is structural, not a discipline failure.

"What starts out working — as you gain more and more information — I usually don't adapt my system appropriately."

— P1
F3

Knowledge work is inherently multi-tool

Every participant used 10+ tools (average ~16), reflecting genuine workflow heterogeneity. Retrieval becomes a compound memory task: remembering both what and where.

"The cost is time. I have to remember where I wrote things. The other week I used Microsoft Notepad and completely lost that note sheet."

— P3
F7

The reading-writing gap

For researchers, 4–12 month gaps between literature review and writing compound retrieval difficulty and create attribution problems.

"That gap can be 4–5 months, 10 months, even a year. The larger the gap, the harder it is to remember."

— P4

Boundary Cases

Capture failures (P5): Primary frustration was failing to document intermediate process details — the information was never recorded, making retrieval impossible.

Infrastructure (P3): Fragmented storage across multiple Google accounts — an organizational problem retrieval alone cannot solve.

Adoption resistance (P3, P5): Multiple participants resisted new tools even when current systems were failing; P2 abandoned both Obsidian and Walling.

Competitive Analysis

We analyzed 28 tools across the design space, spanning 9 categories from personal knowledge management to AI-native computer use agents.

Structural Finding

Key Finding

All tools analyzed are either structure-dependent (user must remember where things are filed) or query-dependent (user must formulate a precise search). No tool reduces the mode switch between synthesis and retrieval.

By Category

PKM Tools with AI

Recall AI, Mem.ai, Notion AI, Obsidian

Reduce organizational burden but operate globally, not within project boundaries. Structure still decays.

AI Project Containers

NotebookLM, Claude Projects, ChatGPT Projects

Offer project-scoped context but operate reactively — users must stop, formulate a query, and interpret results. Mode switch remains.

AI Coding & Workflow

Claude Code, Cursor, Windsurf, GitHub Copilot

Demonstrate persistent project context in coding workflows. Context engineering patterns here inform Palace's approach, but these tools are domain-specific.

Visual Canvas Tools

Heptabase, Scrintal, Kosmik

Support recognition through spatial organization, validating recognition-over-recall, but require full workflow migration.

AI Research & Discovery

Perplexity AI, Elicit

Powerful for finding new information but do not connect results to the user's existing project context or synthesis work.

Meeting & Capture

Granola AI, Notion AI Meetings, Otter AI

Capture transient artifacts (meeting notes, discussions) but create yet another silo disconnected from project sources.

Agent Memory Frameworks

Letta, Mem0, Zep/Graphiti, LangMem

Provide technical infrastructure for persistent memory in AI systems — potential building blocks for Palace's context layer.

Personal AI Assistants

Limitless, Supermemory, MyMind

Capture everything passively but lack project scoping — global memory without boundaries creates noise at scale.

AI-Native Computer Use

OpenClaw, Claude Cowork

Operate across the entire desktop, representing a paradigm shift toward ambient AI — but lack project-scoped retrieval and source traceability.

The Gap

CapabilityPresent inAbsent from
Project scopingClaude Projects, NotebookLMRecall AI, Mem.ai, Obsidian
Recognition-based retrievalHeptabase, ScrintalAll AI containers
Source traceabilityHeptabaseNotebookLM, Recall AI
Works within existing workflowRecall AI, Mem.aiHeptabase, Scrintal, Notion
Reduces mode switchAll 28 tools analyzed
Competitive analysis capability matrix
Capability matrix: 28 tools across 9 categories. Only Palace covers all key dimensions.

From Retrieval to Workspace Structure

Convergence

The findings build a cumulative case: retrieval fails because tools demand precision vague recall cannot provide (F1, F4); the failure is most damaging during synthesis because the mode switch destroys cognitive state (F5, F6); tool fragmentation is normal, not pathological (F3); organizational unification decays under its own maintenance burden (F2); and no existing tool structurally reduces the mode switch.

Improving search — faster, more semantic, AI-powered — addresses a symptom. The cause is a workspace structured so that synthesis and retrieval happen in separate places, forcing a context switch between them.

The Argument Chain

1

Observation

Users vaguely recall relevant material but cannot formulate precise queries to locate it (F1, F4). This is predicted by Belkin's ASK model — information needs arise from gaps the user cannot articulate.

2

Structural cause

The 15-tool landscape forces retrieval into a separate mode from synthesis. Every tool boundary is a potential context switch (F3). Organization cannot compensate because it decays faster than it is maintained (F2).

3

Core cost

The damage is not failed search but the mode switch itself. Shifting from synthesis to search degrades relational thinking, and that degradation persists after the search concludes (F5, F6; Tankelevitch et al., 2025).

4

Competitive confirmation

All 28 tools analyzed require this mode switch. Capabilities exist in isolation — project scoping, recognition, traceability, ambient operation — but no tool combines them.

5

Design implication

The intervention is structural, not functional. Reduce the number of surfaces where work happens and maintain project context across them — so retrieval does not require leaving the working context.

Toward Workspace Structure as Intervention

If the mode switch is the core problem, then the intervention is structural rather than functional. Our findings suggest three properties such a workspace would need:

1

Fewer Working Surfaces

Each tool boundary is a potential mode switch. A workspace that consolidates active work into fewer surfaces reduces the number of boundaries where cognitive continuity can break.

2

Persistent Project Context

Reading and writing need access to the same body of project knowledge — sources, ideas, connections. If this context is available within synthesis surfaces, the user does not need to leave their work to retrieve it.

3

Recognition-Based Access

When project memory is present alongside active work, it can be surfaced through visual and relational cues — supporting recognition that human memory handles well rather than demanding precise recall.

Research evolution diagram
Research evolution: from initial assumptions through methods to validated findings.

Reframing for Spring

The question shifts from "how to improve retrieval" to: what workspace structure reduces the frequency and cost of the mode switch?

Design Direction

Design Principles

Three principles address the structural causes of retrieval breakdown. Their shared outcome is reduced context switching — retrieval that does not require leaving the active working surface, reducing the dual-task demand that degrades both search and synthesis performance (F5, F6).

1

Support Recognition Over Recall

Visual and relational cues over precision search. When project context is surfaced through spatial and associative presentation rather than keyword queries, retrieval aligns with how human memory actually works — recognition is substantially more reliable than recall (Mandler, 1980), and vague recall no longer triggers breakdown (Belkin, 1980).

F1, F4, F6
2

Preserve Persistent Project Context

Bounded project memory that does not degrade with time or require maintenance. The 15-tool fragmentation (F3) and organizational decay (F2) are not discipline failures — they are structural inevitabilities. A system-maintained project context layer eliminates the maintenance burden that causes organizational structures to erode.

F2, F3, F7
3

Maintain Visible Source Traceability

Ideas connected to originating sources; chain preserved from raw material → idea → synthesis as thinking evolves. Synthesis happens in transient artifacts disconnected from their sources (F5); traceability must be system-maintained, not user-maintained, to survive the reading-writing gaps that compound retrieval difficulty (F7).

F5, F7

From Findings to Principles

PrincipleFindingsEvidence
Recognition Over RecallF1, F4, F6Universal precision-search failure; Belkin's ASK model
Persistent Project ContextF2, F3, F7Organizational decay; 19x growth in project-scoped AI tools (OpenAI, 2025)
Source TraceabilityF5, F7Synthesis disconnected from sources; citation loss over time

Outcome: Reduce Context Switching (F5, F6) — All three principles converge on eliminating the mode switch from synthesis to search. Participants consistently requested spatial, in-context retrieval (side panels, split views, overlays). The competitive analysis confirmed no existing tool achieves this. Reduced context switching is the measurable outcome by which we evaluate whether the three principles are working.

Design requirements hierarchy diagram
Three design principles converge on the outcome of reduced context switching.

Design Requirements

Persistent Project Context

Bounded memory holding materials, ideas, and relationships without user-maintained organization.

Recognition-Based Retrieval

Visual and relational cues for vague recall, surfaced within the active working context.

Source Traceability

Preserved provenance: raw material → idea → synthesis artifact.

Outcome: Reduce Context Switching

The measurable goal: retrieval without leaving the synthesis surface. Evaluated through time-to-retrieve, tool switches, failed attempts, abandonment rate, and self-reported flow disruption.

Proposed Measures

MeasureTypeDescription
Time-to-retrievePrimaryTime to locate vaguely remembered material
Tool switchesPrimaryApplications opened during retrieval
Failed attemptsPrimaryUnsuccessful searches before success or abandonment
Abandonment ratePrimaryFrequency of giving up
Flow interruptionSecondarySelf-reported cognitive disruption (Likert)
Cognitive loadSecondaryNASA-TLX
Reuse vs. re-creationSecondaryWhether past work is reused or redundantly recreated

Spring Quarter Proposal

Design Question

What is the minimal workspace structure that reduces retrieval breakdown during active knowledge work, and what properties make it effective?

Prototype: Palace

Palace is a research-through-design prototype that restructures the fragmented knowledge work tool stack into fewer surfaces with persistent, shared project context. The goal is to reduce the retrieval-synthesis mode switch identified in our winter findings and to evaluate which workspace properties — project scoping, persistent context, recognition-based surfacing, source traceability — contribute most to that reduction.

1

Project as Container

Bounded context without global scope creep

2

Persistent Context Across Surfaces

Sources, ideas, and relationships available within reading and writing environments

3

Recognition-Based Surfacing

Visual and relational cues alongside active work

4

Source Traceability

System-maintained provenance from source → idea → synthesis

Wireframes

Early wireframes explore three interaction modes: browsing project materials with persistent context, semantic search with provenance tracing, and ambient context surfacing alongside active work.

Palace Dashboard wireframe
Browse view — project materials with relational cues, supporting recognition over recall.
Palace Search View wireframe
Search view — semantic search with source traceability, showing provenance chains.
Palace Ambient Panel wireframe
Ambient panel — project context surfaced alongside active work, reducing the mode switch.

Methods

Research-through-design: Palace as a vehicle for inquiry into workspace structure's effect on retrieval behavior.

Baseline

Current fragmented workflow (Drive, Notes, Notion, Google, Slides)

Intervention

Palace — restructured workspace with persistent project context

Timeline

Weeks 1–2

Prototype Design & Development

Low-fidelity wireframes, interaction flows

Weeks 3–4

Prototype Iteration

Mid-fidelity interactive prototype

Weeks 5–6

Usability Testing

Wizard of Oz sessions with 4–6 participants

Weeks 7–8

Analysis & Iteration

Synthesized findings, design recommendations

Weeks 9–10

Final Deliverable

High-fidelity prototype, process book, final video

WK 1 WK 2 WK 3 WK 4 WK 5 WK 6 WK 7 WK 8 WK 9 WK 10 Design & Dev Lo-fi wireframes Prototype Mid-fi interactive Testing 4–6 participants Analysis Design recs Deliverables Process book + video NOW Spring 2026 · Research-through-design · Within-subjects usability testing
Spring Quarter timeline: 10-week plan from design through deliverables.

Evaluation

Within-subjects comparative usability testing:

  1. Baseline: retrieval tasks using current workflow
  2. Intervention: comparable tasks using Palace
  3. Measures: time-to-retrieve, tool switches, failed attempts, abandonment, self-reported disruption

Open Design Questions

1

What constitutes "minimal"?

Three surfaces is a hypothesis — the actual structure, and whether count or connectivity matters more, is what the prototype explores.

2

How does persistent context avoid becoming organizational burden?

The system must hold context without requiring user maintenance.

3

How does traceability survive evolving synthesis?

Provenance chains must accommodate iteration, not just initial capture.

4

How to evaluate structural effects in 10 weeks?

Simulated tasks with real materials; focus on observable mode-switch reduction.

Constraints

Appendix

References

Anderson, R. (2026, January 6). Keywords are not dead, but discovery is no longer just search. The Scholarly Kitchen. https://scholarlykitchen.sspnet.org/2026/01/06/keywords-are-not-dead-but-discovery-is-no-longer-just-search/

Bates, M. J. (1989). The design of browsing and berrypicking techniques for the online search interface. Online Review, 13(5), 407–424. https://doi.org/10.1108/eb024320

Belkin, N. J. (1980). Anomalous states of knowledge as a basis for information retrieval. Canadian Journal of Information Science, 5, 133–143.

Dammu, P., & Roosta, T. (2026). Information seeking in the age of agentic AI [Tutorial]. CHIIR 2026. [PDF notes]

Google Cloud. (n.d.). About hybrid search. Vertex AI Documentation. https://cloud.google.com/vertex-ai/docs/vector-search/about-hybrid-search

Mandler, G. (1980). Recognizing: The judgment of previous occurrence. Psychological Review, 87(3), 252–271. https://doi.org/10.1037/0033-295X.87.3.252

Microsoft. (n.d.). Outperforming vector search with hybrid retrieval and reranking. Azure AI Search Documentation. https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/azure-ai-search-outperforming-vector-search-with-hybrid-retrieval-and-reranking/3929167

OpenAI. (2025). The state of enterprise AI: 2025 report. https://cdn.openai.com/pdf/7ef17d82-96bf-4dd1-9df2-228f7f377a29/the-state-of-enterprise-ai_2025-report.pdf

Singh, P., et al. (2025). Agentic retrieval-augmented generation: A survey. arXiv preprint arXiv:2501.09136. https://arxiv.org/abs/2501.09136

Tankelevitch, L., et al. (2025). The future of AI in knowledge work: Tools for thought. Proceedings of CHI 2025. Microsoft Research. https://www.microsoft.com/en-us/research/blog/the-future-of-ai-in-knowledge-work-tools-for-thought-at-chi-2025/

Yu, Z., et al. (2025). A-MEM: Agentic memory for LLM agents. arXiv preprint arXiv:2502.12110. https://arxiv.org/abs/2502.12110

Zhang, Y., et al. (2025). Memory in the age of AI agents: A survey. arXiv preprint arXiv:2512.13564. https://arxiv.org/abs/2512.13564