Masterarbeit

Context‑Aware Caching for Long‑Form User-Assistant Dialogues in Distributed Systems

Completion

2025/10

Research Area

Web Engineering

Students

Artur Vashchenkov

student

Advisers

Jan Haas M.Sc.

researcher

Room: 1/B204

Phone: +49 371 531 32141

Fax: +49 371 531 8 32141

Email: jan-ingo.haas@informatik.tu-chemnitz.de

Prof. Dr.-Ing. Martin Gaedke

professor

Room: 1/B319

Phone: +49 371 531 25530

Fax: +49 371 531 25539

Email: gaedke@informatik.tu-chemnitz.de

Description

Large Language Models (LLMs) have fundamentally transformed web-based conversational applications, enabling sophisticated, context-sensitive interactions widely employed in domains such as customer support, product FAQs, and technical assistance services. These web-based systems frequently handle repetitive or semantically similar queries, presenting clear opportunities for caching to optimize performance, reduce response latency, and cut operational expenses in distributed web architectures. Several caching strategies have already emerged addressing these needs, ranging from vendor-specific implementations, such as OpenAI’s proprietary caching API, to vendor-agnostic middleware solutions like GPT-Cache, MeanCache.

Although existing caching strategies provide certain performance improvements, they have notable limitations in managing complex, multi-turn conversational contexts within distributed web systems. Vendor-specific solutions (e.g. from OpenAI), while highly performant, inherently lack flexibility, cross-platform integration capabilities, leading to a lock-in into a closed ecosystem. Meanwhile, vendor agnostic middleware solutions such as GPT-Cache or MeanCache are primarily optimized for isolated query-response patterns or short-context conversations leading to degraded semantic accuracy as conversations grow in complexity. Thus, this thesis aims to address these shortcomings by conceptualizing a caching mechanism suitable for long-context conversations that can be deployed as a component in a distributed system.

The thesis encompasses a thorough problem analysis of current state-of-the-art caching strategies, comparing them systematically against specific requirements aimed at solving the identified problems. Based on these findings, a new caching approach must be conceptualized and its feasibility demonstrated through the design and implementation of a prototype tailored specifically for integration into distributed system architectures. Specifically, the architecture must not assume to be on the same host-system as the inference engine. To validate this prototype, it must be tested using (possibly synthetic) conversational datasets that reflect realistic, multi turn user-assistant interactions. The solution’s effectiveness will be assessed by comparing its performance against both the defined requirements and current state-of-the-art methods.