Skip to content

Add strategy to summarize ai message history #630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 58 commits into
base: main
Choose a base branch
from
Open

Conversation

MarceloRGonc
Copy link
Contributor

@MarceloRGonc MarceloRGonc commented May 16, 2025

Fixes OPS-1787.

Additional notes:

We are only summarizing after an error.

We are summarizing the chat in the following situations:

  • In selectRelevantTools when there is an error related tokens, we ask for a summary and we retry the request
  • On the MCP controller
    • In the event of a general error, we request a summary in the background. And we append to the error message 'Please try again'
    • On an error streaming to the user if the error is related to tokens, we ask for a summary, and we retry the request
    • Upon finishing streaming, if the finish reason is 'length', we request a summary. And we append to the message 'The message was truncated because the maximum tokens for the context window was reached. Please try again.'

Summarize strategy

  • We ask the LLM to do the summary.
  • If the last message is from the user, we remove it, and we append the message to the end
  • If the request to the LLM fails, we truncate the chat history based on user interactions. I added an environment variable

Copy link

linear bot commented May 16, 2025

@MarceloRGonc MarceloRGonc changed the base branch from main to mg/OPS-1787-2 May 19, 2025 10:18
@MarceloRGonc MarceloRGonc changed the title Summarize chat history Add strategy to summarize ai message history May 19, 2025
@MarceloRGonc MarceloRGonc marked this pull request as ready for review May 19, 2025 10:22
@MarceloRGonc MarceloRGonc requested a review from Copilot May 19, 2025 10:22
Base automatically changed from mg/OPS-1787-2 to main May 19, 2025 10:22
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a flexible summarization strategy for AI message history by introducing new system properties to cap history by tokens or message count and wiring summarization logic into the chat flow.

  • Defines new system props (MAX_LLM_CALLS_WITHOUT_INTERACTION, MAX_TOKENS_IN_LLM_HISTORY, MAX_MESSAGES_IN_LLM_HISTORY) with defaults.
  • Implements summarizeMessages, integrates it into the controller, and persists both full and summarized histories.
  • Adds unit tests for summarizeMessages.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/server/shared/src/lib/system/system.ts Added default values for new LLM history props
packages/server/shared/src/lib/system/system-prop.ts Expanded AppSystemProp enum with new caps
packages/server/api/test/unit/ai/ai-message-history-summarizer.test.ts New tests covering summarization behavior
packages/server/api/src/app/ai/chat/ai-message-history-summarizer.ts Implemented history summarizer utilities
packages/server/api/src/app/ai/chat/ai-mcp-chat.controller.ts Integrated summarizer into chat controller and streaming
packages/server/api/src/app/ai/chat/ai-chat.service.ts Added storage and append/delete for summarized history
packages/openops/src/lib/ai/providers/openai.ts Added compatibility: 'strict' flag
Comments suppressed due to low confidence (1)

packages/server/api/src/app/ai/chat/ai-chat.service.ts:122

  • Newly added functions (getSummarizedChatHistory, appendMessagesToSummarizedChatHistory, deleteSummarizedChatHistory) lack unit tests; consider adding tests to validate summarized-history storage and retrieval behavior.
export const getSummarizedChatHistory = async (

@MarceloRGonc MarceloRGonc changed the base branch from main to mg/simplify-recursion May 21, 2025 14:08
Base automatically changed from mg/simplify-recursion to main May 21, 2025 17:57
Base automatically changed from mg/OPS-1787-3 to main June 2, 2025 07:35
@@ -132,38 +158,50 @@ export const aiMCPChatController: FastifyPluginAsyncTypebox = async (app) => {
isTablesLoaded,
});

const streamMessagesParams: StreamMessagesParams = {
Copy link
Contributor

@rita-gorokhod rita-gorokhod Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract everything please from this point and the helper functions streamMessages. it's very difficult to review, especially with the error handling. There shouldn't be this much logic in a controller.

@@ -88,6 +88,9 @@ const systemPropDefaultValues: Partial<Record<SystemProp, string>> = {
[AppSystemProp.SUPERSET_MCP_SERVER_PATH]: '/root/.mcp/superset',
[AppSystemProp.DOCS_MCP_SERVER_PATH]: '/root/.mcp/docs.openops.com',
[AppSystemProp.LOAD_EXPERIMENTAL_MCP_TOOLS]: 'false',
[AppSystemProp.MAX_LLM_CALLS_WITHOUT_INTERACTION]: '10',
[AppSystemProp.MAX_TOKENS_FOR_HISTORY_SUMMARY]: '2000',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it work with our huge prompt and small models?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there are any models with less than ~4000 tokens. So, I'm defining half of that as default. But the prompt is not included in the summary.

chatId,
[],
async (existingMessages) => {
let reAdd = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing and readding is hard to read.

async function summarizeChatHistory(messages) {
  const lastMessage = messages[messages.length - 1];
  const isLastMessageFromUser = lastMessage && lastMessage.role === 'user' && messages.length > 1;
  const messagesForSummary = isLastMessageFromUser ? messages.slice(0, -1) : messages;

  const summarizedHistory = await requestToGenerateSummary(languageModel, messagesForSummary, aiConfig);
  return isLastMessageFromUser ? [...summarizedHistory, lastMessage] : summarizedHistory;
}

wdyt?

model: languageModel,
system: systemPrompt,
...aiConfig.modelSettings,
maxTokens: getHistoryMaxTokens(aiConfig),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does the maxTokens work in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYM? The maxTokens is the maximum number of tokens to generate. So it will be the value from the env var (2000 by default) or the value defined in the model settings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but does it mean that the history will be max 2000 tokens? what if it needs more?

languageModel: LanguageModel,
messages: CoreMessage[],
aiConfig: AiConfig,
attemptIndex = 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the recursion here is confusing with the attemptIndex .
Why not instead call an inner func?
this method is supposed to be called only twice, no? for full input, and for last 2 interactions -- if those fail we don't try anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these retries (2) are enough. However, I'm not sure if you want to configure it. Can I always assume 2 retries?

Copy link
Contributor

@rita-gorokhod rita-gorokhod Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if it's configurable, you can do it without recursion.

const errorMessage = error instanceof Error ? error.message : String(error);

attemptIndex = attemptIndex + 1;
if (!shouldTryToSummarize(errorMessage, attemptIndex)) {
Copy link
Contributor

@rita-gorokhod rita-gorokhod Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be here, this gives you the tools selection, it should already receive a summarized data if summarization was needed.
All the error handling and summarization should be done on the caller level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are calling the LLM to select the tools and we are using the entire history. It may be necessary to summarize since we are only summarizing on error.

Copy link
Contributor

@rita-gorokhod rita-gorokhod Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure but then you can do it in the caller.
Call tools with everything --> failed on max? --> summarize --> call tools with summarized

otherwise this logic is spread in multiple places

Copy link
Contributor

@rita-gorokhod rita-gorokhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments, this needs some refactoring to simplify the code, it's very difficult to read

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants