Skip to content

Summarisation improvements implementation#684

Open
Niharika0306 wants to merge 2 commits intoIBM:mainfrom
Niharika0306:summarisation_changes_rebased
Open

Summarisation improvements implementation#684
Niharika0306 wants to merge 2 commits intoIBM:mainfrom
Niharika0306:summarisation_changes_rebased

Conversation

@Niharika0306
Copy link
Copy Markdown
Contributor

Proposal plan for this PR : #621


# Get actual token count from the input text
input_tokens = await asyncio.to_thread(
lambda: len(tokenize_with_llm(content_text, llm_endpoint))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope there are no limitation on the content size that this API can handle, please validate once

Copy link
Copy Markdown
Member

@manalilatkar manalilatkar Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Dharaneesh's comment:
Hope there are no limitation on the content size that this API can handle, please validate once

Comment thread spyre-rag/src/summarize/summ_utils.py Outdated
Comment on lines +51 to +58
minimum_output_tokens = int(settings.summarize.minimum_summary_words / settings.common.llm.token_to_word_ratio_en)

# Hard limit: input + prompt + minimum_output must fit in context
max_allowed_input_tokens = (
settings.common.llm.granite_3_3_8b_instruct_context_length -
settings.summarize.summarization_prompt_token_count -
minimum_output_tokens
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these can be defined in global level based on current config. No need to calculate it for every request.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Also, add the calculation equation in comments so there is no confusion later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Will do.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed. It is defined at module level now since they are derived values.

Comment thread spyre-rag/src/summarize/summ_utils.py Outdated
level_config = getattr(settings.summarize.summarization_levels, summary_level)

# Calculate ideal output tokens for this level
ideal_output_tokens = int(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this one also not depends on the input tokens which means we can define it global level.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed as part of next comment.

Comment thread spyre-rag/src/summarize/summ_utils.py Outdated
(1 + settings.summarize.summarization_coefficient * level_config.multiplier)
)

if available_output_tokens < ideal_output_tokens:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain with a comment on which scenario this can happen?
But wondering why are we comparing with a generic output tokens with output tokens based on current input?
I feel we need to calculate the ideal token also based on input and compare, if it available is still less than the ideal count, than we can print this message IMO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the calculation was complex and wrong before. Its corrected now.

Comment thread spyre-rag/src/summarize/summ_utils.py Outdated

# Calculate ideal target based on input tokens and level multiplier
base_target_tokens = int(input_tokens * settings.summarize.summarization_coefficient)
ideal_target_tokens = int(base_target_tokens * level_config.multiplier)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this should be used and the warning can be printed here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread spyre-rag/src/summarize/summ_utils.py Outdated
max_possible_words = int(available_output_tokens * settings.common.llm.token_to_word_ratio_en)
max_words = min(max_words, max_possible_words)

# Add small buffer to max_tokens
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment on why this buffer is needed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment thread spyre-rag/src/summarize/summ_utils.py Outdated
return target_word_count, min_words, max_words, max_tokens


def compute_target_and_max_tokens(input_tokens: int, input_word_count: int, summary_length: Optional[int]):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input_word_count seems unused

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Comment thread spyre-rag/src/summarize/summ_utils.py Outdated
return target_word_count, min_words, max_words, max_tokens


def compute_target_and_max_tokens(input_tokens: int, input_word_count: int, summary_length: Optional[int]):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the new token calculation approach in this method as well?
Just wanted to see the default method also uses token based limit calculation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.


# Get actual token count from the input text
input_tokens = await asyncio.to_thread(
lambda: len(tokenize_with_llm(content_text, llm_endpoint))
Copy link
Copy Markdown
Member

@manalilatkar manalilatkar Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Dharaneesh's comment:
Hope there are no limitation on the content size that this API can handle, please validate once

Comment thread spyre-rag/src/summarize/app.py Outdated
f"automatic length: {target_words} words"
)

messages = build_messages(content_text, target_words, min_words, max_words, has_length_spec)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a new variable called has_length_spec, this line can just be:
messages = build_messages(content_text, target_words, min_words, max_words, (summary_length is not None or summary_level is not None) )

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment thread spyre-rag/src/summarize/app.py Outdated
"|-------|------|----------|-------------|\n"
"| `text` | string | Yes | Plain text content to summarize |\n"
"| `length` | integer | No | Desired summary length in words |\n"
"| `summary_level` | string | No | Abstraction level: 'brief', 'standard' (default), or 'detailed' |\n"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we agree to change the name of the parameter from summary_level -> level
Also in line no 275, 276, 281, 297, 300, 303.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, forgot to reflect the changes here. Will do.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Comment thread spyre-rag/src/summarize/app.py Outdated
Comment thread spyre-rag/src/summarize/summ_utils.py Outdated
Comment on lines +51 to +58
minimum_output_tokens = int(settings.summarize.minimum_summary_words / settings.common.llm.token_to_word_ratio_en)

# Hard limit: input + prompt + minimum_output must fit in context
max_allowed_input_tokens = (
settings.common.llm.granite_3_3_8b_instruct_context_length -
settings.summarize.summarization_prompt_token_count -
minimum_output_tokens
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Also, add the calculation equation in comments so there is no confusion later.

Comment thread spyre-rag/src/summarize/summ_utils.py Outdated
Signed-off-by: Niharika Gurram <niharika.gurram1@ibm.com>
400, "INVALID_PARAMETER",
"Cannot specify both 'summary_level' and 'length'. Please use only one."
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Niharika0306, this block of code from line no 159 to 194 becomes much simpler and cleaner ( only 3 lines ) if you create functions validate_input_word_count and compute_target_and_max_tokens that accept both length and level. Just a suggestion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, refactored a bit. Now there's only one instance of functions in case of both length and level.

Signed-off-by: Niharika Gurram <niharika.gurram1@ibm.com>
@Niharika0306 Niharika0306 force-pushed the summarisation_changes_rebased branch from 995253a to 8df02fc Compare April 30, 2026 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants