Skip to content

Commit 67503d9

Browse files
kieran-wilkinson-4anthony-nhsbencegadanyi1-nhs
authored
Update: [AEA-6157] - Exceed Context Window Issue (#394)
## Summary Updates the bot to use Prompt Reformulation and Fixed Size Chunking ### Details 1. Slack messages filter through a "Reformulation" prompt before reaching the RAG - Looks for questions within the user's query - Finds sub-questions and important details - Searching knowledgebase to improve search query inc. key words, abbreviations, etc 2. Use Fixed Size Chunking to reduce risk of exceeding context window - Set chunk sizes lets us control the size of the documents pulled back for RAG - Can customise size/ overlap allowing for fine-tuning, between comprehension and hallucination, at a later date. --------- Co-authored-by: Anthony Brown <anthony.brown8@nhs.net> Co-authored-by: bencegadanyi1-nhs <bence.gadanyi1@nhs.net>
1 parent 062b740 commit 67503d9

25 files changed

Lines changed: 703 additions & 475 deletions

packages/cdk/nagSuppressions.ts

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,22 @@ export const nagSuppressions = (stack: Stack, account: string) => {
8080
]
8181
)
8282

83+
// Suppress unauthenticated API route warnings
84+
safeAddNagSuppression(
85+
stack,
86+
"/EpsAssistMeStack/Apis/EpsAssistApiGateway/ApiGateway/Default/slack/commands/POST/Resource",
87+
[
88+
{
89+
id: "AwsSolutions-APIG4",
90+
reason: "Slack command endpoint is intentionally unauthenticated."
91+
},
92+
{
93+
id: "AwsSolutions-COG4",
94+
reason: "Cognito not required for this public endpoint."
95+
}
96+
]
97+
)
98+
8399
// Suppress missing WAF on API stage for Apis construct
84100
safeAddNagSuppression(
85101
stack,
Lines changed: 39 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,40 @@
1-
Return the user query exactly as provided without any modifications, changes, or reformulations.
2-
Do not alter, rephrase, or modify the input in any way.
3-
Simply return: {{user_query}}
1+
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
2+
You are an expert RAG query and context optimizer. Your task is to analyze verbose user queries and raw search context, stripping away all conversational filler to output a concise, impactful summary.
43

5-
User Query: {{user_query}}
4+
You must:
5+
1. Extract the core objective into a single, direct question.
6+
2. Capture individual questions and their specific needs.
7+
3. Isolate critical variables, specific states, and constraints required to solve the problem.
8+
4. Enhance the question(s) with relevant terminology from the search results
9+
10+
Output your response strictly using the following XML structure:
11+
<optimized_query> (The short, direct question)
12+
<key_variables> (Bullet points of critical states, statuses, or constraints)
13+
<|eot_id|>
14+
15+
<|start_header_id|>user<|end_header_id|>
16+
### User Query
17+
Hi, I need some help figuring out the PTO rules for one of my team members. They started as part-time 6 months ago, but they just transitioned to full-time last week (let's say exactly 7 days ago). They currently have 2 days of PTO saved up from their part-time stint. They want to take next week off entirely, which would require 5 days of PTO. Can they do this, effectively going to a -3 balance, since they are full-time now?
18+
<|eot_id|>
19+
20+
<|start_header_id|>assistant<|end_header_id|>
21+
<optimized_query>
22+
Can a recently transitioned full-time employee with 6 months total tenure and 2 accrued PTO days take 5 days off, resulting in a -3 PTO balance?
23+
</optimized_query>
24+
<key_variables>
25+
- Current Status: Full-time (transitioned 7 days ago)
26+
- Total Tenure: 6 months
27+
- Current PTO Balance: 2 days
28+
- Requested PTO: 5 days (resulting in -3 balance)
29+
</key_variables>
30+
<|eot_id|>
31+
32+
<|start_header_id|>user<|end_header_id|>
33+
### Search Context
34+
$search_results$
35+
36+
### User Query
37+
{{user_query}}
38+
<|eot_id|>
39+
40+
<|start_header_id|>assistant<|end_header_id|>
Lines changed: 28 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,32 @@
1-
# 1. Persona & Logic
2-
You are an AI assistant for onboarding guidance. Follow these strict rules:
3-
- **Strict Evidence:** If the answer is missing, do not infer or use external knowledge.
4-
- **Grounding:** NEVER use your own internal training data, online resources, or prior knowledge.
5-
- **Decomposition:** Split multi-part queries into numbered sub-questions (Q1, Q2).
1+
You are a technical assistant specialized in onboarding guidance.
2+
Your primary goal is to
63

7-
# 2. Output Structure
8-
**Summary**
9-
2-3 sentences maximum.
4+
STYLE & FORMATTING RULES:
5+
- Do NOT refer to the search results by number or name in the body of the text.
6+
- Do NOT add a "Citations" section at the end of the response.
7+
- Do NOT reference how the information was found (e.g., "...the provided search results")
8+
- Do NOT state what the data is related to (e.g., "The search results are related to NHS API and FHIR...")
9+
- Text should prioritise readability.
10+
- Links should use Markdown text, e.g., <url|link text>.
11+
- Use `Inline Code` for system names, field names, or technical terms (e.g., `HL7 FHIR`).
1012

11-
**Answer**
12-
Prioritize detail and specification, focus on the information direct at the question.
13+
RULES:
14+
- Answer questions using ONLY the provided search results.
15+
- Do not assume any information, all information must be grounded in data.
1316

14-
# 3. Styling Rules (`mrkdwn`)
15-
Use British English.
16-
- **Bold (`*`):** Headings, Subheadings, Source Names, and important information/ exceptions (e.g. `*NHS England*`).
17-
- **Italic (`_`):** Citations and Titles (e.g. `_Guidance v1_`).
18-
- **Blockquote (`>`):** Quotes (>1 sentence) and Tech Specs/Examples (e.g. `HL7 FHIR`).
19-
- **Links:** `[text](link)`.
17+
STEPS:
18+
1. Extract key information from the knowledge base
19+
2. Generate an answer, capturing the core question the user is asking.
20+
3. Answer, directly, any individual or sub-questions the user has provided.
21+
4. You must create a very short summary encapsulating the response and have it precede all other answers.
2022

21-
# 4. Format Rules
22-
- NEVER use in-line references or citations (e.g., do not write "(search result 1)" or "[1]").
23-
- Do NOT refer to the search results by number or name in the body of the text.
24-
- Do NOT add a "Citations" section at the end of the response.wer, details from the knowledge base.
23+
EXAMPLE:
24+
<example_interaction>
25+
*Summary*
26+
This is a short, fast answer so the user doesn't _have_ to read the long answer.
27+
28+
*Answer*
29+
This is a direct answer to the question, or questions, provided. It breaks down individual questions. There is no reference to the text here (for example, you don't see "from source 1") but instead treats this information as if it was public knowledge. However, if there is a source, it does provide that source [as a hyperlink](hyperlink) to the website it can be found.
30+
31+
There is multiple paragraphs, with blank lines between, to make it easier to read, as readability is a requirement.
32+
</example_interaction>
Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
<search_results>$search_results$<search_results>
32

43
<user_query>{{user_query}}<user_query>

packages/cdk/resources/Apis.ts

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ export class Apis extends Construct {
2727
forwardCsocLogs: props.forwardCsocLogs,
2828
csocApiGatewayDestination: props.csocApiGatewayDestination
2929
})
30+
3031
// Create /slack resource path
3132
const slackResource = apiGateway.api.root.addResource("slack")
3233

@@ -41,6 +42,17 @@ export class Apis extends Construct {
4142
lambdaFunction: props.functions.slackBot
4243
})
4344

45+
// Create the '/slack/commands' POST endpoint for Slack Events API
46+
// This endpoint will handle slash commands, such as /test
47+
// eslint-disable-next-line @typescript-eslint/no-unused-vars
48+
const slackCommandsEndpoint = new LambdaEndpoint(this, "SlackCommandsEndpoint", {
49+
parentResource: slackResource,
50+
resourceName: "commands",
51+
method: HttpMethod.POST,
52+
restApiGatewayRole: apiGateway.role,
53+
lambdaFunction: props.functions.slackBot
54+
})
55+
4456
this.apis = {
4557
api: apiGateway
4658
}
Lines changed: 55 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,66 +1,85 @@
11
import {Construct} from "constructs"
2+
import * as crypto from "crypto"
23
import {
34
BedrockFoundationModel,
5+
ChatMessage,
46
Prompt,
57
PromptVariant
68
} from "@cdklabs/generative-ai-cdk-constructs/lib/cdk-lib/bedrock"
79
import {BedrockPromptSettings} from "./BedrockPromptSettings"
10+
import {CfnPrompt} from "aws-cdk-lib/aws-bedrock"
811

912
export interface BedrockPromptResourcesProps {
1013
readonly stackName: string
1114
readonly settings: BedrockPromptSettings
1215
}
1316

1417
export class BedrockPromptResources extends Construct {
15-
public readonly queryReformulationPrompt: Prompt
18+
public readonly reformulationPrompt: Prompt
1619
public readonly ragResponsePrompt: Prompt
17-
public readonly ragModelId: string
18-
public readonly queryReformulationModelId: string
20+
public readonly modelId: string
1921

2022
constructor(scope: Construct, id: string, props: BedrockPromptResourcesProps) {
2123
super(scope, id)
2224

23-
const ragModel = new BedrockFoundationModel("meta.llama3-70b-instruct-v1:0")
24-
const reformulationModel = BedrockFoundationModel.AMAZON_NOVA_LITE_V1
25+
const aiModel = new BedrockFoundationModel("meta.llama3-70b-instruct-v1:0")
2526

26-
const queryReformulationPromptVariant = PromptVariant.text({
27-
variantName: "default",
28-
model: reformulationModel,
29-
promptVariables: ["topic"],
30-
promptText: props.settings.reformulationPrompt.text
31-
})
27+
// Create Prompts
28+
this.reformulationPrompt = this.createPrompt(
29+
"ReformulationPrompt",
30+
`${props.stackName}-reformulation`,
31+
"Prompt for reformulation queries to improve RAG inference",
32+
aiModel,
33+
"",
34+
[props.settings.reformulationPrompt],
35+
props.settings.reformulationInferenceConfig
36+
)
3237

33-
const queryReformulationPrompt = new Prompt(this, "QueryReformulationPrompt", {
34-
promptName: `${props.stackName}-queryReformulation`,
35-
description: "Prompt for reformulating user queries to improve RAG retrieval",
36-
defaultVariant: queryReformulationPromptVariant,
37-
variants: [queryReformulationPromptVariant]
38-
})
38+
this.ragResponsePrompt = this.createPrompt(
39+
"RagResponsePrompt",
40+
`${props.stackName}-ragResponse`,
41+
"Prompt for generating RAG responses with knowledge base context and system instructions",
42+
aiModel,
43+
props.settings.systemPrompt.text,
44+
[props.settings.userPrompt],
45+
props.settings.ragInferenceConfig
46+
)
47+
48+
this.modelId = aiModel.modelId
49+
}
3950

40-
const ragResponsePromptVariant = PromptVariant.chat({
51+
private createPrompt(
52+
id: string,
53+
promptName: string,
54+
description: string,
55+
model: BedrockFoundationModel,
56+
systemPromptText: string,
57+
messages: [ChatMessage],
58+
inferenceConfig: CfnPrompt.PromptModelInferenceConfigurationProperty
59+
): Prompt {
60+
61+
const variant = PromptVariant.chat({
4162
variantName: "default",
42-
model: ragModel,
43-
promptVariables: ["query", "search_results"],
44-
system: props.settings.systemPrompt.text,
45-
messages: [props.settings.userPrompt]
63+
model: model,
64+
promptVariables: ["prompt", "search_results"],
65+
system: systemPromptText,
66+
messages: messages
4667
})
4768

48-
ragResponsePromptVariant.inferenceConfiguration = {
49-
text: props.settings.inferenceConfig
69+
variant.inferenceConfiguration = {
70+
text: inferenceConfig
5071
}
5172

52-
const ragPrompt = new Prompt(this, "ragResponsePrompt", {
53-
promptName: `${props.stackName}-ragResponse`,
54-
description: "Prompt for generating RAG responses with knowledge base context and system instructions",
55-
defaultVariant: ragResponsePromptVariant,
56-
variants: [ragResponsePromptVariant]
57-
})
58-
59-
// expose model IDs for use in Lambda environment variables
60-
this.ragModelId = ragModel.modelId
61-
this.queryReformulationModelId = reformulationModel.modelId
73+
const hash = crypto.createHash("md5")
74+
.update(JSON.stringify(variant))
75+
.digest("hex")
76+
.substring(0, 6)
6277

63-
this.queryReformulationPrompt = queryReformulationPrompt
64-
this.ragResponsePrompt = ragPrompt
78+
return new Prompt(this, id, {
79+
promptName: `${promptName}-${hash}`,
80+
description,
81+
defaultVariant: variant,
82+
variants: [variant]
83+
})
6584
}
6685
}

packages/cdk/resources/BedrockPromptSettings.ts

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ import {ChatMessage} from "@cdklabs/generative-ai-cdk-constructs/lib/cdk-lib/bed
33
import {Construct} from "constructs"
44
import {CfnPrompt} from "aws-cdk-lib/aws-bedrock"
55

6-
export type BedrockPromptSettingsType = "system" | "user" | "reformulation"
6+
export type BedrockPromptSettingsType = "system" | "reformulation" | "user"
77

88
/** BedrockPromptSettings is responsible for loading and providing
99
* the system, user, and reformulation prompts along with their
@@ -13,7 +13,8 @@ export class BedrockPromptSettings extends Construct {
1313
public readonly systemPrompt: ChatMessage
1414
public readonly userPrompt: ChatMessage
1515
public readonly reformulationPrompt: ChatMessage
16-
public readonly inferenceConfig: CfnPrompt.PromptModelInferenceConfigurationProperty
16+
public readonly ragInferenceConfig: CfnPrompt.PromptModelInferenceConfigurationProperty
17+
public readonly reformulationInferenceConfig: CfnPrompt.PromptModelInferenceConfigurationProperty
1718

1819
/**
1920
* @param scope The Construct scope
@@ -30,16 +31,19 @@ export class BedrockPromptSettings extends Construct {
3031
this.userPrompt = ChatMessage.user(userPromptData.text)
3132

3233
const reformulationPrompt = this.getTypedPrompt("reformulation")
33-
this.reformulationPrompt = ChatMessage.user(reformulationPrompt.text)
34+
this.reformulationPrompt = ChatMessage.assistant(reformulationPrompt.text)
3435

35-
this.inferenceConfig = {
36+
const defaultInferenceConfig = {
3637
temperature: 0,
3738
topP: 0.3,
38-
maxTokens: 1024,
39+
maxTokens: 512,
3940
stopSequences: [
4041
"Human:"
4142
]
4243
}
44+
45+
this.ragInferenceConfig = defaultInferenceConfig
46+
this.reformulationInferenceConfig = defaultInferenceConfig
4347
}
4448

4549
/** Get the latest prompt text from files in the specified directory.

packages/cdk/resources/Functions.ts

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ export interface FunctionsProps {
3535
readonly isPullRequest: boolean
3636
readonly mainSlackBotLambdaExecutionRoleArn : string
3737
readonly ragModelId: string
38-
readonly queryReformulationModelId: string
38+
readonly reformulationModelId: string
3939
readonly notifyS3UploadFunctionPolicy: ManagedPolicy
4040
readonly docsBucketName: string
4141
}
@@ -61,17 +61,17 @@ export class Functions extends Construct {
6161
dependencyLocation: ".dependencies/slackBotFunction",
6262
environmentVariables: {
6363
"RAG_MODEL_ID": props.ragModelId,
64-
"QUERY_REFORMULATION_MODEL_ID": props.queryReformulationModelId,
64+
"REFORMULATION_MODEL_ID": props.reformulationModelId,
6565
"KNOWLEDGEBASE_ID": props.knowledgeBaseId,
6666
"LAMBDA_MEMORY_SIZE": LAMBDA_MEMORY_SIZE,
6767
"SLACK_BOT_TOKEN_PARAMETER": props.slackBotTokenParameter.parameterName,
6868
"SLACK_SIGNING_SECRET_PARAMETER": props.slackSigningSecretParameter.parameterName,
6969
"GUARD_RAIL_ID": props.guardrailId,
7070
"GUARD_RAIL_VERSION": props.guardrailVersion,
7171
"SLACK_BOT_STATE_TABLE": props.slackBotStateTable.tableName,
72-
"QUERY_REFORMULATION_PROMPT_NAME": props.reformulationPromptName,
72+
"REFORMULATION_RESPONSE_PROMPT_NAME": props.reformulationPromptName,
7373
"RAG_RESPONSE_PROMPT_NAME": props.ragResponsePromptName,
74-
"QUERY_REFORMULATION_PROMPT_VERSION": props.reformulationPromptVersion,
74+
"REFORMULATION_RESPONSE_PROMPT_VERSION": props.reformulationPromptVersion,
7575
"RAG_RESPONSE_PROMPT_VERSION": props.ragResponsePromptVersion
7676
}
7777
})

packages/cdk/resources/RuntimePolicies.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ export interface RuntimePoliciesProps {
1313
readonly dataSourceArn: string
1414
readonly promptName: string
1515
readonly ragModelId: string
16-
readonly queryReformulationModelId: string
16+
readonly reformulationModelId: string
1717
readonly docsBucketArn: string
1818
readonly docsBucketKmsKeyArn: string
1919
}
@@ -32,7 +32,7 @@ export class RuntimePolicies extends Construct {
3232
actions: ["bedrock:InvokeModel"],
3333
resources: [
3434
`arn:aws:bedrock:${props.region}::foundation-model/${props.ragModelId}`,
35-
`arn:aws:bedrock:${props.region}::foundation-model/${props.queryReformulationModelId}`
35+
`arn:aws:bedrock:${props.region}::foundation-model/${props.reformulationModelId}`
3636
]
3737
})
3838

packages/cdk/resources/VectorKnowledgeBaseResources.ts

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -156,15 +156,12 @@ export class VectorKnowledgeBaseResources extends Construct {
156156
// Create S3 data source for knowledge base documents
157157
// prefix pointed to processed/ to only ingest converted markdown documents
158158

159-
const chunkingConfiguration = {
160-
...ChunkingStrategy.HIERARCHICAL_TITAN.configuration,
161-
hierarchicalChunkingConfiguration: {
162-
overlapTokens: 60,
163-
levelConfigurations: [
164-
{maxTokens: 1000}, // Parent chunk configuration,
165-
{maxTokens: 300} // Child chunk configuration
166-
]
167-
}
159+
const chunkingConfiguration: CfnDataSource.ChunkingConfigurationProperty = {
160+
...ChunkingStrategy.FIXED_SIZE.configuration,
161+
fixedSizeChunkingConfiguration: {
162+
maxTokens: 512,
163+
overlapPercentage: 25
164+
} satisfies CfnDataSource.FixedSizeChunkingConfigurationProperty
168165
}
169166

170167
const hash = crypto.createHash("md5")

0 commit comments

Comments
 (0)