Context engineering: the future of AI integration

Dan Batty
19 hours ago
5 min read

As AI creeps further and further into our everyday lives, it is vital for us to be able to harness it effectively, efficiently and consistently. Enter context engineering - the (maybe) magic bullet to streamlining AI integration in your business.

What is context engineering?

Context engineering is the idea of designing an information system around an AI model such as designing effective prompts, defining information sources, external tools, managing context windows and much more.

If we think of an LLM as the CPU of a machine, then the context window would be the RAM - a limited working memory used to complete tasks. In the same way a computer slows as RAM usage increases, so too does an LLMs ability to complete tasks when its context window is full. This is where context engineering comes into play by massaging the context into a more useful format.

When working with LLMs there is a fine line between too much context and too little. I’m sure we’ve all experienced our AI assistants sometimes going off on wild tangents or seeming to have no idea what we’re talking about. Studies from Anthropic have shown why this happens and it's all related to the context we give. Too much context leads to token inefficiency, slower responses and hallucinations. Too little, and responses become vague and irrelevant. This is the challenge of context engineering.

Slide titled Calibrating the system prompt shows a color scale from Too specific to Too vague with three prompt examples below.

Methodologies

So now we know what context engineering is, how can we harness it? As mentioned, context engineering is a huge subject. If I were to delve too deeply, this blog would soon turn into an essay. However, what I do think is useful is to touch on the more important aspects to act as a foundation.

Content retrieval and generation

This is the first core concept of context engineering which sources relevant contextual information via prompt engineering and external retrieval.

Prompt engineering

Prompt engineering is the practice of writing effective prompts for LLMs in order to produce more accurate and relevant outputs. Think of it as articulating an idea for a more concise and deeper understanding of the topic. By refining the instructions given to a model, we can more effectively steer it into generating relevant information.

Examples of prompt engineering include:

Zero-shot prompt - giving instructions to the AI with no prior examples or guidance (this is the majority of day-to-day users)
Few-shot prompt - gives a couple examples to demonstrate how an answer might look
Chain of thought (CoT) - encourages models to reason through problems step by step, breaking them into smaller components
Meta prompting - asking a model to generate its own prompt to help show its reasoning before setting it off on a task
Self-consistency - asks the model to create a number of responses and choose the most coherent

RAG (Retrieval Augmented Generation)

RAG is a technique which allows useful data to be stored externally which an LLM will then access as and when it needs, allowing for more organised retrieval. The idea is to help redirect an LLM to more relevant and reliable data sources, allowing for greater control over outputs. It works by breaking knowledge sources, such as company records, into meaningful pieces ranked by relevance. These are then worked into the original prompt and fed into the LLM to generate its final text response. This massively narrows search times for LLMs and allows for more focussed generations, while also increasing the efficiency of token consumption.

Flowchart of retrieval-augmented generation: prompt/query searches knowledge sources, adds context, and LLM generates response.

Context processing

This is the next stage of the context engineering process which transforms any acquired information through methods of refinement and data integration.

Self-refinement

Self-refinement is the process of LLMs using cyclical feedback in order to learn from and improve their outputs. This can be likened to human revision techniques where information can be consolidated and refined. The process can be split into three distinct steps: generate, refine and feedback.

Flow diagram: Input to model M, then Refine, Feedback, and back to M in a loop on a white background.

While self-refinement can be an effective tool, it heavily relies on the type of LLM that is used. Complex models can benefit massively from self-refinement, while simple models struggle to use it to much effect. Generally though, the quality of outputs improves with the number of iterations.

Context Management

Finally, there is context management which organises and contextualises information via memory management and compression techniques.

Compaction

A recent study found that as the context window expands with RAG, LLMs become confused and generations become less useful - often way before they reach their assigned token limits (Databricks). This occurs as the models get too hung up on irrelevant information stored in the context leading to confusion. To combat this, we can use compaction - a technique which aims to reduce the size of a conversation history, while retaining important contextual information (part of the management stage). This is a similar concept to compressing images to reduce their size. Importantly, compaction can only be applied to agents that manage their own conversation history; for example local agents such as Claude Code. There are a couple of variations of compaction strategies; following the example of Claude Code, this is done by passing the message history back to the model to summarise and compress it into the most important details.

Line chart of model average answer correctness by context length, with multiple colored lines and a right-side legend of models.

Validation

Another point of contention for many LLMs is conflicting information. As context windows expand, the chance of conflicting instructions and documentation increases. A recent Microsoft research study has shown that this massively reduces the performance of LLMs, often leading to unreliable responses. Therefore, we must turn to methods of validation in order to reduce the occurrence of conflicting information. For constant sources of information such as RAG, this could be done with the use of AI as a quick first check over documents in order to ensure they are aligned. However, there should always be some level of human validation. As we know AIs at this stage aren’t 100% reliable, and while we can reduce the chance of error, the best method at this moment is to manually validate.

Why should we use context engineering?

The main talking point of context engineering is often the cost reduction of prompts. This is important when looking to implement AI usage as a business, with some reports suggesting an average of 90% reduction in costs when implementing context engineering measures.

Despite this, there are also equally important advantages of context engineering. For one, data minimisation; the idea of using the minimum amount of data needed to complete a task. This is important from a business perspective when working with clients to reduce the amount of data leaving your environment. Additionally, better efficiency means faster responses and higher output quality - more structured context means better reasoning. Finally, as green practitioners we must consider the environmental impact of context engineering and token efficiency. According to the International Energy Agency (IEA), AI centres account for 1.5% of all global energy consumption with this expected to rise drastically in the coming years. Therefore, we should increase efficiency for not only our personal benefits, but also to do our part for the environment.

Conclusion

While this blog only covers a small percentage of the research into context engineering, hopefully it acts as a useful starting point for those looking to increase their AI efficiency. I would highly recommend looking into the links provided as they acted as the foundation for a lot of this blog.

If you haven’t quite scratched your AI itch, consider checking out our guide on Building AI-Powered Oracle APEX Apps with Ollama and AWS Bedrock.