LLM 2 min read 23 July 2025

Speeding up chatbot with tools

Author

Company

OVERVIEW

Large Language Models (LLMs) can now use plug-ins to access extra tools. But they often respond slowly when these tools are used. This hurts the user experience.

In this post, we’ll show a simple trick that speeds up LLM responses. It will help you build more practical and efficient LLM-based solutions.

GOALS

Speed up bot answers.

EXAMPLE USE CASE

Let’s say we’re using OpenAI’s ChatGPT via langchain for our holiday rental website where people can talk with chat asking about our website’s current holiday offer or some general geography information. We leave the geography part solely to the LLM, but the one with showing our deals we have to implement on our end. Langchain gives this possibility by using tools.

Let’s say our chatbot function looks already like this:

def get_chatbot_answer(user_message: str) -> str

and the chat has a tool function that looks like this:

def get_holiday_offers(place: str, month: str) -> str

While they’re very easy to use and applicable in this case, the time spent processing tool responses by LLM can be huge. Huge enough to discourage some users from using our chatbot..

SOLUTION

We can divide our chatbot into 2 chatbots, one that will do the same as original, and the other that will decide if we’re using the first one, or the user just wants to get holiday offers (that way we don’t have to use the first, long one). In the other case, we can use the tool ourselves.

So, we can firstly create the first chatbot, that will return 3 outputs, (we can use structured output for that). It will look like this:

def chatbot_preprocessing(user_message: str) -> (bool, str, str)

and will return boolean if the user only wants to see holiday offers, and 2 next variables are place, and month respectively (we only care about those if the boolean is true). Now we can change a flow of our chat a bit:

def get_chatbot_answer_with_preprocessing(user_message: str) -> str:
	only_show_offers, place, month = chatbot_preprocessing(user_message)
	if only_show_offers:
		return get_holiday_offers(place, month)
	else:
		return get_chatbot_answer(user_message)

This way, if the user wants only offers (which is probably more than half the time), the chatbot answers very quickly, because it avoids processing the result by the LLM.

CONCLUSION

It’s very possible to speed up chat conversation not only by getting faster hardware, but also by using some tricks. That particular trick helps a lot with simple chatbots that are made to help website users to quickly see all the available products and services.