LLM Routing — The Heart of Any Practical AI Chatbot Application | by Dr. Aliaksei Mikhailiuk | Jan, 2025

Published:


How to building reliable, scalable, and robust AI applications— explained in five minutes.

Towards Data Science
Image generated by the Author with Imagen 3.

Bigger and bigger models, with more capacity and a larger context window, beat all and everyone. Having built the second most used chatbot to date — Snapchat’s My AI—I have first-handedly seen that practical products do not need the best but most relevant models.

While one model might beat another in the arena for advanced Ph.D.-level math questions, the losing model would be more suitable for your specific request because its response is shorter and to the point.

Choosing not the best in quality, but the right model for your application also comes from the cost and latency perspective — if the majority of your requests are just chit-chat — you would be wasting resources by using the largest models on replying to messages like “Hi! How are you?”.

Besides, what if you want to serve millions of requests? Server capacity is limited, and in peak hours, you might be facing an increased latency for the most in-demand models. Not to make your users wait, you can reply with a less in-demand model that would give an acceptable level of quality.

Related Updates

Recent Updates