LLM Routing — The Heart of Any Practical AI Chatbot Application | by Dr. Aliaksei Mikhailiuk | Jan, 2025

How to building reliable, scalable, and robust AI applications— explained in five minutes.

Image generated by the Author with Imagen 3.

Bigger and bigger models, with more capacity and a larger context window, beat all and everyone. Having built the second most used chatbot to date — Snapchat’s My AI—I have first-handedly seen that practical products do not need the best but most relevant models.

While one model might beat another in the arena for advanced Ph.D.-level math questions, the losing model would be more suitable for your specific request because its response is shorter and to the point.

Choosing not the best in quality, but the right model for your application also comes from the cost and latency perspective — if the majority of your requests are just chit-chat — you would be wasting resources by using the largest models on replying to messages like “Hi! How are you?”.

Besides, what if you want to serve millions of requests? Server capacity is limited, and in peak hours, you might be facing an increased latency for the most in-demand models. Not to make your users wait, you can reply with a less in-demand model that would give an acceptable level of quality.

LLM Routing — The Heart of Any Practical AI Chatbot Application | by Dr. Aliaksei Mikhailiuk | Jan, 2025

How to building reliable, scalable, and robust AI applications— explained in five minutes.

Related Updates

Mexico—but it’s still fighting other fees – Investorempires.com

Trump tariffs trigger steepest drop for US stocks since 2020 as China, EU vow to hit back

Bitcoin As “Digital Gold” Narrative At Risk: JPMorgan

The Trump Tariffs Are How Everything Works Now

Recent Updates

Mexico—but it’s still fighting other fees – Investorempires.com

Trump tariffs trigger steepest drop for US stocks since 2020 as China, EU vow to hit back

Bitcoin As “Digital Gold” Narrative At Risk: JPMorgan

The Trump Tariffs Are How Everything Works Now