"Which LLM is the best for this use case?" is the question you probably have asked yourself many times if you are working with Large Language Models (LLMs) — and unfortunately, there is no universal answer to this question.
The short answer is: It depends on your use case, your budget, and your personal preferences.
That doesn't help you though...
Let me give you a step-by-step approach to figure out which LLM to pick:
If you are new to the game, start with the industry standard, OpenAI GPT-4o mini for simple tasks and GPT-4o for more complex ones. Use Promptmetheus to craft and optimize your prompts until you get satisfying and reproducible completions. If you do not get anywhere with OpenAI models, try different providers like Anthropic, Cohere, or Gemini.
Once you have a working prompt, try to optimize it for performance, speed, reliability, and/or cost (depending on your requirements) by comparing different LLMs and configurations of model parameters (temperature, frequency penalty, etc.). As a rule of thumb, the cheapest model which is fast enough and does the job is the one to go with.
If you are more experienced with Prompt Engineering and AI development, start with the model that historically worked best for use cases similar to the one you have at hand (keep in mind that the technology is evoling fast and your initial selection will likely change every few months). But before you go into the fine-tuning of your prompt, take an early version and execute it with a few different LLMs to see which one is most promising. Use that one to fine-tune your prompt. Once you achieve great results, revisit your model choice and optimize for performance, speed, and reliability.