Adventures with DSPy
LLM development getting to its maturity stage
I published this post originally on Reasoning Engine
In the past few weeks I’ve became obsessed with DSPy, when I first saw the video DSPy Explained by Connor Shorten everything immediately clicked.
This is the LLM framework I was waiting for, this is the idea I was trying to get to but didn’t knew exactly how, and oh how coherent all the pieces are! It’s a great conceptual abstraction, I noticed myself not needing the docs all the time, even though the library is not properly typed right now, because things just *make sense* together.
If you haven’t dig into it yet, DSPy main selling point is that you just define your pipeline declaratively without worrying about the prompts to the LLM, and then it can optimize the prompts for you automatically. Of course, why should humans be trying to prompt engineer, when we have the best machines to do that: LLMs and trial-and-error.
And that’s kinda obvious right, of course I thought of that before, of course a lot of people thought about that before, having another LLM to improve your prompts, in fact we already have plenty of those like PromptPerfect, however things still felt iffy, kinda unreliable, relying just on feeling, or on a bit of uncanny abstraction. But DSPy looked at PyTorch for inspiration, and this, turns out, was the right abstraction.
Many devs adventured themselves at the problem, and different angles were attempted, I personally took a TDD approach at the problem, which I still think is the right one for sanity checks, I’ve also tried Reactive Streams FP-style approach, some inventented this weird “chains” concept which nobody really likes, and are now attempting Graphs angle.
But turns out, comming from the Machine Learning angle with PyTorch abstraction like DSPy did was the actual right approach, the one that makes me and all devs I talk to go like “oooh that is genius! It makes a lot of sense” and have this sense of relief that finally looks like someone had the right jab at the problem and you’ll be able to build something reliable.
Of course DSPy doesn’t seem the best for every problem right now, it seems to make the most sense for problems with well defined discrete outputs, like traditional supervised learning, for example when you want to use LLMs as a classifier. However, it is a clear a step on the right direction, and you can see a near future where more “feeling-based” evaluators will be also very effective and ubiquotous, for example, if you are writing a children’s stories chatbot and want to reliably evaluate how boring or exciting a story is.
The main value that DSPy brings to the industry then, is not (only) that it optimizes prompts for you allowing you to switch models whenever you like, but that it actually turns the problem on its head: it forces you to change your mindset, thinking first about what does it mean to be a good output or not with examples (like TDD promotes!) before the prompt, freeing you think in a more abstract level about the structure and so, and let the machine do machine-things of finding the best fit given the problem at hand.
Being so bearish in DSPy, I really want to help it grow and contribute to the community because I think it has a lot of future and it is the ahead. And I mean conceptually, even if something else would superseed DSPy with a different name.
Turns out, I already have a startup in the LLM space, LangWatch, so as I played with DSPy for LangWatch experiments, I naturally noticed a piece missing that could help me a lot. Thus, DSPy Visualizer was born.
DSPy Visualizer
DSPy Visualizer is open-source, and it allows you to log your DSPy training sessions, track the performance, costs, compare runs and debug them in detail.
It’s still early days for DSPy or any kinda of automated LLM optimizer, so it really helps a lot being able to understand what is the optimizer trying to do, where is it going, and specially, how much is it costing, as too many examples and too complex pipelines can make your OpenAI credits go down the drain in a heartbeat. Even though LLMs are getting cheaper, we are pushing the automation limits here.
Iteration is the word of order with DSPy, although a much more confortable, reliable iteration than “classical” prompt engineering of course, and we found that looking at the examples being used, the LLM calls being made, and what is happening on all those optimization attempts that is helping it go up or down, gave us lots of ideas of what we can try next, or what is wrong so we can stop the iteration earlier and fix it.
I’ve posted an official blogpost on the DSPy Visualizer on our LangWatch blog, if you want to try it out, check out the quickstart guide.
If you are in for more DSPy content, follow me on twitter and subscribe to this substack!