Better Markdown Output with OpenAI

I've been on a bit of a tear experimenting with new tools and the OpenAI proxy. One thing I really wanted that was missing from my original solution is nicely structured and highlighted output. I went down a rabbit hole and added a bunch of complexity before coming up with a solution that was actually simpler than what I started with!

Markdown is my language of choice for pretty much all documentation. It adds just enough structure and doesn't get in the way when writing it. But if I wanted nice, consistent syntax highlighting, I knew I'd need nice, consistent output. Even after setting a temperature of 0.0, my headings would be random:

1
2
3
#### HEADING
## **HEADING**
### HEADING:

And so on. Random headings lead to random formatting, which aside from being inconsistent is just really irksome. One of the unexpected benefits I found to running a tool like extract_wisdom is a consistent way of structuring knowledge; each author's individual writing style was replaced by just the facts, ma'am. And the journalist softly weeps.

Anyways, back to feeding the machine.

So, how to get more structured output? I'd been meaning to check out LangChain, and their JsonOutputParser sounded perfect. And it worked great! I moved all the output format logic out of my markdown prompts and into a Pydantic class like so:

1
2
3
4
5
6
7
class CThinkOutput(BaseModel):
    title: str = Field(description="Title derived from the topic of the submission")
    rating: str = Field(
        description="Overall of the arguments posed (Strong|Adequate|Weak)"
    )
    summary: str = Field(description="Brief summary of the article")
  ...

This would return JSON that looked something like this:

1
2
3
4
5
{
  "title": "A blog post about adventures in structured output",
  "rating": "Adequate",
  "summary": "The author somehow spends 500 words discovering the obvious solution"
}

I then refactored this out of api.py into output.py and modified my client to insert each of the items into the appropriate section of output:

1
2
3
4
5
6
7
8
9
def print_response(response: requests.Response):
    resp = response.json().get("response")
    output = f"""
# {resp["title"]}

Rating: {resp.get("rating")}

{resp.get("summary")}
...

So, now whenever I wanted to add a new API, I just needed to:

  1. Add the markdown prompt.
  2. Define all of the fields I wanted to extract in a new class in output.py.
  3. Import the output class to api.py.
  4. Create an output template in client/<name>/main.py that defined my desired output format.
  5. Create a new tool class in api.py.
  6. Import the tool class to app.py.
  7. Add the API route.
  8. Debug several times because I forgot imports, or missed output fields.

Having done this for one of my existing tools, I immediately became 30% less excited about adding in new tools. Even after updating my hellish bash script to do it for me, it just felt. Messy. One of the things that really clicked with me about Fabric is that the Markdown is more or less all you need.

Luckily it was my wife's birthday, so I was prevented from migrating any more of my tools to this structure and instead spent the afternoon enjoying the city and the evening sipping champagne.

In Vino Veritas

I woke up with the solution. The problem with the existing formatting instructions is that I was doing too much at once:

1
1. Summarize the articles content, with particular attention to the predilection towards navel gazing, in a section called SUMMARY. Be sure to somehow maintain a consistent name for this section between many different iterations...

I would fail to consistently respond to this prompt. Here's the fix:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## Output Template

Structure the output like this:

---
# {Title}

Rating: {Rating}

{Summary}
...
---

Here's what to put in each section:

- Title: Title derived from the topic of the submission
- Rating: Overall of the arguments posed (Strong|Adequate|Weak). If there are less than 50 then collect all of them.
- Summary: Brief summary of the article.
...

It worked flawlessly! Separating structure and content guidelines made it far easier to be clear about both. AND it's all back in a nice, simple Markdown file. Now my update process looks like this:

  1. Add the markdown prompt.
  2. Create a new tool class in api.py.
  3. Import the tool class to app.py.
  4. Add the API route.
  5. Copy an existing client and update the path.

So much simpler.

Colours

Adding the colours was actually surprisingly easy since Python has already solved this problem in several different ways. We can just use pygments, one of those lovely Python solution to a problem that somehow only need 10 lines of code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from pygments import highlight
from pygments.formatters import Terminal256Formatter
from pygments.lexers import get_lexer_by_name

...
def print_output(output: str) -> None:
    print(
        highlight(
            output,
            lexer=get_lexer_by_name("markdown"),
            formatter=Terminal256Formatter(style="github-dark"),
        )
    )

Presto! Beautiful, colourful Markdown input with minimal fuss.

<<