python

AI-Powered Blog Post Generator: Turning Pixels into Prose with Ollama and Claude

Discover how to automate your AI art workflow with a Python script that processes images, generates content using multimodal LLMs, and posts directly to your Ghost blog.

Hey there, fellow AI enthusiasts and digital artists! Remember when I told you about my nifty little script to automate posting Stable Diffusion images to my BitsofJeremy blog? Well, hold onto your neural networks, because I've just given it a major upgrade. We're talking multimodal LLMs, local and remote AI agents, and enough automation to make a lazy programmer proud. Let's dive into this pixel-perfect, prose-producing powerhouse!

The Problem: Manual Labor in the Digital Age

Picture this: You've just created a masterpiece with Stable Diffusion. It's art, it's beautiful, it's... sitting on your hard drive, waiting to be shared with the world. The old process went something like this:

Open image editor
Convert PNG to JPG
Log into Ghost
Create a new post
Upload the image
Rack your brain for a witty title
Write a blog post (don't forget that generation data!)
Hit publish and pray to the internet gods

Sounds tedious, right? Well, no more! We're bringing in the big guns: Python, Ollama, Claude, and a dash of API magic.

The Solution: AI-Powered Automation

Our new and improved script does all of this for us, and then some. Here's the basic workflow:

Monitor a directory for new Stable Diffusion images
Process the image (resize, watermark, convert to JPG)
Use a multimodal LLM to generate a story and title
Upload everything to Ghost
Archive the original files

Let's break it down, shall we?

Setting Up the Playground

First things first, we need to tell Automatic1111 where to save our masterpieces. In the settings, set the "Directory for saving images using the Save button" to your desired input directory. Also, make sure to set the image filename pattern to [seed]-[prompt_spaces] and enable "Create a text file with infotext next to every generated image". This gives us all the juicy details we need for our AI to work its magic.

Image Processing: From PNG to JPG (with a Twist)

We're using the trusty Pillow library to handle our image processing. Here's a snippet to whet your appetite:

from PIL import Image

for filename in os.listdir(INPUT_DIR):
    if filename.endswith(".png"):
        post_title = os.path.splitext(filename)[0][:16]
        base_filename = f"{post_title}"
        jpg_filename = f"{base_filename}.jpeg"
        jpg_path = os.path.join(OUTPUT_DIR, jpg_filename)

        original_image = Image.open(os.path.join(INPUT_DIR, filename)).convert("RGBA")
        watermark = Image.open(WATERMARK_PATH).resize((120, 120))
        watermark_layer = Image.new("RGBA", original_image.size, (0, 0, 0, 0))
        watermark_layer.paste(watermark, (original_image.width - 120, original_image.height - 120), mask=watermark)
        watermarked_image = Image.alpha_composite(original_image, watermark_layer)
        watermarked_image.convert("RGB").save(jpg_path, "JPEG")

This little beauty not only converts our PNG to JPG but also slaps on a watermark. Because nothing says "this is my art" like a good old watermark, right?

Enter the Multimodal LLMs: Ollama and Claude

Now, here's where things get really interesting. We're using not one, but two AI powerhouses to generate our content: Ollama (running locally) and Claude (via the Anthropic API).

Ollama: Your Local AI Wordsmith

Ollama is like having a tiny AI writer living in your computer. We're using the Llava model, which can "see" images and generate text based on them. Here's how we set it up:

Install Ollama from ollama.ai
Pull the Llava model: ollama pull llava
Use the Ollama Python client to generate content:

from ollama import generate

def agent_ollama(_image, _gen_info, _model):
    with open(_image, "rb") as image_file:
        image_data = image_file.read()
        
        article_prompt = f"""
            Craft an engaging short story inspired by this image.
            Create a narrative that captures the scene, characters, or emotions depicted.
            Adopt a tone that is witty and fun.
            Always keep your output to a maximum of 500 words.

            You may use the following data to help inspire your writing,
            as it pertains to how the image was generated with AI, but do not rely on it, use your creativity:

            {_gen_info}
        """
        article_response = generate(
            model=_model,
            prompt=article_prompt,
            images=[image_data],
            stream=False
        )
        article_story = article_response['response']

        # Similar process for generating title...

    return {
        "title": title.replace('"', '').replace("`", "").strip(),
        "article": article_story
    }

Claude: The Cloud-Based Conversationalist

For those times when you want a bit more oomph, we've got Claude waiting in the wings. This cloud-based AI can handle more complex tasks and potentially generate even more creative content. Here's a taste of how we're using it:

from anthropic import Anthropic

def agent_claude(_image, _gen_info):
    client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))

    with open(_image, "rb") as image_file:
        image_data = base64.b64encode(image_file.read()).decode("utf-8")

    story_prompt = f"""
    Craft an engaging short story inspired by this image.
    Create a narrative that captures the scene, characters, or emotions depicted.
    Adopt a tone that is witty and fun.
    ALWAYS keep your output to a maximum of 500 words.
    ALWAYS output in HTML.

    You may use the following data to help inspire your writing,
    as it pertains to how the image was generated with Stable Diffusion, but do NOT rely on it solely, 
    use your creativity:

    {_gen_info}
    """

    story_message = client.messages.create(
        model="claude-3-5-sonnet-20240620",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": image_media_type,
                            "data": image_data,
                        },
                    },
                    {
                        "type": "text",
                        "text": story_prompt
                    }
                ],
            }
        ],
    )
    
    # Similar process for generating title...

    return {
        "title": title,
        "article": story
    }

Putting It All Together

With our image processed and our AI-generated content in hand, it's time to post to Ghost. We're using the Ghost Admin API to upload our image and create the post. Here's a simplified version of what's happening:

def add_post(post_data):
    jwt_token = get_jwt()
    post_json = {
        "posts": [{
            "title": post_data['title'],
            "tags": post_data['tags'],
            "html": post_data["html"],
            "feature_image": post_data["feature_image"],
            "status": "published",
            "visibility": "members",
            "published_at": post_data['published_at']
        }]
    }
    url = f'{API_URL}/posts/?source=html'
    headers = {
        'Authorization': f'Ghost {jwt_token}',
        "Accept-Version": "v3.0"
    }
    response = requests.post(url, json=post_json, headers=headers)
    # Handle response...

And voilà! Our AI-generated masterpiece is now live on the blog.

The Grand Finale

With this new setup, my workflow has gone from a manual chore to an automated dream. I save an image in Automatic1111, and the script takes care of the rest. It processes the image, generates a story and title using either Ollama or Claude (depending on my mood and the phase of the moon), and posts it all to my blog.

The best part? I can set this up with a cron job to run daily, ensuring a steady stream of AI-generated content without lifting a finger. It's like having a team of robot artists and writers working tirelessly while I sip my coffee and ponder the existential implications of AI-generated art.

If you want to dive deeper into the code and perhaps adapt it for your own nefarious AI art purposes, check out the full project on GitHub: sd_image_processing_and_upload

Got questions? Want to share your own AI art automation stories? Hit me up on BlueSky or Warpcast. Let's geek out about AI, art, and the beautiful automations that bring them together!

Have fun.