Querying Multiple Images

Ready to scale up your image processing game with Griptape? Let’s move from a solo act to a full orchestra, handling multiple images with the grace of a conductor wielding a baton. Here’s how you can set up a robust, scalable workflow that processes multiple images simultaneously, ensuring each one gets the star treatment it deserves.

We could ask the agent to handle multiple images at once, and it can do a relatively good job of it if prompted correctly - but it may not be consistent.

For example, if we prompted with:

For each image in images, can you create an SEO safe description, 
define key words, create an alt-description, a caption, and an example html 
element? Save it as a YAML file in image_descriptions/filename.yaml

The agent would use the FileManagerTool to get a list of files, and then start a series of actions to describe them. It may decide to do them in parallel (Yay!), or it may do them one at a time (Boo!). For some cases, this is fine - but if we want to turn this into a consistent workflow that is reliable and can always operate in parallel, the best practice is to use a Griptape Workflow.

If you haven't explored Griptape Workflows or Pipelines previously, we highly recommend you check out these other TradeSchool courses:

What's it going to do?

That's cool and all, but what exactly will the workflow do?

Our workflow will follow these steps:

Get a list of the images in the directory we pass it.
For each image:
1. Clearly describe the image using the ImageQueryTask
2. Based on the description, generate an SEO-friendly description, key words, an alt-description, a caption, and an example HTML image structure.
3. Save the information to disk in YAML format in image_descriptions/filename.yaml
Tell us when it’s finished.

Import Necessary Classes and Modules

Update your import statements to include all the necessary items.

app.py

# ...
import os
# ...
from griptape.structures import Agent, Workflow
from griptape.tasks import TextSummaryTask, ToolTask, ToolkitTask 
# ...

Here's a quick description of what each of these are for.

Library	Description
`os`	The Python `os` library - we will use this to get a list of files in the images directory.
`Workflow`	Defines a dependency graph of tasks.
`TextSummaryTask`	A quick and simple text task that doesn't include Chain of Thought.
`ToolTask`	The task used to get information about an image. Also doesn't use Chain of Thought. Note: We could also use the `ImageQueryTask` specifically to get information about the image, but in my testing the `ToolTask` executed in the same amount of time, and is easier to set up.
`ToolkitTask`	The task that will use FileManagerTool to save the files to disk. This uses Chain of Thought to take the output from the Image Query and generate the proper formatting, then uses the tool to save to disk.

ToolkitTask

The task that will use FileManagerTool to save the files to disk. This uses Chain of Thought to take the output from the Image Query and generate the proper formatting, then uses the tool to save to disk.

Remove Agent Flow

We’re going to be creating a workflow, so we don't really need to use the Agent code anymore. We could delete it, but we might find it useful to reference while adjusting to our Workflow. To keep it around, we could comment it out - this would keep it in our code, but it wouldn't have any influence.

Another option, however, is to put the agent code in a conditional statement. We can say to the app "Hey, if flow == "WORKFLOW" then run the workflow code, but if flow != "WORKFLOW" run the agent code. It would look something like this:

example_flow.py
flow = "WORKFLOW"
if flow == "WORKFLOW":
  print ("This would be where the WORKFLOW code goes.")
else:
  print ("This is where the AGENT code would go.")

If you were to run this script right now, the output would be:

This is where the WORKFLOW code goes.

Conversely, if we set it to "AGENT"..

example_flow.py
flow = "AGENT"
if flow == "WORKFLOW":
  print ("This would be where the WORKFLOW code goes.")
else:
  print ("This is where the AGENT code would go.")

We would get:

This is where the AGENT code would go.

This is a great way to control the flow of execution in our program & make sure we can run via the agent if we need to.

Modify the code as follows:

app.py

# ...
# Configure the ImageQueryTool
image_query_tool = ImageQueryTool(prompt_driver=driver, off_prompt=False)

flow = "AGENT"
if flow == "WORKFLOW":
  # Create a workflow 
else:
  # Create the Agent
  agent = Agent(tools=[image_query_tool, FileManagerTool(off_prompt=False)], stream=True)


  # Modify the Agent's response to have some color.
  def formatted_response(response: str) -> None:
      print(f"[dark_cyan]{response}", end="", flush=True)


  # Begin Chatting
  Chat(
      agent,
      intro_text="\nWelcome to Griptape Chat!\n",
      prompt_prefix="\nYou: ",
      processing_text="\nThinking...",
      response_prefix="\nAgent: ",
      handle_output=formatted_response,  # Uses the formatted_response function
  ).start()

Set Up the Workflow

I lost the orchestra analogy here, but imagine setting up dominoes; each task in a Workflow is a domino. Before I show you how to line them up, you should know that Workflows must always have a start and end task – which is what we’re going to do here; create those around the Workflow.

Insert this code inside the workflow section of your conditional statement.

app.py

# ...

flow = "WORKFLOW"
if flow == "WORKFLOW":
  # Create a Workflow
  workflow = Workflow()

  # Create the Start and End tasks.
  startTask = TextSummaryTask("We are going to start a new workflow.", id="START")
  endTask = TextSummaryTask(
      "We have completed the workflow. Summarize what we did {{ parent_outputs }}",
      id="END",
  )

  # Add the tasks to the workflow
  workflow.add_tasks(startTask, endTask)

  # Run the workflow
  workflow.run()
else:
  # ...

Give it a try, and you'll see the START and END tasks running.

Workflow start and end tasks

Getting the Images

Now, let’s get the data ready for the show. Yes, I’m back to the orchestra example again. Keep up. Using the os module, we’ll gather all images from a directory like a talent scout. This step is crucial as it feeds the workflow with the actual data (images) it needs to process.

Add the following code after the creation of the start/end tasks, and before you run the workflow:

app.py
# ...

flow = "WORKFLOW"
if flow == "WORKFLOW":
  # ...
  # Add the tasks to the workflow
  workflow.add_tasks(startTask, endTask)

  # For each image in the directory
  image_dir = "./images"
  for image in os.listdir(image_dir):
      image_path = os.path.join(image_dir, image)
      filename = os.path.splitext(image)[0]

      # Create a temporary summary task
      image_summary_task = TextSummaryTask(
          f"What image is this: {image_path}", id=f"summary_{image}"
      )

      # Insert it to the workflow
      workflow.insert_tasks(startTask, [image_summary_task], endTask)

  # Run the workflow
  workflow.run()
else:
  # ...

Notice that we've added a fake task - the image_summary_task that's just another TextSummaryTask. This is just to demonstrate that the task is inserted and working as expected.

Tip

If this section is confusing, please go review the other Workflow course mentioned earlier - Compare Movies - Workflows. The concepts are well covered there.

Go ahead and run the script - you should see a number of tasks being created based on the files in the images directory.

Tasks being created

Image Processing Tasks

Now we'll swap out this fake task, for a real one.

For each VIP (Very Important Picture), create a task that details their best angles. This task uses the ImageQueryTool to generate an SEO-friendly description, keywords, alt-description, caption, and HTML element for each image.

app.py

# ...

flow = "WORKFLOW"
if flow == "WORKFLOW":
  # ...
  # For each image in the directory
  image_dir = "./images"
  for image in os.listdir(image_dir):
      image_path = os.path.join(image_dir, image)
      filename = os.path.splitext(image)[0]

      # Create an Image Summary Task
      image_summary_task = ToolTask(
          "Describe this image in detail: {{image_path}}",
          context={"image_path": image_path},
          tool=image_query_tool,
          id=f"{image}",
      )

      # Insert it to the workflow
      workflow.insert_tasks(startTask, [image_summary_task], endTask)

  # ...
else:
  # ...

If you execute the script now, you'll see that it provides descriptions for each of the images.

The results of querying an image

Define SEO Output Task

After generating the image description, a ToolkitTask is used to format the SEO data and save it to disk.

We use a ToolkitTask here instead of a ToolTask because the request we have requires a little bit of intelligence from the Agent. ToolkitTasks use Chain of Thought, whereas ToolTasks just use a tool directly.

All in all, they’re just a tad more capable.

So, this task takes the output from the ToolTask and uses the FileManagerTool to save the information in YAML format in a designated directory.

Create the Image SEO Task right after the image summary task, and insert it into the workflow.

app.py

# ...

flow = "WORKFLOW"
if flow == "WORKFLOW":
  # ...
  for image in os.listdir(image_dir):

      # ...

      # Create an Image Summary Task
      image_summary_task = ToolTask(
        # ...
      )

      # Create an Image SEO Task
      image_seo_task = ToolkitTask(
          "Based on this image description, create the following:\n"
          + "SEO description, Caption, Alt-text, 5 keywords, an HTML snippet to "
          + "display the image. Save this to image_descriptions/{{ filename }}.yml\n"
          + "in YAML format.\n\n{{ parent_outputs }}",
          tools=[FileManagerTool(off_prompt=False)],
          context={"filename": filename},
          id=f"seo_{image}",
      )

      # Insert it to the workflow
      workflow.insert_tasks(startTask, [image_summary_task], endTask)
      workflow.insert_tasks(image_summary_task, [image_seo_task], endTask)

  # Run the workflow
  workflow.run()
else: 
  # ...

If you try it out, you will see some YAML files in the images_descriptions folder. Here are a couple of examples based on these images:

my_favorite_ball.yml

my_favorite_ball.png:
  SEO_description: 'This image features a colorful beach ball on an urban sidewalk with city buildings in the background. The photo has a vintage feel with its warm, slightly faded tones and vignette bordering.'
  Caption: 'A colorful beach ball on an urban sidewalk with a vintage feel.'
  Alt_text: 'Colorful beach ball on urban sidewalk with city buildings in the background.'
  Keywords: ['beach ball', 'urban sidewalk', 'city buildings', 'vintage feel', 'vignette bordering']
  HTML_snippet: '<img src="my_favorite_ball.png" alt="Colorful beach ball on urban sidewalk with city buildings in the background." />'

beach.yml

beach.png:
  SEO Description: 'A nostalgic beach scene with a vintage look featuring a striped beach umbrella, a sandcastle, and people enjoying various activities.'
  Caption: 'Vintage beach scene with a sandcastle under a striped umbrella.'
  Alt-text: 'Vintage beach scene with people enjoying various activities.'
  Keywords: ['vintage', 'beach', 'umbrella', 'sandcastle', 'activities']
  HTML Snippet: '<img src="beach.png" alt="Vintage beach scene with people enjoying various activities."/>'

toy_car.yml

toy_car.png:
  SEO_description: 'Vintage-looking toy car with a nostalgic feel on a wooden surface'
  Caption: 'Vintage toy car on a wooden table'
  Alt_text: 'Vintage toy car'
  Keywords: ['Vintage toy car', 'Classic car', 'Wooden surface', 'Antique look', 'Aged photo']
  HTML_snippet: '<img src="toy_car.png" alt="Vintage toy car">'

Code Review

app.py
from dotenv import load_dotenv
import os

# Griptape Items
from griptape.drivers import OpenAiChatPromptDriver
from griptape.structures import Agent, Workflow
from griptape.tasks import TextSummaryTask, ToolTask, ToolkitTask
from griptape.utils import Chat
from griptape.tools import ImageQueryTool, FileManagerTool


from rich import print as print  # Modifies print to use the Rich library

load_dotenv()  # Load your environment

# Create an Image Query Driver
driver = OpenAiChatPromptDriver(model="gpt-4o")


# Configure the ImageQueryTool
image_query_tool = ImageQueryTool(prompt_driver=driver, off_prompt=False)

flow = "WORKFLOW"
if flow == "WORKFLOW":
    # Create a Workflow
    workflow = Workflow()

    # Create the Start and End tasks.
    startTask = TextSummaryTask("We are going to start a new workflow.", id="START")
    endTask = TextSummaryTask(
        "We have completed the workflow. Summarize what we did {{ parent_outputs }}",
        id="END",
    )

    # Add the tasks to the workflow
    workflow.add_tasks(startTask, endTask)

    # For each image in the directory
    image_dir = "./images"
    for image in os.listdir(image_dir):
        image_path = os.path.join(image_dir, image)
        filename = os.path.splitext(image)[0]

        # Create an Image Summary Task
        image_summary_task = ToolTask(
            "Describe this image in detail: {{ image_path }}",
            context={"image_path": image_path},
            tool=image_query_tool,
            id=f"{image}",
        )

        # Create an Image SEO Task
        image_seo_task = ToolkitTask(
            "Based on this image description, create the following:\n"
            + "SEO description, Caption, Alt-text, 5 keywords, an HTML snippet to "
            + "display the image. Save this to image_descriptions/{{ filename }}.yml\n"
            + "in YAML format.\n\n{{ parent_outputs }}",
            tools=[FileManagerTool(off_prompt=False)],
            context={"filename": filename},
            id=f"seo_{image}",
        )

        # Insert it to the workflow
        workflow.insert_tasks(startTask, [image_summary_task], endTask)
        workflow.insert_tasks(image_summary_task, [image_seo_task], endTask)

    # Run the workflow
    workflow.run()
else:
    # Create the Agent
    agent = Agent(tools=[image_query_tool, FileManagerTool(off_prompt=False)], stream=True)

    # Modify the Agent's response to have some color.
    def formatted_response(response: str) -> None:
        print(f"[dark_cyan]{response}", end="", flush=True)

    # Begin Chatting
    Chat(
        agent,
        intro_text="\nWelcome to Griptape Chat!\n",
        prompt_prefix="\nYou: ",
        processing_text="\nThinking...",
        response_prefix="\nAgent: ",
        handle_output=formatted_response,  # Uses the formatted_response function
    ).start()

Next Steps

This workflow not only automates the process of generating and saving SEO-friendly image descriptions but also ensures that the tasks are performed in a consistent, reliable manner across multiple images.

Take a breather before we move onto the next step. We're so close, only one section left where we add a Template file to improve consistency with output. When you're ready, continue to Part 7.