Querying Multiple Images
Ready to scale up your image processing game with Griptape? Let’s move from a solo act to a full orchestra, handling multiple images with the grace of a conductor wielding a baton. Here’s how you can set up a robust, scalable workflow that processes multiple images simultaneously, ensuring each one gets the star treatment it deserves.
We could ask the agent to handle multiple images at once, and it can do a relatively good job of it if prompted correctly - but it may not be consistent.
For example, if we prompted with:
For each image in images, can you create an SEO safe description,
define key words, create an alt-description, a caption, and an example html
element? Save it as a YAML file in image_descriptions/filename.yaml
The agent would use the FileManagerTool to get a list of files, and then start a series of actions to describe them. It may decide to do them in parallel (Yay!), or it may do them one at a time (Boo!). For some cases, this is fine - but if we want to turn this into a consistent workflow that is reliable and can always operate in parallel, the best practice is to use a Griptape Workflow.
If you haven't explored Griptape Workflows or Pipelines previously, we highly recommend you check out these other TradeSchool courses:
What's it going to do?
That's cool and all, but what exactly will the workflow do?
Our workflow will follow these steps:
-
Get a list of the images in the directory we pass it.
-
For each image:
-
Clearly describe the image using the ImageQueryTask
-
Based on the description, generate an SEO-friendly description, key words, an alt-description, a caption, and an example HTML image structure.
-
Save the information to disk in YAML format in image_descriptions/filename.yaml
-
-
Tell us when it’s finished.
Import Necessary Classes and Modules
Update your import statements to include all the necessary items.
# ...
import os
# ...
from griptape.structures import Agent, Workflow
from griptape.tasks import TextSummaryTask, ToolTask, ToolkitTask
# ...
Here's a quick description of what each of these are for.
| Library | Description |
|---|---|
os |
The Python os library - we will use this to get a list of files in the images directory. |
Workflow |
Defines a dependency graph of tasks. |
TextSummaryTask |
A quick and simple text task that doesn't include Chain of Thought. |
ToolTask |
The task used to get information about an image. Also doesn't use Chain of Thought. Note: We could also use the ImageQueryTask specifically to get information about the image, but in my testing the ToolTask executed in the same amount of time, and is easier to set up. |
ToolkitTask |
The task that will use FileManagerTool to save the files to disk. This uses Chain of Thought to take the output from the Image Query and generate the proper formatting, then uses the tool to save to disk. |
ToolkitTask
The task that will use FileManagerTool to save the files to disk. This uses Chain of Thought to take the output from the Image Query and generate the proper formatting, then uses the tool to save to disk.
Remove Agent Flow
We’re going to be creating a workflow, so we don't really need to use the Agent code anymore. We could delete it, but we might find it useful to reference while adjusting to our Workflow. To keep it around, we could comment it out - this would keep it in our code, but it wouldn't have any influence.
Another option, however, is to put the agent code in a conditional statement.
We can say to the app "Hey, if flow == "WORKFLOW" then run the workflow code, but if flow != "WORKFLOW" run the agent code. It would look something like this:
| example_flow.py | |
|---|---|
If you were to run this script right now, the output would be:
Conversely, if we set it to "AGENT"..
| example_flow.py | |
|---|---|
We would get:
This is a great way to control the flow of execution in our program & make sure we can run via the agent if we need to.
Modify the code as follows:
# ...
# Configure the ImageQueryTool
image_query_tool = ImageQueryTool(prompt_driver=driver, off_prompt=False)
flow = "AGENT"
if flow == "WORKFLOW":
# Create a workflow
else:
# Create the Agent
agent = Agent(tools=[image_query_tool, FileManagerTool(off_prompt=False)], stream=True)
# Modify the Agent's response to have some color.
def formatted_response(response: str) -> None:
print(f"[dark_cyan]{response}", end="", flush=True)
# Begin Chatting
Chat(
agent,
intro_text="\nWelcome to Griptape Chat!\n",
prompt_prefix="\nYou: ",
processing_text="\nThinking...",
response_prefix="\nAgent: ",
handle_output=formatted_response, # Uses the formatted_response function
).start()
Set Up the Workflow
I lost the orchestra analogy here, but imagine setting up dominoes; each task in a Workflow is a domino. Before I show you how to line them up, you should know that Workflows must always have a start and end task – which is what we’re going to do here; create those around the Workflow.
Insert this code inside the workflow section of your conditional statement.
# ...
flow = "WORKFLOW"
if flow == "WORKFLOW":
# Create a Workflow
workflow = Workflow()
# Create the Start and End tasks.
startTask = TextSummaryTask("We are going to start a new workflow.", id="START")
endTask = TextSummaryTask(
"We have completed the workflow. Summarize what we did {{ parent_outputs }}",
id="END",
)
# Add the tasks to the workflow
workflow.add_tasks(startTask, endTask)
# Run the workflow
workflow.run()
else:
# ...
Give it a try, and you'll see the START and END tasks running.

Getting the Images
Now, let’s get the data ready for the show. Yes, I’m back to the orchestra example again. Keep up. Using the os module, we’ll gather all images from a directory like a talent scout. This step is crucial as it feeds the workflow with the actual data (images) it needs to process.
Add the following code after the creation of the start/end tasks, and before you run the workflow:
Notice that we've added a fake task - the image_summary_task that's just another TextSummaryTask. This is just to demonstrate that the task is inserted and working as expected.
Tip
If this section is confusing, please go review the other Workflow course mentioned earlier - Compare Movies - Workflows. The concepts are well covered there.
Go ahead and run the script - you should see a number of tasks being created based on the files in the images directory.

Image Processing Tasks
Now we'll swap out this fake task, for a real one.
For each VIP (Very Important Picture), create a task that details their best angles. This task uses the ImageQueryTool to generate an SEO-friendly description, keywords, alt-description, caption, and HTML element for each image.
# ...
flow = "WORKFLOW"
if flow == "WORKFLOW":
# ...
# For each image in the directory
image_dir = "./images"
for image in os.listdir(image_dir):
image_path = os.path.join(image_dir, image)
filename = os.path.splitext(image)[0]
# Create an Image Summary Task
image_summary_task = ToolTask(
"Describe this image in detail: {{image_path}}",
context={"image_path": image_path},
tool=image_query_tool,
id=f"{image}",
)
# Insert it to the workflow
workflow.insert_tasks(startTask, [image_summary_task], endTask)
# ...
else:
# ...
If you execute the script now, you'll see that it provides descriptions for each of the images.

Define SEO Output Task
After generating the image description, a ToolkitTask is used to format the SEO data and save it to disk.
We use a ToolkitTask here instead of a ToolTask because the request we have requires a little bit of intelligence from the Agent. ToolkitTasks use Chain of Thought, whereas ToolTasks just use a tool directly.
All in all, they’re just a tad more capable.
So, this task takes the output from the ToolTask and uses the FileManagerTool to save the information in YAML format in a designated directory.
Create the Image SEO Task right after the image summary task, and insert it into the workflow.
# ...
flow = "WORKFLOW"
if flow == "WORKFLOW":
# ...
for image in os.listdir(image_dir):
# ...
# Create an Image Summary Task
image_summary_task = ToolTask(
# ...
)
# Create an Image SEO Task
image_seo_task = ToolkitTask(
"Based on this image description, create the following:\n"
+ "SEO description, Caption, Alt-text, 5 keywords, an HTML snippet to "
+ "display the image. Save this to image_descriptions/{{ filename }}.yml\n"
+ "in YAML format.\n\n{{ parent_outputs }}",
tools=[FileManagerTool(off_prompt=False)],
context={"filename": filename},
id=f"seo_{image}",
)
# Insert it to the workflow
workflow.insert_tasks(startTask, [image_summary_task], endTask)
workflow.insert_tasks(image_summary_task, [image_seo_task], endTask)
# Run the workflow
workflow.run()
else:
# ...
If you try it out, you will see some YAML files in the images_descriptions folder. Here are a couple of examples based on these images:
my_favorite_ball.png:
SEO_description: 'This image features a colorful beach ball on an urban sidewalk with city buildings in the background. The photo has a vintage feel with its warm, slightly faded tones and vignette bordering.'
Caption: 'A colorful beach ball on an urban sidewalk with a vintage feel.'
Alt_text: 'Colorful beach ball on urban sidewalk with city buildings in the background.'
Keywords: ['beach ball', 'urban sidewalk', 'city buildings', 'vintage feel', 'vignette bordering']
HTML_snippet: '<img src="my_favorite_ball.png" alt="Colorful beach ball on urban sidewalk with city buildings in the background." />'
beach.png:
SEO Description: 'A nostalgic beach scene with a vintage look featuring a striped beach umbrella, a sandcastle, and people enjoying various activities.'
Caption: 'Vintage beach scene with a sandcastle under a striped umbrella.'
Alt-text: 'Vintage beach scene with people enjoying various activities.'
Keywords: ['vintage', 'beach', 'umbrella', 'sandcastle', 'activities']
HTML Snippet: '<img src="beach.png" alt="Vintage beach scene with people enjoying various activities."/>'
toy_car.png:
SEO_description: 'Vintage-looking toy car with a nostalgic feel on a wooden surface'
Caption: 'Vintage toy car on a wooden table'
Alt_text: 'Vintage toy car'
Keywords: ['Vintage toy car', 'Classic car', 'Wooden surface', 'Antique look', 'Aged photo']
HTML_snippet: '<img src="toy_car.png" alt="Vintage toy car">'
Code Review
Next Steps
This workflow not only automates the process of generating and saving SEO-friendly image descriptions but also ensures that the tasks are performed in a consistent, reliable manner across multiple images.
Take a breather before we move onto the next step. We're so close, only one section left where we add a Template file to improve consistency with output. When you're ready, continue to Part 7.