Creating Images
Overview
Image Generation via Griptape is handled via a few components.
- Image Generation Driver - Determines the model to be used. For example, OpenAI DALL·E 3 or Leonardo.AI.
- Image Generation Task or Image Generation Tool. The Task or Tool is what will be provided to the Griptape Structure. Pipelines and Workflows can use
PromptImageGenerationTask
directly. You can provide anPromptImageGenerationTool
to an Agent, to a ToolTask, or to a ToolkitTask.
For example, to create an image with OpenAI DALL·E 3 as a task you could do something like:
# Create an Image Generation Driver
driver = OpenAiImageGenerationDriver(
model="dall-e-3", api_type="open_ai", image_size="1024x1024"
)
# Create an Image Generation Task
task = PromptImageGenerationTask(
"Create a drawing of a pineapple",
image_generation_driver=driver,
output_dir="images"
)
Once you generate the task, you will add it to the pipeline or workflow.
You can also use the PromptImageGenerationTool
tool and assign it to an Agent. It takes many of the same arguments. If you had previously created the driver
as specified above, you would do something like:
agent = Agent(
tools=[PromptImageGenerationTool(
image_generation_driver=driver,
output_dir="images",
off_prompt=False,
)]
)
In this course, because we're focusing on image generation as part of a Pipeline, we'll generate images using a Task.
The Image Task
To get started, we'll begin by replacing the Fake Image Generation task with a real one, using OpenAI DALL·E 3. We'll start with the basics, and adjust settings in a future step. For now, we just want to get things working.
Imports
To use the Driver and Task we'll need to add them to our imports
section in app.py
. You'll modify griptape.tasks
to include PromptImageGenerationTask
, and add imports for the Driver and Engine.
# ...
# Griptape
from griptape.structures import Pipeline
from griptape.tasks import PromptTask, PromptImageGenerationTask
from griptape.drivers import OpenAiImageGenerationDriver
# ...
Create the Driver
Now we'll create our image generation driver. We'll dive into detail about some of the settings on the driver later, but first we'll get everything hooked up and working. Remember, the Driver controls what Image Generation Model we'll be using.
In app.py
, create the driver before you create the pipeline.
# ...
load_dotenv() # Load your environment
# Create the driver
image_driver = OpenAiImageGenerationDriver(
model="dall-e-3", api_type="open_ai", image_size="1024x1024"
)
# Create the pipeline object
# ...
Replace the ImageTask
Next, we'll replace our fake image generation task with a real image generation task. Find the section of the code where we're creating the image task with generate_image_task
and replace it with PromptImageGenerationTask
.
# ...
generate_image_task = PromptImageGenerationTask(
"{{ parent_output }}",
image_generation_driver=image_driver,
output_dir="images",
id="Generate Image Task",
)
# ...
Notice we're giving it the image_generation_driver
we defined earlier as image_driver
. We're also specifying an output_dir
of images
. This will ensure the image is generated in that directory.
Tip
With the PromptImageGenerationTask
, if you want to save the file to disk you must specify either the output file name (output_file
) or the directory you want the images to appear in (output_dir
). If you don't, the image generated will only exist in the ImageArtifact
.
I recommend saving the file using output_dir
, as we'll be able to retrieve the name of the image artifact in the next task.
Test
Give your application a test run. In the results you will see a PromptImageGenerationTask
getting run, and then information on the image size and where it's written out. Here's a section of the resulting log. I've highlighted the log information of the file being written and the Output.
Info
Notice the image name is not in the Output
text. It's part of the INFO, but not in Output - therefore not getting passed back from the Image Generation Task. I'll demonstrate how to get it later in this section.
[12/16/23 17:58:27] INFO PromptTask Create Prompt Task
Output: "Generate an image of a cow, styled and framed as if it were taken with a 1970s Polaroid camera."
INFO PromptImageGenerationTask Generate Image Task
Input: "Generate an image of a cow, styled and framed as if it were taken with a 1970s Polaroid camera."
[12/16/23 17:58:41] INFO Saving [Image, dimensions: 1024x1024, type: image/png, size: 3147861 bytes] to
/Users/jason/Documents/courses/griptape-image-pipeline/images/image_artifact_231216175841_iuy3.png
INFO PromptImageGenerationTask Generate Image Task
Output: Image, dimensions: 1024x1024, type: image/png, size: 3147861 bytes
As you can see, the image has been written to the images
directory. Let's take a look at it!
Beautiful!
The Display Image Task
At the moment if you look at the output from the Display Image task
, we're using {{ parent_output }}
to get the data from the PromptImageGenerationTask
. However, looking at the logs you can see the information does not contain the information we want - the path to the actual file. I've highlighted the Output
from the PromptImageGenerationTask
, and the input
for the Display Image Task
. Notice the image name isn't there.
[12/17/23 05:16:32] INFO Saving [Image, dimensions: 1024x1024, type: image/png, size: 3147861 bytes] to
/Users/jason/Documents/courses/griptape-image-pipeline/images/image_artifact_231217051632_wazr.png
INFO PromptImageGenerationTask Generate Image Task
Output: Image, dimensions: 1024x1024, type: image/png, size: 3147861 bytes
INFO PromptTask Display Image Task
Input:
Pretend to display the image to the user.
Image, dimensions: 1024x1024, type: image/png, size: 3147861 bytes.
[12/17/23 05:16:35] INFO PromptTask Display Image Task
Output: [Displaying Image]
Image Details:
- Dimensions: 1024x1024
- Type: image/png
- Size: 3147861 bytes
So how do we get the name of the image?
Fortunately, there is more information you can get from the parent
task in a pipeline.
Pipeline Context
Take a look at the Context section of the Griptape Pipeline documentation. It points out that Pipelines have access to a few context variables:
parent_output
: output from the parent. We've been using this one.parent
: parent task.child
: child task.
We know the parent_output
doesn't include the information we want - but what about the parent
? What attributes are available in the task itself?
In order to discover that, we need to know what kind of Artifact
the task outputs. This will tell us the attributes we can get. For example, the BaseArtifact
contains attributes like id
, name
, value
, and type
. This means any type of artifact based on the BaseArtifact
will have those attributes, and you can get them by using something like {{ parent.output.value }}
, {{ parent.output.id}}
, {{parent.output.name}}
, etc.
Tip
Artifacts are used for passing different types of data between Griptape components.
However, if we do this with our code you'll see that the value
being passed back from the ImageGenerationTask are the raw bytes of the image. It's way too big for the LLM, and isn't what we want anyway.
To demonstrate (and so you don't need to do it yourself), I'm going to replace the {{ parent_output }}
in the display_image_task
with {{ parent.output.value }}
and show the results.
# ...
display_image_task = PromptTask(
"""
Pretend to display the image to the user.
{{ parent.output.value }}
""",
id="Display Image Task",
)
# ...
The result is a giant wall of text like:
02\x03\x02\x02\xfb\xfd\xfc\xfc\xfb\x01\x01\x02\xff\x04\x06\x02\x01\x00\x01\xff\x00\xff\xff\xfb\xfb\x03\x06\x05\xff\xfc\x01\xfd\x02\xfd\n\x06\x08\xef\xf2\xf3\x0b\x0b\x07\xff\xff
\x02\xfb\xfb\xfc\x05\x06\x03\xfa\xfc\xff\xf8\xf9\xfb\x06\x07\x06\x02\x00\x00\xfb\xfb\xfc\x03\x04\x04\xfe\xff\xfe\x02\x03\x03\xff\xfb\xfc\xff\x00\x00\x03\x05\x02\x02\x01\x00\xfa
\xf7\xfd\xff\x01\x02\xfb\xfe\xfe\x07\x08\x04\xfa\xfc\xfe\x01\xff\x02\x03\x01\xfc\xfe\xfa\xfb\x00\x05\x06\xfe\x00\x01\x02\x00\x02\xff\xfd\xff\xfa\xfb\xfc\x01\x04\x02\xee\xf3\xf6
\x01\x04\x04\xfc\xff\x01\xf7\xf9\xf9\x01\x01\x03\x05\x03\x03\t\n\x06\x01\xff\xfe\x03\x01\x02\x01\xfe\xff\x04\x02\x00\x0b\x06\x04\x00\x01\xff\xff\xfe\xfe\x04\x04\x04\xf9\xfa\xfd
\xfd\xfe\xff\x02\x03\x00\xff\x01\x02\xf8\xfb\xfa\x02\xff\x03\xfe\xff\xfe\xff\x00\x00\xf8\xfd\xff\x04\x02\x01\x03\xfc\x00\x01\x01\x00\xf6\xfb\xfb\x0c\x0c\t\x01\x03\xfe\xfb\xfa\x
ff\xfb\xfb\xfc\x01\x00\x01\xf5\xf9\xfb\t\x0e\x0b\xef\xf8\xf8\xf0\xf0\xf9\xf5\xf8\xfa\xfe\x01\x05\xf8\xfe\xfc\xfd\xfe\x00\xf8\xf8\xfc\xf9\xf9\xfc\x07\n\x07\xf6\xf7\xfc\xfd\x01\x
ff\xf7\xfa\xfc\x01\x02\x01\x02\x04\x07\xf9\xfb\xfb\xff\xff\xfe\x01\x04\x03\xfe\x00\xfe\xff\xfe\x00\x05\x04\x05\x01\xff\xff\xfa\xf8\xfc\t\t\x03\x02\xff\x00\xf8\xfa\xfc\x02\x01\x
03\x06\x05\x01\x01\x00\x00\xfb\xfb\xff\x05\x02\xff\r\x07\x01\x08\x06\x00\x04\xfb\xff\n\x08\x03\x01\xfe\xfe\xfd\x00\x01\x01\x01\xff\x04\x03\x02\xfd\xfc\xff\x00\x05\x03\x00\x01\x
01\xf8\xf8\xff\xf4\xfa\xff\x06\x02\x00\x05\x03\x02\x03\x01\xfd\xfa\xfb\xfc\x02\x03\x04\xf9\xf8\xfd\x07\x0b\t\xf9\xfc\xfa\xfa\xfd\x01\xff\xff\x00\x06\x08\x06\xfa\xf9\xf9\x01\x01
\x01\xf9\xfb\xfb\x08\x06\x06\xff\xfd\xfc\r\x10\t\xfa\xf8\xfa\x04\x00\x00\xfb\xf8\xfc\x04\x07\x06\x02\x03\xff\x05\x02\x03\r\n\x07\xf4\xf2\xf8\x05\x04\x01\x01\x02\x00\xf8\xf9\xfc
So obviously the .value
isn't what we want in this case.
But what other attributes are available?
To discover this, we need to navigate our code and view what Artifact the ImageGenerationTask
outputs.
PromptImageGenerationTask Artifact
In Visual Studio Code, you can learn more about components of your code by Navigating to it.
To do this, hover over PromptImageGenerationTask
use the Keyboard shortcut Cmd on Mac, or Ctrl on Windows and click on it. Additionally, you can just click on PromptImageGenerationTask
with the Right Mouse Button and choose Go to Definition
This will open the class definition PromptImageGenerationTask
in your editor (prompt_image_generation_task.py
).
If you scroll down in the code until you find the section where the run
method is defined, you'll see the output artifact:
# ...
def run(self) -> ImageArtifact:
image_artifact = self.image_generation_driver.generate_image(
prompts=[self.input.to_text()], rulesets=self.all_rulesets, negative_rulesets=self.negative_rulesets
)
# ...
As you can see, it is an ImageArtifact
.
Navigate to this definition (again, using Cmd click, Ctrl click, or RMB --> Go to Definition
).
At the top of the class you'll see the attributes being defined:
# ...
class ImageArtifact(BlobArtifact):
"""ImageArtifact is a type of BlobArtifact that represents an image.
Attributes:
value: Raw bytes representing the image.
name: Artifact name, generated using creation time and a random string.
mime_type: The mime type of the image, like image/png or image/jpeg.
width: The width of the image in pixels.
height: The height of the image in pixels.
model: Optionally specify the model used to generate the image.
prompt: Optionally specify the prompt used to generate the image.
"""
# ...
Remember previously I tried replacing {{ parent_output }}
with {{ parent.output.value}}
. As you can see from the definition, the value
is the "Raw bites representing the image".
But what other attributes do you see available that might be useful?
How about that name
attribute? It says it's the name of the artifact, generated using creation time and a random string. Let's swap that in and give it a try.
parent.output.name
Navigate back to your app.py
in Visual Studio Code.
Inside the display_image_task
section, replace {{ parent_output }}
with {{ parent.output.name }}
.
# ...
display_image_task = PromptTask(
"""
Pretend to display the image to the user.
{{ parent.output.name }}
""",
id="Display Image Task",
)
# Add tasks to pipeline
# ...
Note
It's important to note that it's not {{ parent_output_name }}
. We're grabbing an attribute of parent
task. We need to use .
, not _
.
Test
Run the code and in the logs take a look at the Input
to the Display Image Task
.
[12/17/23 06:03:36] INFO Saving [Image, dimensions: 1024x1024, type: image/png, size: 3147861 bytes] to
/Users/jason/Documents/courses/griptape-image-pipeline/images/image_artifact_231217060336_570j.png
INFO PromptImageGenerationTask Generate Image Task
Output: Image, dimensions: 1024x1024, type: image/png, size: 3147861 bytes
INFO PromptTask Display Image Task
Input:
Pretend to display the image to the user.
image_artifact_231217060336_570j.png
[12/17/23 06:03:39] INFO PromptTask Display Image Task
Output: [Displaying Image: image_artifact_231217060336_570j.png]
See how it now gives us the name of the image? This is exactly what we need, except it's not the full path. Remember in the PromptImageGenerationTask
we're specifying the output_dir
. We should be sure to include this as well.
Let's add some more context to the task, providing the output_dir
. We'll also create a variable earlier in our code for the output_dir
so we only need to define it once.
Define output_dir
In app.py
, right after load_dotenv()
, let's create a section for any variables we might want. At the moment it's just output_dir
, but it's always good to define these in a single place.
# ...
load_dotenv() # Load your environment
# Variables
output_dir = "images"
# Create the driver
# ...
Replace in generate_image_task
Now go down to the generate_image_task
and use the output_dir
variable in the PromptImageGenerationTask
.
# ...
generate_image_task = PromptImageGenerationTask(
"{{ parent_output }}",
image_generation_driver=image_driver,
output_dir=output_dir,
id="Generate Image Task",
)
# ...
Create context in display_image_task
Finally, back to the display_image_task
, we'll create a context
, provide the output_dir
, and use it in our prompt.
# ...
display_image_task = PromptTask(
"""
Pretend to display the image to the user.
{{output_dir}}/{{ parent.output.name }}
""",
context={"output_dir": output_dir},
id="Display Image Task",
)
# ...
Test
Run the code again and let's check the Input to the Display Image Task
:
INFO PromptTask Display Image Task
Input:
Pretend to display the image to the user.
./images/image_artifact_231217061650_zrmz.png
Perfect! That's exactly what we need to be able to give to the real display image step - the path of the file that's been generated.
Code Review
You can now generate an image using the prompt. Excellent work! Let's review the current state of our application:
Next Step
Our next task is to replace our fake Display Image with a real one. There are a few ways we can achieve this within a Griptape Pipeline.
- Tasks - Use a
CodeExecutionTask
to execute a Python function that will display the image. - Tools - Create a Tool that displays an image, and then use
ToolTask
orToolkitTask
to integrate it into the Pipeline. - Plain ol' Python - Because we're running a Pipeline in Python, we can just ignore adding anything to our structure and just display the output of the Image Generation task as we want.
All three of these methods are valid, but in the context of this course, we're going to learn how to use the CodeExecutionTask
to run a simple Python function within the Pipeline. This will be a great example of how to include generic Python code within Pipelines and Workflows that does not hit an LLM.
Let's dive in and learn how to use the CodeExecutionTask
to display an image.