-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/Luma Ai Video Generation Driver #1199
base: dev
Are you sure you want to change the base?
Feature/Luma Ai Video Generation Driver #1199
Conversation
…://github.com/griptape-ai/griptape into feature/dream_machine_video_generation_driver
Codecov ReportAttention: Patch coverage is 📢 Thoughts on this report? Let us know! |
value: The video binary data. | ||
mime_type: The video MIME type. | ||
resolution: The resolution of the video (e.g., 1920x1080). | ||
duration: Duration of the video in seconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which of these do you feel confident you can have in the initial version (for example, it sounded like a target duration may not be something easy to get)? Are there other params that the providers expose that we should be taking into consideration? For example, I would anticipate framerate to be a popular parameter so that we can have everyone's favorite NTSC 59.94!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to somehow include most of the things Jason mentioned, I don't believe that there is a way to bring audio with these videos, so probably no subtitles for initial version.
|
||
@property | ||
def mime_type(self) -> str: | ||
return "video/mp4" # Or make this flexible based on the video format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds like our loaders support .ogg and .webm, can we point to the same loc for both or is there a reason to keep them separate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should already have the same mime-type umbrella problem for image/audio, if its not in the framework already, check with collin on the preferred approach.
Also, should this be an actual attribute instead of a @property
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@property
is correct, just needs to be updated to something like this.
duration: Duration of the video in seconds. | ||
""" | ||
|
||
aspect_ratio: tuple[int, int] = field(default=(16, 9), kw_only=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open question: do providers enumerate possible aspect ratios, or let you do weirdo things
def to_text(self) -> str: | ||
raise NotImplementedError("VideoArtifact cannot be converted to text.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other not-so-texty artifacts generate a string that describes the parameters of the artifact, is that what we want here, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was more of a testing method, while getting it to actually work.
"DreamMachineVideoGenerationDriver", | ||
"BaseVideoGenerationDriver", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am I the only one at this damned company that likes my lists alphabetized? Not for this PR, but dang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with james, but don't reorder this list in this PR. (Generally mixing refactoring or formatting with features makes it difficult for reviewers to see what is actually changing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BaseVideoGenerationDriver
should be placed before DreamMachineVideoGenerationDriver
for circular dependency (and alphabetical) reasons.
response = self.client.generations.create(prompt=prompt, **self.params) | ||
generation = response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of assigning to an alias, why not assign directly to generation
?
if not generation.id: | ||
raise Exception("Generation ID not found in the response") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this in the while loop? Is this something that should be caught either before or after the dreaming begins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, for testing purposes, pyright was giving me a hard time for not checking if the id exists.
if not video_url: | ||
raise Exception("Video URL not found in the generation response") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no URL shows up, is that indicative of something bad, like it "completed" but was somehow a failure? Can we convey what the situation is to the user?
video_url = generation.assets.video | ||
if not video_url: | ||
raise Exception("Video URL not found in the generation response") | ||
video_binary = self._download_video(video_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
downloading feels like it could wrong in a lot of other ways (retries, timeouts, etc.)
else: | ||
raise Exception(f"Video generation failed with status: {status}") | ||
|
||
def _download_video(self, video_url: str) -> bytes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
videos are big and take a long time to come down. Do we...
- ...already have this functionality somewhere else in the framework?
- ...need to take into consideration fails on memory, disk, retries, timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id be fine with putting this functionality directly on VideoArtifact
as a "lazy load" type feature. the alternative is creating a persistence driver for everything, which we dont have, but we have the ArtifactFileOutputMixin
available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that the Driver should probably not be the one to do this. I don't know about putting it in VideoArtifact
either -- feels like too much responsibility.
In-fact, we probably shouldn't make any assumptions that the user wants us to download it. Maybe they're fine receiving a URL that they watch in their browser. Maybe this should fall onto a Loader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be the same idea as a lazy load. VideoArtifact
contains a reference to the video somewhere.
def get_aspect_ratio(self) -> tuple[int, int]: | ||
return self.aspect_ratio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont need this, aspect_ratio
can be directly accessed
def to_text(self) -> str: | ||
raise NotImplementedError("VideoArtifact cannot be converted to text.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use an approach similar to AudioArtifact for this
client: LumaAI = field( | ||
default=Factory( | ||
lambda self: import_optional_dependency("lumaai").LumaAI(auth_token=self.api_key), takes_self=True | ||
), | ||
kw_only=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the lazy_property
approach for this
), | ||
kw_only=True, | ||
) | ||
params: dict[str, Any] = field(default={}, kw_only=True, metadata={"serializable": True}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use factory=dict
here, this will return a single mutable object which is no good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is important
while status in ["dreaming", "queued"]: | ||
time.sleep(5) | ||
if not generation.id: | ||
raise Exception("Generation ID not found in the response") | ||
|
||
generation = self.client.generations.get(generation.id) | ||
status = generation.state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to use the tenacity
library for this here. i know this is the same approach as the cloud driver but we should try to clean this pattern up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually not sure about this. Retrying (tenacity) feels different than polling (what we're doing here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tenacity has built in conditions for if_condition
or if_not_condition
. its not just for retrying on errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't doubt it can be used for non-exception things, but all of the examples are for exception retrying which makes me think that's its primary purpose.
Do you have an example of how tenacity might be used for polling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like
from tenacity import retry, retry_if_result, stop_after_attempt, wait_fixed
@retry(
retry=retry_if_result(lambda result: result is None),
stop=stop_after_attempt(3),
wait=wait_fixed(5),
)
def call_api(url_to_poll: str) -> Optional[str]:
response = requests.get(url_to_poll)
if response.status_code != 200:
return None
return response.text
def save_video_artifact(self, artifact: VideoArtifact) -> None: | ||
if self.output_file: | ||
outfile = self.output_file | ||
elif self.output_dir: | ||
outfile = os.path.join(self.output_dir, artifact.name + ".mp4") | ||
else: | ||
raise ValueError("No output_file or output_dir specified.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this method? _write_to_file
should be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, it would call the to_text method inside of the base artifact class which the Video Artifact is unable to be converted to text in the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to override to_bytes
on the video artifact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, that makes sense! Thank you!
|
||
|
||
@define | ||
class PromptVideoGenerationTask(BaseVideoGenerationTask): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think these need to be two different classes. the input
can be different types instead
"description": "Generates a video from text prompts.", | ||
"schema": Schema( | ||
{ | ||
Literal("prompt", description=BaseVideoGenerationTool.PROMPT_DESCRIPTION): str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be inlined
|
||
|
||
@define | ||
class PromptVideoGenerationTool(BaseVideoGenerationTool): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesnt need to be two classes, just inherit from BaseTool
and make VideoGenerationTool
), | ||
}, | ||
) | ||
def generate_video(self, params: dict[str, dict[str, str]]) -> VideoArtifact | ErrorArtifact: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just typed as params: dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't quite get a full review in yet, but overall the approach seems fine. You are following existing patterns (e.g. image generation) which I like.
I'll try to review a little more later today
|
||
@property | ||
def mime_type(self) -> str: | ||
return "video/mp4" # Or make this flexible based on the video format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should already have the same mime-type umbrella problem for image/audio, if its not in the framework already, check with collin on the preferred approach.
Also, should this be an actual attribute instead of a @property
?
"DreamMachineVideoGenerationDriver", | ||
"BaseVideoGenerationDriver", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with james, but don't reorder this list in this PR. (Generally mixing refactoring or formatting with features makes it difficult for reviewers to see what is actually changing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start! Have not looked at the entire thing but left some initial feedback.
|
||
@property | ||
def mime_type(self) -> str: | ||
return "video/mp4" # Or make this flexible based on the video format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a format
field similar to AudioArtifact
and use that here.
|
||
@property | ||
def mime_type(self) -> str: | ||
return "video/mp4" # Or make this flexible based on the video format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@property
is correct, just needs to be updated to something like this.
"DreamMachineVideoGenerationDriver", | ||
"BaseVideoGenerationDriver", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BaseVideoGenerationDriver
should be placed before DreamMachineVideoGenerationDriver
for circular dependency (and alphabetical) reasons.
|
||
return result | ||
else: | ||
raise Exception("Failed to run text to video generation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also caught my eye, but we do it in BasePromptDriver
which is probably where this came from. RuntimeError
would probably be a better candidate here.
generation = response | ||
status = generation.state | ||
while status in ["dreaming", "queued"]: | ||
time.sleep(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sleep time should be configurable as something like poll_interval
raise Exception("Generation ID not found in the response") | ||
|
||
generation = self.client.generations.get(generation.id) | ||
status = generation.state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can get rid of status
-- just use generation.state
while status in ["dreaming", "queued"]: | ||
time.sleep(5) | ||
if not generation.id: | ||
raise Exception("Generation ID not found in the response") | ||
|
||
generation = self.client.generations.get(generation.id) | ||
status = generation.state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually not sure about this. Retrying (tenacity) feels different than polling (what we're doing here).
|
||
generation = self.client.generations.get(generation.id) | ||
status = generation.state | ||
if status == "completed": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the exa library provide any types for this so we can do something like status == exa.COMPLETED
if status == "completed": | ||
video_url = generation.assets.video | ||
if not video_url: | ||
raise Exception("Video URL not found in the generation response") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flip conditional as mentioned above. Also raise a ValueError
else: | ||
raise Exception(f"Video generation failed with status: {status}") | ||
|
||
def _download_video(self, video_url: str) -> bytes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that the Driver should probably not be the one to do this. I don't know about putting it in VideoArtifact
either -- feels like too much responsibility.
In-fact, we probably shouldn't make any assumptions that the user wants us to download it. Maybe they're fine receiving a URL that they watch in their browser. Maybe this should fall onto a Loader?
Describe your changes
Added Luma AI as a driver, tool, task, to generate videos.
Issue ticket number and link
Add Luma Ai as a driver