Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Conversation Memory class and drivers #1084

Merged
merged 4 commits into from
Aug 27, 2024
Merged

Refactor Conversation Memory class and drivers #1084

merged 4 commits into from
Aug 27, 2024

Conversation

vachillo
Copy link
Member

@vachillo vachillo commented Aug 19, 2024

Describe your changes

  • Update LocalConversationMemory with an optional persist_file, consistent with LocalVectorStoreDriver
  • update return value of load to a tuple instead of instance
  • update BaseConversationMemory.to_prompt_stack to take a prompt driver.

Issue ticket number and link

@vachillo vachillo force-pushed the conv_mem branch 7 times, most recently from 50d8e36 to 5cd4f87 Compare August 21, 2024 17:07
@vachillo vachillo marked this pull request as ready for review August 21, 2024 17:07
Copy link

codecov bot commented Aug 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

griptape/memory/structure/run.py Outdated Show resolved Hide resolved
griptape/memory/structure/run.py Show resolved Hide resolved
default=Factory(lambda: Defaults.drivers_config.conversation_memory_driver), kw_only=True
)
prompt_driver: BasePromptDriver = field(
default=Factory(lambda: Defaults.drivers_config.prompt_driver), kw_only=True
)
runs: list[Run] = field(factory=list, kw_only=True, metadata={"serializable": True})
metadata: dict = field(factory=dict, kw_only=True, metadata={"serializable": True})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be renamed to meta for consistency with other places.

[self.add_run(r) for r in memory.runs]
if self.autoload:
runs, metadata = self.conversation_memory_driver.load()
runs.extend(self.runs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to merge Driver Runs in (I like this idea), I think we should add them after user defined runs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

went back and forth. my thought was that, if you passed new Runs here, they would probably be the most recent and the data that gets loaded would be "historical"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revisiting this, i think we have the same intention but my implementation is wrong

Comment on lines 23 to 24
if self.persist_file is not None and not self.lazy_load:
self._load_file()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would one use lazy_load? Should this be present in all Drivers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly for "backwards compatibility" with the previous implementation. it wouldnt try to create the file unless accessed. we could just make one default and stick with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for picking one way.

I kind of lean towards the lazy way, otherwise, it'll just create like an empty file, right? Though I could see the predictable behavior nice I suppose... though if the only goal is to load it back into the same driver, then the driver can check if it exists... yeah I vote for the lazy way. That said I don't feel too strongly about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont really care either way either. ill remove the param and default to the existing lazy behavior and see how that looks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding the functionality, but can we just check if persist_file is set and only save then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its timing on when the file is actually created. LocalVectorStoreDriver creates the file on post_init, whereas the original implementation of LocalConversationMemoryDriver waited until .store() to create the file if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, so when would we ever want lazy_load=False? Seems easiest to just check if persist_file is set during store and that's the only time we create a file.

Comment on lines 32 to 38
if not overwrite:
loaded_str = Path(self.persist_file).read_text()
if loaded_str:
loaded_data = json.loads(loaded_str)
loaded_data["runs"] += data["runs"]
loaded_data["metadata"] = dict_merge(loaded_data["metadata"], data["metadata"])
data = loaded_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a slightly cleaner method of appending to the json file.

Comment on lines 89 to 108
thread_metadata = response.json().get("metadata", {})
thread_metadata = dict_merge(thread_metadata, metadata)
response = requests.patch(
self._get_url(f"/threads/{self.thread_id}"),
json={"metadata": thread_metadata},
headers=self.headers,
)
response.raise_for_status()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be DRY'd up with the overwrite logic?

Copy link
Member Author

@vachillo vachillo Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the cloud side? nvm ill see how i can update it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just meant that we should try to only have a single requests.patch for both conditional flows.

Comment on lines 58 to 63
loaded_str = self.client.hget(self.index, self.conversation_id)
if loaded_str is not None:
loaded_data = json.loads(loaded_str)
loaded_data["runs"] += data["runs"]
loaded_data["metadata"] = dict_merge(loaded_data["metadata"], data["metadata"])
data = loaded_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a LPUSH to append to the list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that work for pushing to nested lists?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly not sure, but it does seem like something we should at least try.

Copy link
Contributor

@dylanholmes dylanholmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Comments are mainly topics for discussion. Some may be out of scope for this PR

Comment on lines 23 to 24
if self.persist_file is not None and not self.lazy_load:
self._load_file()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for picking one way.

I kind of lean towards the lazy way, otherwise, it'll just create like an empty file, right? Though I could see the predictable behavior nice I suppose... though if the only goal is to load it back into the same driver, then the driver can check if it exists... yeah I vote for the lazy way. That said I don't feel too strongly about it.



class BaseConversationMemoryDriver(SerializableMixin, ABC):
@abstractmethod
def store(self, memory: BaseConversationMemory) -> None: ...
def store(self, runs: list[Run], metadata: dict, *, overwrite: bool = False) -> None: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative idea: how about adding a clear() (or reset()) method instead of overwrite?

(Or am I missing understanding the behavior?)

griptape/memory/structure/run.py Show resolved Hide resolved
Copy link
Member

@collindutter collindutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, just a couple minor questions/feedbacks

Comment on lines 19 to 23
def _get_dict(self, runs: list[Run], metadata: Optional[dict] = None) -> dict:
data: dict = {"runs": [run.to_dict() for run in runs]}
if metadata is not None:
data["metadata"] = metadata
return data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_dict doesn't tell me much about why this method exists. Maybe rename to something like _to_params?

Comment on lines 60 to 66
def dict_merge_opt(dct: Optional[dict], merge_dct: Optional[dict], *, add_keys: bool = True) -> Optional[dict]:
if dct is None:
return merge_dct
if merge_dct is None:
return dct

return dict_merge(dct, merge_dct, add_keys=add_keys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this method need to exist? Can its functionality be merged into dict_merge?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every instance of its usage would need an updated type hint. tried to be unobtrusive

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

Comment on lines 62 to 72
for run in runs:
run_dict = dict_merge_opt(
{
"input": run.input.to_json(),
"output": run.output.to_json(),
"metadata": {"run_id": run.id},
},
run.meta,
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use list comprehension instead.

input: BaseArtifact = field(kw_only=True, metadata={"serializable": True})
output: BaseArtifact = field(kw_only=True, metadata={"serializable": True})
id: str = field(default=Factory(lambda: uuid.uuid4().hex), metadata={"serializable": True})
meta: Optional[dict] = field(default=None, metadata={"serializable": True})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should default to meta: dict[str, Any] instead of Optional. More consistent with other meta implementations, and then we don't need merge_dict_opt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must have been looking at BaseVectorStoreDriver.Entry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be annoying, can you remove this change from the PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this

Copy link
Member

@collindutter collindutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved outside of the last two requested changes.

MIGRATION.md Outdated
Comment on lines 61 to 87
### LocalConversationMemoryDriver `file_path` renamed to `persist_file`

`LocalConversationMemoryDriver.file_path` has been renamed to `persist_file` and is now `Optional[str]`. If `persist_file` is not passed as a parameter, nothing will be persisted and no errors will be raised. `LocalConversationMemoryDriver` is now the default driver in the global `Defaults` object.

#### 0.30.X
```python
local_driver_with_file = LocalConversationMemoryDriver(
file_path="my_file.json"
)

local_driver = LocalConversationMemoryDriver()

assert local_driver_with_file.file_path == "my_file.json"
assert local_driver.file_path == "griptape_memory.json"
```

#### 0.31.X
```python
local_driver_with_file = LocalConversationMemoryDriver(
persist_file="my_file.json"
)

local_driver = LocalConversationMemoryDriver()

assert local_driver_with_file.persist_file == "my_file.json"
assert local_driver.persist_file is None
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this higher up since the other examples use it.

MIGRATION.md Outdated

### Changes to BaseConversationMemoryDriver

`BaseConversationMemoryDriver` has updated parameter names and different method signatures for `.store` and `.load`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also call out driver renamed to conversation_memory_driver

Copy link
Contributor

@dylanholmes dylanholmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!


key = self.index
memory_json = self.client.hget(key, self.conversation_id)
def load(self) -> tuple[list[Run], dict[str, Any]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so much cleaner!

@vachillo vachillo requested a review from collindutter August 27, 2024 15:25
@vachillo vachillo merged commit ef61c53 into dev Aug 27, 2024
13 checks passed
@vachillo vachillo deleted the conv_mem branch August 27, 2024 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants