Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Implement Feature Request from #1077 on Left Padding #1126

Closed
wants to merge 36 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
a52a90c
Update docstrings for issue #1077
Sep 14, 2021
ed61663
Merge branch 'main' into 1077-implement
Sep 16, 2021
ff1e396
Implementation of left padding for issue #1077
Sep 16, 2021
d9e457d
Update #1077 implementation
Sep 16, 2021
e25c6e8
Implement #1077 update with docstring and type hinting.
Sep 16, 2021
295d4e2
Merge branch 'main' into 1077-implement
lesnikow Sep 16, 2021
5166d57
Merge branch 'main' into 1077-implement
lesnikow Sep 17, 2021
e55336c
Merge branch 'main' into 1077-implement
lesnikow Sep 20, 2021
af8aa57
Merge branch 'main' of github.com:NVIDIA/NVTabular into 1077-implement
Sep 23, 2021
299d356
Update tensorflow module docstring for docs syntax
Sep 23, 2021
1285783
Expose pad_left to user
Sep 24, 2021
364bcf1
Skip test_distributed_multigpu()
Sep 24, 2021
071b8bf
Add unit test for torch dataloader and padding argument
Sep 24, 2021
3cce162
Update torch test for padding argument
Sep 24, 2021
cebb715
Update unit test for padding argument
Sep 25, 2021
5acd76a
Update dataloader torch to pass new tests
Sep 25, 2021
1684289
Clean up loader/torch module
Sep 25, 2021
a319501
Clean up test_torch_dataloader module
Sep 25, 2021
0be389e
Update tests
Sep 27, 2021
d93f9c5
Add tests for the TensorFlow runtime dataloader
Sep 28, 2021
0c0ce69
Implement pad_left in _build_sparse_tensor TF
Sep 28, 2021
941d2f3
Update torch loader documentation
Sep 28, 2021
7944b2a
Merge branch 'main' of 1077-implement
Sep 28, 2021
76c0024
Cleanup _build_sparese_tensor for TF dataloader
Sep 28, 2021
46847cb
Add docstring to _build_sparse_tensor() for tf
Sep 28, 2021
c7ae873
Update docstring
Sep 28, 2021
d86cec3
Refactor torch dataloader pad_left and _build_spar
Sep 28, 2021
d90e1df
Update pytest decorator
Sep 28, 2021
b21c57d
Cleanup torch loader
Sep 28, 2021
2150ede
Implement pad_left with TF ops
Sep 29, 2021
a51aa44
Implement pad_left with TF ops cleanup
Sep 29, 2021
01749f9
Merge branch 'main' into 1077-implement
lesnikow Sep 29, 2021
b305afa
Update tensorflow dataloader implementation
Sep 30, 2021
2febf1a
Merge branch '1077-implement' of https://github.com/NVIDIA/NVTabular …
Sep 30, 2021
587ef0c
Update pad_left TF unit tests
Sep 30, 2021
dd9927e
Update pad_left code for TF sparse tensors
Sep 30, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Refactor torch dataloader pad_left and _build_spar
Refactor torch dataloader pad_left and _build_sparse_tensor() method.
The motivation is for improved readability and maintainability.
  • Loading branch information
Adam Lesnikowski committed Sep 28, 2021
commit d86cec336d5a4ace559b037fabbfa469b1c84ae0
34 changes: 15 additions & 19 deletions nvtabular/loader/torch.py
Original file line number Diff line number Diff line change
@@ -180,6 +180,19 @@ def _get_sparse_tensor(self, values, indices, num_rows, seq_limit):
sparse_tensor = sparse_tensor.to_dense()
return sparse_tensor

def _build_sparse_tensor_helper_process_column(self, col: torch.Tensor) -> torch.Tensor:
"""Process column by increasing blocks for use in left padding."""
col = col.tolist()
prev, curr = 0, 0
while curr < len(col):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the while loop I am talking about @lesnikow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to know, thank you. I have not implemented anything optimized over this yet. I would like to hear your feedback first on the tensorflow side. This torch implementation is also operating on torch.sparse tensors, where torch.functional.nn.pad and torch.flip are not implemented on torch.sparse tensors. Would you have any guidance on how to proceed for this torch.sparse case?

if col[curr] >= col[curr - 1]:
col[prev:curr] = col[prev:curr][::-1]
prev = curr
if curr == (len(col) - 1):
col[prev : curr + 1] = col[prev : curr + 1][::-1]
curr += 1
return torch.Tensor(col)

def _build_sparse_tensor(
self,
values,
@@ -207,27 +220,10 @@ def _build_sparse_tensor(
indices = self._get_indices(offsets, diff_offsets)
if self.pad_left:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again this iteration logic is not the best methodology for covering this: https://stackoverflow.com/questions/48686945/reshaping-a-tensor-with-padding-in-pytorch something like this would be more efficient. torch.nn.functional has a padding function you could use to your advantage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise this is good to know. I will look into how to do this outlined approach for the Torch implementation here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same questions for your second comment as I wrote in reply to your first comment above. Would you have any guidance on this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read over this Stack Overflow question along with all the replies. These approaches will not directly work for this torch implementation. The main obstacle in these approaches is that methods like torch.functional.nn.pad are currently not supported on torch.sparse matrices. Are there any further optimization other than the O(n) linear time that I provided in python code that you see available, such as a vectorized approach, given that these are torch.sparse matrices we are dealing with? I looked through the available methods for this torch.sparse class, and nothing seemed immediately relevant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see the while loop here for the torch side is that accurate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not been updated yet, since I would like to hear your feedback first on the tensorflow side, and since this torch implementation is dealing with torch.sparse tensors instead of tf.RaggedTensors(), the latter of which are easier to implement this for.

Copy link
Contributor Author

@lesnikow lesnikow Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the for loop? I see no while loop in this code block. Edit: I see this loop now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see line 187

indices[:, 1] = (seq_limit - 1) - indices[:, 1]

# We make sure that the elements of our sparse matrix indices are in the correct
# non-reversed order. We do this by flipping increasing blocks in the second column
# of indices. We find it convienient and more efficient to modify the transpose
# of indices and transpose indices back before returning the indices matrix.
def _process_row(row: torch.Tensor) -> torch.Tensor:
"""Process row by blocks for use in left padding."""
row = row.tolist()
prev, curr = 0, 0
while curr < len(row):
if row[curr] >= row[curr - 1]:
row[prev:curr] = row[prev:curr][::-1]
prev = curr
if curr == (len(row) - 1):
row[prev : curr + 1] = row[prev : curr + 1][::-1]
curr += 1
return torch.Tensor(row)

indices = indices.T
indices[1] = _process_row(indices[1])
indices = indices.T
# of indices.
indices[:, 1] = self._build_sparse_tensor_helper_process_column(indices[:, 1])
return self._get_sparse_tensor(values, indices, num_rows, seq_limit)