-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Implement Feature Request from #1077 on Left Padding #1126
Changes from 1 commit
a52a90c
ed61663
ff1e396
d9e457d
e25c6e8
295d4e2
5166d57
e55336c
af8aa57
299d356
1285783
364bcf1
071b8bf
3cce162
cebb715
5acd76a
1684289
a319501
0be389e
d93f9c5
0c0ce69
941d2f3
7944b2a
76c0024
46847cb
c7ae873
d86cec3
d90e1df
b21c57d
2150ede
a51aa44
01749f9
b305afa
2febf1a
587ef0c
dd9927e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
Refactor torch dataloader pad_left and _build_sparse_tensor() method. The motivation is for improved readability and maintainability.
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -180,6 +180,19 @@ def _get_sparse_tensor(self, values, indices, num_rows, seq_limit): | |
sparse_tensor = sparse_tensor.to_dense() | ||
return sparse_tensor | ||
|
||
def _build_sparse_tensor_helper_process_column(self, col: torch.Tensor) -> torch.Tensor: | ||
"""Process column by increasing blocks for use in left padding.""" | ||
col = col.tolist() | ||
prev, curr = 0, 0 | ||
while curr < len(col): | ||
if col[curr] >= col[curr - 1]: | ||
col[prev:curr] = col[prev:curr][::-1] | ||
prev = curr | ||
if curr == (len(col) - 1): | ||
col[prev : curr + 1] = col[prev : curr + 1][::-1] | ||
curr += 1 | ||
return torch.Tensor(col) | ||
|
||
def _build_sparse_tensor( | ||
self, | ||
values, | ||
|
@@ -207,27 +220,10 @@ def _build_sparse_tensor( | |
indices = self._get_indices(offsets, diff_offsets) | ||
if self.pad_left: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. again this iteration logic is not the best methodology for covering this: https://stackoverflow.com/questions/48686945/reshaping-a-tensor-with-padding-in-pytorch something like this would be more efficient. torch.nn.functional has a padding function you could use to your advantage. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Likewise this is good to know. I will look into how to do this outlined approach for the Torch implementation here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have the same questions for your second comment as I wrote in reply to your first comment above. Would you have any guidance on this here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I read over this Stack Overflow question along with all the replies. These approaches will not directly work for this torch implementation. The main obstacle in these approaches is that methods like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I still see the while loop here for the torch side is that accurate? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This has not been updated yet, since I would like to hear your feedback first on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please see line 187 |
||
indices[:, 1] = (seq_limit - 1) - indices[:, 1] | ||
|
||
# We make sure that the elements of our sparse matrix indices are in the correct | ||
# non-reversed order. We do this by flipping increasing blocks in the second column | ||
# of indices. We find it convienient and more efficient to modify the transpose | ||
# of indices and transpose indices back before returning the indices matrix. | ||
def _process_row(row: torch.Tensor) -> torch.Tensor: | ||
"""Process row by blocks for use in left padding.""" | ||
row = row.tolist() | ||
prev, curr = 0, 0 | ||
while curr < len(row): | ||
if row[curr] >= row[curr - 1]: | ||
row[prev:curr] = row[prev:curr][::-1] | ||
prev = curr | ||
if curr == (len(row) - 1): | ||
row[prev : curr + 1] = row[prev : curr + 1][::-1] | ||
curr += 1 | ||
return torch.Tensor(row) | ||
|
||
indices = indices.T | ||
indices[1] = _process_row(indices[1]) | ||
indices = indices.T | ||
# of indices. | ||
indices[:, 1] = self._build_sparse_tensor_helper_process_column(indices[:, 1]) | ||
return self._get_sparse_tensor(values, indices, num_rows, seq_limit) | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the while loop I am talking about @lesnikow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good to know, thank you. I have not implemented anything optimized over this yet. I would like to hear your feedback first on the
tensorflow
side. This torch implementation is also operating ontorch.sparse
tensors, wheretorch.functional.nn.pad
andtorch.flip
are not implemented ontorch.sparse
tensors. Would you have any guidance on how to proceed for thistorch.sparse
case?