Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[main-] handle position args as +sheet:subsheet:col:row #2425

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

midichef
Copy link
Contributor

Two bug fixes:
3ea655b - fixes setting column position in sheet
Currently the starting column cannot be set, only the row.
repro command: vd sample_data/sample.tsv sample_data/StatusPR.csv +2:2
This occurs because the column position changes too early, before the sheet is loaded.

1aa2d98 - fixes setting column position in subsheet
A similar problem, changing position before the subsheet is loaded.
repro: vd ~vdsrc/sample_data/employees.sqlite +:emp:5:2

And then some functionality changes:
d16743d - change arg parsing, from row:col to col:row
Strings of the form a:b are parsed as row:column. This commit changes it to column:row.
Previously, I discussed some pros and cons of this change.
This is technically a breaking change, but as I noted in that comment, the form +a:b has not worked since 2.9 so it's not likely that anyone is currently using that syntax. However, if they are using +:a:b with the extra colon, this will indeed be a breaking change for them.
repro: vd ~vdsrc/sample_data/sample.tsv +:1:4
(note the : before the 1. It's needed (on develop) to demonstrate the column-positioning bug, until the 2 commits above are applied.)

0a8f6ee - handling args of form +a:b vs +:a:b.
Comments in the code suggest that '+col:row' should apply to only the last sheet on the list, vs. adding a colon, '+:col:row', which should apply to all sheets:

'Return (startsheets:list, startrow:str, startcol:str) from *arg* like "+sheet:subsheet:col:row". Empty sheetstr in startsheets means the starting pos applies to all sheets.'

(The suggestion comes from the phrase Empty sheetstr in startsheets. An empty sheetstr happens only when arg.split(':') has more than 2 items and pos[:-2] is '', which happens when arg follows the pattern +:a:b:
startsheets = pos[:-2]

Currently the behavior is the reverse: +col:row applies to all sheets.
repro: vd sample_data/sample.tsv sample_data/StatusPR.csv +:2:2
and look at the cursor position on each sheet.
This commit changes +:a:b to apply to all sheets, and +a:b to apply only to the last sheet.

da52ac3 - improve error for invalid sheets
Attempting to index a sheet that does not exist, as with:
repro: vd sample_data/sample.tsv +10:0:0
gives a traceback with IndexError: list index out of range.
This commit handles the error and shows a status message of no sheet "10".

a0d4913 - handling subsheet names that are integers
For consistency with sheets/rows/cols, this commit allows indexing subsheets with integers, like:
numeric-subsheet-names.vds.txt
repro: vd -f vds numeric-subsheet-names.vds.txt +0:3:0:0
This is also a change that breaks current behavior, for a subsheet that happens to have a name that is an integer. Such a subsheet would become unable to opened by using its name, as with sheets 5, 6, 7, and 8 in numeric-subsheet-names.vds.txt; +0:5:0:0 currently works, but after this commit only +0:0:0:0 would work.

Thoughts? I'm not so worried about breaking changes, because I don't think the vd + feature is heavily used.

Just to write out all the cases, here's what the args would mean after all these commits:

    +2         = row 2
    +1:        = last sheet; col 1
    +:1:       = all sheets; col 1
    +:2        = last sheet; row 2
    +1:2       = last sheet; col 1, row 2
    +:1:2      = all sheets; col 1, row 2
    +0:1:2     = sheet 0; col 1, row 2
    +:a:1:2    = last sheet, subsheet a; col 1, row 2
    +0:a:1:2   = sheet 0, subsheet a; col 1, row 2
    +0:a:b:1:2 = sheet 0, subsheet a from sheet 0, subsheet b from sheet a; col 1, row 2
    +0:0:1:2   = sheet 0, first subsheet of sheet 0, col 1, row 2
    +1::       = 2nd sheet, col 0, row 0
    +-2::      = 2nd sheet from last, col 0, row 0

@midichef
Copy link
Contributor Author

I fixed the type hint that was breaking in Python 3.8, by adding ' quotes to make it a string:
https://github.com/midichef/visidata/blob/bcf812f0052d54f25a6c1c5d7734f9cec2e86a06/visidata/main.py#L89
I don't understand type hints enough to know what the string does, I'm just doing it because it worked in sort.py.

def sortkey(sheet, r, ordering:'list[tuple[Column, bool]]'=[]):

visidata/main.py Outdated
sheets = sources # apply row/col to all sheets
sheets = sources # apply col/row to all sheets
for vs in sheets:
vd.sync(vs.ensureLoaded())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the problem with this sync, is that opening a large file with a start position specified, will wait until the entire file has loaded, before it shows any interface at all. (Prior to this change, it loads the file immediately and reports that it can't find the column).

Neither behavior is what the user wants, but the current behavior is at least immediately responsive, whereas adding a sync() here will appear to delay startup indefinitely, with no feedback.

My proposed behavior, would be to wait a second or two after loading on the sheet has started (or until loading has finished, if that's sooner), before trying to move the cursor. This will fix the issue in most cases, since columns are usually discovered very early in the loading process. If the column hasn't been discovered yet, it will fail as it does currently. Alternatively, we could start a thread that keeps trying to move the cursor to the given starting position until it either succeeds, or loading has finished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, responsiveness is critical. I'll think about how to implement it. I do like the idea of a thread that keeps trying. Because then the delay before the cursor move could be shorter than a second, and unobtrusive.

@saulpw
Copy link
Owner

saulpw commented Jul 4, 2024

(This is a large combined PR, so I'm posting my comments piecemeal.)

I agree with your comments in #2358, that it should be +col:row. It's unfortunately a breaking change, but as you said, no one has complained, so probably (hopefully) no one is using it in scripts. I'd rather suffer the break now and get it right for the future, than to let it stay weirdly broken out of fear.

I'm less enthusiastic about +row, but I accept the argument that it's like how vim and tail work, and since we're borrowing this CLI syntax from them, that we should be consistent.

@saulpw
Copy link
Owner

saulpw commented Jul 4, 2024

Comments in the code suggest that '+col:row' should apply to only the last sheet on the list, vs. adding a colon, '+:col:row', which should apply to all sheets:

It should apply to the most recently specified sheet on the CLI, and not the last sheet. So you could specify a different position for each sheet:

vd foo.csv +name: bar.csv +address:

vs

vd foo.csv +2 bar.csv +3 +:name:

such that the row movement would be different for the two sheets, but they should both end up on the name column.

I would prefer to specify "all sheets" more explicitly, like +*:name:, but of course * is problematic (as the shell may/will try to expand it). Suggestions for alternatives welcome, though in the absence of anything compelling I guess the empty string still works.

@saulpw
Copy link
Owner

saulpw commented Jul 4, 2024

BTW, @midichef, thank you so much for calling out all the different cases. We should lift that and put it directly into the docs, both the manpage and the online .md.

@midichef
Copy link
Contributor Author

I'm less enthusiastic about +row, but I accept the argument that it's like how vim and tail work, and since we're borrowing this CLI syntax from them, that we should be consistent.

I thought about this syntax some more. Here are a few arguments against +row:

  1. It's unneeded. We can already specify a row number by inserting 0: in front of the row: +0:2.
  2. It will be unintuitive for users unfamiliar with vim and tail. That's probably quite a few users. In fact, it would be worse thatn unintuitive, but actively confusing, if they think in terms of "the number after the plus", which changes, from being a row in +row, to being a column in +col:row.
  3. The visidata syntax of +a:b:c:d is so different from vim that they're not analogous. For example, vim only allows + for line number, it does not support +col:row. And vim uses + in ways that visidata does not, like +/pattern (to search for a line with pattern) and +command (to execute a command after reading the first file).

And changing the behavior would be easy. It is a one-line change. But if we get rid of +row, what should +1 mean now? +col? +col would be surprising for vim/tail users, and it's a breaking change from current behavior in 3.0.2. Should we require column and row to be specified: +col:row?

@saulpw
Copy link
Owner

saulpw commented Jul 17, 2024

Here's the design and explanation that I think is coherent:

The core design for this option is +sheet:col:row. (This ordering is consistent with VisiData internals; the colorizer function signature, for instance, is func(sheet, col, row, value)). This moves the starting cursor on the given sheet to the given col and row, without changing the active sheet. Either names or numbers can be given. Numbers are 0-based; negative numbers mean "from the end" as in Python.

Values are interpreted right-to-left, with an empty sheet field meaning "all" and a missing field meaning "current". So vd bar.csv foo.csv +col:row would mean move to col/row in foo.csv (the "current" or most recently given sheet), whereas vd bar.csv foo.csv +:col:row would mean move to col/row on both sheets.

This also means that +10 would move to the 11th row on the last sheet, and +:10 would behave identically. +::10 would mean to move to the 11th row on all sheets. We can keep this +row as a lightly-documented usage, or as a nicety, a pleasant surprise compatibility with vim, but almost entirely promote the more straightforward +s:c:r syntax.

I believe the above is entirely consistent with your given examples, so it feels like we're converging on the design.

Open question: can we use +s:: (which would be a no-op) to mean "switch the active sheet to s"? It might be useful to switch to be able to switch to a newly opened subsheet from the CLI.

Does this leave anything unspecified or unclear?

@midichef
Copy link
Contributor Author

Does this leave anything unspecified or unclear?

It's clear, thanks!

Copy link
Owner

@saulpw saulpw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're still waiting on changes for the last comment about vd.sync().

@midichef
Copy link
Contributor Author

I've almost got a final patch ready to remove unnecessary vd.sync() and simply retry loading till it succeeds.

BTW, @midichef, thank you so much for calling out all the different cases. We should lift that and put it directly into the docs, both the manpage and the online .md.

@anjakefala How do I update the manual pages in man/: vd.1, vd.inc, vd.txt, visidata.1? I'm not sure how to generate the .1 files.

@anjakefala
Copy link
Collaborator

@midichef Just update vd.inc! The others are updated when I run dev/mkman.sh when I'm shipping a release.

@midichef
Copy link
Contributor Author

Before this PR, for the purposes of the +col:row arg, row number was 0-based, but column number was 1-based. +1:1 was the 1st column but the 2nd row.

As this PR currently stands, changing to +row:col, both rows and columns would be 0-based. +1:1 would be the 2nd column and the 2nd row.

What's the right base to use for numbering rows and columns?

@midichef
Copy link
Contributor Author

Open question: can we use +s:: (which would be a no-op) to mean "switch the active sheet to s"?

Sure, I'll put that in too.

@saulpw
Copy link
Owner

saulpw commented Oct 26, 2024

What's the right base to use for numbering rows and columns?

Let's use 0-based numbers for both, consistent with Python.

@midichef
Copy link
Contributor Author

Okay, I've incorporated all the feedback.

This set of commits will retry the move changes every 0.1 seconds. It's hardcoded. Should I make this a configurable option? Also, it will retry failed moves forever. Should there be a time limit? Or a number of failures limit? The repeated errors are annoying and currently make it impossible to leave the sheet that is giving errors:
(to see the error messages repeat: vd sample_data/sample.tsv sample_data/test.usv +50:10)

I could also use some guidance on the troff/groff formatting. Existing Sy tags seem to be ended by doing No, as in Sy 1 No. But they left a space after the No. To close the tags without adding a space, I used Ns. It was used near the start of vd.inc and seemed to do what I want. I copied Ns without understanding, as I could not find any documentation online that explained how to use any of Sy No or Ns.

Using int indices internally is better than using sheet names,
because sheet names were ambiguous when sheets had the same name,
for example, the name "sample" in:
vd sample.tsv b.tsv sample.json +1:2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants