Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Record print method #3

Closed
wants to merge 1 commit into from
Closed

Update Record print method #3

wants to merge 1 commit into from

Conversation

lazappi
Copy link
Collaborator

@lazappi lazappi commented Sep 30, 2024

There was a big in the print method for the Record class where things were mixed up if there was a NULL value. This fixes that and uses {cli} to make the print output prettier.

@rcannood
Copy link
Collaborator

rcannood commented Oct 1, 2024

Thanks for solving issues with NULL values!

Regarding making the output prettier: I'm all for this. However, the current print message I implemented to align the way that the Python lamin package currently prints information:

import lamindb as ln

artifact = ln.Artifact.get("KBW89Mf7")
artifact

# Artifact(uid='KBW89Mf7IGcekja2hADu', version='2024-07-01', is_latest=True, description='Myeloid compartment', key='cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad', suffix='.h5ad', type='dataset', size=691757462, hash='SZ5tB0T4YKfiUuUkAL09ZA', n_observations=51552, _hash_type='md5-n', _accessor='AnnData', visibility=1, _key_is_virtual=False, created_by_id=1, storage_id=2, transform_id=22, run_id=27, updated_at='2024-07-12 12:40:48 UTC')

Imo this print statement could be improved upon a lot, and what you implemented is probably a lot better.

Could you show me an example of what it looks like? Should we propose these changes for the upstream lamin python package as well?

@falexwolf
Copy link
Member

Imo this print statement could be improved upon a lot, and what you implemented is probably a lot better.

I agree that the lamindb print output for any given record is quite verbose and evidently the artifact.describe() is even more verbose.

The problem is if one omits certain fields, the user might not know how to get them.

In some cases, we're now using reduced references that look similar

Transform('JjRF4mAC')
Run('DWqJ1af7')

instead of

Transform(uid='JjRF4mACd9m00000', ...)
Run(uid='DWqJ1af7W8q4tFz5pxQd', ...)

Can you show an example for what you did and paste it here? I think it's fair to assume that the typical R user will be a bit less technical but purely wants simple and intuitive UX without needing to know all private fields etc.

Hence, I think simplifying what Python does might be a good idea. Or we simplify in Python, too, and only print all info upon .describe() or similar. 🤔

WDYT @Zethson @sunnyosun?

@lazappi
Copy link
Collaborator Author

lazappi commented Oct 2, 2024

Sorry, I should have posted what it looks like. Here is a screenshot.

image

The Artifact in the top left will be whatever the record class name is. I figured the UID/Hash and timestamps would always exist but the code is written to not show them if not. All the other stuff will be whatever non-private fields are in the record (in whatever order they appear).

I agree it makes sense to have the Python/R output roughly aligned, I should have checked that first. It's also easy enough to add a describe (or whatever) method to the R object or something like a details = TRUE argument to the print method.

@falexwolf
Copy link
Member

falexwolf commented Oct 2, 2024

Thanks for sharing! 😄

And yes, uid and hash always exist.

I like your suggestion of having a details = FALSE argument for print that hides a bit of information that seems irrelevant for most users. Could you by default hide

  • attributes that are prefixed with _ (these are private)
  • foreign key ids? (I fear this one might be trickier, I'm not sure whether you have that info easily available)

Can there be a deterministic order of these fields similar to what we have in Python? It's defined here: https://docs.lamin.ai/lamindb.artifact

Here is how the implicitly called .__repr__() currently looks in Python:

>>> artifact
Artifact(uid='nL3Bslwl4kIZittG0001', version='1.0', is_latest=True, description='my RNA-seq', suffix='.parquet', type='dataset', size=4091, hash='GDTjY7XSC6E2k9d_ZpLCnw', _hash_type='md5', _accessor='DataFrame', visibility=1, _key_is_virtual=True, created_at=2024-10-01 18:24:56 UTC)

And this is how it looks when you call .describe(). It adds a few more lines through querying the relational metadata (and type annotations if print_types=True).

>>> artifact.describe(print_types=True)
Artifact(uid='nL3Bslwl4kIZittG0001', version='1.0', is_latest=True, description='my RNA-seq', suffix='.parquet', type='dataset', size=4091, hash='GDTjY7XSC6E2k9d_ZpLCnw', _hash_type='md5', _accessor='DataFrame', visibility=1, _key_is_virtual=True, created_at=2024-10-01 18:24:56 UTC)
  Provenance
    .storage: Storage = '/home/runner/work/lamin-docs/lamin-docs/docs/lamin-intro'
    .transform: Transform = 'Introduction'
    .run: Run = 2024-10-01 18:24:55 UTC
    .created_by: User = 'anonymous'
  Labels
    .cell_lines: bionty.CellLine = 'HEK293'
    .ulabels: ULabel = 'Candidate marker study'
  Features
    'study': cat[ULabel] = 'Candidate marker study'
    'temperature': float = 21.6

What I don't like about it is that within .__repr__(), there is a lot of unnecessary stuff for the typical user. So, we might make it a bit leaner.

In R a good solution seems possible with the details arg you proposed.

What I'd like to draw your attention to is the wealth of information you'd get when you hit .describe(). Hence, you might consider putting the basic info in a single line rather than spreading it out vertically.

But all of this can be iterated on.

@lazappi
Copy link
Collaborator Author

lazappi commented Oct 2, 2024

I had a go at updating things with @falexwolf's suggestions.

The short output now looks like this:

image

I tried to keep only the things I thought would be useful for users but maybe I missed something important.

This is what the detailed output looks like:

image

The provenance information is stored in the object so we could show that for artifacts with a bit more work. The labels and features etc. would require more querying/linking of things which I don't think we have set up yet.

Let me know if you have any more comments.

@falexwolf
Copy link
Member

falexwolf commented Oct 2, 2024

From my side, this is great for now, @lazappi! Thank you!

@falexwolf
Copy link
Member

@Zethson @sunnyosun - guys, do you think we should have the simple fields vertically displayed or all in one line as in Python?

image

@Zethson
Copy link
Member

Zethson commented Oct 4, 2024

If we can reduce the clutter to ensure that the simple fields in a single line are not that wide, we can go for the Python version. Your recent suggestion of using shorter UIDs helps.

Else, I think vertical is better because everyone hates horizontal scrolling. But as usual, consistency is key. Let's keep Python and R a consistent UX (at least as close as possible) and then go from there?

@lazappi lazappi mentioned this pull request Oct 9, 2024
@rcannood
Copy link
Collaborator

I really like this PR, but I think it would be best to create a github issue in the Python repository and come back to this when the print function for Python has been improved.

Would that make sense? Shall we close this PR when the separate github issue has been created?

@lazappi
Copy link
Collaborator Author

lazappi commented Oct 10, 2024

For now we have implemented something more similar to Python in #22.

@rcannood
Copy link
Collaborator

Alright, let's close this for now. @lazappi Could you create issue at laminlabs/lamindb with your proposal for what the print would look like?

@rcannood rcannood closed this Oct 11, 2024
@rcannood rcannood deleted the print-record branch October 18, 2024 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants