Update Record print method #3

lazappi · 2024-09-30T12:20:08Z

There was a big in the print method for the Record class where things were mixed up if there was a NULL value. This fixes that and uses {cli} to make the print output prettier.

rcannood · 2024-10-01T14:21:40Z

Thanks for solving issues with NULL values!

Regarding making the output prettier: I'm all for this. However, the current print message I implemented to align the way that the Python lamin package currently prints information:

import lamindb as ln

artifact = ln.Artifact.get("KBW89Mf7")
artifact

# Artifact(uid='KBW89Mf7IGcekja2hADu', version='2024-07-01', is_latest=True, description='Myeloid compartment', key='cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad', suffix='.h5ad', type='dataset', size=691757462, hash='SZ5tB0T4YKfiUuUkAL09ZA', n_observations=51552, _hash_type='md5-n', _accessor='AnnData', visibility=1, _key_is_virtual=False, created_by_id=1, storage_id=2, transform_id=22, run_id=27, updated_at='2024-07-12 12:40:48 UTC')

Imo this print statement could be improved upon a lot, and what you implemented is probably a lot better.

Could you show me an example of what it looks like? Should we propose these changes for the upstream lamin python package as well?

falexwolf · 2024-10-01T19:59:36Z

Imo this print statement could be improved upon a lot, and what you implemented is probably a lot better.

I agree that the lamindb print output for any given record is quite verbose and evidently the artifact.describe() is even more verbose.

The problem is if one omits certain fields, the user might not know how to get them.

In some cases, we're now using reduced references that look similar

Transform('JjRF4mAC')
Run('DWqJ1af7')

instead of

Transform(uid='JjRF4mACd9m00000', ...)
Run(uid='DWqJ1af7W8q4tFz5pxQd', ...)

Can you show an example for what you did and paste it here? I think it's fair to assume that the typical R user will be a bit less technical but purely wants simple and intuitive UX without needing to know all private fields etc.

Hence, I think simplifying what Python does might be a good idea. Or we simplify in Python, too, and only print all info upon .describe() or similar. 🤔

WDYT @Zethson @sunnyosun?

lazappi · 2024-10-02T06:27:10Z

Sorry, I should have posted what it looks like. Here is a screenshot.

The Artifact in the top left will be whatever the record class name is. I figured the UID/Hash and timestamps would always exist but the code is written to not show them if not. All the other stuff will be whatever non-private fields are in the record (in whatever order they appear).

I agree it makes sense to have the Python/R output roughly aligned, I should have checked that first. It's also easy enough to add a describe (or whatever) method to the R object or something like a details = TRUE argument to the print method.

falexwolf · 2024-10-02T08:09:58Z

Thanks for sharing! 😄

And yes, uid and hash always exist.

I like your suggestion of having a details = FALSE argument for print that hides a bit of information that seems irrelevant for most users. Could you by default hide

attributes that are prefixed with _ (these are private)
foreign key ids? (I fear this one might be trickier, I'm not sure whether you have that info easily available)

Can there be a deterministic order of these fields similar to what we have in Python? It's defined here: https://docs.lamin.ai/lamindb.artifact

Here is how the implicitly called .__repr__() currently looks in Python:

>>> artifact
Artifact(uid='nL3Bslwl4kIZittG0001', version='1.0', is_latest=True, description='my RNA-seq', suffix='.parquet', type='dataset', size=4091, hash='GDTjY7XSC6E2k9d_ZpLCnw', _hash_type='md5', _accessor='DataFrame', visibility=1, _key_is_virtual=True, created_at=2024-10-01 18:24:56 UTC)

And this is how it looks when you call .describe(). It adds a few more lines through querying the relational metadata (and type annotations if print_types=True).

>>> artifact.describe(print_types=True)
Artifact(uid='nL3Bslwl4kIZittG0001', version='1.0', is_latest=True, description='my RNA-seq', suffix='.parquet', type='dataset', size=4091, hash='GDTjY7XSC6E2k9d_ZpLCnw', _hash_type='md5', _accessor='DataFrame', visibility=1, _key_is_virtual=True, created_at=2024-10-01 18:24:56 UTC)
  Provenance
    .storage: Storage = '/home/runner/work/lamin-docs/lamin-docs/docs/lamin-intro'
    .transform: Transform = 'Introduction'
    .run: Run = 2024-10-01 18:24:55 UTC
    .created_by: User = 'anonymous'
  Labels
    .cell_lines: bionty.CellLine = 'HEK293'
    .ulabels: ULabel = 'Candidate marker study'
  Features
    'study': cat[ULabel] = 'Candidate marker study'
    'temperature': float = 21.6

What I don't like about it is that within .__repr__(), there is a lot of unnecessary stuff for the typical user. So, we might make it a bit leaner.

In R a good solution seems possible with the details arg you proposed.

What I'd like to draw your attention to is the wealth of information you'd get when you hit .describe(). Hence, you might consider putting the basic info in a single line rather than spreading it out vertically.

But all of this can be iterated on.

lazappi · 2024-10-02T15:01:25Z

I had a go at updating things with @falexwolf's suggestions.

The short output now looks like this:

I tried to keep only the things I thought would be useful for users but maybe I missed something important.

This is what the detailed output looks like:

The provenance information is stored in the object so we could show that for artifacts with a bit more work. The labels and features etc. would require more querying/linking of things which I don't think we have set up yet.

Let me know if you have any more comments.

falexwolf · 2024-10-02T15:06:25Z

From my side, this is great for now, @lazappi! Thank you!

falexwolf · 2024-10-04T09:02:37Z

@Zethson @sunnyosun - guys, do you think we should have the simple fields vertically displayed or all in one line as in Python?

Zethson · 2024-10-04T09:14:03Z

If we can reduce the clutter to ensure that the simple fields in a single line are not that wide, we can go for the Python version. Your recent suggestion of using shorter UIDs helps.

Else, I think vertical is better because everyone hates horizontal scrolling. But as usual, consistency is key. Let's keep Python and R a consistent UX (at least as close as possible) and then go from there?

rcannood · 2024-10-10T13:39:30Z

I really like this PR, but I think it would be best to create a github issue in the Python repository and come back to this when the print function for Python has been improved.

Would that make sense? Shall we close this PR when the separate github issue has been created?

lazappi · 2024-10-10T15:10:45Z

For now we have implemented something more similar to Python in #22.

rcannood · 2024-10-11T07:33:44Z

Alright, let's close this for now. @lazappi Could you create issue at laminlabs/lamindb with your proposal for what the print would look like?

Update Record print method

3f61fe3

lazappi requested a review from rcannood September 30, 2024 12:20

lazappi mentioned this pull request Oct 9, 2024

Add Record print methods #22

Merged

rcannood closed this Oct 11, 2024

rcannood deleted the print-record branch October 18, 2024 08:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Record print method #3

Update Record print method #3

lazappi commented Sep 30, 2024

rcannood commented Oct 1, 2024

falexwolf commented Oct 1, 2024

lazappi commented Oct 2, 2024

falexwolf commented Oct 2, 2024 •

edited

Loading

lazappi commented Oct 2, 2024

falexwolf commented Oct 2, 2024 •

edited

Loading

falexwolf commented Oct 4, 2024

Zethson commented Oct 4, 2024

rcannood commented Oct 10, 2024

lazappi commented Oct 10, 2024

rcannood commented Oct 11, 2024

Update Record print method #3

Update Record print method #3

Conversation

lazappi commented Sep 30, 2024

rcannood commented Oct 1, 2024

falexwolf commented Oct 1, 2024

lazappi commented Oct 2, 2024

falexwolf commented Oct 2, 2024 • edited Loading

lazappi commented Oct 2, 2024

falexwolf commented Oct 2, 2024 • edited Loading

falexwolf commented Oct 4, 2024

Zethson commented Oct 4, 2024

rcannood commented Oct 10, 2024

lazappi commented Oct 10, 2024

rcannood commented Oct 11, 2024

falexwolf commented Oct 2, 2024 •

edited

Loading

falexwolf commented Oct 2, 2024 •

edited

Loading