From network drives to de-coupled data - what's standing in our way! (and where can it bring the most benefit) #5

epijim · 2024-02-01T20:28:28Z

epijim
Feb 1, 2024
Maintainer

Proposal

Traditionally statistical programmers interacted with network drive type data (e.g. acl permissioned data in folders). In the data science space, this was largely replaced with APIs like GraphQL, ODBC for SQL or REST for many object stores.

In our company, RWD moved 100% to databases and S3 ~8 years ago, and this has helped to enable a culture of data users across the company working from source even if outside the core Scientific Computing Environment platform (e.g. Posit Connect, HPCs, Spotfire) rather than making local copies of data, and allowed new capabilities - like doing joins and pulls across TBs of data in seconds.

We have migrated to parquet for clinical trial data - but still store data in mounted filesystem-like interfaces. Can we migrate clinical trial data access to APIs - do we expect the same benefits? Where should we focus? What experiences have people had across companies? What are the opportunities of moving from folder hierarchies to tags (and nested tags)?

Expected impact

In the 2023 round tables, the SCE discussion flagged having to support network drive based workflows as one of the biggest blockers SCE leads face when trying to modernise our platforms. There would be value into diving into this topic deeper.

Prior discussions/work

Several companies have experience with this - and even for some doing 'network drive like' end-user experiences, they may have learnings - e.g. using Lsx for Lustre's API to expose S3 data as mounted drives.

Would you be willing to potentially facilitate this discussion?

Maybe/No/ask me again later

epijim · 2024-08-05T10:45:09Z

epijim
Aug 5, 2024
Maintainer Author

added to the agenda

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

From network drives to de-coupled data - what's standing in our way! (and where can it bring the most benefit) #5

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

From network drives to de-coupled data - what's standing in our way! (and where can it bring the most benefit) #5

epijim Feb 1, 2024 Maintainer

Proposal

Expected impact

Prior discussions/work

Would you be willing to potentially facilitate this discussion?

Replies: 1 comment

epijim Aug 5, 2024 Maintainer Author

epijim
Feb 1, 2024
Maintainer

epijim
Aug 5, 2024
Maintainer Author