You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When you are working off of local datastores only its a bit clunky to have to define the connection the Prefect API and define the names of remote and local Prefect blocks.
Until you can define flow code in julia scripts there's no upside to the prefect integration, since you are writing your flow code in python and calling a julia process.
Example local julia exploratory use-case:
using DataFrames, UnicodePlots, PrefectInterfaces
ENV["PREFECT_API_URL"] ="http://127.0.0.1:4204/api"# dev environment# need to define both of these to use the `read(Dataset)` functionENV["PREFECT_DATA_BLOCK_LOCAL"] ="local-file-system/datastore"ENV["PREFECT_DATA_BLOCK_REMOTE"] ="s3-bucket/datastore"
dsz =Dataset(dataset_name ="my_cool_data_extract", datastore_type ="local")
dfz =read(dsz)
# 404×4 DataFrame# ..etc
If 'Dataset' module (name already taken) could stand alone from PrefectInterfaces, you could bring it on as an extention when needed. In stand alone mode, you'll need to define the filesystem block instead of calling the API url to get that:
using Dataset.local-datastore
dstore = Dataset.local-datastore()
dstore.basepath ="$HOME/toodata/templisher/dev"
And thats all you need to find datasets in your local system. You are working in julia outside of any prefect orchestration.
The text was updated successfully, but these errors were encountered:
A key part of this is that currently read(::Dataset) is defined as a read_path function attached to a prefect block, which has a very Object Oriented structure.
The way I'm using Dataset is its just a metadata reference, mostly carrying filepath locations and local/remote labels.
I do not want to define a 'dataset' with a block, the only prefect block reference needed is the base path to the data store.
remove that read_path/write_path functionality. read(::Dataset) should take the datatype reader as an arguement, and a dataset has "csv" for example as data type. So default would be CSV.read, but can be override with a keyword argument or dispatched based on Dataset type somehow.
Again, this was borrowed from the way Prefect file blocks included a read_path/write_path object method which creates too much linkage between Prefect internal object-oriented structure and the structure of my data application.
When you are working off of local datastores only its a bit clunky to have to define the connection the Prefect API and define the names of remote and local Prefect blocks.
Until you can define flow code in julia scripts there's no upside to the prefect integration, since you are writing your flow code in python and calling a julia process.
Example local julia exploratory use-case:
If 'Dataset' module (name already taken) could stand alone from PrefectInterfaces, you could bring it on as an extention when needed. In stand alone mode, you'll need to define the filesystem block instead of calling the API url to get that:
And thats all you need to find datasets in your local system. You are working in julia outside of any prefect orchestration.
The text was updated successfully, but these errors were encountered: