You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I load a csv first into dask, and then into dask dataframe using .from_dask_dataframe, ._meta_nonempty does not exist, causing downstream problems in analysis (e.g. with spatial_shuffle). My hackish solution below takes the head, uses from_geopandas to get the meta, and the replaces the meta in the original. It would be nice to make this just work directly! Not sure if it replicates for other people.
# Load a csv filedf=dd.read_csv(fname,
dtype= {'longitude':float,
'latitude':float,
'geometry':'object',
}).repartition(npartitions=njobs) # njobs is the number of workers I have# Translate to geometry using shapelydf['geometry'] =df.geometry.map(shapely.wkt.loads,meta=('geometry','object'))
# Create a tmp dataframe using a Geodataframe and from_geopandastmp=dg.from_geopandas(gpd.GeoDataFrame(df.head(),geometry='geometry',crs='EPSG:4326'),npartitions=1)
# Now create the dask_geopandas dfdf=dg.from_dask_dataframe(df)
# Need to set metadata here, otherwise spatial_shuffle won't run. df._meta=tmp.compute()
df=df.spatial_shuffle()
The text was updated successfully, but these errors were encountered:
When I load a csv first into dask, and then into dask dataframe using .from_dask_dataframe, ._meta_nonempty does not exist, causing downstream problems in analysis (e.g.
with spatial_shuffle
). My hackish solution below takes the head, usesfrom_geopandas
to get the meta, and the replaces the meta in the original. It would be nice to make this just work directly! Not sure if it replicates for other people.The text was updated successfully, but these errors were encountered: