You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The code I used is as follows, but when checking proceed '201105. shp', only the first 2 million lines were processed, and the remaining other original content changed into 0
May I ask where the problem lies with this code? If anyone can answer, I would greatly appreciate your help
import geopandas as gpd
import time
import dask_geopandas
def process_row(row):
outwen = r'201105.shp'
bianjie = r'2023xian.shp'
jiabianjie = r'E:\201105out'
start_time3 = time.time()
# Read input and clipped boundary shapefiles
target_gdf = gpd.read_file(outwen)
join_gdf = gpd.read_file(bianjie)
# Switch to dask approach
target_gdfnew = dask_geopandas.from_geopandas(target_gdf, npartitions=4)
# Reproject the boundary participating in the join to match the CRS of the target geometry
join_gdf = join_gdf.to_crs(target_gdf.crs)
# Switch to dask approach
join_gdfnew = dask_geopandas.from_geopandas(join_gdf, npartitions=4)
# Use spatial join to find intersecting parts
joined = gpd.sjoin(target_gdfnew, join_gdfnew, how='inner', predicate='intersects')
# Add attributes from 'bianjie' to 'outwen'
joined = joined.drop(columns='index_right') # Remove redundant index column
result = target_gdfnew.merge(joined, how='left', on=target_gdfnew.columns.to_list())
# Save the result to the output boundary
result.to_file(jiabianjie, encoding='utf-8-sig') # Ensure the correct encoding is used
end_time3 = time.time()
execution_time3 = end_time3 - start_time3
print(f"'{jiabianjie}' has added boundaries. Start time: {start_time3:.2f}, End time: {end_time3:.2f}, Execution time: {execution_time3:.2f} seconds")
process_row()
print('Finish')
The text was updated successfully, but these errors were encountered:
1jiangxd
changed the title
Can someone answer why the number and x columns of '1. shp' in the output of this code also become 0?
Can someone answer why the number and x columns of '201105. shp' in the output of this code also become 0?
Jan 13, 2024
@1jiangxd apologies for the slow reply, but looking at your code, the following lines
# Add attributes from 'bianjie' to 'outwen'
joined = joined.drop(columns='index_right') # Remove redundant index column
result = target_gdfnew.merge(joined, how='left', on=target_gdfnew.columns.to_list())
are typically not needed. The result of the spatial join, joined, already has the columns of the original target_gdf, so this additional merge is not doing anything, except for getting back the original rows of target_gdf that didn't have a match in the spatial join. To achieve the same, you do a left join (specifying how='left' in the sjoin` call).
Also, I assume that the gpd.sjoin in your code above should be dask_geopandas.sjoin ?
Can someone answer why the number and x columns of '201105. shp' in the output of this code also become 0?
(Two shp files have been uploaded to my GitHub repository)
https://github.com/1jiangxd/daskgeopandasproblems
The code I used is as follows, but when checking proceed '201105. shp', only the first 2 million lines were processed, and the remaining other original content changed into 0
May I ask where the problem lies with this code? If anyone can answer, I would greatly appreciate your help
The text was updated successfully, but these errors were encountered: