Can someone answer why the number and x columns of '201105. shp' in the output of this code also become 0? #261

1jiangxd · 2024-01-12T17:16:52Z

Can someone answer why the number and x columns of '201105. shp' in the output of this code also become 0?

(Two shp files have been uploaded to my GitHub repository)
https://github.com/1jiangxd/daskgeopandasproblems

The code I used is as follows, but when checking proceed '201105. shp', only the first 2 million lines were processed, and the remaining other original content changed into 0
May I ask where the problem lies with this code? If anyone can answer, I would greatly appreciate your help

import geopandas as gpd
import time

import dask_geopandas

def process_row(row):
    outwen = r'201105.shp'
    bianjie = r'2023xian.shp'
    jiabianjie = r'E:\201105out'
    
    start_time3 = time.time()
    
    # Read input and clipped boundary shapefiles
    target_gdf = gpd.read_file(outwen)
    join_gdf = gpd.read_file(bianjie)
    
    # Switch to dask approach
    target_gdfnew = dask_geopandas.from_geopandas(target_gdf, npartitions=4)
       
    # Reproject the boundary participating in the join to match the CRS of the target geometry
    join_gdf = join_gdf.to_crs(target_gdf.crs)
    
    # Switch to dask approach
    join_gdfnew = dask_geopandas.from_geopandas(join_gdf, npartitions=4)
    
    # Use spatial join to find intersecting parts
    joined = gpd.sjoin(target_gdfnew, join_gdfnew, how='inner', predicate='intersects')
    
    # Add attributes from 'bianjie' to 'outwen'
    joined = joined.drop(columns='index_right')  # Remove redundant index column
    result = target_gdfnew.merge(joined, how='left', on=target_gdfnew.columns.to_list())
    
    # Save the result to the output boundary
    result.to_file(jiabianjie, encoding='utf-8-sig')  # Ensure the correct encoding is used
    
    end_time3 = time.time()
    execution_time3 = end_time3 - start_time3
    
    print(f"'{jiabianjie}' has added boundaries. Start time: {start_time3:.2f}, End time: {end_time3:.2f}, Execution time: {execution_time3:.2f} seconds")

process_row()

print('Finish')

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2024-05-06T14:33:08Z

@1jiangxd apologies for the slow reply, but looking at your code, the following lines

    # Add attributes from 'bianjie' to 'outwen'
    joined = joined.drop(columns='index_right')  # Remove redundant index column
    result = target_gdfnew.merge(joined, how='left', on=target_gdfnew.columns.to_list())

are typically not needed. The result of the spatial join, joined, already has the columns of the original target_gdf, so this additional merge is not doing anything, except for getting back the original rows of target_gdf that didn't have a match in the spatial join. To achieve the same, you do a left join (specifying how='left' in the sjoin` call).

Also, I assume that the gpd.sjoin in your code above should be dask_geopandas.sjoin ?

1jiangxd changed the title ~~Can someone answer why the number and x columns of '1. shp' in the output of this code also become 0?~~ Can someone answer why the number and x columns of '201105. shp' in the output of this code also become 0? Jan 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can someone answer why the number and x columns of '201105. shp' in the output of this code also become 0? #261

Can someone answer why the number and x columns of '201105. shp' in the output of this code also become 0? #261

1jiangxd commented Jan 12, 2024 •

edited

Loading

jorisvandenbossche commented May 6, 2024

Can someone answer why the number and x columns of '201105. shp' in the output of this code also become 0? #261

Can someone answer why the number and x columns of '201105. shp' in the output of this code also become 0? #261

Comments

1jiangxd commented Jan 12, 2024 • edited Loading

jorisvandenbossche commented May 6, 2024

1jiangxd commented Jan 12, 2024 •

edited

Loading