Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault on Weakly Connected Component #200

Open
2 tasks done
prdx opened this issue Jan 20, 2025 · 2 comments · May be fixed by #202
Open
2 tasks done

Segmentation Fault on Weakly Connected Component #200

prdx opened this issue Jan 20, 2025 · 2 comments · May be fixed by #202
Labels
bug Something isn't working reproduced

Comments

@prdx
Copy link

prdx commented Jan 20, 2025

What happens?

Hi everyone, thanks for the work! Very promising!

I am trying the extension for computing weak connected components. I am using duckdb 1.1.3 and it works on my laptops (I tested on Mac Intel and Mac M4). For some reason on my server machine, I get a Segfault. There is no further information printed, unfortunately

I am attaching the stack trace here:

Stacktrace: https://gist.github.com/prdx/ce6d850697e00e7efd957b53e2c3217e

To Reproduce

I am using this python script

import duckdb
import pandas as pd
import time

# Start the timer


# Read the TSV file
print("Loading the data")
input_file = "links.tsv"  # Replace with the actual file path
data = pd.read_csv(input_file, sep="\t")

# Ensure the TSV has the correct columns
if "a" not in data.columns or "b" not in data.columns:
    raise ValueError("The input TSV must contain 'a' and 'b' columns")

# Connect to an in-memory DuckDB instance
db = duckdb.connect(":memory:")
print(duckdb.__version__)


# Load the DataFrame into DuckDB as a table
db.register("data_table", data)

print("Installing extension")
db.execute("INSTALL duckpgq FROM community")
db.execute("LOAD duckpgq")

# Create vertex table in DuckDb
db.execute(
    "CREATE TABLE devices AS SELECT a AS device FROM data_table UNION SELECT b AS device FROM data_table"
)

# Create an edges table in DuckDB
db.execute("CREATE TABLE edges AS SELECT * FROM data_table")

# Create a property graph from the edges
query_create_graph = """
CREATE PROPERTY GRAPH graph
  VERTEX TABLES (
    devices
  )
  EDGE TABLES (
    edges
      SOURCE KEY (a) REFERENCES devices(device)
      DESTINATION KEY (b) REFERENCES devices(device)
    LABEL connects
  );
"""
db.execute(query_create_graph)

print("Computing WCC")
start_time = time.time()
# Compute the weakly connected components
query_wcc = """
FROM weakly_connected_component(graph, devices, connects);
"""
wcc_result = db.execute(query_wcc).fetchdf()

# Print or save the results
# print(wcc_result)
output_file = "weakly_connected_components.csv"
wcc_result.to_csv(output_file, index=False)
print(f"Weakly connected components saved to {output_file}")

# Drop the property graph
db.execute("DROP PROPERTY GRAPH graph")

# Close the connection
db.close()

# End the timer and print the runtime
end_time = time.time()
runtime = end_time - start_time
print(f"Script runtime: {runtime:.2f} seconds")

OS:

NixOS 24.05 (Uakari)

DuckDB Version:

1.1.3

DuckDB Client:

Python

Full Name:

Anak Bagus

Affiliation:

Arista Networks

How did you load the extension?

Community extension version

Did you include all relevant data sets for reproducing the issue?

No - I cannot share the data sets because they are confidential

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have
@Dtenwolde
Copy link
Contributor

Hi @prdx, thank you for the bug report. Unfortunately, I can't reproduce the bug with the information given. Could you reproduce this segmentation fault using (anonymized) data? Otherwise, would it be possible to mail the file (or anonymized data) to me ([email protected]). I would like to solve this bug :)

Kind regards,
Daniel

@Dtenwolde Dtenwolde added needs more info Needs more information to reproduce the problem bug Something isn't working labels Jan 21, 2025
@prdx
Copy link
Author

prdx commented Jan 22, 2025

Hi @Dtenwolde , I sent the dataset I am using to your email. Thanks!

@Dtenwolde Dtenwolde added reproduced and removed needs triage needs more info Needs more information to reproduce the problem labels Jan 24, 2025
@Dtenwolde Dtenwolde linked a pull request Jan 24, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working reproduced
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants