Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training of MAGIK #202

Open
po60nani opened this issue Jan 12, 2024 · 4 comments
Open

Training of MAGIK #202

po60nani opened this issue Jan 12, 2024 · 4 comments
Assignees

Comments

@po60nani
Copy link

Hi,

Thank you for your valuable network. I am currently trying to train MAGIK on my dataset, which has a structure similar to the ones in your tutorials. I encountered two issues during this process:

  1. Extra Columns in EdgeExtractor Output:
    The EdgeExtractor function returns a DataFrame with extra columns containing NaN values. I noticed this discrepancy between my data and the provided tutorial data, and I'm not sure why these extra columns are present. As a workaround, I manually remove these extra columns before returning the data in the function.

Input_df:

input_df

Output_df:

output_df

  1. ValueError in SelfDuplicateEdgeAugmentation:
    When attempting to train with my dataset, I encountered the following error:
    File "mtrand.pyx", line 909, in numpy.random.mtrand.RandomState.choice
    ValueError: a must be greater than 0 unless no samples are taken.
    
    I traced this error back to the SelfDuplicateEdgeAugmentation function, specifically in the inner function where offset = maxnofedges - nofedges results in an offset of 0. I'm unsure how to handle this situation and would appreciate guidance on resolving this issue.

Any assistance or clarification on these matters would be greatly appreciated.

Best regards,

@JesusPinedaC
Copy link
Collaborator

JesusPinedaC commented Jan 23, 2024

Hi @po60nani,

Thanks for your interest in MAGIK!

  1. This error is due to EdgeExtractor expecting frames to start from zero. You can resolve this issue by nodesdf["frame"] -= nodesdf["frame"].min(). We will clarify this in the documentation.

  2. The problem is indeed related to what you mentioned. However, it is related not to offset = 0 but rather to nofedges = 0.

In cases where offset is 0 and nofedges is greater than or equal to 0, np.choice returns an empty array, which does not affect the function's performance. In such cases, duplicated_edges is simply assigned the value of edges.

The case you describe, in turn, arises when nofedges = 0 and offset > 0 (idx = np.random.choice(0, whatever > 0, replace=True) reproduces the error), indicating that some graphs in your batch do not have any edges.

To better help you resolve this issue, we need to confirm some details:

  • Can you confirm if the centroids are normalized to a range of roughly 0 to 1?
  • The radius used to generate edges must be large enough to connect time-subsequent frames.
  • Please note that the ExtractEdges function was not designed to be used as a standalone function. Instead, it was intended to be used within GraphExtractor, which ensures the proper handling of the dataframe until the final graphs are generated. If you haven't already, I recommend using this function as we do in the tutorials.

I suggest solving problem 1 and then check if 2 still persists!

@po60nani
Copy link
Author

Thank you for providing additional insights into the issue. I appreciate your effort in investigating the problem. As suggested, I will focus on solving problem 1 and then reevaluate if problem 2 persists.

I have implemented the suggested solution by adjusting the frames using nodesdf["frame"] -= nodesdf["frame"].min(). However, the problem persists.

image

@JesusPinedaC
Copy link
Collaborator

JesusPinedaC commented Jan 23, 2024

Could you please try with the following toy example?

import numpy as np
import pandas as pd

import deeptrack as dt

# like in your case
frame_shift = 0 # right case: 0

# randomly generated centroids
centroids = np.random.rand(80, 2)
frames = np.arange(0, 80) + frame_shift

nodesdf = pd.DataFrame()
nodesdf[["centroid-0", "centroid-1"]] = centroids
nodesdf["frame"] = frames
nodesdf["label"] = 0 # single particle
nodesdf["solution"] = 0
nodesdf["set"] = 0


# display the first 20 rows of the dataframe
nodesdf.head(20)

# Seach radius for the graph edges
radius = 0.7

# time window to associate nodes (in frames)
nofframes=3

# compute edges
edges = dt.models.gnns.graphs.EdgeExtractor(
    nodesdf, 
    parenthood=np.ones((1, 2)) * -1, 
    radius=radius, 
    nofframes=nofframes
    )

Here, frame_shift = 7 reproduces the issue:

nodesdf
nodesdf_frame_7

edges
edges_frame_7

While, frame_shift=0 produces the correct output:

nodesdf
nodesdf_frame_0

edges
edges_frame_0

@po60nani
Copy link
Author

I have thoroughly examined the provided toy example, and it accurately reproduces the expected results you shared. However, when applying the code to my dataset, I encountered an error. To facilitate the troubleshooting process, I have uploaded both the CSV file (df_PSFs.csv) and the code for your review.

Code:

import deeptrack as dt
from deeptrack.models.gnns.generators import GraphGenerator
import pandas as pd
import numpy as np
import deeptrack as dt
import logging

logging.disable(logging.WARNING)

if __name__ == "__main__":

    path_csv = r'./df_PSFs.csv'
    nodesdf = pd.read_csv(path_csv)

    print(nodesdf.head(20))

    # normalize centroids between 0 and 1
    nodesdf.loc[:, nodesdf.columns.str.contains("centroid")] = (
            nodesdf.loc[:, nodesdf.columns.str.contains("centroid")]
            / np.array([1000.0, 1000.0])
    )

    nodesdf.loc[:, 'solution'] = 0.0
    nodesdf.loc[:, 'set'] = 0.0

    nodesdf["frame"] -= nodesdf["frame"].min()

    # display the first 20 rows of the dataframe
    nodesdf.head(20)

    # Search radius for the graph edges
    radius = 0.2

    # Time window to associate nodes (in frames)
    nofframes=3

    # Compute edges
    edges = dt.models.gnns.graphs.EdgeExtractor(
        nodesdf, 
        parenthood=np.ones((1, 2)) * -1, 
        radius=radius, 
        nofframes=nofframes
    )

    a = 1

Additional Information:
df_PSFs.csv

  • The output of nodesdf is:

image

  • The output of edges is:

image

  • My panda version is:
    pandas 1.5.3

Upon execution, I expect the code to run successfully without encountering any errors. The provided toy example validates this expectation, but the issue arises with my dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants