Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running 3D Segmentation #111

Open
NickDiNapoli opened this issue Feb 6, 2024 · 3 comments
Open

Running 3D Segmentation #111

NickDiNapoli opened this issue Feb 6, 2024 · 3 comments

Comments

@NickDiNapoli
Copy link

I am trying to obtain a 3D segmentation result from what I believe is the way to do so. I am not using PRIOR_SEGMENTATION in this run.

  1. Specify z column name in the MOLECULES_CSV via '-z' parameter
  2. Set force_2d = false in the config.toml file

The README documentation explains:

segmentation_polygons.json: polygons used for visualization in GeoJSON format. In the case of 3D segmentation, it is an array of GeoJSON polygons per z-plane, as well as "joint" polygons. Shown only if --save-polygons=geojson is set.

The output GeoJSON only contains one set of 2D geometries instead of the expected output above. Can you please advise what aspect I am missing to obtain a 3D output?

Run parameters:
~/.julia/bin/baysor run
'transcripts.csv'
-c 'example_config_3d.toml'
-x 'global_x' -y 'global_y' -z 'global_z' -g 'gene' -m 20 -s 5
--save-polygons=geojson
-p
-o 'test'

Environment is:
NAME="Amazon Linux AMI"
VERSION="2018.03"

@FalkoHof
Copy link

I am running into a similar issue, which I highlighted in a bit more detail in this other issue here.

@NickDiNapoli
Copy link
Author

@FalkoHof Thanks so much for linking your issue. It seems like your run got to the point where 3D segmentation was at least recognized. Out of curiosity, what did you do differently to at least get to that point, even if your run ultimately errored out because of the too many z values issue?

@FalkoHof
Copy link

FalkoHof commented Mar 5, 2024

The 3D segmentation was working for me irrespectively of using floats or ints on the z-axis. However, the json was only outputing 2D data. What kind of data are you using as input? I used Xenium data and followed this workflow (but deviated a bit from it due to changes in baysor and the xenium output). The modifications I made to the workflow were mostly to the main method of the filter_transcripts.py. script.

See below:

def main():
    # Parse input arguments.
    args = parse_args()

    data_frame = pd.read_csv(args.transcript)

    # Filter transcripts. Ignore negative controls
    filtered_frame = data_frame[(data_frame["qv"] >= args.min_qv) &
                                (data_frame["x_location"] >= args.min_x) &
                                (data_frame["x_location"] <= args.max_x) &
                                (data_frame["y_location"] >= args.min_y) &
                                (data_frame["y_location"] <= args.max_y) &
                                (~data_frame["feature_name"].str.startswith("NegControlProbe_")) &
                                (~data_frame["feature_name"].str.startswith("antisense_")) &
                                (~data_frame["feature_name"].str.startswith("NegControlCodeword_")) &
                                (~data_frame["feature_name"].str.startswith("BLANK_"))]
    # Xenium takes a lot of z-stacks and outputs a continous value for the z-location.Baysor wants a discrete z-stack value though.
    # As stacks are taken in 0.75 um intervals, we just round it to the next multiple of 0.75.
    filtered_frame["z_stack"] = np.rint(filtered_frame.loc[:,"z_location"] / 0.75).astype(np.int32)
    # Add also an additional column where we just round the z-value to the next int
    filtered_frame["z_location_int"] = np.rint(filtered_frame.loc[:,"z_location"]).astype(np.int32)

    # Baysor requires each cell id to be an integer. Newer Xenium ranger versions produce a letter based cell id though.
    # so we map the letter based cell ids to an integer range. Transcripts not assigned to a cell are labled "UNASSIGNED" in newer Xenium output and will be mapped to 0
    cell_ids = np.unique(filtered_frame.loc[:,"cell_id"])
    cell_ids_ints = np.arange(0,len(cell_ids))
    df_cell_ids = pd.DataFrame({"cell_id" : cell_ids, "cell_id_int" : cell_ids_ints}).set_index("cell_id")
    # here we just join the new integer cell ids into the filtered data frame
    filtered_frame = filtered_frame.join(df_cell_ids, how="left", on="cell_id")

    # Change cell_id of cell-free transcripts from "UNASSIGNED" to 0
    neg_cell_row = filtered_frame["cell_id"] == "UNASSIGNED"
    # add an additional column to inducate if the transcript overlaps the cell
    filtered_frame["overlaps_cell"] = 1
    filtered_frame.loc[neg_cell_row,"overlaps_cell"] = 0
    # also add another column if the transcript overlaps with the nucleus. As one cell contains only one nucleus in the Xenium segmentation we can use the cell int id also as nucleus id
    # We do this so we can use also just the nuclei segementations as input for baysor if we like (see https://github.com/kharchenkolab/Baysor?tab=readme-ov-file#using-a-prior-segmentation)
    filtered_frame["nucleus_id_int"] = filtered_frame.loc[:,"cell_id_int"]
    filtered_frame.loc[data_frame["overlaps_nucleus"] == 0, "nucleus_id_int"] = 0

    # Output filtered transcripts to CSV
    filtered_frame.to_csv('_'.join(["X"+str(args.min_x)+"-"+str(args.max_x), "Y"+str(args.min_y)+"-"+str(args.max_y), "filtered_transcripts.csv"]),
                          index=False,
                          encoding = 'utf-8')

@NickDiNapoli What is the error baysor is throwing in your case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants