Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No objects to concatenate in clustering.fit #54

Open
mrbarbitoff opened this issue Apr 2, 2024 · 3 comments
Open

No objects to concatenate in clustering.fit #54

mrbarbitoff opened this issue Apr 2, 2024 · 3 comments

Comments

@mrbarbitoff
Copy link

Hi!

After updating to the latest release of clusTCR, I am facing an issue while attempting to fit the clustering to data (please see the complete traceback below). The same functions worked perfectly with the previous version. I initialize the clustering object like clustering = Clustering(n_cpus=24, chain='A') (though the same error occurs if I don't specify the chain, both for TRA and TRB input data). I'd be grateful for your help with this issue.

ValueError                                Traceback (most recent call last)
Cell In[9], line 1
----> 1 output = clustering.fit(tra_data, include_vgene = True, 
      2                         cdr3_col="aaSeqCDR3", 
      3                         v_gene_col="vGene")

File ~/anaconda3/envs/clustcr_103/lib/python3.10/site-packages/clustcr/clustering/tools.py:96, in timeit.<locals>.timed(*args, **kwargs)
     94 def timed(*args, **kwargs):
     95     start = time.time()
---> 96     result = myfunc(*args, **kwargs)
     97     end = time.time()
     98     print(f'Total time to run ClusTCR: {(end-start):.3f}s')

File ~/anaconda3/envs/clustcr_103/lib/python3.10/site-packages/clustcr/clustering/clustering.py:429, in Clustering.fit(self, data, include_vgene, cdr3_col, v_gene_col, alpha)
    425 """
    426 Function that calls the indicated clustering method and returns clusters in a ClusteringResult
    427 """
    428 if include_vgene:
--> 429     return self._vgene_clustering(data, cdr3_col, v_gene_col)
    430 else:
    431     try:

File ~/anaconda3/envs/clustcr_103/lib/python3.10/site-packages/clustcr/clustering/clustering.py:346, in Clustering._vgene_clustering(self, data, cdr3_col, v_gene_col)
    343 super_clusters = self._faiss(subset["junction_aa"])
    344 # Second clustering step
    345 clusters = ClusteringResult(
--> 346     MCL_multiprocessing_from_preclusters(
    347         super_clusters, self.mcl_params, self.n_cpus
    348         ), chain=self.chain
    349                             ).clusters_df
    350 clusters.cluster += c # adjust cluster identifiers to ensure they stay unique
    351 subset = subset.merge(clusters, left_on="junction_aa", right_on="junction_aa")

File ~/anaconda3/envs/clustcr_103/lib/python3.10/site-packages/clustcr/clustering/methods.py:139, in MCL_multiprocessing_from_preclusters(preclust, mcl_hyper, n_cpus)
    137     if c != 0:
    138         nodelist[c]['cluster'] += nodelist[c - 1]['cluster'].max() + 1
--> 139 return pd.concat(nodelist, ignore_index=True)

File ~/anaconda3/envs/clustcr_103/lib/python3.10/site-packages/pandas/core/reshape/concat.py:382, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    379 elif copy and using_copy_on_write():
    380     copy = False
--> 382 op = _Concatenator(
    383     objs,
    384     axis=axis,
    385     ignore_index=ignore_index,
    386     join=join,
    387     keys=keys,
    388     levels=levels,
    389     names=names,
    390     verify_integrity=verify_integrity,
    391     copy=copy,
    392     sort=sort,
    393 )
    395 return op.get_result()

File ~/anaconda3/envs/clustcr_103/lib/python3.10/site-packages/pandas/core/reshape/concat.py:445, in _Concatenator.__init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    442 self.verify_integrity = verify_integrity
    443 self.copy = copy
--> 445 objs, keys = self._clean_keys_and_objs(objs, keys)
    447 # figure out what our result ndim is going to be
    448 ndims = self._get_ndims(objs)

File ~/anaconda3/envs/clustcr_103/lib/python3.10/site-packages/pandas/core/reshape/concat.py:507, in _Concatenator._clean_keys_and_objs(self, objs, keys)
    504     objs_list = list(objs)
    506 if len(objs_list) == 0:
--> 507     raise ValueError("No objects to concatenate")
    509 if keys is None:
    510     objs_list = list(com.not_none(*objs_list))

ValueError: No objects to concatenate

Yury

@svalkiers
Copy link
Owner

Hi Yury,

Sorry for the inconvenience. I believe this error indicates that your clustering result is empty (i.e. no clusters were detected), hence there is nothing to be concatenated. I will update the script to return a None-type instead.

Another solution would be to loosen up the stringency by only looking at the CDR3 amino acid sequence.

Best,
Sebastian

@svalkiers
Copy link
Owner

The issue should be fixed in the latest build (clustcr-1.0.3+3.g5fa6b46).
Let me know if you encounter any further problems.

Cheers,
Sebastiaan

@mrbarbitoff
Copy link
Author

Hi @svalkiers

Thank you for your reply! It seems that the lack of clustering results was due to the fact that I occasionally installed the GPU version instead of the regular one during an update. Sorry for that.
The issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants