-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Description of feature
Hi @guillaumehu,
Writing a tutorial where I need get_esm_embedding
, I get the following error:
HTTPError Traceback (most recent call last)
Cell In[55], line 1
----> 1 get_esm_embedding(adata, gene_key="gene_ensembl", gene_emb_key="esm_embeddings", null_value = "control", esm_model_name="esm2_t36_3B_UR50D")
File /ictstr01/home/icb/dominik.klein/git_repos/cell_flow_perturbation/src/cellflow/preprocessing/_gene_emb.py:359, in get_esm_embedding(adata, gene_key, null_value, gene_emb_key, copy, esm_model_name, toks_per_batch, trunc_len, truncation, use_cuda, cache_dir)
357 genes_todo.extend(adata.obs[col].unique().tolist())
358 unique_genes = list(set(genes_todo) - {null_value, None})
--> 359 results, metadata = protein_features_from_genes(
360 genes=unique_genes,
361 esm_model_name=esm_model_name,
362 toks_per_batch=toks_per_batch,
363 trunc_len=trunc_len,
364 truncation=truncation,
365 use_cuda=use_cuda,
366 cache_dir=cache_dir,
367 )
368 adata.uns[gene_emb_key] = results
369 adata.uns[gene_emb_key + "_metadata"] = metadata
File /ictstr01/home/icb/dominik.klein/git_repos/cell_flow_perturbation/src/cellflow/preprocessing/_gene_emb.py:274, in protein_features_from_genes(genes, esm_model_name, toks_per_batch, trunc_len, truncation, use_cuda, cache_dir)
269 if os.getenv("HF_HOME") is None and cache_dir is None:
270 logger.warning(
271 "HF_HOME environment variable is not set and `cache_dir` is None. \
272 Cache will be stored in the current directory."
273 )
--> 274 metadata = prot_sequence_from_ensembl(genes)
275 to_emb = metadata[metadata.protein_sequence.notnull()]
276 use_cuda = use_cuda and torch.cuda.is_available()
File /ictstr01/home/icb/dominik.klein/git_repos/cell_flow_perturbation/src/cellflow/preprocessing/_gene_emb.py:119, in prot_sequence_from_ensembl(ensembl_gene_id)
117 df = pd.DataFrame(columns=columns)
118 for gene_id in ensembl_gene_id:
--> 119 gene_info = GeneInfo(gene_id)
120 results[gene_id] = gene_info.protein_sequence
121 data = [
122 [
123 gene_id,
(...)
129 ]
130 ]
File <string>:4, in __init__(self, gene_id)
File /ictstr01/home/icb/dominik.klein/git_repos/cell_flow_perturbation/src/cellflow/preprocessing/_gene_emb.py:83, in GeneInfo.__post_init__(self)
81 self.transcript_id: str | None = None
82 self.display_name: str | None = None
---> 83 self.canonical_transcript_info = fetch_canonical_transcript_info(self.gene_id)
84 if self.canonical_transcript_info:
85 self.transcript_id = self.canonical_transcript_info["transcript_id"]
File /ictstr01/home/icb/dominik.klein/git_repos/cell_flow_perturbation/src/cellflow/preprocessing/_gene_emb.py:43, in fetch_canonical_transcript_info(ensembl_gene_id)
41 response = requests.get(server + ext, headers=headers)
42 if not response.ok:
---> 43 response.raise_for_status()
45 gene_data = response.json()
46 transcripts = gene_data.get("Transcript", [])
File ~/mambaforge/envs/cellflow/lib/python3.12/site-packages/requests/models.py:1024, in Response.raise_for_status(self)
1019 http_error_msg = (
1020 f"{self.status_code} Server Error: {reason} for url: {self.url}"
1021 )
1023 if http_error_msg:
-> 1024 raise HTTPError(http_error_msg, response=self)
HTTPError: 400 Client Error: Bad Request for url: https://rest.ensembl.org/lookup/id/nan?expand=1
Can we raise which gene the error happens for? This would help the users a lot.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request