Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about gene2idx order #240

Open
hongruhu opened this issue Aug 15, 2024 · 0 comments
Open

question about gene2idx order #240

hongruhu opened this issue Aug 15, 2024 · 0 comments

Comments

@hongruhu
Copy link

to extract gene embeddings, I was trying to use three models: model_types = ['whole_human', 'human_brain', 'human_blood'], which means, I have 3 vocab.json files

I briefly checked the files themselves by looking at their last few lines:
(1) for whole_human:

  "RP11-390P2.2": 42033,
  "SGSM3-AS1": 31087,
  "RP11-812I20.2": 42036,
  "AC022154.7": 828,
  "RBMXP1": 42041
}

(2) for human_brain:

  "SETP7": 53181,
  "RP11-175P13.3": 53185,
  "HSPA8P14": 53196,
  "BTG1P1": 53065,
  "OR7E47P": 53197
}

(3) for human_blood:

  "PXT1": 20738,
  "BAG3": 3005,
  "PYCR3": 20744,
  "TAB3": 32782,
  "PYDC1": 20745
}

when I checked the gene2idx using

vocab_file = model_dir / "vocab.json"
vocab = GeneVocab.from_file(vocab_file)
gene2idx = vocab.get_stoi()

I was assuming they would return the consistent order with their corresponding json file order,
but both (1) and (2) return the identical gene2idx which matches the (1) whole_human's gene order:

... 'RP11-390P2.2': 42033, 'SGSM3-AS1': 31087, 'RP11-812I20.2': 42036, 'AC022154.7': 828, 'RBMXP1': 42041}

while (3) is consistent with its own json file:

... 'PXT1': 20738, 'BAG3': 3005, 'PYCR3': 20744, 'TAB3': 32782, 'PYDC1': 20745}

I was just confused why the order of (2) is not consistent with its own json file but identical to (1)'s? Did I miss something? Just wanted to make sure the order of gene index of each model's extracted gene embeddings is correct.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant