Possibility on speeding the calculations up? #19

AquifersBSIM · 2024-12-05T03:45:46Z

Hi all, I have a question to ask, is there a possibility that we can actually speed the calculations up? It's awesome that mordred is still maintained!

Why speed calculations up?
I have about 90,000 molecules to calculate chemical descriptors and it takes somewhere between 4 hours to 5 hours.

The code

def calculate_3D_function(mols, input_filename):
    calc_3D = Calculator(descriptors, ignore_3D=False)  # Initialise 3D descriptors

    # Calculate descriptors
    print(f"Calculating 3D descriptors for {input_filename}...")

    # Start timer
    start_time = time.time()

    df = calc_3D.pandas(mols)
    print(df.head())  # Display the top rows
    
    # Create output filename based on input filename
    output_filename = f"{os.path.splitext(input_filename)[0]}_descriptors.csv"
    df.to_csv(output_filename, index=False)  # Save to CSV
    print(f"Descriptors saved to {output_filename}")

    # Calculate elapsed time
    elapsed_time = time.time() - start_time
    print(f"Processing completed in {elapsed_time:.2f} seconds.\n")

JacksonBurns · 2024-12-06T21:31:03Z

There's no easy way to make it faster. It's highly parallelized, so if you have access to a machine with more CPU cores it will speed up, but that doesn't really count.

We could do some profiling to find which descriptors are the slowest to calculate and then try and speed those up, if you are interested!

AquifersBSIM · 2024-12-11T11:25:26Z

Hi @JacksonBurns ! Thank you so much for the reply and suggestions. Would it be too troublesome for you to do profiling to find which descriptors are the slowest to calculate?

JacksonBurns · 2024-12-11T16:19:50Z

Sure - I put together this small demo (mordred_profile.json) which you download, change the extension to .ipynb and then open as a jupyter notebook. About half of the execution time is actually spent entering and exiting context managers (this is terrible). This can be mostly be attributed to this method:

    @classmethod
    def from_query(cls, mol, require_3D, explicit_hydrogens, kekulizes, id, config):
        if not isinstance(mol, Chem.Mol):
            raise TypeError("{!r} is not rdkit.Chem.Mol instance".format(mol))

        n_frags = len(Chem.GetMolFrags(mol))

        if mol.HasProp("_Name"):
            name = mol.GetProp("_Name")
        else:
            name = Chem.MolToSmiles(Chem.RemoveHs(mol, updateExplicitCount=True))

        mols, coords = {}, {}

        for eh, ke in ((eh, ke) for eh in explicit_hydrogens for ke in kekulizes):
            m = Chem.AddHs(mol) if eh else Chem.RemoveHs(mol, updateExplicitCount=True)

            if ke:
                Chem.Kekulize(m)

            if require_3D:
                try:
                    conf = m.GetConformer(id)
                    if conf.Is3D():
                        coords[eh, ke] = conformer_to_numpy(conf)
                except ValueError:
                    pass

            m.RemoveAllConformers()
            mols[eh, ke] = m

        return cls(mols, coords, n_frags, name, config)

inside mordred/_base/context.py which is spending a lot of time operating on the input molecules. Perhaps you can find a way to reduce the time spent in this method? I think the name method and all of the rdkit Chem operations are probably expensive.

AquifersBSIM · 2024-12-16T06:30:19Z

Hi @JacksonBurns, Thank you for the demo! I will have a look at this and hopefully come back with good news! I actually tried to utilize more CPU cores, and it actually sped up quite well (a rough estimation would be x10), but you're right, it doesn't really count.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibility on speeding the calculations up? #19

Possibility on speeding the calculations up? #19

AquifersBSIM commented Dec 5, 2024 •

edited

Loading

JacksonBurns commented Dec 6, 2024

AquifersBSIM commented Dec 11, 2024

JacksonBurns commented Dec 11, 2024

AquifersBSIM commented Dec 16, 2024

Possibility on speeding the calculations up? #19

Possibility on speeding the calculations up? #19

Comments

AquifersBSIM commented Dec 5, 2024 • edited Loading

JacksonBurns commented Dec 6, 2024

AquifersBSIM commented Dec 11, 2024

JacksonBurns commented Dec 11, 2024

AquifersBSIM commented Dec 16, 2024

AquifersBSIM commented Dec 5, 2024 •

edited

Loading