Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new interface for samples and fba, fva solutions and workaround for #96 #98

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

hariszaf
Copy link
Collaborator

Aim of this PR is:

  1. to provide an easier way to get flux values from fba and fva and flux distributions from samples, based on reaction ids.
  2. to fix issues Redundant methods in MetabolicNetwork #95 and Error using fva after sampling #96

With respect to aim (1) instead of searching for the index of a reaction, the user gets a dataframe with the reaction ids as indices.

>>> samples.iloc[:, :5]
                 0          1             2             3          4 
PFK       7.477384   7.477381  7.477382e+00  7.477382e+00   7.477385
PFL       0.000001   0.000001  1.437957e-07  3.160731e-08   0.000002
PGI       4.860857   4.860857  4.860859e+00  4.860859e+00   4.860858
PGK     -16.023526 -16.023526 -1.602353e+01 -1.602353e+01 -16.023526

and

>>> samples.loc["O2t"]
0       21.799498
1       21.799493
2       21.799497
3       21.799496
4       21.799497

A relevant interface is available for fba and fva solutions too.

There is also a reactions_map to link a reaction's id to its complete name (quite useful in case of modelseed models).

>>> model.reactions_map
                                         reaction_name
PFK                                Phosphofructokinase
PFL                             Pyruvate formate lyase
PGI                      Glucose-6-phosphate isomerase

Regarding aim (2), an example of how to use the medium was added in the README file and the setter was fixed (#95), while using deepcopy for getting the bounds addressed #96

Copy link
Contributor

@vfisikop vfisikop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! These are really cool stuff.

I have some comments mostly on the efficiency of the proposed changes.

@@ -218,6 +218,27 @@ max_biomass_objective = fva_output[3]

The output of FVA method is tuple that contains `numpy` arrays. The vectors `min_fluxes` and `max_fluxes` contains the minimum and the maximum values of each flux. The vector `max_biomass_flux_vector` is the optimal flux vector according to the biomass objective function and `max_biomass_objective` is the value of that optimal solution.

```python
fva_output_df = model.fva_to_df()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Do we still need the old method model.fva() that changes the internal state of model? I suppose that one function is enough. Then the new one could be called fva() it is more clear I think. And the old one should be made private. Actually it is even simpler if there exist only one function fva() with the new interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments hold for fva.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit tricky as the original fva returns a set of 4 things that we use in other parts of the PolytopeSampler class. Yet, I changes those parts with an _fva and _fba and we now have fva() and fba() to return the dataframes. 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but as the name suggests _fva and _fba should be private, right?

Also, I still cannot see the need of two fva (and fba) functions. I think it complicates the interface. Is it possible to keep the old functions and then the user (if needed) compute a dataframe on demand (it is one line of code).

self._lb,
self._ub,
self._S,
self._biomass_function,
self._parameters["opt_percentage"],
)
self._min_fluxes = min_fluxes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need them to be stored in the model?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought was about the sample_from_fva_output function that used to get as arguments the min_fluxes and the max_fluxes.

At first, I thought of keeping those in the model to provide them in the function but then I thought it would be simpler if the fva is performed with the sample_from_fva_output function.

Yet, maybe it's practical to keep them. No strong arguments on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was lost a bit. Are those class variables used even after that changes of this PR? As far as understand sample_from_fva_output function is not using them.

For example:

```python
initial_medium = model.medium
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified avoiding to copy the entire dictionary (see #95 (comment))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for my reply on the issue.
if you have any ideas for alternatives, let me know!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see so there is still an issue in setting the model correctly. Should we address it in a different PR?

@@ -184,8 +187,10 @@ def generate_steady_states(
self._T = np.dot(self._T, Tr)
self._T_shift = np.add(self._T_shift, Tr_shift)

return steady_states

steady_states_df = pd.DataFrame(steady_states, index = self._metabolic_network.reactions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to make that transformation by default? As far as I can understand that could be memory intensive (especially for large matrices). I think it is better to add a method that takes steady_states and create a df or simply write that one line of code in the doc or README.

@@ -350,6 +351,17 @@ def sample_from_fva_output(

return steady_states

@staticmethod
def samples_as_df(model, samples):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

samples or steady states?


return lb, ub, S, metabolites, reactions, \
biomass_index, biomass_function, medium, inter_medium, exchanges, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be in two lines instead of three?


return lb, ub, S, metabolites, reactions, \
biomass_index, biomass_function, medium, inter_medium, exchanges, \
reactions_map, metabolites_map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should also update the documentation of the function with medium, ..., metabolites_map

model.set_slow_mode()

# Check if script is running in GitHub action
if os.getenv('CI', 'false').lower() == 'true':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

@@ -689,7 +689,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, notebooks cannot be reviewed in github... See https://github.com/orgs/community/discussions/12959

self._N_shift = []
self._T = []
self._T_shift = []
self._metabolic_network = copy.deepcopy(metabol_net)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making a deep copy just shifting the problem (see #96 (comment))

Shouldn't the sampler only transform A and b and leave the model as is?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this has to do with the fast_remove_redundant_facets.
I suggest we keep that as is for this PR so we have something working fine for now, and we open a new PR for the fast_remove_redundant_facets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK let's go with the workaround for now, but please keep the issue open until there is a proper fix. Thanks!

Copy link
Contributor

@vfisikop vfisikop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi again, sorry for late reply.

Let me comment to each change in this PR separately,

  • Regarding dataframes, is it simpler just to let is to the user to create a dataframe if needed, it is just a call pd.DataFrame(data, index) at the end of the day. This way we avoid complicating the interface.
  • Regarding Redundant methods in MetabolicNetwork #95 and medium I propose to handle it in a separate PR to solves in fully (it seems that now it is a work in progress)
  • Regarding Error using fva after sampling #96 I agree to use the deepcopy workaround (but please keep the issue open)

@@ -218,6 +218,27 @@ max_biomass_objective = fva_output[3]

The output of FVA method is tuple that contains `numpy` arrays. The vectors `min_fluxes` and `max_fluxes` contains the minimum and the maximum values of each flux. The vector `max_biomass_flux_vector` is the optimal flux vector according to the biomass objective function and `max_biomass_objective` is the value of that optimal solution.

```python
fva_output_df = model.fva_to_df()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but as the name suggests _fva and _fba should be private, right?

Also, I still cannot see the need of two fva (and fba) functions. I think it complicates the interface. Is it possible to keep the old functions and then the user (if needed) compute a dataframe on demand (it is one line of code).

For example:

```python
initial_medium = model.medium
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see so there is still an issue in setting the model correctly. Should we address it in a different PR?

self._N_shift = []
self._T = []
self._T_shift = []
self._metabolic_network = copy.deepcopy(metabol_net)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK let's go with the workaround for now, but please keep the issue open until there is a proper fix. Thanks!

self._lb,
self._ub,
self._S,
self._biomass_function,
self._parameters["opt_percentage"],
)
self._min_fluxes = min_fluxes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was lost a bit. Are those class variables used even after that changes of this PR? As far as understand sample_from_fva_output function is not using them.

@vfisikop vfisikop requested a review from TolisChal May 17, 2024 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants