PR to record the prevalence of disease/condition, still births, neonatal death, maternal mortality #1455

RachelMurray-Watson · 2024-08-15T09:58:06Z

The point prevalence is recorded for a number of modules and conditions within modules (Alri, BladderCancer, BreastCancer, CardioMetabolicDisorders ( chronic_ischemic_hd, chronic_kidney_disease, chronic_lower_back_pain, diabetes, hypertension), COPD, Depression, Diarrhoea, Epilepsy, Hiv, Labor (Intrapartum stillbirth), Malaria, Measles, NewbornOutcomes, OesophagealCancer, OtherAdultCancer, PostnatalSupervisor, PregnancySupervisor (Antenatal stillbirth), ProstateCancer, RTI, Schisto, TB, Demography (maternal_deaths, newborn_deaths).

Additional questions:
- Okay to calculate the prevalence of diarrhoea? It is a not really a disease in its own right, more of a symptom
- For some modules (RTI), may be more accurately described by calculating incidence, rather than prevalence. Is that useful/okay? Or should it be skipped?

Other notes:
- COPD is defined as ch_lung_function > 3.
- Have not included events in the CardioMetabolic Module (ever_heart_attack and ever_stroke) as would be a cumulative incidence living people who have had such events

tdm32 · 2024-09-04T08:51:16Z

src/tlo/methods/demography.py

+                                                   (df['cause_of_death'] == 'TB')) &
+                                                  (df['date_of_death'] >= (self.sim.date - DateOffset(months=1)))
+                                                  ])
+                direct_deaths_non_hiv = len(df.loc[


These are non-HIV deaths?

Sorry! Yes well spotted, have changed that now

tdm32 · 2024-09-04T08:56:41Z

src/tlo/methods/healthburden.py

        self._years_written_to_log += [year]
+    def write_to_log_prevalence_monthly(self):


Monthly prevalence logs seem reasonable, but bear in mind that some conditions, e.g. malaria can develop and resolve within 1 week. In these cases, we could think about using the clinical counter - which counts episodes of disease. It may not make too much difference but perhaps running a daily then monthly logger and checking if the prevalences vary considerably would be useful.

I have made this change below

tdm32 · 2024-09-04T08:57:53Z

src/tlo/methods/healthburden.py

+        # Check that the format of the internal storage is as expected.
+        self.check_multi_index()
+
+        log_df_line_by_line(


Somewhere it would be useful to store a definition of the way prevalence is reported for each disease module, e.g. ALRI logger will record prevelance of pneumonia/other ALRI, malaria logger should record only clinical/severe cases, COPD is stage 3 and above etc.

Okay, I have added that!

tdm32 · 2024-09-04T08:59:00Z

src/tlo/methods/healthburden.py

+        # Create a DataFrame with one row and assign the population size
+        prevalence_from_each_disease_module = pd.DataFrame({'population': [population_size]})
+        for disease_module_name in self.module.recognised_modules_names:
+            if disease_module_name in ['NewbornOutcomes', 'PostnatalSupervisor', 'Mockitis', 'DiseaseThatCausesA',


Are these placeholders left in?

Mockitis is now removed form the list, as it is being included in the test as a dummy check of prevalences. For the others, however, they do not return any prevalences/conditions, and to avoid a log of lots of columns of zeroes, I have it so that they're skipped over here.

tdm32 · 2024-09-04T09:01:11Z

src/tlo/methods/malaria.py

@@ -755,6 +755,16 @@ def report_daly_values(self):

        return health_values.loc[df.is_alive]  # returns the series

+
+    def report_prevalence(self):


I think this should record clinical and severe cases only. Usually we are interested in the prevalence of symptomatic malaria. Parasite prevalence, which would be ma_is_infected=True is an alternative measure.

Okay great, good to know

Thanks for changing this

Just to note also, if reporting point prevalence each month, this could potentially miss some cases of malaria which occur and then resolve within the month. One other option could be to use the property ma_date_symptoms to find all malaria cases who have had onset of symptoms within the last time period. Whichever way you prefer is ok.

Oh interesting! I suppose most of the other modules are also point prevalence, so maybe for consistency with that, we keep it this way. But if having more of a period prevalence is more useful to you/in general, happy to change it!

tdm32 · 2024-09-04T09:56:29Z

src/tlo/methods/rti.py

@@ -2314,6 +2314,15 @@ def report_daly_values(self):
        disability_series_for_alive_persons = df.loc[df.is_alive, "rt_disability"]
        return disability_series_for_alive_persons

+    def report_prevalence(self):


I think this property rt_road_traffic_inc relates to having been involved in a RTI. The property rt_inj_severity can be none, ie. no injuries arising. For this, perhaps we should log rt_inj_severity != 'none'?

Okay, I think I had
df = self.sim.population.props
total_prev = len(
df[(df['is_alive']) & (df['rt_inj_severity'] != 'none')]
) / len(df[df['is_alive']])

return total_prev Originally, but I think Margherita implied that this would include a consideration of recovery time. Do you think it's okay to use, still?

Could possibly select cases using property rt_date_inj to make sure injury occurred in most recent time period!?

Oh good idea! I've made that change now

tdm32 · 2024-09-04T09:57:52Z

src/tlo/methods/schisto.py

+    def report_prevalence(self):
+        # This returns dataframe that reports on the prevalence of schisto for all individuals
+        df = self.sim.population.props
+        is_infected = (df[self.cols_of_infection_status] == 'Non-infected').any()


I think we want to log infection status is either ['Low-infection', 'High-infection']

Ah sorry I see now it should have been !=, but I have changed to your way for clarity, thank you!

tdm32 · 2024-09-04T10:00:18Z

src/tlo/methods/tb.py

@@ -1009,6 +1009,16 @@ def report_daly_values(self):

        return health_values.loc[df.is_alive]

+    def report_prevalence(self):


I think here we should log only active cases, this way we can compare with WHO reports / GBD etc. Also the way that we assign latent cases is not identical to other models, we don't have infections -> latent -> active so we would under-estimate the latent infections. Best to stick to symptomatic active cases only.

Okay, that's great to know, thank you!

Thanks for making the change

tdm32 · 2024-09-04T10:07:28Z

tests/test_record_prevalence_healthburden_class.py

+    assert (df.dtypes == orig.dtypes).all()
+
+
+def find_closest_recording(prevalence, target_date, log_value, column_name, multiply_by_pop):


I'm not sure of the usefulness of finding the closest reported prevalence value. Would a useful test perhaps be to check all registered modules are logging prevalence every month and the inverse of this (no prevalence reported if module not registered), or set the incidence to 0 for one disease and check logger not reporting anything above 0, assert prevalence values for 2010 within reasonable range, e.g. test for extreme or unlikely values.

We had a long discussion about the test yesterday on our call Part II. We decided to include a dummy disease with which to compare prevalences (this will be in the new test file), and to see if the prevalence of what it reports matches with what has been reported it its own logging file.

I suppose by doing it the way that I was doing it, I was trying to see if the calculations themselves were working, as well as the general mechanics of logging. But do you think such a test is unnecessary? And that by showing e.g. with a dummy module and/or what you have suggested above, it would suffice?

tdm32 · 2024-09-04T10:10:02Z

src/tlo/methods/healthburden.py

-        sim.schedule_event(Healthburden_WriteToLog(self), last_day_of_the_year)
+        sim.schedule_event(Get_Current_DALYS(self), sim.date + DateOffset(months=1))
+        if self.parameters['test']:
+            sim.schedule_event(Get_Current_Prevalence(self), sim.date + DateOffset(months=1))


could we make a choice here for the frequency of logging - if we are interested in very rapidly developing/resolving conditions we could set to daily logger, for broader analyses we could set to annual!?

I have included a parameter now that allows us to set the time of logging as either daily, monthly, or yearly

tbhallett

just adding here the comment made in-person: I like the way this is being done overall.... and we should add this new method to the Module base class so that it's formally part of the definition of a disease module (in the same way that report_dalys is.)

RachelMurray-Watson · 2024-09-04T14:49:03Z

so that it's formally part of the definition of a disease module

Grand! Have that done

tbhallett · 2024-09-04T21:28:55Z

src/tlo/methods/healthburden.py

@@ -58,7 +59,8 @@ def __init__(self, name=None, resourcefilepath=None):
        'Age_Limit_For_YLL': Parameter(
            Types.REAL, 'The age up to which deaths are recorded as having induced a lost of life years'),
        'gbd_causes_of_disability': Parameter(
-            Types.LIST, 'List of the strings of causes of disability defined in the GBD data')
+            Types.LIST, 'List of the strings of causes of disability defined in the GBD data'),
+        'logging_frequency_prevalence': Parameter(Types.BOOL, 'Set to the frequency at which we want to make calculations of the prevalence logger')


types.BOOL looks wrong as it accept string?

Sorry, yes, that was a hangover from an earlier iteration. Corrected now.

tbhallett · 2024-09-04T21:30:05Z

src/tlo/methods/healthburden.py

@@ -99,6 +101,7 @@ def initialise_simulation(self, sim):
        self.years_life_lost_stacked_time = pd.DataFrame(index=self.multi_index_for_age_and_wealth_and_time)
        self.years_life_lost_stacked_age_and_time = pd.DataFrame(index=self.multi_index_for_age_and_wealth_and_time)
        self.years_lived_with_disability = pd.DataFrame(index=self.multi_index_for_age_and_wealth_and_time)
+        self.prevalence_of_diseases = pd.DataFrame(index=year_index)


if we want it to take different frequencies (e.g month, year etc) then we'd need a difference index.

tbhallett · 2024-09-04T21:31:37Z

src/tlo/methods/healthburden.py

+            ).groupby(level=1).sum() \
+                .assign(year=date_of_death.year) \
+                .set_index(['year'], append=True)['person_years'] \
+                .pipe(_format_for_multi_index)


these changes (and those above) look like they're just formatting changes. So we'll roll these back before merging.

tbhallett · 2024-09-04T21:32:48Z

src/tlo/methods/healthburden.py

+            sim.schedule_event(Get_Current_Prevalence(self), sim.date + DateOffset(days=0))
+            sim.schedule_event(Healthburden_WriteToLog_Prevalences(self), sim.date + DateOffset(days=0))


if these two events happen at the same frequency and we want to guarantee the order they happen in, I think they should be ONE event.

tbhallett · 2024-09-04T21:34:39Z

src/tlo/methods/healthburden.py

+        # 5) Log the prevalence of each disease
+        log_df_line_by_line(
+            key='prevalence_of_diseases',
+            description='Prevalence of each disease.',
+            df=self.prevalence_of_diseases,
+            force_cols=self.recognised_modules_names,
+        )


is this unintentionally left-in? The function is defined below (embedded in write-to-log-prevalence)

tbhallett · 2024-09-04T21:50:46Z

src/tlo/methods/healthburden.py

+                               disease_module_name)
+
+                # Add the prevalence data as a new column to the DataFrame
+                prevalence_from_each_disease_module[column_name] = prevalence_from_disease_module.iloc[:, 0]


in most cases, there module is returning a single number; sometimes a set of numbers. Could this be defined by a dict instead, for simplicity?

I think to use the log_df_line_by_line, it needs to be a dataframe, but I may have misinterpreted

EDIT: are all now dictionaries

tbhallett · 2024-09-04T21:51:33Z

tests/test_record_prevalence_healthburden_class.py

+end_date = Date(2012, 1, 1)
+
+popsize = 1000
+seed = 42


seed can be a 'magical' kwarg to a function beginning with test and pyyest will populate it for you. (same as tmpdir)

tbhallett · 2024-09-04T21:53:41Z

tests/test_record_prevalence_healthburden_class.py

+             prevalence_mockitis_log["TotalInf"][j])
+
+         if target_date <= max_date_in_prevalence:
+             find_closest_recording(prevalence, target_date, regular_log_value, 'Mockitis', True)


do we need this extra step of finding the closest recording. For this dummy we can set the logging frquency to be the same, so that we can do a straight-forward comparison, can't we?

I think mockitis logs every 6 months, so rather than setting a more detailed frequency for logging, I have kept the "closest match" date here.

Edited: new dummydisease in mockitis file. Set the logging frequency to be the same, so no need for this function

tbhallett · 2024-09-04T21:56:23Z

src/tlo/methods/healthburden.py

+        population_size = len(self.sim.population.props[self.sim.population.props['is_alive']])
+
+        # Create a DataFrame with one row and assign the population size
+        prevalence_from_each_disease_module = pd.DataFrame({'population': [population_size]})


I'm wondering why this is dataframe

I initiated it a sa dataframe that I could then populate. I think for the the logging line-by-line function it needs to be a dataframe (like the DALYs), but I may have misinterpreted.

EDIT: Have now changed so that the prevalence is now collected as dictionaries

tbhallett · 2024-09-04T21:57:01Z

src/tlo/methods/healthburden.py

+    """
+
+    def __init__(self, module):
+            super().__init__(module, frequency=DateOffset(months=1))


I think its very murky if we allow this frquency to be differnt to the logging frequncy, That's why I think combine these two events into one.

RachelMurray-Watson · 2024-09-11T13:36:15Z

just adding here the comment made in-person: I like the way this is being done overall.... and we should add this new method to the Module base class so that it's formally part of the definition of a disease module (in the same way that report_dalys is.)

(Based on conversation yesterday) - changed so that it is no longer in base class (as not all modules are disease modules), but there is an assertion checking to see that if something uses the healthburden module, it must have the report_prevalence function

…l individuals. Based solely on "gi_has_diarrhoea", not dehydration, pathogen, etc.

…culated as they are for the "proportion_of_something_in_a_groupby_ready_for_logging", but across everyone and not over age/sex

… returned as all 0s.

…terest, as previously was only looking at module classes and was therefore missing things like polling events

…e of below-1 mortality. Life table calculations assume that 50% of the age group is survived; from this analysis, looks like >60% of children die before 6 months, violating that assumption. Major driver could be encephalopathy?

…g. Unsure it did much

…s a big decrease in demand on the healthcare system

…od and WPP

…rld and added in the HTM scale up. Working on lifestyle examinations

isort

# Conflicts: # src/scripts/get_properties/properties_graph.py

…into rmw/log_prevalence_all_disease

review-notebook-app · 2025-01-07T16:09:47Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

RachelMurray-Watson requested review from joehcollins, tdm32, tbhallett and marghe-molaro August 21, 2024 20:46

RachelMurray-Watson marked this pull request as ready for review August 22, 2024 09:19

tdm32 reviewed Sep 4, 2024

View reviewed changes

tbhallett reviewed Sep 4, 2024

View reviewed changes

tbhallett requested changes Sep 4, 2024

View reviewed changes

RachelMurray-Watson self-assigned this Sep 5, 2024

RachelMurray-Watson added 19 commits September 30, 2024 09:25

added function to calculate monthly prevalence of diarrhoea across al…

e7bae22

…l individuals. Based solely on "gi_has_diarrhoea", not dehydration, pathogen, etc.

added function to calculate monthly prevalence of different ncds. Cal…

12fa604

…culated as they are for the "proportion_of_something_in_a_groupby_ready_for_logging", but across everyone and not over age/sex

added function to calculate monthly prevalence of different ncds. Cal…

08f2523

…culated as they are for the "proportion_of_something_in_a_groupby_ready_for_logging", but across everyone and not over age/sex

Fixed to allow for dataframes to pass through

02fe58e

Fixed from None to False

808c4d7

Fixed from 0 to '0'

96d5096

Fixed calculation of depression in last month

454d0d3

Removed checks

08e41a4

Added debugging

927a649

Removed print checks

00d7b4a

NOT WORKING but tried to simplify the collection process as was being…

2b5c2a3

… returned as all 0s.

Now logging properly

03453e1

Included code for if HIV module isn't loaded

84aa7e8

Added/removed check

1cc0bee

Added function to record number of antenatal stillbirthds

cae5d81

renamed newborn to neonatal

20541a2

Accessed all possible data from logs, but it's not matching

694b132

not needed

508eadf

removed prints

afd8d22

RachelMurray-Watson added 29 commits October 23, 2024 16:08

Merge branch 'refs/heads/master' into rmw/long_term_projections

00e9fe8

updated so that it looks at the entire code of whatever disease of in…

a496369

…terest, as previously was only looking at module classes and was therefore missing things like polling events

actual logging for demography detail

a17c1a2

changed to latest short run

f38debb

removed unneccessary LE graph

03d18e6

More plots to investiagte proprtion of life groups lived

9d086d3

Sample LE estimates with one year age groups and 0.5 year at beginnin…

bfbadc4

…g. Unsure it did much

Can't get half-year age groups to work

c0fc49c

Updated date

4fedd88

Scripts to try and investigate why RTI DALYs etc goes up, but there i…

2314e40

…s a big decrease in demand on the healthcare system

revert to original

b2c3de4

New notebook for investigating the differences in LE between our meth…

17c2296

…od and WPP

New notebook for investigating the differences in LE between our meth…

0e23fc5

…od and WPP

New notebook for investigating the differences in LE between our meth…

00809db

…od and WPP

Merge branch 'refs/heads/master' into rmw/long_term_projections

c100567

To investigate RTI

85d8c7f

Formattin

71d365b

Add in scenarios for baseline (business as usual), renamed perfect wo…

60ccfb1

…rld and added in the HTM scale up. Working on lifestyle examinations

Added in two lifestyle scenarios: cancer and CMD. Hard to paramaterise

28f0503

Fixed number_of_draws

cf3f81c

isort

50% increase/decrease in probabilities instead of doubling?

3bec301

Set up to run

1fb6089

Added MDA_event to try and ensure that MDA events are scheduled

29051d3

Added MDA_event to try and ensure that MDA events are scheduled

48b06e7

Merge branch 'refs/heads/master' into rmw/long_term_projections

a0223c9

# Conflicts: # src/scripts/get_properties/properties_graph.py

Merge branch 'refs/heads/master' into rmw/log_prevalence_all_disease

be55fd1

Merge remote-tracking branch 'origin/rmw/log_prevalence_all_disease' …

fe1448e

…into rmw/log_prevalence_all_disease

Somehow prevalence logging for HIV was removed

d12bfe8

		self._years_written_to_log += [year]
		def write_to_log_prevalence_monthly(self):

		@@ -755,6 +755,16 @@ def report_daly_values(self):

		return health_values.loc[df.is_alive] # returns the series


		def report_prevalence(self):

		@@ -1009,6 +1009,16 @@ def report_daly_values(self):

		return health_values.loc[df.is_alive]

		def report_prevalence(self):

		assert (df.dtypes == orig.dtypes).all()


		def find_closest_recording(prevalence, target_date, log_value, column_name, multiply_by_pop):

		sim.schedule_event(Get_Current_Prevalence(self), sim.date + DateOffset(days=0))
		sim.schedule_event(Healthburden_WriteToLog_Prevalences(self), sim.date + DateOffset(days=0))

PR to record the prevalence of disease/condition, still births, neonatal death, maternal mortality #1455

Are you sure you want to change the base?

PR to record the prevalence of disease/condition, still births, neonatal death, maternal mortality #1455

Conversation

RachelMurray-Watson commented Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tbhallett left a comment

Choose a reason for hiding this comment

RachelMurray-Watson commented Sep 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RachelMurray-Watson commented Sep 11, 2024

review-notebook-app bot commented Jan 7, 2025

RachelMurray-Watson commented Aug 15, 2024 •

edited

Loading