Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cost function mentioned in main_OSEM.py #71

Closed
gschramm opened this issue Jul 21, 2024 · 9 comments
Closed

cost function mentioned in main_OSEM.py #71

gschramm opened this issue Jul 21, 2024 · 9 comments

Comments

@gschramm
Copy link

NB: It should be `sum(prompts * log(acq_model.forward(self.x)) - self.x * sensitivity)` across all subsets.

The cost function in main_OSEM.py is sum(prompts * log(acq_model.forward(self.x)) - self.x * sensitivity).

Shouldn't that be:
sum(prompts * log(acq_model.forward(self.x)) - acq_model.forward(self.x)) (plus prior)?

@KrisThielemans
Copy link
Member

OSEM does NOT implement a MAP recon.

NB: OSEM does not use `data.prior` and thus does not converge to the MAP reference used in PETRIC.

This example is provided to show yet another alternative how to implement a familiar algorithm, as well as giving the opportunity to compare OSEM metrics iwth MAP metrics.

Apologies if this isn't clear in the documentation, suggestions welcome (or maybe you didn't read it :smile)

@gschramm
Copy link
Author

Indeed. But the cost function mentioned there also does not match the Poisson logL as far as I can see, which could confuse people if they want to evaluate cost (as sum of data fidelity + prior) themselves.

@KrisThielemans
Copy link
Member

But the cost function mentioned there also does not match the Poisson logL as far as I can see, which could confuse people if they want to evaluate cost (as sum of data fidelity + prior) themselves.

Are you talking about the difference between
sum(prompts * log(acq_model.forward(self.x)) - self.x * sensitivity) and sum(prompts * log(m*acq_model.forward(self.x)) - self.x * sensitivity) (where the latter is the one mentioned on the wiki). That is indeed different. However, the difference is an image-independent constant, so should not affect any optimisation (and doesn't affect our metrics). One of the reasons I implemented main_OSEM.py that way is that it is faster (one multiplication and one division less per gradient).

Aside from thresholding, the former is an implementation of [6] UCL/STIR#1472 (comment), while the latter is one of [5]. Unfortunately, I'm planning to switch to [6] in STIR soon, which indeed will confuse people.

@gschramm
Copy link
Author

Indeed. After writing it down, I see that the difference is only the sum of the known additive contamination, so indeed image-independent.

main_OSEM.py is much appreciated and very helpful :)

@KrisThielemans
Copy link
Member

I see that the difference is only the sum of the known additive contamination, so indeed image-independent.

multiplicative term m really: the difference is sum(prompts * log(m)). The additive term cannot be taken out of the log.

By the way, this "trick" is our reason to write the accquisition model as $D(m) (A x+a)$

@gschramm
Copy link
Author

Hm, now I am confused.

If we have an affine fwd mode:

$$ \bar{y} = Ax + b $$

and emission data $y$, the Poisson logL up to constant terms is:

$$ logL = \sum_i y_i \log{\bar{y}_i} - \sum_i \bar{y}_i $$

If I understand your comment in the code correctly, you calculate (mathjaxx bug in a_{ij} ...)

$$ logL = \sum_i y_i \log{\bar{y}_i} - \sum_j x_j s_j = \sum_i y_i \log{\bar{y}_i} - \sum_j x_j \sum_i a_ij 1 $$

which after swapping the last two sums is

$$ logL = \sum_i y_i \log{\bar{y}_i} - \sum_i \sum_j a_ij x_j $$

which is

$$ logL = \sum_i y_i \log{\bar{y}_i} - \sum_i (\bar{y}_i - b_i) $$

isn't it?

@KrisThielemans
Copy link
Member

KrisThielemans commented Jul 22, 2024

nope... First of all, our wiki notation is different from yours. We use

$$ \hat{y} = D(m)(Ax + a) $$

so your $A$ is not ours... (and your \bar{y} is our \hat(y))

Sticking to our notation, the main_OSEM.py suggests to compute
$O = \sum_i y_i \log{[A(x+a)]_i} - \sum_i \sum_j a_{ij} x_j$

which is (unless I'm making a typo!), with $b = m*a $

$$ O= \sum_i y_i (\log{\hat{y}_i - \log m_i)} - \sum_i (\hat{y}_i - b_i) = logL + csts $$

@gschramm
Copy link
Author

  1. In the first definition of $O$, the operator $A$ is applied to $(x + a)$ (image + additive contamination in data space) which I guess is not correct? I think it should be $[Ax + a]_i$
  2. I find it slightly confusing to use $a$ for the contaminations, if we use $a_{ij}$ for the matrix elements of $A$.

@KrisThielemans
Copy link
Member

  1. In the first definition of O, the operator A is applied to (x+a) (image + additive contamination in data space) which I guess is not correct? I think it should be [Ax+a]i

Indeed. Sorry

$O = \sum_i y_i \log{[(Ax)+a)]_i} - \sum_i \sum_j a_{ij} x_j$

  1. I find it slightly confusing to use a for the contaminations, if we use aij for the matrix elements of A.

Apologies. We called it $a$ for "additive". I have not found any agreements on how to call this. (SIRF uses "background" for $b$).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants