Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package name #2

Closed
yebai opened this issue Jun 13, 2021 · 28 comments
Closed

Package name #2

yebai opened this issue Jun 13, 2021 · 28 comments

Comments

@yebai
Copy link
Member

yebai commented Jun 13, 2021

InferenceDiagnostics sounds rather generic IMO. Does it make sense to narrow down a bit, e.g. MCMCDiagnostics or MCMCTools?

@yebai
Copy link
Member Author

yebai commented Jun 13, 2021

Just found the following package with the name MCMCDiagnostics. Maybe we can join forces, merge the code and move the new library under the ArviZ umbrella?

https://github.com/tpapp/MCMCDiagnostics.jl

@devmotion
Copy link
Member

The name was suggested and motivated in TuringLang/MCMCChains.jl#266 (the discussion also mentioned MCMCDiagnostics).

@devmotion
Copy link
Member

BTW this package (ie the code in MCMCChains) contains strictly more general implementations of the statistics in MCMCDiagnostics (eg three different algorithms for estimating the ESS).

@yebai
Copy link
Member Author

yebai commented Jun 13, 2021

I'd prefer merging all the code (i.e. diagnostics code in MCMCChains, and MCMCDiagnostics) into a new MCMCDiagnostics library, then co-maintain the library under ArviZ. Such a plan would benefit all DynamicHMC, Turing and ArviZ users.

@devmotion
Copy link
Member

The intention of this package is to contain all the diagnostic code of MCMCChains. I don't think there's anything in MCMCDiagnostics that is missing and could be merged.

@yebai
Copy link
Member Author

yebai commented Jun 13, 2021

I see - let's still make an effort at merging these packages to reduce redundancy in the Julia MCMC ecosystem. If there is no code to merge, we can simply ask the current MCMCDiagnostics author whether it's ok to transfer the package to ArviZ for (hopefully better) development.

@sethaxen
Copy link
Member

The motivation for InferenceDiagnostics is that we may want to include more general diagnostics than those for MCMC, e.g. for VI. One could create a VIDiagnostics, but IIUC there's enough overlapping code that it would make sense to have a single library. That being said, we could always start with MCMCDiagnostics and rename later if we do expand. Currently all diagnostics in the ArviZ package are MCMC-specific.

RE merging with MCMCDiagnostics, I agree that the diagnostics in this package are a superset of those in MCMCDiagnostics and that we should at least discuss a merge to the author.

@devmotion
Copy link
Member

How should we proceed? It would be nice to be able to finish TuringLang/MCMCChains.jl#310 and to clean MCMCChains, so it would be good to start the registration process.

Is the plan to discuss a merge (or rather replacement) of MCMCDiagnostics? The package seems to be very lightweight compared with InferenceDiagnostics, I wonder if this might be a problem.

We can also rename the package at a later time point (technically register a new package with a different name: https://github.com/JuliaRegistries/General#how-do-i-rename-an-existing-registered-package) or e.g. only use it as an umbrella package for smaller subpackages such as VIDiagnostics etc.

@yebai
Copy link
Member Author

yebai commented Jun 17, 2021

I kind of like a more specific name than InferenceDiagnostics. In fact, the terminology inference can refer to very different concepts in machine learning and statistics. For example, in deep learning, inference means prediction.

Re VI related diagnostics, it's a bit unclear what diagnostics one can offer in general. The literature on such diagnostics is less mature than those for MCMC in my view. Thus I'm slightly worried to include such diagnostics together with MCMCDiagnostics, since they are a bit experimental in nature.

@ParadaCarleton
Copy link
Member

ParadaCarleton commented Jun 20, 2021

Giving my own two cents -- InferenceDiagnostics as a name makes me immediately think of things like WAIC or Bayesian R^2 -- goodness of fit diagnostics that tell me how good my inferences are, rather than diagnostics for MCMC convergence.

@ParadaCarleton
Copy link
Member

I kind of like a more specific name than InferenceDiagnostics. In fact, the terminology inference can refer to very different concepts in machine learning and statistics. For example, in deep learning, inference means prediction.

Re VI related diagnostics, it's a bit unclear what diagnostics one can offer in general. The literature on such diagnostics is less mature than those for MCMC in my view. Thus I'm slightly worried to include such diagnostics together with MCMCDiagnostics, since they are a bit experimental in nature.

I think that makes some sense, but we could just add a warning for using those diagnostics if we feel like they're on shaky ground.

@devmotion
Copy link
Member

Giving my own two cents -- InferenceDiagnostics as a name makes me immediately think of things like WAIC or Bayesian R^2 -- goodness of fit diagnostics that tell me how good my inferences are, rather than diagnostics for MCMC convergence.

I imagined these belong here as well. Not all them have to be defined in this package necessarily (some belong to or might even exist in StatsBase, and some are eg implemented in StatsModelComparisons) but I think it would be useful to at least reexport them and to make sure they use a consistent API.

@sethaxen
Copy link
Member

Giving my own two cents -- InferenceDiagnostics as a name makes me immediately think of things like WAIC or Bayesian R^2 -- goodness of fit diagnostics that tell me how good my inferences are, rather than diagnostics for MCMC convergence.

I imagined these belong here as well. Not all them have to be defined in this package necessarily (some belong to or might even exist in StatsBase, and some are eg implemented in StatsModelComparisons) but I think it would be useful to at least reexport them and to make sure they use a consistent API.

ArviZ lists WAIC and R^2 as statistics, not diagnostics, but I agree, these should probably be in this package or a more general package like StatsBase.

To summarize our discussions, all of the diagnostics we are immediately planning to add are applicable to samples generated with MCMC, and that is currently out main focus. I am okay with renaming this package as MCMCDiagnostics and accepting the reduced scope. I think then our next step is to propose a merge to the author of MCMCDiagnostics.jl.

@sethaxen
Copy link
Member

ArviZ lists WAIC and R^2 as statistics, not diagnostics, but I agree, these should probably be in this package or a more general package like StatsBase.

@ColCarroll explained to me on Slack that the distinction between statistics and diagnostics in ArviZ is:

a stat tells you something about the posterior, a diagnostic tells you something about the quality of inference.

I think this distinction is useful in that it keeps us from implementing every tool for posterior analysis here. By that distinction, we would probably not want WAIC and R^2 in this package and instead might consider instead a package for generic utilities for posterior analysis to hold them. The scope of this package would be diagnostics of the quality of the samples generated by some MCMC method.

@ParadaCarleton
Copy link
Member

ParadaCarleton commented Jun 24, 2021

My suggestion for a name is PosteriorDiagnostics or SamplingDiagnostics, which gives us wiggle room if we want to add diagnostics for variational inference later while avoiding confusion about the term "Inference." Also worth noting we might want to add importance sampling diagnostics -- Pareto ξ values from a PSIS fit are known to be good importance sampling diagnostics, so that's at least one thing we could add right now if we wanted. Pareto ξ values are also good diagnostics for VI+IS.

I would suggest a separate package for posterior goodness-of-fit measures, with a name like BayesFit.

@devmotion
Copy link
Member

I am okay with renaming this package as MCMCDiagnostics and accepting the reduced scope. I think then our next step is to propose a merge to the author of MCMCDiagnostics.jl.

Since the author wants to keep the existing MCMCDiagnostics.jl lightweight and separate, we have to reconsider if and how the package should be renamed.

@ParadaCarleton
Copy link
Member

Are there any objections to SamplingDiagnostics? I think this is more clear than InferenceDiagnostics.

@yebai
Copy link
Member Author

yebai commented Jun 30, 2021

Sampling is quite generic as well IMO. If we are searching for a MCMCDiagnostics alternative, maybe MCMCDiagnosticTools could work?

@ParadaCarleton
Copy link
Member

ParadaCarleton commented Jul 1, 2021

Sampling is quite generic as well IMO. If we are searching for a MCMCDiagnostics alternative, maybe MCMCDiagnosticTools could work?

MCMCDiagnostics is preferable to MCMCDiagnosticTools, which is quite the mouthful. Sampling is supposed to be generic, so we can add diagnostics for VI later if we want. How about PosteriorDiagnostics, BayesDiagnostics, or MCDiagnostics (for Monte Carlo diagnostics)?

@devmotion
Copy link
Member

I think MCMCDiagnosticTools is a good alternative to MCMCDiagnostics (which is not available since a package of the same name exists and its author does not want to merge additional diagnostics and dependencies @ParadaCarleton). Users and downstream developers can always introduce an alias if they think the name is too long, so the additional 5 characters shouldn't matter. I also agree that Sampling is quite generic, this was already a sentiment in the discussion in MCMCChains IIRC.

What do you think about MCMCDiagnosticTools @sethaxen?

@sethaxen
Copy link
Member

sethaxen commented Jul 1, 2021

What do you think about MCMCDiagnosticTools @sethaxen?

What I don't like: If one tries to tab-complete the package name, they'll get both package names and need to remember which is our package. What I do like: it's descriptive of what the current scope of the package is. I agree SamplingDiagnostics and InferenceDiagnostics are too generic for the current scope. So I'm happy with the name change to MCMCDiagnosticTools.jl.

@devmotion
Copy link
Member

Maybe AdvancedMCMCDiagnostics would be an alternative (following the AdvancedMH/AdvancedHMC etc theme in the Turing ecosystem 😄)?

@sethaxen
Copy link
Member

sethaxen commented Jul 1, 2021

Sadly I can only dislike that once.

@ParadaCarleton
Copy link
Member

ParadaCarleton commented Jul 2, 2021

How about MCChecks? This solves the tab-complete, too specific, and "Too long" problems at once:

  1. It refers to all Monte Carlo-based algorithms. This would include all the algorithms I know of for checking VI approximations, which are based on importance sampling. If this isn't desirable, we can go with MCMCChecks instead.
  2. It's way shorter -- 8 characters. This isn't just important for typing out, but also because it makes it easier for users to remember it. Not even joking, I had to double-check what the name proposed above was just now. Not to mention, users will probably get annoyed having to Google "MCMCDiagnosticTools" and arrive on our issues page with a pitchfork in hand.
  3. It won't be confused with MCMCDiagnostics -- the names MCMCDiagnostics and MCMCDiagnosticTools are very similar, and if this package becomes popular, we'll probably end up with a bunch of confused users putting issue requests for their package on our Github. (Especially since a Google search for MCMCDiagnostics will probably lead to whichever of the two packages is more popular. In fact, a search for "MCMCDiagnosticTools Julia" already leads to MCMCDiagnostics.jl, and it might take quite a while for MCMCDiagnosticTools to replace MCMCDiagnostics as the first result...)

@devmotion
Copy link
Member

IMO MCChecks is too generic (and I think MonteCarloChecks/MonteCarloDiagnostics would be clearer if we want to widen the aim of the package). If there are no other suggestions, I'll rename the package and update all links to MCMCDiagnosticTools tomorrow so we can start the registration process.

@ParadaCarleton
Copy link
Member

If MCChecks is considered too generic, I would have no objections to any of MCMCChecks, MonteCarloChecks, or MonteCarloDiagnostics.

@ParadaCarleton
Copy link
Member

(Or, alternatively, AdvancedChecks/AdvancedDiagnostics. Even InferenceDiagnostics is better -- I just don't like MCMCDiagnosticTools, since it's going to be confused with MCMCDiagnostics. That being said, this really isn't important enough to keep holding up the package.)

@devmotion
Copy link
Member

Since MCMCDiagnosticTools seemed to be the name most of us agreed on I renamed the package accordingly and started the registration process: JuliaRegistries/General#40583

We can always refactor or rename later if it turns out that it was a suboptimal decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants