-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors increasing with polynomial order and calculation of condition mean #365
Comments
variable = chaospy.variable(12)
e_y_xi = chaospy.E_cond(approx_function, variable[i], joint_distribution)
sxi = chaospy.Var(e_y_xi, joint_distribution)/chaospy.Var(approx_function, joint_distribution) Hope this helps. |
Dr Feinberg, Thank you for replying.
|
|
hello Dr Feinberg, variable = chaospy.variable(12) In the above code while debugging i found out that while variable[i] is q2 the main effects is a function of q4 and the condition mean function doesnt freeze the variables correctly. Could you please hint where i am possibly going wrong. |
I am unsure. Could you post the following:
From your example, do mind posting the output from:
|
Here are the information you asked for: as shown above variable[2] is q2 and conditional mean is a function of q4 |
Ah, the variable names are not sorted for more than 10 variables. That makes sense. I've made a new release of Chaospy now version 4.3.7. which has a fix for conditional expectations. Let me know if that solves your problem. |
Are the sobol indices using Sens_m function also dependent on the sorting of variables in the approx_function? Do they give in the same order as the variables in joint distribution? |
In principle yes as Sens_m is using E_cond underneath the hood. But I don't think so from skimming the code. |
Thanks i have upgraded to 4.3.7. |
Chaospy is inherently positional. If you rearange the order of something, that is fine, you just need to be consistent about it. The following two methods should be equivalent for getting s0 and s1: dist1 = cp.Normal(0, 1)
dist2 = cp.Uniform(-1, 1)
q0, q1 = chaospy.variable(2)
# METHOD 1
joint = cp.J(dist1, dist2)
expansion = cp.create_expansion(4, joint)
samples = joint.sample(1000)
evals_ = [foo(x[0], x[1]) for x in samples.T]
approx = cp.fit_regression(expansion, samples, evals)
s0 = cp.Var(cp.E_cond(approx, q0, joint), joint) / cp.Var(approx, joint)
s1 = cp.Var(cp.E_cond(approx, q1, joint), joint) / cp.Var(approx, joint)
# METHOD 2
joint = cp.J(dist2, dist1)
expansion = cp.create_expansion(4, joint)
samples = joint.sample(1000)
evals_ = [foo(x[1], x[0]) for x in samples.T]
approx = cp.fit_regression(expansion, samples, evals)
s0 = cp.Var(cp.E_cond(approx, q1, joint), joint) / cp.Var(approx, joint)
s1 = cp.Var(cp.E_cond(approx, q0, joint), joint) / cp.Var(approx, joint) In other words, there is a bit of book keeping needed to ensure that the reordering stays the same. But if you manage to not make any mistake, you should have order invariance as you expect, yes. |
In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modelled as an nth degree polynomial in x. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y |x). Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function E(y | x) is linear in the unknown parameters that are estimated from the data. For this reason, polynomial regression is considered to be a special case of multiple linear regression. The explanatory (independent) variables resulting from the polynomial expansion of the "baseline" variables are known as higher-degree terms. Such variables are also used in classification settings.[1] {\displaystyle y=\beta _{0}+\beta _{1}x+\varepsilon ,,}{\displaystyle y=\beta _{0}+\beta _{1}x+\varepsilon ,,} In many settings, such a linear relationship may not hold. For example, if we are modeling the yield of a chemical synthesis in terms of the temperature at which the synthesis takes place, we may find that the yield improves by increasing amounts for each unit increase in temperature. In this case, we might propose a quadratic model of the form {\displaystyle y=\beta _{0}+\beta _{1}x+\beta _{2}x^{2}+\varepsilon .,}{\displaystyle y=\beta _{0}+\beta _{1}x+\beta _{2}x^{2}+\varepsilon .,} In general, we can model the expected value of y as an nth degree polynomial, yielding the general polynomial regression model {\displaystyle y=\beta _{0}+\beta _{1}x+\beta _{2}x^{2}+\beta _{3}x^{3}+\cdots +\beta _{n}x^{n}+\varepsilon .,}{\displaystyle y=\beta _{0}+\beta _{1}x+\beta _{2}x^{2}+\beta _{3}x^{3}+\cdots +\beta _{n}x^{n}+\varepsilon .,} |
Hi Jonath I used below paper as reference (section 4.4.3 in the paper), calculated the Sobol indices using PCE coefficients. UQLab uses similar methodology in calculating Sobol indices. In chaospy exported PCE coefficients using .todict() function. And calculated first and total order Sobol indices. Below image Sobol first and total order indices calculated using inbuilt chaospy functions chaospy.Sens_m & chaospy.Sens_t respectively. As image shows below, Sobol indices does not sum to one and total order indices looks way off, given there is no two factor interaction of input variables on the response. First and total order indices suppose to be same, given the model behaves linear relationship between input variables and output response. I am not able to figure it out what causing this when we use chaospy.Sens_m & chaospy.Sens_t functions to calculate Sobol indices. These functions kind of take longer to run and results they are giving seems not accurate. In my model, I have 11 input variables. Using chaospy 4.3.10 & numpoly 1.2.5 Any help on this greatly appreciated. Thank you so much in advance. Thanks Seshagiri |
It is hard for me to answer without looking at code. Do you have a complete minimal snippet that I can have a look at? |
Hi Jonath Thank you so much for the reply. Here is the code I used to calculate the Sobol indices second plot in my previous post for your reference. import os doefilepath=r'C:\Temp' inputvariables=df_eval.columns[0:11] V1=cp.Uniform(6,40) Joint=cp.J(V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11) warnings.filterwarnings("ignore", category=RuntimeWarning) order=2 S1_chaos=pd.DataFrame(data=cp.Sens_m(approx_solver, Joint), index=inputvariables,columns=['First Order']) SAll_chaos=pd.concat([S1_chaos,ST_chaos],axis=1) When I use approximate solver from chaospy PCE and use SALib to calculate Sobol indices, I am getting right answers. from SALib.sample import sobol as sobolsample problem = { doesamples_salib = sobolsample.sample(problem, 1024).T Si = sobolanalyze.analyze(problem, evaluations_pred, print_to_console=False) total_Si, first_Si, second_Si = Si.to_df() sobolax=sobolsalib.plot.bar(title='Sobol Indices - SALib',width=0.75) Thanks Seshagiri |
Like in the below paper reference, I tried to calculate the Sobol indices using PCE coefficients (first picture in my first post), that also i am getting wrong answers. That may be something to do with Isoprobabilistic transform of input variable distributions Calculation proposed in the paper similar to below pictures for your reference - UQLab does this way Is there a way to calculate Sobol Indices from PCE coefficient in chaospy, like UQLab. This method seems much faster Thanks Seshagiri |
Could you post the input dataset |
the method you are showing is better, but I never got around to setting it up. So the method you are proposing could be done in chaospy manually by using |
Hi Jonathan Thanks for looking into it. Here is the data you are looking for. Thanks Seshagiri |
I just made a new release of chaospy version 4.3.11. It contains an PCE specific implementation of the Sobol indices that you can try out: approx, coefficients = chaospy.fit_regression(order, nodes, evals, retall=True)
first_order_indices = chaospy.FirstOrderSobol(expansion, coefficients) My time is somewhat limited these days, so I don't have the oportunity to test the implementation really. So consider it experimental in this first iteration. If you try it out, please let me know how it goes. This approach is clearly supperior for PCE and should in time be made the default. |
Yeah, you are right. The formula is incorrect for the numerator I've made a fix and a dev release. Try it out with: pip install chaospy==4.3.12.dev0 |
Hi Jonathan Thank you so much for the quick fix. Tested the chaospy version 4.3.12.dev0 . Thanks Seshagiri |
Yeah there was a bug, but I had to stare at the code a bit to spot the error. I'vev made a patch now in 4.3.13. Lets hope all the errors are ironed out now. Thanks for suggesting the improved way to calculate Sobol indices, btw. It is an obvious improvement to how it was done before. |
Dr. Feinberg,
My procedure is defined below:
I have 12 input variables X = Xi (i = 1,2,..,12) each normally distrbuted.
I have generated random samples and ran vibration simulation using abaqus on generated samples and got 250 evaluation points.
I have then generated 2nd order othogonal expansion and calculated the coefficients using the regression.
approx_function = chaospy.fit_regression(expansion, samples, evaluations) these have points using the data generated by the vibration simulations of 250 samples
using this i have got a approx_function of shape (90,). I have then calculated the sobol indices.
I am checking the error in Mean and variance by:
True mean (mean of 250 evaluation points) - Approx Mean (mean using approx_function and distribution)
True variance (variance of 250 evaluation points) - Approx variance (variance using approx_function and distribution)
My doubts are as follows:
1)Why are errors increasing on increasing the order of polynomial? How to determine which is the best order to be used?
2) How do i calculate Conditional Mean and Sobol indices (without using m_sens function):
E(Y/Xi) = chaospy.E_cond(approx_function, freeze, joint_distribution) because freeze is accepted as a polynomial and not the RV distribution.
Sxi = Var(E(Y/Xi)(This is calculated over distribution of Xi)/Var(Y) (this is calculated over the joint distribution)
The text was updated successfully, but these errors were encountered: