Skip to content

Commit

Permalink
deploy: 30d0b73
Browse files Browse the repository at this point in the history
  • Loading branch information
StefanTodoran committed Oct 16, 2023
1 parent 086dac8 commit 2556737
Show file tree
Hide file tree
Showing 13 changed files with 66 additions and 29 deletions.
40 changes: 25 additions & 15 deletions Chapter2-DataManipulation/2.6_resampling.html
Original file line number Diff line number Diff line change
Expand Up @@ -951,12 +951,19 @@ <h3>1.2 Bootstrapping<a class="headerlink" href="#bootstrapping" title="Permalin
</div>
</div>
</div>
<p>We are now going to resample the data and calculate again the Pearson coefficient.</p>
<p>We are now going to resample the data and calculate again the Pearson coefficient. We will take a subset of the data</p>
<div class="cell docutils container">
<div class="cell_input docutils container">
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">nsubset</span><span class="o">=</span><span class="mi">10</span>
</pre></div>
</div>
</div>
</div>
<div class="cell docutils container">
<div class="cell_input docutils container">
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># Next, let&#39;s take some subset of the correlated data</span>

<span class="n">subset</span> <span class="o">=</span> <span class="n">rng</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">correlated_data</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">replace</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="n">subset</span> <span class="o">=</span> <span class="n">rng</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">correlated_data</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">nsubset</span><span class="p">,</span> <span class="n">replace</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>

<span class="c1"># Report the correlation coefficient of the subsets</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">&#39;The correlation coefficient is: &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">corrcoef</span><span class="p">(</span><span class="n">subset</span><span class="p">[:,</span><span class="mi">0</span><span class="p">],</span> <span class="n">subset</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">])[</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">])</span> <span class="o">+</span> <span class="s1">&#39;.&#39;</span><span class="p">)</span>
Expand All @@ -969,16 +976,16 @@ <h3>1.2 Bootstrapping<a class="headerlink" href="#bootstrapping" title="Permalin
</div>
</div>
<div class="cell_output docutils container">
<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>The correlation coefficient is: -0.7196904266013272.
<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>The correlation coefficient is: -0.5069452099352522.
</pre></div>
</div>
<img alt="../_images/2.6_resampling_20_1.png" src="../_images/2.6_resampling_20_1.png" />
<img alt="../_images/2.6_resampling_21_1.png" src="../_images/2.6_resampling_21_1.png" />
</div>
</div>
<p>We will resample <code class="docutils literal notranslate"><span class="pre">number_runs</span></code> times the data.</p>
<div class="cell docutils container">
<div class="cell_input docutils container">
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">number_runs</span> <span class="o">=</span><span class="mi">100</span>
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">number_runs</span> <span class="o">=</span><span class="mi">1000</span>
</pre></div>
</div>
</div>
Expand Down Expand Up @@ -1010,18 +1017,21 @@ <h3>1.2 Bootstrapping<a class="headerlink" href="#bootstrapping" title="Permalin
</div>
</div>
<div class="cell_output docutils container">
<img alt="../_images/2.6_resampling_23_0.png" src="../_images/2.6_resampling_23_0.png" />
<img alt="../_images/2.6_resampling_24_0.png" src="../_images/2.6_resampling_24_0.png" />
</div>
</div>
<p>We see that the median Pearson coefficient is not exactly the true Pearson coefficient. Why is that? How much data was selected in the subset? What happens if you increase the sibset to its max?</p>
</section>
<section id="monte-carlo-resampling">
<h3>1.3 Monte Carlo Resampling:<a class="headerlink" href="#monte-carlo-resampling" title="Permalink to this headline">#</a></h3>
<ul class="simple">
<li><p><strong>Data Simulation</strong>: Monte Carlo resampling involves simulating new data based on a known or assumed statistical model. It generates synthetic data points according to the specified distribution and correlation structure. It does not draw from the original dataset.</p></li>
<li><p><strong>Estimation</strong>: Monte Carlo resampling is used for simulating a wide range of possible scenarios and assessing complex systems or models. It’s particularly useful when dealing with intricate, non-linear relationships, and it can estimate a variety of statistical properties, including those involving complex simulations and models.</p></li>
<li><p><strong>Correlated Data</strong>: Monte Carlo resampling is well-suited to handling correlated data as it explicitly incorporates the correlation structure specified in the simulation model. This makes it more adaptable to scenarios with intricate dependencies.</p></li>
<li><p><strong>Data Simulation</strong>: Monte Carlo resampling involves <em>simulating new data</em> based on a known or assumed statistical model. It generates <em>synthetic data points</em> according to the specified distribution and correlation structure. It does not draw from the original dataset.</p></li>
<li><p><strong>Estimation</strong>: Monte Carlo resampling is used for <em>simulating a wide range of possible scenarios</em> and assessing complex systems or models. It’s particularly useful when dealing with intricate, non-linear relationships, and it can estimate a variety of statistical properties, including those involving complex simulations and models.</p></li>
<li><p><strong>Correlated Data</strong>: Monte Carlo resampling is <em>well-suited to handling correlated data</em> as it explicitly incorporates the correlation structure specified in the simulation model. This makes it more adaptable to scenarios with intricate dependencies.</p></li>
<li><p><strong>Applications</strong>: Monte Carlo resampling is often employed for probabilistic risk assessment, uncertainty propagation in models, and evaluating complex systems’ performance. It is especially valuable when dealing with complex geospatial, environmental, or engineering models, where the relationships among variables are not easily captured by simple parametric methods.</p></li>
</ul>
<p>We will test Monte Carlo sampling by trying ti estimate <span class="math notranslate nohighlight">\(\pi\)</span>.</p>
<p>The ratio of the area of a circle to the area of the square is pi/4. How can we use this knowledge to estimate a value for pi?</p>
</section>
</section>
<section id="level-2-resampling-for-robust-model-inference">
Expand Down Expand Up @@ -1165,7 +1175,7 @@ <h3>2.1 Data: Geodetic time series<a class="headerlink" href="#data-geodetic-tim
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Text(0.5, 0, &#39;Time (years)&#39;)
</pre></div>
</div>
<img alt="../_images/2.6_resampling_28_1.png" src="../_images/2.6_resampling_28_1.png" />
<img alt="../_images/2.6_resampling_31_1.png" src="../_images/2.6_resampling_31_1.png" />
</div>
</div>
</section>
Expand Down Expand Up @@ -1236,7 +1246,7 @@ <h2>2. Linear Regression<a class="headerlink" href="#linear-regression" title="P
<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Coefficient / Velocity eastward (mm/year): -6.438107857588658
</pre></div>
</div>
<img alt="../_images/2.6_resampling_32_1.png" src="../_images/2.6_resampling_32_1.png" />
<img alt="../_images/2.6_resampling_35_1.png" src="../_images/2.6_resampling_35_1.png" />
</div>
</div>
<p>To evaluate the errors of the model fit using the module <code class="docutils literal notranslate"><span class="pre">sklearn</span></code>, we will use the following function</p>
Expand Down Expand Up @@ -1299,7 +1309,7 @@ <h2>3. Bootstrapping<a class="headerlink" href="#id1" title="Permalink to this h
<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>mean of the velocity estimates -6.4381444.2 and the standard deviation 0.0061004.2
</pre></div>
</div>
<img alt="../_images/2.6_resampling_36_1.png" src="../_images/2.6_resampling_36_1.png" />
<img alt="../_images/2.6_resampling_39_1.png" src="../_images/2.6_resampling_39_1.png" />
</div>
</div>
</section>
Expand Down Expand Up @@ -1347,7 +1357,7 @@ <h2>4. Cross validation<a class="headerlink" href="#cross-validation" title="Per
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;matplotlib.legend.Legend at 0x1c9e1fe80&gt;
</pre></div>
</div>
<img alt="../_images/2.6_resampling_38_1.png" src="../_images/2.6_resampling_38_1.png" />
<img alt="../_images/2.6_resampling_41_1.png" src="../_images/2.6_resampling_41_1.png" />
</div>
</div>
<p>Now fit the data and evaluate the error</p>
Expand Down Expand Up @@ -1391,7 +1401,7 @@ <h2>4. Cross validation<a class="headerlink" href="#cross-validation" title="Per
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Text(0.5, 1.0, &#39;Random selection for data split&#39;)
</pre></div>
</div>
<img alt="../_images/2.6_resampling_40_2.png" src="../_images/2.6_resampling_40_2.png" />
<img alt="../_images/2.6_resampling_43_2.png" src="../_images/2.6_resampling_43_2.png" />
</div>
</div>
<p>We can also select the training and validation to be chronological. If the “state” of the data changes through time, this may induce a bias in the training. But let’s see.</p>
Expand Down Expand Up @@ -1435,7 +1445,7 @@ <h2>4. Cross validation<a class="headerlink" href="#cross-validation" title="Per
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Text(0.5, 1.0, &#39;Chronological selection for data split&#39;)
</pre></div>
</div>
<img alt="../_images/2.6_resampling_42_2.png" src="../_images/2.6_resampling_42_2.png" />
<img alt="../_images/2.6_resampling_45_2.png" src="../_images/2.6_resampling_45_2.png" />
</div>
</div>
<p>Now you see that the choice of <em>training</em> vs <em>validating</em> data is important to fit a model that will generalize.</p>
Expand Down
Binary file removed _images/2.6_resampling_20_1.png
Binary file not shown.
Binary file added _images/2.6_resampling_21_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed _images/2.6_resampling_23_0.png
Binary file not shown.
Binary file added _images/2.6_resampling_24_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
53 changes: 40 additions & 13 deletions _sources/Chapter2-DataManipulation/2.6_resampling.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion searchindex.js

Large diffs are not rendered by default.

0 comments on commit 2556737

Please sign in to comment.