Skip to content

Commit

Permalink
Language correction in data sources
Browse files Browse the repository at this point in the history
  • Loading branch information
khliland committed Oct 14, 2024
1 parent 42a84b8 commit 3cc4a58
Show file tree
Hide file tree
Showing 144 changed files with 5,850 additions and 30,743 deletions.
1,537 changes: 753 additions & 784 deletions D2Dbook/3_Data_sources/3_APIs/downloads/forecast.json

Large diffs are not rendered by default.

41 changes: 19 additions & 22 deletions D2Dbook/3_Data_sources/3_APIs/downloads/weather.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,42 +5,39 @@
},
"weather": [
{
"id": 500,
"main": "Rain",
"description": "light rain",
"icon": "10n"
"id": 804,
"main": "Clouds",
"description": "overcast clouds",
"icon": "04d"
}
],
"base": "stations",
"main": {
"temp": 282.28,
"feels_like": 280.44,
"temp_min": 281.18,
"temp_max": 282.94,
"pressure": 1006,
"humidity": 96,
"sea_level": 1006,
"grnd_level": 989
"temp": 276.3,
"feels_like": 275.22,
"temp_min": 275.5,
"temp_max": 277.64,
"pressure": 1011,
"humidity": 100,
"sea_level": 1011,
"grnd_level": 994
},
"visibility": 10000,
"wind": {
"speed": 3.3,
"deg": 103,
"gust": 9.56
},
"rain": {
"1h": 0.87
"speed": 1.34,
"deg": 357,
"gust": 2.51
},
"clouds": {
"all": 100
"all": 95
},
"dt": 1728322014,
"dt": 1728888498,
"sys": {
"type": 2,
"id": 2006772,
"country": "NO",
"sunrise": 1728279396,
"sunset": 1728318716
"sunrise": 1728885203,
"sunset": 1728922287
},
"timezone": 7200,
"id": 3139081,
Expand Down
990 changes: 930 additions & 60 deletions D2Dbook/4_Data_quality/1_Signal/3_Decomposition.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@
},
"source": [
"## Outliers in models\n",
"- An outlier in the input data, __X__, will influence some model more than a central point.\n",
"- An outlier in the input data, __X__, will influence some models more than a central point would do.\n",
" - In regression (Ordinary Least Squares), these are called high leverage points. \n",
" OLS for $\\tilde{X} = [1 ~ X]$: \n",
" $\\tilde{X}\\beta = \\tilde{X} (\\tilde{X}'\\tilde{X})^{-1} \\tilde{X}' Y$ \n",
Expand Down
9 changes: 5 additions & 4 deletions D2Dbook/4_Data_quality/2_Outliers/2_Outlier_statistics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
},
"source": [
"# Outlier statistics\n",
"- Determining if a sample or timepoint is a statistical outlier often a two-step process:\n",
"- Determining if a sample or timepoint is a statistical outlier is often a two-step process:\n",
" 1. Estimate a distribution assumed to be normal operating conditions.\n",
" 2. Check if new samples are significant outliers from this distribution.\n",
"- (Multivariate) [Statistical Process Control](https://en.wikipedia.org/wiki/Statistical_process_control), part of [$6\\sigma$](https://en.wikipedia.org/wiki/Six_Sigma) process improvement, has a wide range of methods for this."
Expand Down Expand Up @@ -75,7 +75,7 @@
"# Format the result as a percentage rounded to two decimal places.\n",
"import scipy.stats as stats\n",
"prob = stats.norm.cdf(-3)*2\n",
"print('Probability of being outside outside +/- 3 SD in a normal distribution: {:.2%}'.format(prob))\n",
"print('Probability of being outside +/- 3 SD in a normal distribution: {:.2%}'.format(prob))\n",
"# Rewrite this as the proportion of values, i.e., 1 in n, that are outside +/- 3 SD in a normal distribution.\n",
"print('Proportion of values that are outside +/- 3 SD in a normal distribution: 1 in {:.0f}'.format(1/prob))"
]
Expand Down Expand Up @@ -285,9 +285,10 @@
"- As seen above, to be flagged in the base case will happen by accident 1 in 370 cases.\n",
"- The probability quickly shrinks if additional requirements are added, e.g., 3 cases in a row outside +/- 3 std.\n",
" - With multiple consecutive outliers, one can also use fewer standard deviations.\n",
"- If data are assumed to be iid (independent and identically distributed), another heuristic can be to check concecutive samples are too similar (and possibly non-centred).\n",
"- If data are assumed to be iid (independent and identically distributed), another heuristic can be to check if concecutive samples are too similar (and possibly non-centred).\n",
" - This can indicate a bias in the series, e.g., caused by a manufacturing step caught in an error condition.\n",
"- A shift in mean value can also be indicative of errors or unwanted changes; checkable using rolling means or similar with an appropriate window size."
"- A shift in mean value can also be indicative of errors or unwanted changes; checkable using rolling means or similar with an appropriate window size.\n",
" - A whole field of SPC uses Exponentially Weighted Moving Averages to detect outliers and shifts in distributions."
]
},
{
Expand Down
2,084 changes: 16 additions & 2,068 deletions D2Dbook/4_Data_quality/3_Preprocessing/2_Imputation.ipynb

Large diffs are not rendered by default.

26 changes: 4 additions & 22 deletions D2Dbook/4_Data_quality/3_Preprocessing/3_Time_formats.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2024-08-17T11:09:26.933980Z",
Expand All @@ -34,16 +34,7 @@
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Date and time is: 2024-08-17 13:09:26.938204\n",
"Timestamp is: 1723892966.938204\n"
]
}
],
"outputs": [],
"source": [
"from datetime import datetime\n",
"\n",
Expand Down Expand Up @@ -107,7 +98,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2024-08-17T11:09:26.974670Z",
Expand All @@ -119,16 +110,7 @@
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"An arbitrary format: 17/08-2024 13:09:26\n",
"Reformatted to the standard: 2024-08-17 13:09:26\n"
]
}
],
"outputs": [],
"source": [
"now = datetime.now()\n",
"formatted = now.strftime(\"%d/%m-%Y %H:%M:%S\")\n",
Expand Down
7,090 changes: 14 additions & 7,076 deletions D2Dbook/6_Deployment2/3_Dashboard_design/2_Colours_symbols.ipynb

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified D2Dbook/_build/.doctrees/environment.pickle
Binary file not shown.

Large diffs are not rendered by default.

36 changes: 18 additions & 18 deletions D2Dbook/_build/html/3_Data_sources/2_Databases/3_Cassandra.html
Original file line number Diff line number Diff line change
Expand Up @@ -578,7 +578,7 @@ <h3>Keyspace<a class="headerlink" href="#keyspace" title="Link to this heading">
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x107828a40&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x105d6b0b0&gt;
</pre></div>
</div>
</div>
Expand All @@ -599,7 +599,7 @@ <h3>Create a table<a class="headerlink" href="#create-a-table" title="Link to th
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x1064aa7e0&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x1062e9b20&gt;
</pre></div>
</div>
</div>
Expand All @@ -617,7 +617,7 @@ <h3>Inserting and reading data<a class="headerlink" href="#inserting-and-reading
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x1064a89e0&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x107e54ec0&gt;
</pre></div>
</div>
</div>
Expand Down Expand Up @@ -662,7 +662,7 @@ <h3>Case sensitivity<a class="headerlink" href="#case-sensitivity" title="Link t
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x1100211f0&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x1078e21b0&gt;
</pre></div>
</div>
</div>
Expand All @@ -674,7 +674,7 @@ <h3>Case sensitivity<a class="headerlink" href="#case-sensitivity" title="Link t
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x106128410&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x1061f3c50&gt;
</pre></div>
</div>
</div>
Expand Down Expand Up @@ -707,7 +707,7 @@ <h3>Case sensitivity<a class="headerlink" href="#case-sensitivity" title="Link t
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x10668c620&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x1062cf170&gt;
</pre></div>
</div>
</div>
Expand Down Expand Up @@ -803,7 +803,7 @@ <h2>Cassandra filtering<a class="headerlink" href="#cassandra-filtering" title="
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x110767b00&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x107e69370&gt;
</pre></div>
</div>
</div>
Expand All @@ -819,7 +819,7 @@ <h2>Cassandra filtering<a class="headerlink" href="#cassandra-filtering" title="
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x110785370&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x107e48e00&gt;
</pre></div>
</div>
</div>
Expand Down Expand Up @@ -876,7 +876,7 @@ <h3>Unique IDs<a class="headerlink" href="#unique-ids" title="Link to this headi
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x110778a40&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x1071d5610&gt;
</pre></div>
</div>
</div>
Expand All @@ -890,7 +890,7 @@ <h3>Unique IDs<a class="headerlink" href="#unique-ids" title="Link to this headi
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x110758230&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x107e754c0&gt;
</pre></div>
</div>
</div>
Expand All @@ -909,12 +909,12 @@ <h3>Unique IDs<a class="headerlink" href="#unique-ids" title="Link to this headi
</div>
</div>
<div class="cell_output docutils container">
<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Row(id=UUID(&#39;067b7ed0-84d2-11ef-b0c1-63fd87a2383f&#39;), company=&#39;Tesla&#39;, model=&#39;Model S&#39;, price=21000.0)
Datetime: 2024-10-07 17:31:45.853000
Row(id=UUID(&#39;067bccf0-84d2-11ef-b0c1-63fd87a2383f&#39;), company=&#39;Oldsmobile&#39;, model=&#39;Model 6C&#39;, price=135000.0)
Datetime: 2024-10-07 17:31:45.855000
Row(id=UUID(&#39;067b09a0-84d2-11ef-b0c1-63fd87a2383f&#39;), company=&#39;Tesla&#39;, model=&#39;Model S&#39;, price=20000.0)
Datetime: 2024-10-07 17:31:45.850000
<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Row(id=UUID(&#39;2ee6b120-89f8-11ef-abe3-8360e2880d90&#39;), company=&#39;Tesla&#39;, model=&#39;Model S&#39;, price=21000.0)
Datetime: 2024-10-14 06:47:30.354000
Row(id=UUID(&#39;2ee72650-89f8-11ef-abe3-8360e2880d90&#39;), company=&#39;Oldsmobile&#39;, model=&#39;Model 6C&#39;, price=135000.0)
Datetime: 2024-10-14 06:47:30.357000
Row(id=UUID(&#39;2ee63bf0-89f8-11ef-abe3-8360e2880d90&#39;), company=&#39;Tesla&#39;, model=&#39;Model S&#39;, price=20000.0)
Datetime: 2024-10-14 06:47:30.351000
</pre></div>
</div>
</div>
Expand Down Expand Up @@ -969,7 +969,7 @@ <h2>Raw JSON<a class="headerlink" href="#raw-json" title="Link to this heading">
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x1107849e0&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x107e49370&gt;
</pre></div>
</div>
</div>
Expand All @@ -983,7 +983,7 @@ <h3>Insert the forecast data into the table as text blob<a class="headerlink" hr
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x110766720&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;cassandra.cluster.ResultSet at 0x1078e0b00&gt;
</pre></div>
</div>
</div>
Expand Down
Loading

0 comments on commit 3cc4a58

Please sign in to comment.