Add more about info on frontend

Co-authored-by: Emil Karlsson <[email protected]>
pierrelefevre · Jan 8, 2024 · a6fa919 · a6fa919
1 parent e9fbfb5
commit a6fa919
Show file tree

Hide file tree

Showing 7 changed files with 170 additions and 25 deletions.
diff --git a/README.md b/README.md
@@ -98,18 +98,18 @@ We aimed for transparency in the model, which is why we included a page presenti
 
 ## Results
 ### Model evaluation pipeline
-To ensure we monitor for model drift and inference skew, we designed a model evaluation pipeline.
-The issue is - what data should we test with, since all of the sold listings have been used in some effect (train/test/validate)? 
-Since the idea is to run this pipeline once a day, we can simply use the listings that were sold in the last 24 hours, which the model has not been trained on yet.
+To ensure we monitor for model drift and inference skew, we designed a batch inference pipeline that runs every hour. This pipeline takes all the sold listings since the model was trained that has not already been in a batch inference, and runs them through the model. The results are then stored in the database. Running it every hours means that, in practice, the batches are quite small (apart from the first one if the pipelilne is "shut down" for a while).
+By ensuring that we only select the listings with injection time after the model was trained, we can ensure that we are not testing the model on data it has already seen.
 
+The code for the batch inference pipeline is in the `inference` folder. The code is written in Python and can be run on any computer with systemd and Python 3 installed.
 
 ### Comparison against Booli
 ![image](https://github.com/pierrelefevre/bostadspriser/assets/35996839/ff3872ce-c5ea-445a-8111-c04341d2099d)
 
 As a good sanity check, we wanted to compare to an established source of property price predictions in Sweden. Booli has a [free tool](https://www.booli.se/vardera) for predicting prices with most parameters overlapping ours. 
 We designed a test set of properties to be quite broad, yet we could not test summer houses, plots nor farms as these are not supported by Booli. 
 
-Most results were within 20% of the estimated price by Booli, however it is clear that the prices in Stockholm, Göteborg and Malmö are much more accurate than those outsite these larger cities.
+Most results were within 20% of the estimated price by Booli, however it is clear that the prices in Stockholm, Göteborg and Malmö are much more accurate than those outsite these larger cities. This is likely due to the fact that Booli (probably) has more data for these areas, which means that if we are close to Booli's predictions wherever we have enough data to train a model, we are likely to have built a robust model.
 
 ## Conclusion
 ### Final words

diff --git a/README.pdf b/README.pdf
diff --git a/api/helpers.py b/api/helpers.py
@@ -53,12 +53,16 @@ def choose_model(params):
     new_params = params.copy()
 
     # 1. cpi is not in the "combine-cpi" models
-    ### However, after 2024-01-07:14:00:00, cpi should be present
-    if "combine-cpi" in model["name"] and "cpi" in params.keys() and model["metadata"]["trainedAt"] < "2024-01-07:14:00:00":
+    # However, after 2024-01-07:14:00:00, cpi should be present
+    if (
+        "combine-cpi" in model["name"]
+        and "cpi" in params.keys()
+        and model["metadata"]["trainedAt"] < "2024-01-07:14:00:00"
+    ):
         del new_params["cpi"]
 
     # 2. soldAt was parsed differently before
-    ### If model is trained earlier than 2024-01-07:14:00:00, remove "yearsSinceSold", otherwise remove "soldYear" and "soldMonth"
+    # If model is trained earlier than 2024-01-07:14:00:00, remove "yearsSinceSold", otherwise remove "soldYear" and "soldMonth"
     if model["metadata"]["trainedAt"] < "2024-01-07:14:00:00":
         del new_params["yearsSinceSold"]
     else:
@@ -75,7 +79,9 @@ def get_prediction_results():
     # Group per day
     predictions_per_day = {}
     for prediction in predictions:
-        date = prediction["listingCreatedAt"].replace(hour=0, minute=0, second=0, microsecond=0)
+        date = prediction["listingCreatedAt"].replace(
+            hour=0, minute=0, second=0, microsecond=0
+        )
         if date not in predictions_per_day:
             predictions_per_day[date] = []
 
@@ -103,25 +109,29 @@ def get_prediction_results():
         rmse_x.append(results["createdAt"].isoformat())
         rmse_y.append(results["rmse"])
 
-    predictions_x = []
-    predictions_y = []
-    labels_x = []
-    labels_y = []
+    prediction_diff_x = []
+    prediction_diff_y = []
+    prediction_diff_precent_y = []
 
     for prediction in predictions:
-        predictions_x.append(prediction["listingCreatedAt"].isoformat())
-        predictions_y.append(prediction["prediction"])
-        labels_x.append(prediction["listingCreatedAt"].isoformat())
-        labels_y.append(prediction["label"])
+        prediction_diff_x.append(prediction["listingCreatedAt"].isoformat())
+        prediction_diff_y.append(prediction["prediction"] - prediction["label"])
+        prediction_diff_precent_y.append(
+            (prediction["prediction"] - prediction["label"]) / prediction["label"]
+        )
 
     print(len(predictions))
 
     return {
         "rmse": {"x": rmse_x, "y": rmse_y},
-        "predictions": {"x": predictions_x, "y": predictions_y},
-        "labels": {"x": labels_x, "y": labels_y},
+        "predictions": {
+            "x": prediction_diff_x,
+            "y": prediction_diff_y,
+            "yPercent": prediction_diff_precent_y,
+        },
     }
 
+
 def get_live_listing_prediction(url: str):
     # Check if the listing is in the database, otherwise it is treated as a non-existent listing
     return db.get_live_listing_by_url(url)

diff --git a/frontend/src/api/api.js b/frontend/src/api/api.js
@@ -1,5 +1,5 @@
-const api_url = "https://bostadspriser-api.app.cloud.cbh.kth.se";
-// const api_url = "http://localhost:8080";
+export const api_url = "https://bostadspriser-api.app.cloud.cbh.kth.se";
+// export const api_url = "http://localhost:8080";
 
 export const getListings = async (page, pageSize) => {
   if (page === undefined) {
@@ -53,4 +53,11 @@ export const predictWithHemnetURL = async (url) => {
   }
 
   return response.json();
-}
+};
+
+export const getCronPredictions = async () => {
+  const response = await fetch(api_url + "/predictions");
+  const data = await response.json();
+
+  return data;
+};
diff --git a/frontend/src/components/Navbar.jsx b/frontend/src/components/Navbar.jsx
@@ -1,4 +1,5 @@
 import {
+  Alert,
   AppBar,
   Button,
   IconButton,
@@ -11,6 +12,7 @@ import { useEffect, useState } from "react";
 import Iconify from "./Iconify";
 import { Link } from "react-router-dom";
 import { useLocation } from "react-router-dom";
+import { api_url } from "../api/api";
 
 const Navbar = () => {
   const [currentTab, setCurrentTab] = useState(0);
@@ -60,6 +62,12 @@ const Navbar = () => {
           Bostadspriser
         </Typography>
 
+        {api_url.includes("localhost") && (
+          <Alert severity="error" sx={{ mr: 1 }}>
+            Using localhost API!
+          </Alert>
+        )}
+
         <Tabs value={currentTab}>
           <Tab
             label="Listings"

diff --git a/frontend/src/contexts/ResourceContext.jsx b/frontend/src/contexts/ResourceContext.jsx
@@ -1,5 +1,5 @@
 import { useState, createContext, useEffect } from "react";
-import { getListings, getModels } from "../api/api";
+import { getCronPredictions, getListings, getModels } from "../api/api";
 
 const initialState = {
   listings: [],
@@ -12,6 +12,7 @@ export const ResourceContext = createContext({
 export const ResourceContextProvider = ({ children }) => {
   const [listings, setListings] = useState([]);
   const [models, setModels] = useState([]);
+  const [cronPredictions, setCronPredictions] = useState([]);
   const [page, setPage] = useState(0);
   const n = 10;
 
@@ -28,6 +29,11 @@ export const ResourceContextProvider = ({ children }) => {
     setModels(data);
   };
 
+  const fetchCronPredictions = async () => {
+    let data = await getCronPredictions();
+    setCronPredictions(data);
+  };
+
   const nextPage = () => {
     setPage(page + 1);
     fetchListings();
@@ -36,6 +42,7 @@ export const ResourceContextProvider = ({ children }) => {
   useEffect(() => {
     fetchListings();
     fetchModels();
+    fetchCronPredictions();
     // eslint-disable-next-line react-hooks/exhaustive-deps
   }, []);
 
@@ -44,6 +51,7 @@ export const ResourceContextProvider = ({ children }) => {
       value={{
         listings,
         models,
+        cronPredictions,
         nextPage,
       }}
     >

diff --git a/frontend/src/pages/About.jsx b/frontend/src/pages/About.jsx
@@ -2,6 +2,7 @@ import {
   Accordion,
   AccordionDetails,
   AccordionSummary,
+  Box,
   Card,
   CardContent,
   CardHeader,
@@ -23,7 +24,7 @@ import { useEffect, useState } from "react";
 import Iconify from "../components/Iconify";
 
 const About = () => {
-  const { models } = useResource();
+  const { models, cronPredictions } = useResource();
   const [dataset, setDataset] = useState([]);
 
   useEffect(() => {
@@ -167,7 +168,7 @@ const About = () => {
           p: 3,
         }}
       >
-        <CardHeader title="Model performance" />
+        <CardHeader title="Training results" />
         <CardContent>
           <LineChart
             dataset={dataset}
@@ -177,12 +178,123 @@ const About = () => {
               { dataKey: "mse", label: "MSE" },
               { dataKey: "rmse", label: "RMSE" },
             ]}
-            width={600}
+            style={{ width: "100%" }}
             height={300}
             yAxis={[{ scaleType: "log" }]}
           />
         </CardContent>
       </Card>
+      {cronPredictions && (
+        <Card
+          sx={{
+            border: 2,
+            borderColor: "#018e51",
+            borderRadius: 2,
+            boxShadow: 5,
+            p: 3,
+          }}
+        >
+          <CardHeader title="Batch inference results" />
+          <CardContent>
+            <Stack spacing={2}>
+              {cronPredictions["predictions"] && (
+                <LineChart
+                  xAxis={[
+                    {
+                      scaleType: "time",
+                      data: cronPredictions["predictions"].x.map(
+                        (x) => new Date(x)
+                      ),
+                    },
+                  ]}
+                  series={[
+                    {
+                      label: "predictions over time (change in %)",
+                      data: cronPredictions["predictions"].yPercent,
+                    },
+                  ]}
+                  style={{ width: "100%" }}
+                  height={300}
+                />
+              )}
+
+              <Box sx={{ width: "100%", overflowX: "scroll" }}>
+                {cronPredictions["predictions"] && (
+                  <LineChart
+                    xAxis={[
+                      {
+                        scaleType: "time",
+                        data: cronPredictions["predictions"].x.map(
+                          (x) => new Date(x)
+                        ),
+                      },
+                    ]}
+                    series={[
+                      {
+                        label: "predictions over time ",
+                        data: cronPredictions["predictions"].y,
+                      },
+                    ]}
+                    width={5000}
+                    height={300}
+                  />
+                )}
+              </Box>
+
+              {cronPredictions["rmse"] && (
+                <LineChart
+                  xAxis={[
+                    {
+                      scaleType: "time",
+                      data: cronPredictions["rmse"].x.map((x) => new Date(x)),
+                    },
+                  ]}
+                  series={[{ label: "RMSE", data: cronPredictions["rmse"].y }]}
+                  style={{ width: "100%" }}
+                  height={300}
+                />
+              )}
+            </Stack>
+          </CardContent>
+        </Card>
+      )}
+
+      <Card
+        sx={{
+          border: 2,
+          borderColor: "#018e51",
+          borderRadius: 2,
+          boxShadow: 5,
+          p: 3,
+        }}
+      >
+        <CardHeader title="Comparing to Booli" />
+        <CardContent>
+          <Stack spacing={2}>
+            <Typography variant="body1">
+              As a good sanity check, we wanted to compare to an established
+              source of property price predictions in Sweden. Booli has a{" "}
+              <a href="https://www.booli.se/vardera">free tool</a> for
+              predicting prices with most parameters overlapping ours. We
+              designed a test set of properties to be quite broad, yet we could
+              not test summer houses, plots nor farms as these are not supported
+              by Booli.
+            </Typography>
+            <Typography variant="body1">
+              Most results were within 20% of the estimated price by Booli,
+              however it is clear that the prices in Stockholm, Göteborg and
+              Malmö are much more accurate than those outsite these larger
+              cities.
+            </Typography>
+
+            <iframe
+              width="100%"
+              height="315px"
+              src="https://docs.google.com/spreadsheets/d/e/2PACX-1vSOHaDYHNmf6ToxAlEQie3N7AxtFhvhdsUhHoRFriU_Xpnd0flEGz9TYqDkh78r9AY_Qj-WtcCnWyJ9/pubhtml?gid=0&amp;single=true&amp;widget=true&amp;headers=false"
+            ></iframe>
+          </Stack>
+        </CardContent>
+      </Card>
       <Card
         sx={{
           border: 2,