Skip to content

Commit

Permalink
Add more about info on frontend
Browse files Browse the repository at this point in the history
Co-authored-by: Emil Karlsson <[email protected]>
  • Loading branch information
pierrelefevre and saffronjam committed Jan 8, 2024
1 parent e9fbfb5 commit a6fa919
Show file tree
Hide file tree
Showing 7 changed files with 170 additions and 25 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,18 +98,18 @@ We aimed for transparency in the model, which is why we included a page presenti

## Results
### Model evaluation pipeline
To ensure we monitor for model drift and inference skew, we designed a model evaluation pipeline.
The issue is - what data should we test with, since all of the sold listings have been used in some effect (train/test/validate)?
Since the idea is to run this pipeline once a day, we can simply use the listings that were sold in the last 24 hours, which the model has not been trained on yet.
To ensure we monitor for model drift and inference skew, we designed a batch inference pipeline that runs every hour. This pipeline takes all the sold listings since the model was trained that has not already been in a batch inference, and runs them through the model. The results are then stored in the database. Running it every hours means that, in practice, the batches are quite small (apart from the first one if the pipelilne is "shut down" for a while).
By ensuring that we only select the listings with injection time after the model was trained, we can ensure that we are not testing the model on data it has already seen.

The code for the batch inference pipeline is in the `inference` folder. The code is written in Python and can be run on any computer with systemd and Python 3 installed.

### Comparison against Booli
![image](https://github.com/pierrelefevre/bostadspriser/assets/35996839/ff3872ce-c5ea-445a-8111-c04341d2099d)

As a good sanity check, we wanted to compare to an established source of property price predictions in Sweden. Booli has a [free tool](https://www.booli.se/vardera) for predicting prices with most parameters overlapping ours.
We designed a test set of properties to be quite broad, yet we could not test summer houses, plots nor farms as these are not supported by Booli.

Most results were within 20% of the estimated price by Booli, however it is clear that the prices in Stockholm, Göteborg and Malmö are much more accurate than those outsite these larger cities.
Most results were within 20% of the estimated price by Booli, however it is clear that the prices in Stockholm, Göteborg and Malmö are much more accurate than those outsite these larger cities. This is likely due to the fact that Booli (probably) has more data for these areas, which means that if we are close to Booli's predictions wherever we have enough data to train a model, we are likely to have built a robust model.

## Conclusion
### Final words
Expand Down
Binary file removed README.pdf
Binary file not shown.
38 changes: 24 additions & 14 deletions api/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,16 @@ def choose_model(params):
new_params = params.copy()

# 1. cpi is not in the "combine-cpi" models
### However, after 2024-01-07:14:00:00, cpi should be present
if "combine-cpi" in model["name"] and "cpi" in params.keys() and model["metadata"]["trainedAt"] < "2024-01-07:14:00:00":
# However, after 2024-01-07:14:00:00, cpi should be present
if (
"combine-cpi" in model["name"]
and "cpi" in params.keys()
and model["metadata"]["trainedAt"] < "2024-01-07:14:00:00"
):
del new_params["cpi"]

# 2. soldAt was parsed differently before
### If model is trained earlier than 2024-01-07:14:00:00, remove "yearsSinceSold", otherwise remove "soldYear" and "soldMonth"
# If model is trained earlier than 2024-01-07:14:00:00, remove "yearsSinceSold", otherwise remove "soldYear" and "soldMonth"
if model["metadata"]["trainedAt"] < "2024-01-07:14:00:00":
del new_params["yearsSinceSold"]
else:
Expand All @@ -75,7 +79,9 @@ def get_prediction_results():
# Group per day
predictions_per_day = {}
for prediction in predictions:
date = prediction["listingCreatedAt"].replace(hour=0, minute=0, second=0, microsecond=0)
date = prediction["listingCreatedAt"].replace(
hour=0, minute=0, second=0, microsecond=0
)
if date not in predictions_per_day:
predictions_per_day[date] = []

Expand Down Expand Up @@ -103,25 +109,29 @@ def get_prediction_results():
rmse_x.append(results["createdAt"].isoformat())
rmse_y.append(results["rmse"])

predictions_x = []
predictions_y = []
labels_x = []
labels_y = []
prediction_diff_x = []
prediction_diff_y = []
prediction_diff_precent_y = []

for prediction in predictions:
predictions_x.append(prediction["listingCreatedAt"].isoformat())
predictions_y.append(prediction["prediction"])
labels_x.append(prediction["listingCreatedAt"].isoformat())
labels_y.append(prediction["label"])
prediction_diff_x.append(prediction["listingCreatedAt"].isoformat())
prediction_diff_y.append(prediction["prediction"] - prediction["label"])
prediction_diff_precent_y.append(
(prediction["prediction"] - prediction["label"]) / prediction["label"]
)

print(len(predictions))

return {
"rmse": {"x": rmse_x, "y": rmse_y},
"predictions": {"x": predictions_x, "y": predictions_y},
"labels": {"x": labels_x, "y": labels_y},
"predictions": {
"x": prediction_diff_x,
"y": prediction_diff_y,
"yPercent": prediction_diff_precent_y,
},
}


def get_live_listing_prediction(url: str):
# Check if the listing is in the database, otherwise it is treated as a non-existent listing
return db.get_live_listing_by_url(url)
Expand Down
13 changes: 10 additions & 3 deletions frontend/src/api/api.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
const api_url = "https://bostadspriser-api.app.cloud.cbh.kth.se";
// const api_url = "http://localhost:8080";
export const api_url = "https://bostadspriser-api.app.cloud.cbh.kth.se";
// export const api_url = "http://localhost:8080";

export const getListings = async (page, pageSize) => {
if (page === undefined) {
Expand Down Expand Up @@ -53,4 +53,11 @@ export const predictWithHemnetURL = async (url) => {
}

return response.json();
}
};

export const getCronPredictions = async () => {
const response = await fetch(api_url + "/predictions");
const data = await response.json();

return data;
};
8 changes: 8 additions & 0 deletions frontend/src/components/Navbar.jsx
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import {
Alert,
AppBar,
Button,
IconButton,
Expand All @@ -11,6 +12,7 @@ import { useEffect, useState } from "react";
import Iconify from "./Iconify";
import { Link } from "react-router-dom";
import { useLocation } from "react-router-dom";
import { api_url } from "../api/api";

const Navbar = () => {
const [currentTab, setCurrentTab] = useState(0);
Expand Down Expand Up @@ -60,6 +62,12 @@ const Navbar = () => {
Bostadspriser
</Typography>

{api_url.includes("localhost") && (
<Alert severity="error" sx={{ mr: 1 }}>
Using localhost API!
</Alert>
)}

<Tabs value={currentTab}>
<Tab
label="Listings"
Expand Down
10 changes: 9 additions & 1 deletion frontend/src/contexts/ResourceContext.jsx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { useState, createContext, useEffect } from "react";
import { getListings, getModels } from "../api/api";
import { getCronPredictions, getListings, getModels } from "../api/api";

const initialState = {
listings: [],
Expand All @@ -12,6 +12,7 @@ export const ResourceContext = createContext({
export const ResourceContextProvider = ({ children }) => {
const [listings, setListings] = useState([]);
const [models, setModels] = useState([]);
const [cronPredictions, setCronPredictions] = useState([]);
const [page, setPage] = useState(0);
const n = 10;

Expand All @@ -28,6 +29,11 @@ export const ResourceContextProvider = ({ children }) => {
setModels(data);
};

const fetchCronPredictions = async () => {
let data = await getCronPredictions();
setCronPredictions(data);
};

const nextPage = () => {
setPage(page + 1);
fetchListings();
Expand All @@ -36,6 +42,7 @@ export const ResourceContextProvider = ({ children }) => {
useEffect(() => {
fetchListings();
fetchModels();
fetchCronPredictions();
// eslint-disable-next-line react-hooks/exhaustive-deps
}, []);

Expand All @@ -44,6 +51,7 @@ export const ResourceContextProvider = ({ children }) => {
value={{
listings,
models,
cronPredictions,
nextPage,
}}
>
Expand Down
118 changes: 115 additions & 3 deletions frontend/src/pages/About.jsx
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import {
Accordion,
AccordionDetails,
AccordionSummary,
Box,
Card,
CardContent,
CardHeader,
Expand All @@ -23,7 +24,7 @@ import { useEffect, useState } from "react";
import Iconify from "../components/Iconify";

const About = () => {
const { models } = useResource();
const { models, cronPredictions } = useResource();
const [dataset, setDataset] = useState([]);

useEffect(() => {
Expand Down Expand Up @@ -167,7 +168,7 @@ const About = () => {
p: 3,
}}
>
<CardHeader title="Model performance" />
<CardHeader title="Training results" />
<CardContent>
<LineChart
dataset={dataset}
Expand All @@ -177,12 +178,123 @@ const About = () => {
{ dataKey: "mse", label: "MSE" },
{ dataKey: "rmse", label: "RMSE" },
]}
width={600}
style={{ width: "100%" }}
height={300}
yAxis={[{ scaleType: "log" }]}
/>
</CardContent>
</Card>
{cronPredictions && (
<Card
sx={{
border: 2,
borderColor: "#018e51",
borderRadius: 2,
boxShadow: 5,
p: 3,
}}
>
<CardHeader title="Batch inference results" />
<CardContent>
<Stack spacing={2}>
{cronPredictions["predictions"] && (
<LineChart
xAxis={[
{
scaleType: "time",
data: cronPredictions["predictions"].x.map(
(x) => new Date(x)
),
},
]}
series={[
{
label: "predictions over time (change in %)",
data: cronPredictions["predictions"].yPercent,
},
]}
style={{ width: "100%" }}
height={300}
/>
)}

<Box sx={{ width: "100%", overflowX: "scroll" }}>
{cronPredictions["predictions"] && (
<LineChart
xAxis={[
{
scaleType: "time",
data: cronPredictions["predictions"].x.map(
(x) => new Date(x)
),
},
]}
series={[
{
label: "predictions over time ",
data: cronPredictions["predictions"].y,
},
]}
width={5000}
height={300}
/>
)}
</Box>

{cronPredictions["rmse"] && (
<LineChart
xAxis={[
{
scaleType: "time",
data: cronPredictions["rmse"].x.map((x) => new Date(x)),
},
]}
series={[{ label: "RMSE", data: cronPredictions["rmse"].y }]}
style={{ width: "100%" }}
height={300}
/>
)}
</Stack>
</CardContent>
</Card>
)}

<Card
sx={{
border: 2,
borderColor: "#018e51",
borderRadius: 2,
boxShadow: 5,
p: 3,
}}
>
<CardHeader title="Comparing to Booli" />
<CardContent>
<Stack spacing={2}>
<Typography variant="body1">
As a good sanity check, we wanted to compare to an established
source of property price predictions in Sweden. Booli has a{" "}
<a href="https://www.booli.se/vardera">free tool</a> for
predicting prices with most parameters overlapping ours. We
designed a test set of properties to be quite broad, yet we could
not test summer houses, plots nor farms as these are not supported
by Booli.
</Typography>
<Typography variant="body1">
Most results were within 20% of the estimated price by Booli,
however it is clear that the prices in Stockholm, Göteborg and
Malmö are much more accurate than those outsite these larger
cities.
</Typography>

<iframe
width="100%"
height="315px"
src="https://docs.google.com/spreadsheets/d/e/2PACX-1vSOHaDYHNmf6ToxAlEQie3N7AxtFhvhdsUhHoRFriU_Xpnd0flEGz9TYqDkh78r9AY_Qj-WtcCnWyJ9/pubhtml?gid=0&amp;single=true&amp;widget=true&amp;headers=false"
></iframe>
</Stack>
</CardContent>
</Card>
<Card
sx={{
border: 2,
Expand Down

0 comments on commit a6fa919

Please sign in to comment.