House Price Prediction

This project implements machine learning models to predict house prices using the Ames Housing dataset. The implementation includes a comprehensive data preprocessing pipeline, model training, evaluation, and hyperparameter tuning to achieve optimal prediction results.

Dataset

The dataset contains information about residential homes in Ames, Iowa, with 79 explanatory variables describing various aspects of the houses:

train.csv: Training data with 1460 observations and includes the target variable SalePrice
test.csv: Test data with 1459 observations used for making predictions
data_description.txt: Detailed description of all variables in the dataset

Project Structure

testmodel.ipynb: Main notebook with the complete modeling pipeline
housepriceprediction.ipynb: Additional exploratory notebook
percobaan.ipynb: Notebook for experimental approaches
submission.csv: Predictions file in the format required for submission
requirements.txt: Python dependencies required for the project
data/: Directory containing all dataset files

Methodology

Data Preprocessing

The preprocessing pipeline implemented in preprocess_house_data function includes:

Handling missing values with different strategies based on variable type and missing percentage
Feature transformation (logarithmic, Yeo-Johnson) for skewed numerical variables
Categorical encoding with ordinal mapping based on target relationship
Feature engineering including temporal variable transformations
Feature selection using Lasso regularization

Model Development

The project evaluates multiple regression models:

Linear models: Linear Regression, Ridge, Lasso, ElasticNet
Tree-based models: Random Forest, Gradient Boosting, XGBoost, LightGBM, CatBoost
Other models: SVR, KNN

Models are evaluated using:

Cross-validation with 5 folds
Metrics: RMSE, MSE, and R²
Visualization of comparative performance

Model Optimization

Hyperparameter tuning using GridSearchCV
Ensemble modeling with the best-performing models

Key Functions

preprocess_house_data: Comprehensive data preprocessing pipeline
evaluate_model: Model training and evaluation on train/test split
cross_val_evaluate: K-fold cross-validation evaluation
ensemble_predict: Ensemble prediction function

Results

The best models after optimization include XGBoost, LightGBM, and Gradient Boosting. The final solution uses an ensemble approach, averaging predictions from multiple tuned models to achieve robust results.

Running the Project

Install dependencies:

pip install -r requirements.txt

Run the Jupyter notebooks:

jupyter notebook testmodel.ipynb

or for EDA:

jupyter notebook housepriceprediction.ipynb

License

This project is open source and available for educational and research purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

House Price Prediction

Dataset

Project Structure

Methodology

Data Preprocessing

Model Development

Model Optimization

Key Functions

Results

Running the Project

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
.gitignore		.gitignore
README.md		README.md
housepriceprediction.ipynb		housepriceprediction.ipynb
requirements.txt		requirements.txt
submission.csv		submission.csv
testmodel.ipynb		testmodel.ipynb

Bagusdevaa/house-price-prediction

Folders and files

Latest commit

History

Repository files navigation

House Price Prediction

Dataset

Project Structure

Methodology

Data Preprocessing

Model Development

Model Optimization

Key Functions

Results

Running the Project

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages