Predicting Car Prices Using Machine Learning

Youssef DIR & Nazifou AFOLABI

Predicting Car Prices Using Machine Learning

This project is dedicated to predicting the prices of luxury and standard vehicles on Autoscout24 using Machine Learning models. The goal is to create an efficient and reliable predictive model that can estimate car prices with high accuracy, providing valuable insight for buyers, sellers, and enthusiasts in the automotive industry.

Scraping

-We extract detailed information about various vehicle models from AutoScout24 with the file VehicleScraper executed in main.py.

Brands Covered: - Audi, Mercedes-Benz (including AMG GT), Ferrari, Fiat, Porsche, Toyota, Ford, Volkswagen, Bentley, Renault, Land Rover.

Data Points Extracted:

Price: The cost of each vehicle.

Power: Engine power details, usually in kW or CH.

Evaluations: Number of evaluations or reviews for the vehicle.

Name: The name of the vehicle.

Scraping

Brands: The manufacturer or brand of the vehicle.

Version: Specific version or model of the vehicle.

Mileage: The total distance the vehicle has traveled.

Fuel Type: Type of fuel used (e.g., petrol, diesel, electric).

Transmission Type: The transmission system (e.g., manual, automatic).

PARTIE I: Data

Data before cleaning

## Warning: le package 'reticulate' a été compilé avec la version R 4.3.2
##     Modèle       Prix       Transmission                                            Version Kilométrage           Carburant        Puissance                  Évaluations                                            Vendeur                                  Nom de la Voiture     Date
## 0     audi   € 74 999  Boîte automatique       55 TFSI Quattro Tiptronic S-Line INDIVIDUAL!   34 000 km             Essence  250 kW (340 CH)                          122                    Kristof D'herde • BE-9300 Aalst  Audi Q855 TFSI Quattro Tiptronic S-Line INDIVI...  01/2021
## 1     audi   € 59 950  Boîte automatique  50 TDI quattro S-LINE/PANO/MATRIX/HUD/B&O/ACC/...   99 900 km              Diesel  210 kW (286 CH)                          106                         Hammad Khan • BE-2500 Lier  Audi Q850 TDI quattro S-LINE/PANO/MATRIX/HUD/B...  12/2019
## 2     audi   € 73 990  Boîte automatique  Audi Q8   50 TDI quattro 210(286) kW(ch) tiptr...   61 100 km              Diesel  210 kW (286 CH)  Évaluations non disponibles                     Tony Renna • BE-6041 Gosselies  Audi Q8Audi Q8   50 TDI quattro 210(286) kW(ch...  04/2021
## 3     audi  € 118 900  Boîte automatique       60 Hybr 49gr Sline BlackPack B&O Leather 23'       10 km  Electrique/Essence  340 kW (462 CH)                          109  Frederik Rik Maxime Jorn Hendrik • BE-8710 Wie...  Audi Q860 Hybr 49gr Sline BlackPack B&O Leathe...  01/2024
## 4     audi   € 64 990  Boîte automatique      50TDi QUATTRO 3X S LINE/PANO/360 CAM/TREKHAAK   62 000 km              Diesel  210 kW (286 CH)                          211                  Gauthier Terras • BE-8791 Waregem  Audi Q850TDi QUATTRO 3X S LINE/PANO/360 CAM/TR...  01/2019
## ..     ...        ...                ...                                                ...         ...                 ...              ...                          ...                                                ...                                                ...      ...
## 895   fiat   € 13 800     Boîte manuelle       NAVIGATIE*DIGITALE-AIRCO*CRUISE-CONTROLE*LED   46 546 km             Essence    70 kW (95 CH)                          179                  Philip Uyttendaele • BE-9340 Lede  Fiat TipoNAVIGATIE*DIGITALE-AIRCO*CRUISE-CONTR...  06/2018
## 896   fiat   € 11 200     Boîte manuelle                                        1.2i Lounge   21 285 km             Essence    51 kW (69 CH)                           40               Maurice Deconinck • BE-7711 Mouscron                                Fiat 5001.2i Lounge  06/2017
## 897   fiat   € 26 744  Boîte automatique                           1.5 HYBRID 130PK *CABRIO       15 km  Electrique/Essence   96 kW (131 CH)                           12               Gregory Mezières • BE-8800 Roeselare                  Fiat 500X1.5 HYBRID 130PK *CABRIO  06/2023
## 898   fiat   € 14 673     Boîte manuelle  1.0 hybrid 70pk *CRUISE CONTROL *APPLE/ANDROID...       15 km  Electrique/Essence    51 kW (69 CH)                           12               Gregory Mezières • BE-8800 Roeselare  Fiat 5001.0 hybrid 70pk *CRUISE CONTROL *APPLE...  06/2023
## 899   fiat   € 12 450     Boîte manuelle                                     Pop Star 1.4 T   91 440 km             Essence  100 kW (136 CH)                          143             Philippe Raeymaekers • BE-7033 Cuesmes                            Fiat 500XPop Star 1.4 T  01/2017
## 
## [900 rows x 11 columns]

Editing Data

# Manipulation des données
df = df.drop('Date', axis=1)
df['Évaluations'] = df['Évaluations'].replace('Évaluations non disponibles', 0).astype(float)
df['Puissance_CH'] = df['Puissance'].str.extract('(\d+\.?\d*) CH').astype(float)
df['Prix'] = df['Prix'].str.replace('€', '').str.replace(' ', '').str.replace(',', '.').astype(float)
df['Kilométrage'] = df['Kilométrage'].str.replace('km', '').str.replace(' ', '').str.replace(',', '.').str.replace('- ', '0').astype(float)
df['Carburant'] = df['Carburant'].replace(['- Carburant','CNG'], 'Autre')
df['Transmission'] = df['Transmission'].replace(['- Boîte', 'Boite non disponible'], 'Autre')

# Autres manipulations et création de dummies
new_df = df.drop(['Nom de la Voiture', 'Version', 'Vendeur'], axis=1)
df_1 = pd.get_dummies(new_df, columns=['Modèle', 'Carburant', 'Transmission'])
df_new1 = df_1.drop(['Carburant_Autres', 'Puissance'], axis=1)
df_encoded = df_new1.dropna(subset=['Puissance_CH'])
df_encoded = df_encoded.drop(columns=['Carburant_Autre', 'Transmission_Autre'])

Data after Cleaning

##        Prix  Kilométrage  Évaluations  Puissance_CH  Modèle_audi  Modèle_bentley  Modèle_ferrari  Modèle_fiat  Modèle_ford  Modèle_land-rover  Modèle_mercedes-benz  Modèle_porsche  Modèle_renault  Modèle_toyota  Modèle_volkswagen  Carburant_Diesel  Carburant_Electrique  Carburant_Electrique/Diesel  Carburant_Electrique/Essence  Carburant_Essence  Transmission_Boîte automatique  Transmission_Boîte manuelle
## 0   74999.0      34000.0        122.0         340.0         True           False           False        False        False              False                 False           False           False          False              False             False                 False                        False                         False               True                            True                        False
## 1   59950.0      99900.0        106.0         286.0         True           False           False        False        False              False                 False           False           False          False              False              True                 False                        False                         False              False                            True                        False
## 2   73990.0      61100.0          0.0         286.0         True           False           False        False        False              False                 False           False           False          False              False              True                 False                        False                         False              False                            True                        False
## 3  118900.0         10.0        109.0         462.0         True           False           False        False        False              False                 False           False           False          False              False             False                 False                        False                          True              False                            True                        False
## 4   64990.0      62000.0        211.0         286.0         True           False           False        False        False              False                 False           False           False          False              False              True                 False                        False                         False              False                            True                        False
## 5   64950.0      66500.0         87.0         286.0         True           False           False        False        False              False                 False           False           False          False              False              True                 False                        False                         False              False                            True                        False
## 6   91590.0         11.0         38.0         286.0         True           False           False        False        False              False                 False           False           False          False              False              True                 False                        False                         False              False                            True                        False
## 7   36755.0     222222.0          0.0         286.0         True           False           False        False        False              False                 False           False           False          False              False             False                 False                        False                         False              False                           False                        False
## 8   72450.0      52508.0         44.0         286.0         True           False           False        False        False              False                 False           False           False          False              False              True                 False                        False                         False              False                            True                        False
## 9   89999.0      45307.0         49.0         286.0         True           False           False        False        False              False                 False           False           False          False              False              True                 False                        False                         False              False                            True                        False
**Types of Variables:**

- Price, Power, Evaluation, Mileage: Float
  - Numeric values representing cost, engine power, user ratings, and distance traveled.

- Brands, Fuel Type, Transmission Type: Categorical
  - Discrete values categorizing brand names, fuel options, and transmission modes.

PARTIE II: Statistiques

Distribution de la Puissance et les Evaluations

Distribution of Carburant and Transmission Type

Distribution of Price and Mileage

BoxPlot

Price Carburant

Comparative Analysis of Vehicle Features by Model

Comparative Analysis by Specific Model

Maching Learning


Tableau des Modèles

Modèle RMSE R2
Linéaire 49113.33036 0.6846197
Forêt Aléatoire 301.18096 0.9999881
Boosting 67.38589 0.9999994
KNN 13.91941 1.0000000
MLP 15910.26324 0.9669029
Lasso 49034.38591 0.6856328
SVM 73757.81567 0.2887015

KNN The best Model

Graphique de la RMSE des Modèles

## Warning: le package 'ggplot2' a été compilé avec la version R 4.3.2

Graphique R2