Panda

import pandas as pd

input_file = (../somedir/somefile.csv)
data = pd.read_csv(input_file)
data.describe()

The results show 8 numbers for each column in your original dataset. * The first number, the count, shows how many rows have non-missing values. * The second value is the mean, which is the average. * Under that, std is the standard deviation, which measures how numerically spread out the values are. * min smallest value * 25% Percentiles are used in statistics to give you a number that describes the value that a given percent of the values are lower than. * 50% means 50% are lower than this number * 75% means 75% are lower than this numner * max is the largest number

To display all the coloumns in the dataset data.columns

dropna (drop not available) drops missing values data = data.dropna(axis=0)

Select the Prdictin target (we are going to predict Price). By convention, the prediction target is called y y = data.Price

Choose the Features. By convention, this data is called X.

feature = data_features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']
X = data[feature]

# Verify the fetures
X.describe()

# To display the top 5 rows
X.head()