SCM 200 Project 1
Supply Chain Management
The Lending Club
Initial Observations of the Data
The Lending Club data has 42535 observations per row and about 100 columns that outline the features of each observation within the data. The data can be used to provide an assessment of the possibility of providing a loan to a member of the club given the features of the member. A model can be constructed from the data to provide predictive analytics.
The data possess a variety of variables that fall under the numeric and the categorical categories. The numeric variables include the loan_amnt, funded_amnt, funded_amnt_inv and the annual_inc that are observations on the loan amount and the annual income for the individuals. There are cases of missing data points within the data which can be attributed to entry. The missing values have been left blank rather than been filled up with alternative values. From the data points, it can be hard to provide an assessment of what the missing data points should be. There are no duplicate observations, and the features that encompass dates are well represented and formatted for ease of analysis. The units for each of the features are well defined, for example the interest rates are in percentage, whereas features that are rates have been well documented to facilitate analysis. The categories of the data used are well represented and provide a clear over view of what they represent.