Chapter 3 Data transformation
We have several different datasets from Wikipedia. We will divide them into four different sections and do the data transformation separately. Later, we will use these cleaned data to plot and answer our questions in the introduction.
3.1 iPhone Features Data
The dataset of Apple products features is very large, and it contains a lot of information in various aspects. We will mainly focus on certain features (including Release date, Display, Rear Camera) of iPhone for all available models. Firstly, we will merge several tables of different models, and then separate them into smaller tables for different features.
For release date table, we added another column to store the date information in the correct format. For the display data table, we will focus on the screen size and the resolution information. We added extra column for screen size in inch, resolution x and resolution y.
iPhone Release Date
## Model Released Discontinued ReleaseDate ReleaseYear DisconDate
## Length:33 Length:33 Length:33 Min. :2007-06-29 Length:33 Min. :2008-07-11
## Class :character Class :character Class :character 1st Qu.:2014-09-19 Class :character 1st Qu.:2016-09-07
## Mode :character Mode :character Mode :character Median :2017-09-22 Mode :character Median :2019-09-10
## Mean :2016-11-25 Mean :2018-08-25
## 3rd Qu.:2020-04-24 3rd Qu.:2021-09-14
## Max. :2021-09-24 Max. :2021-12-12
iPhone Display table
## Model PixelDensity(ppi) AspectRatio TypicalMaxbrightness( cd⁄m2) Contrastratio(typical)
## Length:23 Length:23 Length:23 Length:23 Length:23
## Class :character Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character Mode :character
##
##
##
## TrueToneDisplay ProMotionDisplay HDR10Content DolbyVision Taptic TypicalMaxbrightness
## Length:23 Length:23 Length:23 Length:23 Length:23 Length:23
## Class :character Class :character Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character
##
##
##
## ScreenSizeIn ResX ResY
## Min. :4.000 Min. :1136 Min. : 640
## 1st Qu.:5.420 1st Qu.:1792 1st Qu.: 828
## Median :5.850 Median :2340 Median :1080
## Mean :5.667 Mean :2125 Mean :1035
## 3rd Qu.:6.060 3rd Qu.:2532 3rd Qu.:1170
## Max. :6.680 Max. :2778 Max. :1284
iPhone Rear Camera table
## Model iPhone 6S iPhone 6S Plus iPhone SE(1st generation) iPhone 7 iPhone 7 Plus
## Length:35 Length:35 Length:35 Length:35 Length:35 Length:35
## Class :character Class :character Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character
## iPhone 8 iPhone 8 Plus iPhone X iPhone XS iPhone XS Max iPhone XR
## Length:35 Length:35 Length:35 Length:35 Length:35 Length:35
## Class :character Class :character Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character
## iPhone 11 Pro iPhone 11 Pro Max iPhone 12 Pro iPhone 12 Pro Max iPhone 11 iPhone SE(2nd generation)
## Length:35 Length:35 Length:35 Length:35 Length:35 Length:35
## Class :character Class :character Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character
## iPhone 12 Mini iPhone 12 iPhone 13 Mini iPhone 13 iPhone 13 Pro iPhone 13 Pro Max
## Length:35 Length:35 Length:35 Length:35 Length:35 Length:35
## Class :character Class :character Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character
iPhone supported months table
## Model Months supported to date
## Length:33 Length:33
## Class :character Class :character
## Mode :character Mode :character
3.2 Apple Finance Data
The finance table includes revenue, net income, total assets, and employees information. However, there are some missing data in the table, so we need to replace the word “N/A” with NA in our dataframe. We change the year column to be in the Date format; for the rest columns, we change the type from character to numeric. We also remove the space in the column names.
## Year Revenue NetIncome TotalAssets Employees
## Min. :2000 Min. : 5363 Min. : -25 Min. : 6021 Min. : 14800
## 1st Qu.:2005 1st Qu.: 13931 1st Qu.: 1328 1st Qu.: 11516 1st Qu.: 33725
## Median :2010 Median : 65225 Median :14013 Median : 75183 Median : 76550
## Mean :2010 Mean :111160 Mean :23818 Mean :142555 Mean : 77388
## 3rd Qu.:2015 3rd Qu.:215639 3rd Qu.:45687 3rd Qu.:290345 3rd Qu.:117750
## Max. :2020 Max. :274515 Max. :59531 Max. :375319 Max. :147000
## NA's :5
3.3 Customer Satisfaction Data
Initially, we want to use the data from this page, but the format of the data is picture instead of tables. Then we found this table with the same information, and we extracted the table from here and modified the column names. For the Satisfaction Index column, we change the type to numeric to better analyze it.
## Model Manufacturer Satisfaction
## Length:24 Length:24 Min. :75.00
## Class :character Class :character 1st Qu.:79.00
## Mode :character Mode :character Median :81.00
## Mean :80.83
## 3rd Qu.:82.00
## Max. :85.00