Missing Values Handling — missingValuesHandling • CornerstoneR

Function for the automatic handling of missing values.

missingValuesHandling(
  dataset = cs.in.dataset(),
  preds = cs.in.predictors(),
  resps = cs.in.responses(),
  groups = cs.in.groupvars(),
  auxs = cs.in.auxiliaries(),
  scriptvars = cs.in.scriptvars(),
  return.results = FALSE
)

Arguments

dataset: [data.frame]
Dataset with named columns. The names correspond to predictors and responses.
preds: [character]
Character vector of predictor variables.
resps: [character]
Character vector of response variables.
groups: [character]
Character vector of group variables.
auxs: [character]
Character vector of auxiliary variables.
scriptvars: [list]
Named list of script variables set via the Cornerstone "Script Variables" menu. For details see below.
return.results: [logical(1)]
If FALSE the function returns TRUE invisibly. If TRUE, it returns a list of results. Default is FALSE.

Value

Logical [TRUE] invisibly and outputs to Cornerstone or, if return.results = TRUE, list of resulting data.table objects:

rowInds: Data table indicating which columns contain missing values in which rows.
outDataset: Output data table with changes in missing entries.

Details

The following script variables are summarized in scriptvars list:

math.fun: [character(1)]
Function selection for missing value handling in data. It is possible to choose a predefined method out of Omit Missing Values (omit), Last Observation Carried Forward (locf), Next Observation Carried Backward (nocb), Mean Values (mean), Median Values (median), Minimum Values (min), Maximum Values (max), Linear Interpolation (linpol), Cubic Interpolation (cubicpol), or compose a method manually by selecting User Defined. If one or several group by variables were passed, the method will be applied by common group. Note that brushing is not possible if Omit Missing Values (omit) was selected since the output dataset will have less rows than the original one. If you select interpolation, you can choose an underlying time scale via auxiliaries.
Default is Omit Missing Values (omit).
input.values: [character(1)]
If User Defined is selected, one or multiple input values or formulas must be specified. This can be: a single value to replace all the NAs with e.g. "0", a value for one or more specific columns containing NAs e.g. "MPG = 0, Horsepower = 1" a formula from the pre-defined ones e.g. MPG = omit, Horsepower = min" (the identifiers here are "omit", "locf", "nocb", "mean", "median", "min", "max"), a mathematical formula which can be evaluated e.g. log(4), 3+5 etc.
na.representation: [character(1)]
The NA representation(s) of your data apart from NA (represented as black point in Cornerstone). Separate string by comma for multiple NA representations, e.g. "N/A, MISSING, .".
min.complete: [numeric(1)]
A value between 0 and 1 indicating the minimal complete cases proportion to keep variables, e.g. 0.2 keeps only columns where at least 20 0 would keep all columns (default). 1 (100 To remove all columns which contain solely missing values, choose a number near 0 or, more accurately, 1 divided by the number of the data rows.

Examples

data(carstats)
summary(carstats)
#>     Model               Origin         MPG        Cylinders  Displacement  
#>  Length:406         England:  1   Min.   : 9.00   3:  4     Min.   : 68.0  
#>  Class :character   France : 14   1st Qu.:17.50   4:207     1st Qu.:105.0  
#>  Mode  :character   Germany: 39   Median :23.00   5:  3     Median :151.0  
#>                     Italy  :  8   Mean   :23.51   6: 84     Mean   :194.8  
#>                     Japan  : 79   3rd Qu.:29.00   8:108     3rd Qu.:302.0  
#>                     Sweden : 11   Max.   :46.60             Max.   :455.0  
#>                     USA    :254   NA's   :8                                
#>    Horsepower         Weight      Acceleration     Model.Year   
#>  Min.   : 46.00   Min.   :1613   Min.   : 8.00   Min.   :70.00  
#>  1st Qu.: 75.75   1st Qu.:2226   1st Qu.:13.70   1st Qu.:73.00  
#>  Median : 95.00   Median :2822   Median :15.50   Median :76.00  
#>  Mean   :105.08   Mean   :2979   Mean   :15.52   Mean   :75.92  
#>  3rd Qu.:130.00   3rd Qu.:3618   3rd Qu.:17.18   3rd Qu.:79.00  
#>  Max.   :230.00   Max.   :5140   Max.   :24.80   Max.   :82.00  
#>  NA's   :6                                                      
# the carstats data set contains missing values in two columns
missingValuesHandling(carstats, preds = "Horsepower", 
resps = c("Model", "MPG", "Cylinders", "Displacement", "Weight", 
"Acceleration", "Model.Year"), groups = "Origin", auxs = character(),
scriptvars = list(math.fun = "Mean Values (mean)", input.values = "", 
na.representation = "", min.complete = 0.5), return.results = TRUE)
#> $rowInds
#>      MPG Horsepower
#>    <int>      <int>
#> 1:    11         39
#> 2:    12        134
#> 3:    13        338
#> 4:    14        344
#> 5:    15        362
#> 6:    18        383
#> 7:    40         NA
#> 8:   368         NA
#> 
#> $outDataset
#>                          Model  Origin   MPG Cylinders Displacement Horsepower
#>                         <char>  <char> <num>    <char>        <num>      <num>
#>   1: chevrolet chevelle malibu     USA    18         8          307        130
#>   2:         buick skylark 320     USA    15         8          350        165
#>   3:        plymouth satellite     USA    18         8          318        150
#>   4:             amc rebel sst     USA    16         8          304        150
#>   5:               ford torino     USA    17         8          302        140
#>  ---                                                                          
#> 402:           ford mustang gl     USA    27         4          140         86
#> 403:                 vw pickup Germany    44         4           97         52
#> 404:             dodge rampage     USA    32         4          135         84
#> 405:               ford ranger     USA    28         4          120         79
#> 406:                chevy s-10     USA    31         4          119         82
#>      Weight Acceleration Model.Year
#>       <num>        <num>      <num>
#>   1:   3504         12.0         70
#>   2:   3693         11.5         70
#>   3:   3436         11.0         70
#>   4:   3433         12.0         70
#>   5:   3449         10.5         70
#>  ---                               
#> 402:   2790         15.6         82
#> 403:   2130         24.6         82
#> 404:   2295         11.6         82
#> 405:   2625         18.6         82
#> 406:   2720         19.4         82
#>