|
|
|
|
|
[[_TOC_]]
|
|
|
|
|
|
The app screen is divided into a narrow navigation sidebar on the left and a wider workspace area on the right.
|
|
|
|
|
|
<img src="uploads/8218b8afd810e83cd5b93600e36c680d/data_input-v0.9.1.png" width="95%">
|
... | ... | @@ -45,4 +48,4 @@ Individual data points, i.e. rows of the data table, can be associated with a la |
|
|
To help visualize the overall structure of the data, several numerical variables can be combined into a 2d projection. The numerical columns and method to use can be selected in the 'Dimensionality reduction' section. Application of a dimensionality reduction method results in the creation of two new columns containing new coordinates for all data points. When running the same method multiple times, coordinates columns are re-used (i.e. new columns are not created for each new run). Upon successful completion of the dimensionality reduction, the new columns are automatically selected for plotting and the view switches back to the 'Explore' workspace.
|
|
|
|
|
|
### Classification and feature selection
|
|
|
In this workspace, data points can be classified using the XGBoost implementation of gradient boosted decision trees. Its input are a set of numerical features and a training set consisting of rows annotated with the classes to consider in the selected target annotation column. The IDE does 5-fold cross-validation using 2/3 of the annotated data for training and 1/3 for validation. It outputs a plot of feature importance and some statistics on the classifier performance. The plot shows importance for the top 10 features and how these features cluster together. The classifier can be applied to the whole data set with the outcome put into an additional column named xgboost.predictions. |
|
|
\ No newline at end of file |
|
|
In this workspace, data points can be classified using the XGBoost implementation of gradient boosted decision trees. Its input are a set of numerical features and a training set consisting of rows annotated with the classes to consider in the selected target annotation column. The IDE does 5-fold cross-validation using 2/3 of the annotated data for training and 1/3 for validation. It outputs a plot of feature importance and some statistics on the classifier performance. The plot shows importance for the top 10 features and how these features cluster together. The classifier can be applied to the whole data set with the outcome put into an additional column named xgboost.predictions. |