Random Forest with paired observations: how to maintain subject separation
When using classifiers like SVM, I keep observations from each subject together by using a custom cross-validation partition. Random forest uses bootstrap aggregation instead of cross-validation, so I need a way of telling it to keep each subject’s observations together: i.e. a subject has to be either fully in or out of the bag, not some observations in and some out. How do I do this in Matlab?
I can write code to generate the bootstrapped data that TreeBagger could use for each tree, analogous to a custom CVPartition, but there seems to be no way of passing this to TreeBagger. How does one achieve this in Matlab?
(I do realise that one solution to keep subjects together is to use cross-validation on top of bagging, but that shouldn’t be necessary and greatly slows the whole process down, e.g. 10-fold CV would be expected to take ten times as long. I could also manually roll the whole random forest process, but then I don’t have a TreeBagger object that I can pass to other functions, etc.)
rf = TreeBagger(numTrees, X, Y, …
‘Method’, ‘classification’, …
‘OOBPrediction’, ‘on’, …
‘NumPredictorsToSample’, mtry, …
‘MinLeafSize’, 3)When using classifiers like SVM, I keep observations from each subject together by using a custom cross-validation partition. Random forest uses bootstrap aggregation instead of cross-validation, so I need a way of telling it to keep each subject’s observations together: i.e. a subject has to be either fully in or out of the bag, not some observations in and some out. How do I do this in Matlab?
I can write code to generate the bootstrapped data that TreeBagger could use for each tree, analogous to a custom CVPartition, but there seems to be no way of passing this to TreeBagger. How does one achieve this in Matlab?
(I do realise that one solution to keep subjects together is to use cross-validation on top of bagging, but that shouldn’t be necessary and greatly slows the whole process down, e.g. 10-fold CV would be expected to take ten times as long. I could also manually roll the whole random forest process, but then I don’t have a TreeBagger object that I can pass to other functions, etc.)
rf = TreeBagger(numTrees, X, Y, …
‘Method’, ‘classification’, …
‘OOBPrediction’, ‘on’, …
‘NumPredictorsToSample’, mtry, …
‘MinLeafSize’, 3) When using classifiers like SVM, I keep observations from each subject together by using a custom cross-validation partition. Random forest uses bootstrap aggregation instead of cross-validation, so I need a way of telling it to keep each subject’s observations together: i.e. a subject has to be either fully in or out of the bag, not some observations in and some out. How do I do this in Matlab?
I can write code to generate the bootstrapped data that TreeBagger could use for each tree, analogous to a custom CVPartition, but there seems to be no way of passing this to TreeBagger. How does one achieve this in Matlab?
(I do realise that one solution to keep subjects together is to use cross-validation on top of bagging, but that shouldn’t be necessary and greatly slows the whole process down, e.g. 10-fold CV would be expected to take ten times as long. I could also manually roll the whole random forest process, but then I don’t have a TreeBagger object that I can pass to other functions, etc.)
rf = TreeBagger(numTrees, X, Y, …
‘Method’, ‘classification’, …
‘OOBPrediction’, ‘on’, …
‘NumPredictorsToSample’, mtry, …
‘MinLeafSize’, 3) random forest, pair, separate, bootstrap, bagging MATLAB Answers — New Questions