## Speed up Some Code

Hi all. I’m trying to speed up the code below. It’s from the SMOTE function that was wrtitten for MATLAB and works really well. The only thing is that the loops are not fast. I’ve looked at Parfor but that wouldn’t work and I can’t see how I can vectorise it either. It could be I just have to suck it up and wait but just in case, the code is :-

% This is where the magic happens ðŸ˜‰

% X : Observational matrix (rows are observations, columns are variables)

% J : Synthesization vector. It has the same length as the number of

% observations (rows) in X. J determines how many times each

% observation is used as a base for synthesization.

% k : Number of nearest neighbors to consider when synthesizing.

function Xn = simpleSMOTE(X,J,k)

tic

HNSMdl = hnswSearcher(X); % To remove this, comment out this line and replace the HNSMdl below with X

[idx, ~] = knnsearch(HNSMdl,X,’k’,k+1); % Find nearest neighbors (add one to the number of neighbors to find, as observations are their own nearest neighbor)

toc

Xn = nan(sum(J),size(X,2)); % Pre-allocate memory for synthesized observations

% Iterate through observations to create to synthesize new observations

for ii=1:numel(J)

P = randperm(k,J(ii))+1; % Randomize nearest neighbor pick (never pick first nearest neighbor as this is the observation itself)

for jj=1:J(ii)

x = X(idx(ii,1),:); % Observation

xk = X(idx(ii,P(jj)),:); % Nearest neighbor

Xn(sum(J(1:ii-1))+jj,:) = (xk-x)*rand+x; % Synthesize observation

end

end

end

It’s from the ‘for ii’ bit that is slow and there are around 750000 items of 13 variables.

SteveHi all. I’m trying to speed up the code below. It’s from the SMOTE function that was wrtitten for MATLAB and works really well. The only thing is that the loops are not fast. I’ve looked at Parfor but that wouldn’t work and I can’t see how I can vectorise it either. It could be I just have to suck it up and wait but just in case, the code is :-

% This is where the magic happens ðŸ˜‰

% X : Observational matrix (rows are observations, columns are variables)

% J : Synthesization vector. It has the same length as the number of

% observations (rows) in X. J determines how many times each

% observation is used as a base for synthesization.

% k : Number of nearest neighbors to consider when synthesizing.

function Xn = simpleSMOTE(X,J,k)

tic

HNSMdl = hnswSearcher(X); % To remove this, comment out this line and replace the HNSMdl below with X

[idx, ~] = knnsearch(HNSMdl,X,’k’,k+1); % Find nearest neighbors (add one to the number of neighbors to find, as observations are their own nearest neighbor)

toc

Xn = nan(sum(J),size(X,2)); % Pre-allocate memory for synthesized observations

% Iterate through observations to create to synthesize new observations

for ii=1:numel(J)

P = randperm(k,J(ii))+1; % Randomize nearest neighbor pick (never pick first nearest neighbor as this is the observation itself)

for jj=1:J(ii)

x = X(idx(ii,1),:); % Observation

xk = X(idx(ii,P(jj)),:); % Nearest neighbor

Xn(sum(J(1:ii-1))+jj,:) = (xk-x)*rand+x; % Synthesize observation

end

end

end

It’s from the ‘for ii’ bit that is slow and there are around 750000 items of 13 variables.

SteveÂ Hi all. I’m trying to speed up the code below. It’s from the SMOTE function that was wrtitten for MATLAB and works really well. The only thing is that the loops are not fast. I’ve looked at Parfor but that wouldn’t work and I can’t see how I can vectorise it either. It could be I just have to suck it up and wait but just in case, the code is :-

% This is where the magic happens ðŸ˜‰

% X : Observational matrix (rows are observations, columns are variables)

% J : Synthesization vector. It has the same length as the number of

% observations (rows) in X. J determines how many times each

% observation is used as a base for synthesization.

% k : Number of nearest neighbors to consider when synthesizing.

function Xn = simpleSMOTE(X,J,k)

tic

HNSMdl = hnswSearcher(X); % To remove this, comment out this line and replace the HNSMdl below with X

[idx, ~] = knnsearch(HNSMdl,X,’k’,k+1); % Find nearest neighbors (add one to the number of neighbors to find, as observations are their own nearest neighbor)

toc

Xn = nan(sum(J),size(X,2)); % Pre-allocate memory for synthesized observations

% Iterate through observations to create to synthesize new observations

for ii=1:numel(J)

P = randperm(k,J(ii))+1; % Randomize nearest neighbor pick (never pick first nearest neighbor as this is the observation itself)

for jj=1:J(ii)

x = X(idx(ii,1),:); % Observation

xk = X(idx(ii,P(jj)),:); % Nearest neighbor

Xn(sum(J(1:ii-1))+jj,:) = (xk-x)*rand+x; % Synthesize observation

end

end

end

It’s from the ‘for ii’ bit that is slow and there are around 750000 items of 13 variables.

SteveÂ speed, code, loopsÂ MATLAB Answers â€” New Questions

â€‹