Multi-thread parsing and loading thousands of csv files
I have a folder with 2500 csv files, each 15MB each. I currently have a script that reads each csv into a cell array container as follows at the bottom.
Unfortunately this serial process takes a very long time to open each csv one by one.
Ideally I would like to multi-thread or open multiple csv files in parallel and save them into either their own set of cell arrays per ‘thread’ and later combine and sort them, or into one big cell array as it is currently.
%% IMPORT FILES
directory = ‘\headnodeuserdataGeorgeANSTOANSTO Day 2DataD14’;
datafiles = dir(append(directory,’*.csv’));
N=length(datafiles);
a = 0;
data = cell(1,N);
f = waitbar(a,’Importing Data…’);
for i = 1:N
data{i} = read_csv(strcat(datafiles(i).folder, ”, datafiles(i).name));
waitbar(i/N,f);
end
waitbar(1,f);
close(f);I have a folder with 2500 csv files, each 15MB each. I currently have a script that reads each csv into a cell array container as follows at the bottom.
Unfortunately this serial process takes a very long time to open each csv one by one.
Ideally I would like to multi-thread or open multiple csv files in parallel and save them into either their own set of cell arrays per ‘thread’ and later combine and sort them, or into one big cell array as it is currently.
%% IMPORT FILES
directory = ‘\headnodeuserdataGeorgeANSTOANSTO Day 2DataD14’;
datafiles = dir(append(directory,’*.csv’));
N=length(datafiles);
a = 0;
data = cell(1,N);
f = waitbar(a,’Importing Data…’);
for i = 1:N
data{i} = read_csv(strcat(datafiles(i).folder, ”, datafiles(i).name));
waitbar(i/N,f);
end
waitbar(1,f);
close(f); I have a folder with 2500 csv files, each 15MB each. I currently have a script that reads each csv into a cell array container as follows at the bottom.
Unfortunately this serial process takes a very long time to open each csv one by one.
Ideally I would like to multi-thread or open multiple csv files in parallel and save them into either their own set of cell arrays per ‘thread’ and later combine and sort them, or into one big cell array as it is currently.
%% IMPORT FILES
directory = ‘\headnodeuserdataGeorgeANSTOANSTO Day 2DataD14’;
datafiles = dir(append(directory,’*.csv’));
N=length(datafiles);
a = 0;
data = cell(1,N);
f = waitbar(a,’Importing Data…’);
for i = 1:N
data{i} = read_csv(strcat(datafiles(i).folder, ”, datafiles(i).name));
waitbar(i/N,f);
end
waitbar(1,f);
close(f); multi-thread, csv, file, file processing, speed, file handling, file import, data import, multiple, parsing, parallel, parallel computing MATLAB Answers — New Questions