Textscan doesn’t work on big files?
I’m currently using the latest Matlab version on 16 GB RAM Mac.
I tried to perform a splitting of a really big cube file (100 GB) into smaller cube files with only 210151 lines per file using this code:
%% Splitting
% opening the result.cube file
fid = fopen(cube) ;
if fid == -1
error(‘File could not be opened.’);
end
m = 1 ;
while ~feof(fid)
% skip the alpha and beta density
fseek(fid,16596786,0) ;
% copy the spin density
text = textscan(fid,’%s’,210150,’Delimiter’,’n’,’Whitespace’,”) ;
% Prints the cube snap shot to the subdirectory
name = string(step_nr(m))+’.cube’ ;
full_path = fullfile(name1,name) ;
fid_new = fopen(full_path,"w") ;
fprintf(fid_new,’%sn’, text{1}{:}) ;
fclose(fid_new) ;
m = m+1 ;
end
fclose(fid) ;
save("steps","step_nr")
end
My problem is: Apparently, textscan is not suited for this kind of files. I also tried with line-by-line copying with fgetl, which on the other hand takes ages for a file of 100 GB. Is there a more efficient way to split the file?
I’ve read about fscanf and tried this:
tic;
fid = fopen(‘result.cube’);
fgetl(fid) ; fgetl(fid) ;
f = fscanf(fid, ‘%d %f %f %f’, [4 4]) ;
s = fscanf(fid, ‘%d %f %f %f %f’, [5 192]) ;
n = fscanf(fid, ‘%f %f %f %f %f %f’, [6 209953]) ;
fid_new = fopen("new",’w’) ;
fprintf(fid_new, ‘%d %.6f %.6f %.6fn’, f) ;
fprintf(fid_new, ‘%d %.6f %.6f %.6f %.6fn’, s) ;
fprintf(fid_new, ‘%f %f %f %f %fn’, n) ;
fclose(fid) ;
t=toc
But my problem here is: `s` is not aligned in the individual file like in the big file. `n` is in decimals instead of for example E-02. I also tried to copy it line by line but it takes years. Any suggestions how to improve this? I want it to look like this:I’m currently using the latest Matlab version on 16 GB RAM Mac.
I tried to perform a splitting of a really big cube file (100 GB) into smaller cube files with only 210151 lines per file using this code:
%% Splitting
% opening the result.cube file
fid = fopen(cube) ;
if fid == -1
error(‘File could not be opened.’);
end
m = 1 ;
while ~feof(fid)
% skip the alpha and beta density
fseek(fid,16596786,0) ;
% copy the spin density
text = textscan(fid,’%s’,210150,’Delimiter’,’n’,’Whitespace’,”) ;
% Prints the cube snap shot to the subdirectory
name = string(step_nr(m))+’.cube’ ;
full_path = fullfile(name1,name) ;
fid_new = fopen(full_path,"w") ;
fprintf(fid_new,’%sn’, text{1}{:}) ;
fclose(fid_new) ;
m = m+1 ;
end
fclose(fid) ;
save("steps","step_nr")
end
My problem is: Apparently, textscan is not suited for this kind of files. I also tried with line-by-line copying with fgetl, which on the other hand takes ages for a file of 100 GB. Is there a more efficient way to split the file?
I’ve read about fscanf and tried this:
tic;
fid = fopen(‘result.cube’);
fgetl(fid) ; fgetl(fid) ;
f = fscanf(fid, ‘%d %f %f %f’, [4 4]) ;
s = fscanf(fid, ‘%d %f %f %f %f’, [5 192]) ;
n = fscanf(fid, ‘%f %f %f %f %f %f’, [6 209953]) ;
fid_new = fopen("new",’w’) ;
fprintf(fid_new, ‘%d %.6f %.6f %.6fn’, f) ;
fprintf(fid_new, ‘%d %.6f %.6f %.6f %.6fn’, s) ;
fprintf(fid_new, ‘%f %f %f %f %fn’, n) ;
fclose(fid) ;
t=toc
But my problem here is: `s` is not aligned in the individual file like in the big file. `n` is in decimals instead of for example E-02. I also tried to copy it line by line but it takes years. Any suggestions how to improve this? I want it to look like this: I’m currently using the latest Matlab version on 16 GB RAM Mac.
I tried to perform a splitting of a really big cube file (100 GB) into smaller cube files with only 210151 lines per file using this code:
%% Splitting
% opening the result.cube file
fid = fopen(cube) ;
if fid == -1
error(‘File could not be opened.’);
end
m = 1 ;
while ~feof(fid)
% skip the alpha and beta density
fseek(fid,16596786,0) ;
% copy the spin density
text = textscan(fid,’%s’,210150,’Delimiter’,’n’,’Whitespace’,”) ;
% Prints the cube snap shot to the subdirectory
name = string(step_nr(m))+’.cube’ ;
full_path = fullfile(name1,name) ;
fid_new = fopen(full_path,"w") ;
fprintf(fid_new,’%sn’, text{1}{:}) ;
fclose(fid_new) ;
m = m+1 ;
end
fclose(fid) ;
save("steps","step_nr")
end
My problem is: Apparently, textscan is not suited for this kind of files. I also tried with line-by-line copying with fgetl, which on the other hand takes ages for a file of 100 GB. Is there a more efficient way to split the file?
I’ve read about fscanf and tried this:
tic;
fid = fopen(‘result.cube’);
fgetl(fid) ; fgetl(fid) ;
f = fscanf(fid, ‘%d %f %f %f’, [4 4]) ;
s = fscanf(fid, ‘%d %f %f %f %f’, [5 192]) ;
n = fscanf(fid, ‘%f %f %f %f %f %f’, [6 209953]) ;
fid_new = fopen("new",’w’) ;
fprintf(fid_new, ‘%d %.6f %.6f %.6fn’, f) ;
fprintf(fid_new, ‘%d %.6f %.6f %.6f %.6fn’, s) ;
fprintf(fid_new, ‘%f %f %f %f %fn’, n) ;
fclose(fid) ;
t=toc
But my problem here is: `s` is not aligned in the individual file like in the big file. `n` is in decimals instead of for example E-02. I also tried to copy it line by line but it takes years. Any suggestions how to improve this? I want it to look like this: data splitting MATLAB Answers — New Questions