Bug in readtable()? - if the first values in a CSV's column are missing, the whole column is misinterpreted

I am reading a big CSV file (500K lines) with readtable. In the CSV, some columns have the first 250+ lines empty (e.g. ",,,," in the CSV), while the non-missing values below (pretty rare) are either text strings or dates (in the DD-MM-YYYY format). Readtable() somehow interprets these columns as numeric, and so converts all the strings and dates into NaNs – thus, I end up with 100% NaN-filled columns instead of rarely-populated data (among empty strings and NaTs).
Furthermore, if I move the "with-data" lines up – even a few dozens positions up – readtable() starts to read everything normally!
So, it looks like readtable() checks only ~250 first values to determine the type of the column, which, in my opinion, is a bug! (Although I understand that it was likely made to improve speed.)

Is there a way to fix it systematically? I have lots of such CSVs with thousands of columns in them – so, a manual check and manual fix is not an option…

UPD: a test-file (truncated to 510 lines) is attached – the behaviour is still the same. The problem columns are the 2nd and the 3rd (p190, p191). The first non-empty value is on data-line 270.I am reading a big CSV file (500K lines) with readtable. In the CSV, some columns have the first 250+ lines empty (e.g. ",,,," in the CSV), while the non-missing values below (pretty rare) are either text strings or dates (in the DD-MM-YYYY format). Readtable() somehow interprets these columns as numeric, and so converts all the strings and dates into NaNs – thus, I end up with 100% NaN-filled columns instead of rarely-populated data (among empty strings and NaTs).
Furthermore, if I move the "with-data" lines up – even a few dozens positions up – readtable() starts to read everything normally!
So, it looks like readtable() checks only ~250 first values to determine the type of the column, which, in my opinion, is a bug! (Although I understand that it was likely made to improve speed.)

Is there a way to fix it systematically? I have lots of such CSVs with thousands of columns in them – so, a manual check and manual fix is not an option…

UPD: a test-file (truncated to 510 lines) is attached – the behaviour is still the same. The problem columns are the 2nd and the 3rd (p190, p191). The first non-empty value is on data-line 270. I am reading a big CSV file (500K lines) with readtable. In the CSV, some columns have the first 250+ lines empty (e.g. ",,,," in the CSV), while the non-missing values below (pretty rare) are either text strings or dates (in the DD-MM-YYYY format). Readtable() somehow interprets these columns as numeric, and so converts all the strings and dates into NaNs – thus, I end up with 100% NaN-filled columns instead of rarely-populated data (among empty strings and NaTs).
Furthermore, if I move the "with-data" lines up – even a few dozens positions up – readtable() starts to read everything normally!
So, it looks like readtable() checks only ~250 first values to determine the type of the column, which, in my opinion, is a bug! (Although I understand that it was likely made to improve speed.)

Is there a way to fix it systematically? I have lots of such CSVs with thousands of columns in them – so, a manual check and manual fix is not an option…

Cart

Cart

Bug in readtable()? – if the first values in a CSV’s column are missing, the whole column is misinterpreted

Related posts

External Mode Connection Issue with C2000 LaunchPad and Speedgoat System

how to validate mscohere?

Transfer history to MATLAB 2025a

Leave a Reply Cancel reply

Information

Contact Us

All Categories

Search

Cart

All Categories

Search

Cart

Bug in readtable()? – if the first values in a CSV’s column are missing, the whole column is misinterpreted

Share this!

Related posts

External Mode Connection Issue with C2000 LaunchPad and Speedgoat System

how to validate mscohere?

Transfer history to MATLAB 2025a

Leave a Reply Cancel reply

Sign Up For Newsletters

Information

Contact Us