Excel Data preprocessing from a scattered one into an organised table
I have excel data that is pretty big (10000×63) and unorganised. I have attached the sample data file here (sample.xlsx).
This data is converted from a pdf to excel so there are a lot of issues. I will highlight some of them here.
Some key things to note about data
There are strings and numbers and I would like to keep them both organised.
After every 200-250 rows, the table header labels keep repeating which I would like to remove
Whenever the table header repeats, the columns shift left or right (sometimes upto 5 or 6 columns)
There is one column "Code" which has both strings(1,2) and numbers(2).
There are some cells where there is no data or "——".
There are some texts on the top and bottom of the table (legends etc) which I would like to remove.
I would like the data to look like this one big simple list (it took me 4 hours to make this). I would like to automate it as I have many excel files of similar dimensions(10000×63).
Additionally I would like to know if there’s a better way to organise this data other than matlab?
Thank you well in advance for your help. I am still learning matlab so any help would mean a lot!I have excel data that is pretty big (10000×63) and unorganised. I have attached the sample data file here (sample.xlsx).
This data is converted from a pdf to excel so there are a lot of issues. I will highlight some of them here.
Some key things to note about data
There are strings and numbers and I would like to keep them both organised.
After every 200-250 rows, the table header labels keep repeating which I would like to remove
Whenever the table header repeats, the columns shift left or right (sometimes upto 5 or 6 columns)
There is one column "Code" which has both strings(1,2) and numbers(2).
There are some cells where there is no data or "——".
There are some texts on the top and bottom of the table (legends etc) which I would like to remove.
I would like the data to look like this one big simple list (it took me 4 hours to make this). I would like to automate it as I have many excel files of similar dimensions(10000×63).
Additionally I would like to know if there’s a better way to organise this data other than matlab?
Thank you well in advance for your help. I am still learning matlab so any help would mean a lot! I have excel data that is pretty big (10000×63) and unorganised. I have attached the sample data file here (sample.xlsx).
This data is converted from a pdf to excel so there are a lot of issues. I will highlight some of them here.
Some key things to note about data
There are strings and numbers and I would like to keep them both organised.
After every 200-250 rows, the table header labels keep repeating which I would like to remove
Whenever the table header repeats, the columns shift left or right (sometimes upto 5 or 6 columns)
There is one column "Code" which has both strings(1,2) and numbers(2).
There are some cells where there is no data or "——".
There are some texts on the top and bottom of the table (legends etc) which I would like to remove.
I would like the data to look like this one big simple list (it took me 4 hours to make this). I would like to automate it as I have many excel files of similar dimensions(10000×63).
Additionally I would like to know if there’s a better way to organise this data other than matlab?
Thank you well in advance for your help. I am still learning matlab so any help would mean a lot! importing excel data, data import MATLAB Answers — New Questions