how to organize input dimensions for LSTM classification
Hi guys,
I’m trying to train a lstm using sequential data to predict classes, and I’m a little confused by the format of input data and labels.
For the sake of simplicity, I’ll use an example to mimic my situation.
let’s say I’m trying to use temperature data to predict 3 cities: A, B, and C.
Within each city, i have temperature readings from 10 therometers over 2 seconds at a sample frequency of 100 hz.
So far, at each observation, I have a 200 by 10 matrix (time point by therometer).
temperature_matrix = randi(40, 200, 10) % pseudodata
We collected the temperature data 40 times throughout the day at each city, and this will give us 120 observations (3 cities * 40). Within each observation, I have a 200 by 10 matrix.
As for my input format, I now have a 120 by 1 cell array, and again within each cell array is a 200 by 10 matrix.
temperature_input = cell(120,1)
for ii = 1:length(temperature_input)
temperature_input{ii} = randi(40, 200, 10)
end
labels = [repmat("city A", 40,1); repmat("city B", 40,1); repmat("city C", 40,1)]
Per my undstanding, if I were to have a time step of 10, i should make a sliding window with a size of 5, and move it down the time dimenssion at a moving step of 1. That is to say, for each 200 by 10 temperature_matrix, I now slice it into 196 2D arrays, where each array is 5 by 10 (window size by therometer).
My question is how this sliding window plays a part in the input format? the sliding window create the fourth dimension in my example. The other three dimension is observation, time, and therometer. I think my overall structure is still a 120 by 1 cell array, but the dimenssions within each entry, I dont know how to organize them.
Also, out of curiosity, will it mess up the structure i transpose the time point by therometer matrice? I’m only asking between I’ve seen examples on the sequencce either in row or column.
Best,
FYHi guys,
I’m trying to train a lstm using sequential data to predict classes, and I’m a little confused by the format of input data and labels.
For the sake of simplicity, I’ll use an example to mimic my situation.
let’s say I’m trying to use temperature data to predict 3 cities: A, B, and C.
Within each city, i have temperature readings from 10 therometers over 2 seconds at a sample frequency of 100 hz.
So far, at each observation, I have a 200 by 10 matrix (time point by therometer).
temperature_matrix = randi(40, 200, 10) % pseudodata
We collected the temperature data 40 times throughout the day at each city, and this will give us 120 observations (3 cities * 40). Within each observation, I have a 200 by 10 matrix.
As for my input format, I now have a 120 by 1 cell array, and again within each cell array is a 200 by 10 matrix.
temperature_input = cell(120,1)
for ii = 1:length(temperature_input)
temperature_input{ii} = randi(40, 200, 10)
end
labels = [repmat("city A", 40,1); repmat("city B", 40,1); repmat("city C", 40,1)]
Per my undstanding, if I were to have a time step of 10, i should make a sliding window with a size of 5, and move it down the time dimenssion at a moving step of 1. That is to say, for each 200 by 10 temperature_matrix, I now slice it into 196 2D arrays, where each array is 5 by 10 (window size by therometer).
My question is how this sliding window plays a part in the input format? the sliding window create the fourth dimension in my example. The other three dimension is observation, time, and therometer. I think my overall structure is still a 120 by 1 cell array, but the dimenssions within each entry, I dont know how to organize them.
Also, out of curiosity, will it mess up the structure i transpose the time point by therometer matrice? I’m only asking between I’ve seen examples on the sequencce either in row or column.
Best,
FY Hi guys,
I’m trying to train a lstm using sequential data to predict classes, and I’m a little confused by the format of input data and labels.
For the sake of simplicity, I’ll use an example to mimic my situation.
let’s say I’m trying to use temperature data to predict 3 cities: A, B, and C.
Within each city, i have temperature readings from 10 therometers over 2 seconds at a sample frequency of 100 hz.
So far, at each observation, I have a 200 by 10 matrix (time point by therometer).
temperature_matrix = randi(40, 200, 10) % pseudodata
We collected the temperature data 40 times throughout the day at each city, and this will give us 120 observations (3 cities * 40). Within each observation, I have a 200 by 10 matrix.
As for my input format, I now have a 120 by 1 cell array, and again within each cell array is a 200 by 10 matrix.
temperature_input = cell(120,1)
for ii = 1:length(temperature_input)
temperature_input{ii} = randi(40, 200, 10)
end
labels = [repmat("city A", 40,1); repmat("city B", 40,1); repmat("city C", 40,1)]
Per my undstanding, if I were to have a time step of 10, i should make a sliding window with a size of 5, and move it down the time dimenssion at a moving step of 1. That is to say, for each 200 by 10 temperature_matrix, I now slice it into 196 2D arrays, where each array is 5 by 10 (window size by therometer).
My question is how this sliding window plays a part in the input format? the sliding window create the fourth dimension in my example. The other three dimension is observation, time, and therometer. I think my overall structure is still a 120 by 1 cell array, but the dimenssions within each entry, I dont know how to organize them.
Also, out of curiosity, will it mess up the structure i transpose the time point by therometer matrice? I’m only asking between I’ve seen examples on the sequencce either in row or column.
Best,
FY lstm, input, dimension MATLAB Answers — New Questions