- go downstairs
- go upstairs
You might be thinking why are we using LSTM-CNN models instead of basic machine learning methods? Machine learning methods rely heavily on heuristic manual feature extraction for human activity recognition tasks, and what we need to do here is end-to-end learning that simplifies the operation of heuristic manual feature extraction.
First, we will import all the necessary libraries that we will need.
from pandas import read_csv, unique
import numpy as np
from scipy.interpolate import interp1d
from scipy.stats import mode
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
from tensorflow import stack
from tensorflow.keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, GlobalAveragePooling1D, BatchNormalization, MaxPool1D, Reshape, Activation
from keras.layers import Conv1D, LSTM
from keras.callbacks import ModelCheckpoint, EarlyStopping
import matplotlib.pyplot as plt
We will use Sklearn, Tensorflow, Keras, Scipy, and Numpy for model building and data preprocessing. Use PANDAS for data loading and matplotlib for data visualization.
Dataset loading and visualization
WISDM is recorded by an accelerometer on a mobile device carried around the individual's waist. The data collection is overseen by individuals to ensure the quality of the data. The file we will use is WISDM_AR_V1.1_RAW.TXT. Using PANDAS, the dataset can be loaded into a DataAframe as follows:
df = read_csv(filepath, header=None, names=['user-id',
## removing ';' from last column and converting it to float
df['Z'].replace(regex=True, inplace=True, to_replace=r';', value=r'')
df['Z'] = df['Z'].apply(convert_to_float)
df = read_data('Dataset/WISDM_ar_v1.1/WISDM_ar_v1.1_raw.txt')
title='Training examples by Activity Types')
title='Training examples by user')
Now I visualize the accelerometer data collected on the three axes.
def axis_plot(ax, x, y, title):
ax.set_ylim([min(y) - np.std(y), max(y) + np.std(y)])
limit= df[df['activity'] == activity][:180]
fig, (ax0, ax1, ax2) = plt.subplots(nrows=3, sharex=True, figsize=(15, 10))
Data preprocessing is a very important task that enables our model to make better use of our raw data. The data preprocessing methods that will be used here are:
- Tag encoding
- Linear interpolation
- data segmentation
- time series segmentation
- one-hot encoding
Tag encoding Since the model cannot accept non-numeric labels as input, we will add the encoded label of the ' activity ' column in another column and name it ' activityEncode '. The labels are converted into numeric labels as shown below (this label is the resulting label we want to predict)
- Downstairs 
- Jogging 
- Sitting 
- Standing 
- Upstairs 
- Walking 
df['activityEncode'] = label_encode.fit_transform(df['activity'].values.ravel())
Linear interpolation Using linear interpolation can avoid the problem of data loss of NaN during the acquisition process. It will fill in missing values by interpolation. Although there is only one NaN value in this dataset, it needs to be implemented for our demonstration.
interpolation_fn = interp1d(df['activityEncode'] ,df['Z'], kind='linear')
null_list = df[df['Z'].isnull()].index.tolist()
for i in null_list:
y = df['activityEncode'][i]
value = interpolation_fn(y)
data segmentation Data segmentation is performed according to user id to avoid data segmentation errors. We use users with ids less than or equal to 27 in the training set and the rest in the test set.
df_test= df[df['user-id'] >27]
df_train= df[df['user-id'] <=>=>27]
Normalized Before training, the data features need to be normalized to a range of 0 to 1. The method we use is:
df_train['X'] = (df_train['X']-df_train['X'].min())/(df_train['X'].max()-df_train['X'].min())
df_train['Y'] = (df_train['Y']-df_train['Y'].min())/(df_train['Y'].max()-df_train['Y'].min())
df_train['Z'] = (df_train['Z']-df_train['Z'].min())/(df_train['Z'].max()-df_train['Z'].min())
time series segmentation Since we are dealing with time series data, we need to create a function to split, the label name and the range of each record to split. This function performs the separation of features in x_train and y_train, dividing each 80 time period into a set of data.
def segments(df, time_steps, step, label_name):
N_FEATURES = 3
segments = 
labels = 
for i in range(0, len(df) - time_steps, step):
xs = df['X'].values[i:i+time_steps]
ys = df['Y'].values[i:i+time_steps]
zs = df['Z'].values[i:i+time_steps]
label = mode(df[label_name][i:i+time_steps])
segments.append([xs, ys, zs])
reshaped_segments = np.asarray(segments, dtype=np.float32).reshape(-1, time_steps, N_FEATURES)
labels = np.asarray(labels)
return reshaped_segments, labels
TIME_PERIOD = 80
STEP_DISTANCE = 40
LABEL = 'activityEncode'
x_train, y_train = segments(df_train, TIME_PERIOD, STEP_DISTANCE, LABEL)
In this way, the x_train and y_train shapes become:
print('x_train shape:', x_train.shape)
print('Training samples:', x_train.shape)
print('y_train shape:', y_train.shape)
x_train shape: (20334, 80, 3)
Training samples: 20334
y_train shape: (20334,)
It also stores some data for later use: the time period (time_period), the number of sensors (sensors), and the number of classes (num_classes).
time_period, sensors = x_train.shape, x_train.shape
num_classes = label_encode.classes_.size
['Downstairs', 'Jogging', 'Sitting', 'Standing', 'Upstairs', 'Walking']
Finally it needs to be converted to a list using Reshape as input to keras:
input_shape = time_period * sensors
x_train = x_train.reshape(x_train.shape, input_shape)
print("Input Shape: ", input_shape)
print("Input Data Shape: ", x_train.shape)
Input Shape: 240
Input Data Shape: (20334, 240)
Finally need to convert all data to float32.
x_train = x_train.astype('float32')
y_train = y_train.astype('float32')
one-hot encoding This is the last step in data preprocessing, which we will perform by encoding the labels and storing them into y_train_hot.
y_train_hot = to_categorical(y_train, num_classes)
print("y_train shape: ", y_train_hot.shape)
y_train shape: (20334, 6)
The model we use is a sequential model consisting of 8 layers. The first two layers of the model consist of LSTMs, each LSTM has 32 neurons, and the activation function used is Relu. Then there are convolutional layers for extracting spatial features. The LSTM output dimension needs to be changed at the connection of the two layers, because the output has 3 dimensions (number of samples, timestep, input dimension), while CNN requires 4 dimensions of input (number of samples, 1, timestep, input). The first CNN layer has 64 neurons and the other has 128 neurons. Between the first and second CNN layers, we have a max pooling layer to perform the downsampling operation. Then a global average pooling (GAP) layer converts the multi-D feature maps into 1-D feature vectors, which reduces the global model parameters since no parameters are needed in this layer. Then there is the BN layer, which contributes to the convergence of the model. The last layer is the output layer of the model, which is just a fully connected layer with 6 neurons of the SoftMax classifier layer representing the probability of the current class.
model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=(input_shape,1), activation='relu'))
model.add(Reshape((1, 240, 32)))
model.add(Conv1D(filters=64,kernel_size=2, activation='relu', strides=2))
model.add(Conv1D(filters=192, kernel_size=2, activation='relu', strides=1))
training and results
After training, the model gave 98.02% accuracy and 0.0058 loss. The training F1 score is 0.96.
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(x_train,
Visualize training accuracy and loss change graphs.
plt.plot(history.history['accuracy'], 'r', label='Accuracy of training data')
plt.plot(history.history['loss'], 'r--', label='Loss of training data')
plt.title('Model Accuracy and Loss')
plt.ylabel('Accuracy and Loss')
y_pred_train = model.predict(x_train)
max_y_pred_train = np.argmax(y_pred_train, axis=1)
Test it on the test dataset, but you need to do the same preprocessing on the test set before passing it.
df_test['X'] = (df_test['X']-df_test['X'].min())/(df_test['X'].max()-df_test['X'].min())
df_test['Y'] = (df_test['Y']-df_test['Y'].min())/(df_test['Y'].max()-df_test['Y'].min())
df_test['Z'] = (df_test['Z']-df_test['Z'].min())/(df_test['Z'].max()-df_test['Z'].min())
x_test, y_test = segments(df_test,
x_test = x_test.reshape(x_test.shape, input_shape)
x_test = x_test.astype('float32')
y_test = y_test.astype('float32')
y_test = to_categorical(y_test, num_classes)
After evaluating our test dataset, we got 89.14% accuracy and 0.4647 loss. The F1 test score was 0.89.
score = model.evaluate(x_test, y_test)
Plot the confusion matrix below to better understand the predictions on the test dataset.
predictions = model.predict(x_test)
predictions = np.argmax(predictions, axis=1)
y_test_pred = np.argmax(y_test, axis=1)
cm = confusion_matrix(y_test_pred, predictions)
cm_disp = ConfusionMatrixDisplay(confusion_matrix= cm)
The performance of the LSTM-CNN model is much better than any other machine learning model. The code for this article can be found on GitHub. https://github.com/Tanny1810/Human-Activity-Recognition-LSTM-CNN You can try to implement it yourself to improve the F1 score by optimizing the model. Another: This model is from the paper LSTM-CNN Architecture for Human Activity Recognition published in the IEEE journal by Xia Kun, Huang Jianguang, and Hanyu Wang. https://ieeexplore.ieee.org/abstract/document/9043535Reviewing Editor: Li Qian