This article takes you through the use of raw data generated by mobile sensors to identify human activity. Human activity recognition (HAR) is a method that uses artificial intelligence (AI) to identify human activity from raw data generated by activity recording devices such as smart watches. Sensors that people wear (smart watches, wristbands, specialized devices, etc.) generate signals when they perform a certain action. These sensors that collect information include accelerometers, gyroscopes, and magnetometers. Human activity recognition has a variety of applications, from providing assistance to the sick and disabled to fields like gaming that rely heavily on analyzing motor skills. We can broadly categorize these human activity recognition techniques into two categories: stationary sensors and mobile sensors. In this paper, we use raw data produced by mobile sensors to identify human activity. 23b68bf8-1142-11ed-ba43-dac502259ad0.pngIn this article, I will use LSTM (Long – term Memory) and CNN (Convolutional Neural Network) to identify the following human activities:
  • go downstairs
  • go upstairs
  • Run
  • sitting
  • stand
  • walk

Overview

You might be thinking why are we using LSTM-CNN models instead of basic machine learning methods? Machine learning methods rely heavily on heuristic manual feature extraction for human activity recognition tasks, and what we need to do here is end-to-end learning that simplifies the operation of heuristic manual feature extraction. The model I'm going to use is a deep neural network formed from a combination of LSTM and CNN and has the ability to extract active features and classify using only model parameters. Here we use the WISDM dataset with a total of 1.098.209 samples. With our training, the model has an F1 score of 0.96, and on the test set, the F1 score is 0.89.

import library

First, we will import all the necessary libraries that we will need.


from pandas import read_csv, uniqueimport numpy as npfrom scipy.interpolate import interp1dfrom scipy.stats import modefrom sklearn.preprocessing import LabelEncoderfrom sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplayfrom tensorflow import stackfrom tensorflow.keras.utils import to_categoricalfrom keras.models import Sequentialfrom keras.layers import Dense, GlobalAveragePooling1D, BatchNormalization, MaxPool1D, Reshape, Activationfrom keras.layers import Conv1D, LSTMfrom keras.callbacks import ModelCheckpoint, EarlyStoppingimport matplotlib.pyplot as plt%matplotlib inlineimport warningswarnings.filterwarnings("ignore")

We will use Sklearn, Tensorflow, Keras, Scipy, and Numpy for model building and data preprocessing. Use PANDAS for data loading and matplotlib for data visualization.

Dataset loading and visualization

WISDM is recorded by an accelerometer on a mobile device carried around the individual's waist. The data collection is overseen by individuals to ensure the quality of the data. The file we will use is WISDM_AR_V1.1_RAW.TXT. Using PANDAS, the dataset can be loaded into a DataAframe as follows:

def read_data(filepath):df = read_csv(filepath, header=None, names=['user-id','activity','timestamp','X','Y','Z'])## removing ';' from last column and converting it to floatdf['Z'].replace(regex=True, inplace=True, to_replace=r';', value=r'')df['Z'] = df['Z'].apply(convert_to_float)return dfdef convert_to_float(x):try:return np.float64(x)except:return np.nandf = read_data('Dataset/WISDM_ar_v1.1/WISDM_ar_v1.1_raw.txt')df


plt.figure(figsize=(15, 5))plt.xlabel('Activity Type')plt.ylabel('Training examples')df['activity'].value_counts().plot(kind='bar',title='Training examples by Activity Types')plt.show()plt.figure(figsize=(15, 5))plt.xlabel('User')plt.ylabel('Training examples')df['user-id'].value_counts().plot(kind='bar',title='Training examples by user')plt.show()



Now I visualize the accelerometer data collected on the three axes.


def axis_plot(ax, x, y, title):ax.plot(x, y,'r')ax.set_title(title)ax.xaxis.set_visible(False)ax.set_ylim([min(y) - np.std(y), max(y) + np.std(y)])ax.set_xlim([min(x), max(x)])ax.grid(True)foractivityindf['activity'].unique():  limit= df[df['activity'] == activity][:180]fig, (ax0, ax1, ax2) = plt.subplots(nrows=3, sharex=True, figsize=(15, 10))axis_plot(ax0,limit['timestamp'],limit['X'],'x-axis')axis_plot(ax1,limit['timestamp'],limit['Y'],'y-axis')axis_plot(ax2,limit['timestamp'],limit['Z'],'z-axis')plt.subplots_adjust(hspace=0.2)fig.suptitle(activity)plt.subplots_adjust(top=0.9)plt.show()

data preprocessing

Data preprocessing is a very important task that enables our model to make better use of our raw data. The data preprocessing methods that will be used here are:

  • Tag encoding
  • Linear interpolation
  • data segmentation
  • Normalized
  • time series segmentation
  • one-hot encoding

Tag encoding Since the model cannot accept non-numeric labels as input, we will add the encoded label of the ' activity ' column in another column and name it ' activityEncode '. The labels are converted into numeric labels as shown below (this label is the resulting label we want to predict)

  • Downstairs [0]
  • Jogging [1]
  • Sitting [2]
  • Standing [3]
  • Upstairs [4]
  • Walking [5]

label_encode= LabelEncoder()df['activityEncode'] = label_encode.fit_transform(df['activity'].values.ravel())df



Linear interpolation Using linear interpolation can avoid the problem of data loss of NaN during the acquisition process. It will fill in missing values ​​by interpolation. Although there is only one NaN value in this dataset, it needs to be implemented for our demonstration.

interpolation_fn = interp1d(df['activityEncode'] ,df['Z'], kind='linear')null_list = df[df['Z'].isnull()].index.tolist()for i in null_list:y = df['activityEncode'][i]value = interpolation_fn(y)df['Z']=df['Z'].fillna(value)print(value)

data segmentation Data segmentation is performed according to user id to avoid data segmentation errors. We use users with ids less than or equal to 27 in the training set and the rest in the test set.


df_test= df[df['user-id'] >27]df_train= df[df['user-id'] <=>=>27]

Normalized Before training, the data features need to be normalized to a range of 0 to 1. The method we use is:


df_train['X'] = (df_train['X']-df_train['X'].min())/(df_train['X'].max()-df_train['X'].min())df_train['Y'] = (df_train['Y']-df_train['Y'].min())/(df_train['Y'].max()-df_train['Y'].min())df_train['Z'] = (df_train['Z']-df_train['Z'].min())/(df_train['Z'].max()-df_train['Z'].min())df_train



time series segmentation Since we are dealing with time series data, we need to create a function to split, the label name and the range of each record to split. This function performs the separation of features in x_train and y_train, dividing each 80 time period into a set of data.


def segments(df, time_steps, step, label_name):N_FEATURES = 3segments = []labels = []for i in range(0, len(df) - time_steps, step):xs = df['X'].values[i:i+time_steps]ys = df['Y'].values[i:i+time_steps]zs = df['Z'].values[i:i+time_steps]label = mode(df[label_name][i:i+time_steps])[0][0]segments.append([xs, ys, zs])labels.append(label)reshaped_segments = np.asarray(segments, dtype=np.float32).reshape(-1, time_steps, N_FEATURES)labels = np.asarray(labels)return reshaped_segments, labelsTIME_PERIOD = 80STEP_DISTANCE = 40LABEL = 'activityEncode'x_train, y_train = segments(df_train, TIME_PERIOD, STEP_DISTANCE, LABEL)

In this way, the x_train and y_train shapes become:


print('x_train shape:', x_train.shape)print('Training samples:', x_train.shape[0])print('y_train shape:', y_train.shape)x_train shape: (20334, 80, 3)Training samples: 20334y_train shape: (20334,)

It also stores some data for later use: the time period (time_period), the number of sensors (sensors), and the number of classes (num_classes).

time_period, sensors = x_train.shape[1], x_train.shape[2]num_classes = label_encode.classes_.sizeprint(list(label_encode.classes_))['Downstairs', 'Jogging', 'Sitting', 'Standing', 'Upstairs', 'Walking']

Finally it needs to be converted to a list using Reshape as input to keras:


input_shape = time_period * sensorsx_train = x_train.reshape(x_train.shape[0], input_shape)print("Input Shape: ", input_shape)print("Input Data Shape: ", x_train.shape)Input Shape: 240Input Data Shape: (20334, 240)

Finally need to convert all data to float32.


x_train = x_train.astype('float32')y_train = y_train.astype('float32')

one-hot encoding This is the last step in data preprocessing, which we will perform by encoding the labels and storing them into y_train_hot.


y_train_hot = to_categorical(y_train, num_classes)print("y_train shape: ", y_train_hot.shape)y_train shape: (20334, 6)

Model



The model we use is a sequential model consisting of 8 layers. The first two layers of the model consist of LSTMs, each LSTM has 32 neurons, and the activation function used is Relu. Then there are convolutional layers for extracting spatial features. The LSTM output dimension needs to be changed at the connection of the two layers, because the output has 3 dimensions (number of samples, timestep, input dimension), while CNN requires 4 dimensions of input (number of samples, 1, timestep, input). The first CNN layer has 64 neurons and the other has 128 neurons. Between the first and second CNN layers, we have a max pooling layer to perform the downsampling operation. Then a global average pooling (GAP) layer converts the multi-D feature maps into 1-D feature vectors, which reduces the global model parameters since no parameters are needed in this layer. Then there is the BN layer, which contributes to the convergence of the model. The last layer is the output layer of the model, which is just a fully connected layer with 6 neurons of the SoftMax classifier layer representing the probability of the current class.


model = Sequential()model.add(LSTM(32, return_sequences=True, input_shape=(input_shape,1), activation='relu'))model.add(LSTM(32,return_sequences=True, activation='relu'))model.add(Reshape((1, 240, 32)))model.add(Conv1D(filters=64,kernel_size=2, activation='relu', strides=2))model.add(Reshape((120, 64)))model.add(MaxPool1D(pool_size=4, padding='same'))model.add(Conv1D(filters=192, kernel_size=2, activation='relu', strides=1))model.add(Reshape((29, 192)))model.add(GlobalAveragePooling1D())model.add(BatchNormalization(epsilon=1e-06))model.add(Dense(6))model.add(Activation('softmax'))print(model.summary())

training and results

After training, the model gave 98.02% accuracy and 0.0058 loss. The training F1 score is 0.96.

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])history = model.fit(x_train,y_train_hot,batch_size= 192,epochs=100)



Visualize training accuracy and loss change graphs.


plt.figure(figsize=(6, 4))plt.plot(history.history['accuracy'], 'r', label='Accuracy of training data')plt.plot(history.history['loss'], 'r--', label='Loss of training data')plt.title('Model Accuracy and Loss')plt.ylabel('Accuracy and Loss')plt.xlabel('Training Epoch')plt.ylim(0)plt.legend()plt.show()y_pred_train = model.predict(x_train)max_y_pred_train = np.argmax(y_pred_train, axis=1)print(classification_report(y_train, max_y_pred_train))



Test it on the test dataset, but you need to do the same preprocessing on the test set before passing it.


df_test['X'] = (df_test['X']-df_test['X'].min())/(df_test['X'].max()-df_test['X'].min())df_test['Y'] = (df_test['Y']-df_test['Y'].min())/(df_test['Y'].max()-df_test['Y'].min())df_test['Z'] = (df_test['Z']-df_test['Z'].min())/(df_test['Z'].max()-df_test['Z'].min())x_test, y_test = segments(df_test,TIME_PERIOD,STEP_DISTANCE,LABEL)x_test = x_test.reshape(x_test.shape[0], input_shape)x_test = x_test.astype('float32')y_test = y_test.astype('float32')y_test = to_categorical(y_test, num_classes)

After evaluating our test dataset, we got 89.14% accuracy and 0.4647 loss. The F1 test score was 0.89.


score = model.evaluate(x_test, y_test)print("Accuracy:", score[1])print("Loss:", score[0])



Plot the confusion matrix below to better understand the predictions on the test dataset.


predictions = model.predict(x_test)predictions = np.argmax(predictions, axis=1)y_test_pred = np.argmax(y_test, axis=1)cm = confusion_matrix(y_test_pred, predictions)cm_disp = ConfusionMatrixDisplay(confusion_matrix= cm)cm_disp.plot()plt.show()

A classification report of the model evaluated on the test dataset is also available.

print(classification_report(y_test_pred, predictions))

Summarize

The performance of the LSTM-CNN model is much better than any other machine learning model. The code for this article can be found on GitHub. https://github.com/Tanny1810/Human-Activity-Recognition-LSTM-CNN You can try to implement it yourself to improve the F1 score by optimizing the model. Another: This model is from the paper LSTM-CNN Architecture for Human Activity Recognition published in the IEEE journal by Xia Kun, Huang Jianguang, and Hanyu Wang. https://ieeexplore.ieee.org/abstract/document/9043535Reviewing Editor: Li Qian

Leave a Reply

Your email address will not be published.