Skip to content

Shariat1994/Anomaly-Detection-Ushant-AIS

Repository files navigation

Anomaly Detection in trajectory data-SH

This project focuses on abnormal detection in Ushant trajectories using the GMM (Gaussian Mixture Model) for the classic method and Autoencoder for the deep learning-based method. It utilizes the Factory Design pattern for the creation of classic and deep models.

About anomaly detection

Anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well-defined notion of normal behavior.


Dataset:

Download Ushant dataset from: https://figshare.com/articles/dataset/Ushant_AIS_dataset/8966273

dataset corresponds to 6 months of AIS data of vessels steaming in the area of the Ushant traffic separation scheme (in Brittany, West of France). This is an area with one of the highest traffic density in the world, with a clear separation scheme with two navigation lanes. Different kinds of vessels are present in the area, from cargos and tankers with high velocity and straight routes to sailing boats or fishing vessels with low speed and different sailing directions. As such, the area is highly monitored to avoid collision or grounding, and a better analysis and understanding of the different ship behaviors is of prime importance.

The whole trajectory data set consists in 18,603 trajectories, gathering overall more than 7 millions GPS observations. Only trajectories having more than 30 points were kept, time lag between two consecutive observations ranges between 5 seconds and 15 hours, with 95% of time lags below 3 minutes.

This directory contains data presented in the article “Scalable clustering of {segmented} trajectories within a continuous time framework: application to maritime traffic data” by Gloaguen et al.

The data directory contains 18603 files, each of them being a trajectory.

where:

  • x is the longitude
  • y is the latitude
  • vx is the x-velocity
  • vy is the y-velocity
  • t the time since the beginning of the trajectory

2. classic method

The project implements the GMM algorithm for abnormal detection in Ushant trajectories

Gaussian Mixture Models (GMM) for Abnormal Detection

The Ushant Abnormal Detection system utilizes Gaussian Mixture Models (GMM) as one of the classic methods for detecting abnormalities in trajectory data. GMM is a probabilistic model that represents a probability distribution as a weighted sum of Gaussian component distributions.

How GMM Works

GMM assumes that the normal behavior of a system or dataset can be represented by a mixture of Gaussian distributions. These Gaussian components capture different patterns or clusters in the data. The model is trained using the available normal trajectory data.

During the training phase, the GMM learns the parameters of the Gaussian components, such as mean and covariance, to best fit the normal trajectory patterns. Once trained, the GMM can estimate the likelihood or probability of a new trajectory being generated by the learned mixture of Gaussian distributions.

Anomaly Detection with GMM

To detect anomalies using GMM, a threshold is set based on the likelihood or probability of the trajectory data. Trajectories with a likelihood below the threshold are considered abnormal, indicating a deviation from the learned normal behavior.

In our Ushant Abnormal Detection system, the threshold for anomaly detection is determined using the following formula:

threshold = mean(data_loss) + std(data_loss)

The data_loss represents the difference between the actual trajectory data and the reconstructed trajectory obtained from the GMM model.

Advantages of GMM for Abnormal Detection

GMM can capture complex patterns and clusters in trajectory data, making it suitable for detecting various types of abnormalities. The probabilistic nature of GMM allows for flexible threshold setting, enabling the system to adapt to different datasets and abnormal patterns. GMM is computationally efficient and scalable, making it suitable for processing large-scale trajectory datasets.

Usage of GMM in the Ushant Abnormal Detection System

In the Ushant Abnormal Detection system, the GMM-based method is implemented using a Factory Design pattern. This design pattern allows for the creation of the classic anomaly detection model, which includes GMM, as well as the deep learning-based model using an autoencoder.

The trajectory data is preprocessed using MinMaxScaler to scale the features, and then split into training and validation sets using a train-test split of 0.2.

3. Deep Learning-based method

The project utilizes the Autoencoder model for deep learning-based abnormal detection.

Autoencoder for Deep Learning-based Abnormal Detection

In addition to the Gaussian Mixture Models (GMM) method, the Ushant Abnormal Detection system also utilizes Autoencoders as a deep learning-based approach for detecting abnormalities in trajectory data.

What is an Autoencoder?

An Autoencoder is a type of neural network architecture that is commonly used for unsupervised learning tasks, such as dimensionality reduction and data reconstruction. It consists of an encoder network that compresses the input data into a lower-dimensional representation, and a decoder network that reconstructs the original input data from the compressed representation.

How Autoencoders Work for Abnormal Detection

In the context of abnormal detection, Autoencoders are trained on normal trajectory data to learn the underlying patterns and structure. The model aims to reconstruct the input data as accurately as possible. During training, the Autoencoder learns to encode the normal trajectory patterns into a compressed representation and decode it back to reconstruct the original trajectories.

When presented with abnormal trajectory data, the reconstructed output from the Autoencoder will have a higher reconstruction error compared to normal data. This reconstruction error serves as a measure of abnormality, as abnormalities tend to result in higher deviations between the input and the reconstructed output.

Anomaly Detection with Autoencoders

To detect abnormalities using Autoencoders, a threshold is set based on the reconstruction error. Trajectories with a reconstruction error above the threshold are considered abnormal, indicating a deviation from the learned normal behavior.

In the Ushant Abnormal Detection system, the threshold for anomaly detection is determined using the formula:

threshold = mean(data_loss) + std(data_loss)

The data_loss represents the difference between the original trajectory data and the reconstructed trajectory obtained from the Autoencoder model.

Advantages of Autoencoders for Abnormal Detection

  • Autoencoders can capture intricate patterns and non-linear relationships in trajectory data, making them effective for detecting complex abnormalities.
  • Deep learning models, such as Autoencoders, have the ability to learn hierarchical representations, enabling them to capture both global and local patterns in the data.
  • Autoencoders can handle high-dimensional data, making them suitable for trajectory data with multiple features.

Usage of Autoencoders in the Ushant Abnormal Detection System

The Autoencoder model in the Ushant Abnormal Detection system is implemented using the Keras library, which provides a high-level interface for building and training neural networks. The model architecture consists of an encoder network followed by a decoder network, with a bottleneck layer in between representing the compressed representation.


4. Generate synthetic abnormal trajectories

with two types of abnormal behavior: location-based anomaly and velocity-based anomaly. synthetic abnormal trajectories are generated by doubling the maximum data for both location and velocity.


Factory Design Pattern

The Factory Design pattern is used for the creation of classic and deep models, providing flexibility and extensibility.


Data Scaling

The data is scaled using the MinMaxScaler() to ensure uniformity.


Train-Test Split

The data is split into training and validation sets using a 0.2 validation data ratio.


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published