Skip to content

📱 📧 Demonstration project to showcase how big data and deep learning can be used to filter a stream of messages as either spam or not-spam messages.

License

Notifications You must be signed in to change notification settings

FredrikBakken/spam-filter-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spam Filter Demo

This is a simple demonstration project to showcase some of the interesting technologies and projects you might get to work on as a developer at Telenor.

Introduction

The purpose of this demonstration project is to illustrate how deep learning (DL) can be used to filter incoming SMS messages as either spam- or ham-messages. It uses an open-source dataset from Kaggle - SMS Spam Collection Dataset.

Requirements

The following set of requirements has to be fulfilled in order to run the example project:

Project Overview

In this project we will take a closer look at the process of developing a data streaming service application for filtering SMS messages as either spam or ham messages. Apache Kafka will be used as the dedicated streaming platform and Keras is used as the deep learning API for developing the prediction model. A chat messaging client is also developed and deployed for interactive testing.

The project consists of the following five modules:

  1. DL Analytics: Design and develop a deep learning prediction model.
  2. Model Service: Build the deep learning prediction API.
  3. Kafka Producer: Publish incoming SMS-messages to a Kafka topic.
  4. Kafka Filter Stream: Develop the streaming application for filtering incoming SMS-messages.
  5. Messaging Client: Develop and publish the messaging chat application for sending example messages.

Architecture

  • Green boxes: Kafka applications developed in Golang.
  • Blue boxes: Kafka topics.
  • Orange box: External prediction service API.

Applications

The applications in this project requires that the cp-all-in-one docker containers are running. These can be launched by the command docker-compose up -d.

DL Analytics

We start by examining the deep learning module as it will function as a service for the streaming application to filter the incoming SMS-messages.

Launch the Application

Run the following command within the /analytics-directory to access the notebook:

>> jupyter notebook

A new page should now open in your browser on http://localhost:8888. Go to the /notebooks directory and open the Deep Learning - Spam Message Classification.ipynb notebook.

Exploratory Data Analysis

When performing data analysis it is important to inspect the type of data that one is working with, from our dataset we can find the following:

v1,v2,,,
ham,"Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...",,,
ham,Ok lar... Joking wif u oni...,,,
spam,Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's,,,
ham,U dun say so early hor... U c already then say...,,,
...

From this data we find that it is structured into five columns, where only the first two columns includes information that is interesting for us.

Further examination of the data distribution shows us the difference between the number of ham and spam messages in the dataset:

It is also possible to investigate the message length of the incoming SMS-messages in the dataset and compare the differences in spam vs. not-spam:

Further more - an illustration of the word distribution for ham and spam messages is also performed:

Define the Model Architecture

For building the deep learning (DL) model the Keras API is used. A deep neural network (DNN) is an artificial neural network (ANN) with multiple hidden layers between the input layer and the output layer. These layers are fully connected and includes a weighted parameter which is adjusted during the training phase of the model. Once a deep learning model is trained it's architecture and trained weights can used to make predictions on new messages. Our deep learning model architecture is illustrated below:

Model Service

The model service application is designed to make our prediction model production ready by loading it into a REST-API and make predictions on incoming SMS-messages.

Launch the Application

Run the following commands within the /apps/trained-model-api-directory to download the dependencies and launch the application:

>> python3 -m pip install -r requirements.txt
>> flask run

The application will now be running locally on your machine (http://localhost:5000) and can be used by sending a POST-request to one of the following endpoints:

Message Example

Sending POST-requests to this model API can easily be done by using curl or Postman.

Example 1: Singular SMS-Message

POST-Request:

{
    "message": "Hi man, I was wondering if we can meet tomorrow."
}

Returned:

{
    "Message": "Hi man, I was wondering if we can meet tomorrow.",
    "Spam": false,
    "Confidence": "0.00%"
}

Example 2: Bulk of SMS-Messages

POST-Request:

{
    "messages": [
        {
            "message": "Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005"
        },
        {
            "message": "Hi man, I was wondering if we can meet tomorrow."
        }
    ]
}

Returned:

[
    {
        "Message": "Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005",
        "Spam": true,
        "Confidence": "56.01%"
    },
    {
        "Message": "Hi man, I was wondering if we can meet tomorrow.",
        "Spam": false,
        "Confidence": "0.00%"
    }
]

Kafka Producer

The incoming messages into the application will first arrive at the Kafka Producer application, called sms-event-integration. This application works as a REST-API with an endpoint for serving incoming messages. These messages are then structured into objects and published to a Kafka topic named new-sms-json-v1.

The application also pulls new messages from the Firebase - Cloud Firestore and pushes these messages to the new-sms-json-v1 topic, which is used as a client-based example application.

Launch the Application

Run the following command within the /apps/sms-event-integration-directory to launch the application:

>> go run app.go

Kafka Filter Stream

The filter stream application is a streaming application that listens for new messages to arrive into the new-sms-json-v1 topic, structures the messages into accepted JSON-format for the model API, sends POST-requests to the model API, and then pushes the messages to one of the following Kafka topics based on the predicted result:

  • safe-sms-json-v1
  • spam-sms-json-v1

In the client demonstration case, the results are also pushed to update the corresponding messages in Firebase - Cloud Firestore.

Launch the Application

Run the following command within the /apps/sms-filter-stream-directory to launch the application:

>> go run app.go

Messaging Client

The messaging client is a simple Flutter web chatroom application where clients can send messages. These messages are stored into the Firebase - Cloud Firestore and continously streamed by the web application. Any updates to the messages, such as categorization as spam or not-spam, will be accessible to the users using the chatroom.

Launch the Application

To run this application on your own machine, it requires that you have created a new Firebase project and updated the firebaseConfig variable found within the /client/web/index.html file with your Firebase project config. After this is done, the client application can be launched by running the following commands within the /client-directory:

>> flutter run -d chrome

Important to note is that if you want to run the entire project locally, it also requires you to add the serviceAccountKey.json file into the /config directory of both the sms-event-integration- and the sms-filter-stream application.

Any Questions?

Do you want a demonstration of the project or have any questions/issues, please feel free to create a New Issue on the Issues page and I will get back to you as soon as possible.