- Daniel Jennings (23064976)
- Izzy Scott (23105336)
- Elijah Mullens (23335907)
- Joel Willoughby (23135002)
- Mitchell Otley (23475725)
Below is a description of our implementation of the project specification to design security solutions for the RapidoBank Filesystem.
This report is separated into each section of the project specification:
The Yara Engine was designed to be as verbose as necessary mainly due to the fact that false positives are always easier to handle than false negatives. It runs over each Yara rule for each file as required, and if a Yara file returns a hit, it will send the required data to VirusTotal for a malicious report. Printing out the needed information to decide if the file is malicious or not.
As mentioned before files are scanned as required, designed according to the Project sheet. We will run through an example below:
File name: .hiddensensitivefile.txt
Last Name: Willoughby
Phone Number: 0496496496
Bank Number: 1234567
Due to this file being hidden on linux, it will be discovered as such by our is_hidden function:
#inside yaraengine.py -> is_hidden
return os.path.basename(file_path).startswith(".")
Once a hidden file is detected our sensitive information Yara Rule runs over it:
#inside sensitiveinfo.yara -> personal_data
$first_name = {(46 | 66) 69 72 73 74 ?? (4e | 6e) 61 6d 65}
$last_name = {(4c | 6c) 61 73 74 ?? (4e | 6e) 61 6d 65}
This will return true - however due to the file containing sensitive information it will not be sent for scanning.
#inside yaraengine.py -> run_scans
if is_hidden(file_path):
# Not sending file to be scanned if it contains sensitive information
scan_file(file_path, "SENSINFO_YARA")
Note: If the document is then caught on another Yara Rule - it will infact be sent for scanning. This is to stop attackers from hiding malicious files as sensitive data.
In this section we will run over every Yara rule.
- malURL.yara
Very simple yara rule - detects a URL - that is its only functionality
$url_pattern = /(http(s)?:\/\/)?[-a-zA-Z0-9@:%._+~#?&\/=]{2,256}.[a-z]{2,6}\b([-a-zA-Z0-9@:%._+~#?&\/=]*)/
- malware.yara
Has two checks; does it have high entropy, and does it contain base64 content.
The reasoning behind this is: encrypted files tend to be very random leading to high entropy.
Encrypted files are also typically base64 encoded, so that they can be transferred over the network without the potential of loss of data (as base64 encodes the data into binary data representable by an ASCII string). - netresource.yara
This yara rule is too large to mention everything - however it does check the following:
HTTP Requests
Network System Calls
Basic File Access Calls
Network DLL's
DNS Calls
The reasoning behind these is that quite often scripts will be accessing one of these network resources - and as per project requirements we MUST detect network activity.
We did have a look at a Yara import calledcuckoo
however that has since been deprecated and so we decided against supporting it. - scripts.yara
This yara rule looks for common scripting languages and their structure, for example:
$ps_function = /function\s+\w+\s*{/ // Powershell function
This detects the common function structure of a powershell file - which is almost required in any complex powershell script.
- sensitiveinfo.yara
This detects common sensitive information to be found in a banking files - which was explained above. For example:
#inside sensitiveinfo.yara -> personal_data
$first_name = {(46 | 66) 69 72 73 74 ?? (4e | 6e) 61 6d 65}
$last_name = {(4c | 6c) 61 73 74 ?? (4e | 6e) 61 6d 65}
For example this regex detects the string "first name" within a file - with F and N and the space all being non-determinate (f and be capatial or non-capital). Sensitive information also detects Australian bank details, IBAN number and Date of Birth.
Using the VirusTotal API, all potentially malicious files and URLs detected by the Yara Engine are sent to be scanned via a request to the VirusTotal API. The engine will save the scan results to a .json file named based on the MD5 Hash of the file. This allows any overseer to further investigate scan results if needed (beyond the handling performed by the MTD). Using the MD5 hash as the naming scheme ensures that each newly scanned file is saved separately.
The RB-Cyrpt.py script is designed to be a diverse command line tool to encrypt, decrypt, and hash the requested file. The CLI tool utilises a set of flags to differentiate what is happening. An example of how to use the tool is as follows;
Say we have a file testfile that we want to encrypt and hash simultaneously. The encryption algorithm we wish to use is Quagmire 3 while the hashing algorithm is SHA256. We don't have a key for the encryption, instead we want one generated. The CLI input for this will be:
Security/cryptography/RB-Crypt.py -i testfile -e quagmire -h sha256 --key-gen
This will encrypt the file provided by the -i
flag, then produce a hash of
that file stored in testfile.hash
. The key used for the encryption will also
be stored in testfile.key
.
It should be noted that the above script path is the relative path from the rapidobank filesystem root, any attempts to run it should modify the relative/ absolute path accordingly
A --help
flag is available to describe how to use the CLI tool.
Additionally, the MTD interfaces with the Cipher and Hashing systems through the classes outlined in cryptoclasses.py, which allows for the creation of Cipher
or Hash
instances. As each cipher implemented can use the same key for encryption, each instance of Cipher
has an allocated key that can be used for symmetric encryption or decryption of files.
There are 3 ciphers available for the RB-Crypt.py script, which follow the the
-e
or -d
flags. The text which follows the flags is case agnostic, however
it must be one of the provided ciphers specified below:
- Vigenere: called with
vigenere
- XOR: called with
xor
- Quagmire 3: called with
quagmire
the --key-gen
flag is also available to randomly generate a cipher key of
length 50 and store it in a file called <filename>.key
. Alternatively, if
key-gen is called without any other args it will output the random key to
stdout
. The content of the key can include any character considered to be printable ASCII by Python's string module. This ensures that the generated key can be displayed properly to the user if necessary.
Each cipher is encoded in ascii encoding, followed by base64 encoding, before being returned. Likewise, the decryption algorithms reverse this process before decrypting to plaintext. This design implementation was chosen such that the encrypted plaintext could be stored or transferred in environments that are restricted to ASCII data, as per RFC4648. This practice ensures no data is lost in the event that the encrypted data were needed to be transferred to a legacy system.
The implemented Vigenére cipher is a polyalphabetic cipher. Our implementation uses all ASCII printable characters, as opposed to a traditional Vigenére cipher that uses the A-Z uppercase alphabet. This decision was made to increase ciphertext obscurity and to ensure plaintext maintains it's formatting when decrypted.
(e.g. a
won't get changed to A
)
Vigenére Cipher example with an alphabet of A-Z (source)
Example of our Vigenére Cipher Implementation
The XOR cipher works by calculating the XOR value of the plaintext character with a corresponding character in the key, and the XOR value is used as the ciphertext character. As the key is 50 characters, a modulo operation is used such that every 50th plaintext character is XOR’d with the same key character.
XOR Cipher example (source)
Example of our XOR Cipher Implementation
The Quagmire 3 cipher is one of four iterations of a Quagmire periodic cipher. They are cimilar in principle to a Vignere cipher, except they utilise a keyed alphabet derived from the key. The initial keyed alphabet is decided based on the first appearance of each character in the key, followed by the remaining unused printable ASCII characters.
Next, we utilise a keyword to formulate multiple keyed alphabets from this initial alphabet. The number of keyed alphabets used is equal to the number of letters in the keyword. So if the keyword is CIPHER, as used in our implementation, 6 keyed alphabets are generated. The variation in the keyed alphabets comes from shifting each alphabet across such that each letter of the keyword aligns with the indicator (the character ‘A’ in our implementation).
As an example, take the use of the alphabet A-Z, and the key PASSWORD. The initial keyed alphabet would be:
PASWORDBCEFGHIJKLMNQTUVXYZ
We then generate the keyed alphabet table:
PASWORDBCEFGHIJKLMNQTUVXYZ (Initial Alphabet)
BCEFGHIJKLMNQTUVXYZPASWORD (Alphabet 1)
HIJKLMNQTUVXYZPASWORDBCEFG (Alphabet 2)
ZPASWORDBCEFGHIJKLMNQTUVXY (Alphabet 3)
GHIJKLMNQTUVXYZPASWORDBCEF (Alphabet 4)
CEFGHIJKLMNQTUVXYZPASWORDB (Alphabet 5)
ORDBCEFGHIJKLMNQTUVXYZPASW (Alphabet 6)
In Quagmire 1, the standard ordered alphabet A-Z is rotated to create the table. Quagmire 3 adds a layer of complexity by using the keyed alphabet not only as the initial alphabet, but also the rotated alphabet to create the list of alphabets. To calculate the ciphertext, the plaintext is split into groups of strings equal to the length of the keyword (e.g. groups of strings of length 6 with the keyword CIPHER).
The ciphertext character is calculated by following the process:
- Get the index of the character in the initial alphabet
- In the assigned alphabet from the list of keyed alphabets (e.g. letter 1 in the group is assigned Alphabet 1), select the character in the same index position
This process is repeated for each character of the plaintext. Note that by changing the key, length of the keyword, letters of the keyword, or the indicator, will all alter the output of the algorithm.
Example generation of a keyed alphabet (source)
Quagmire 3 Example (same source)
The hash function can be called in conjuction with or seperate from the cipher
functionality. When called with in conjuction with a cipher flag, the hashing
will take place either before a decryption or after an encryption has occured
on the file. The output file - specified either by -o
or the
ordinally second un-flagged text - is where the hash will be output to. This
can be mapped to stdout
with \&1
or if not output is provided, the output
file will be <filename>.hash
.
Four hash algorithms have been implemented for RB-Crypt's -h
flag, those
being:
- MD5: called with
md5
- xxHash: called with
xxhash
- MurmurHash: called with
murmur
- SHA256: called with
sha256
To stop unneeded repeated information, I will outline the following:
Due to these hashes only running to detect file changes, we typically do not need a cryptographically sound hash.
With that in mind:
xxhash was chosen due to it's incredibly fast speed - it is one of the fastest hashing algorithms available currently - whilst also having a relatively low collision rate.
murmurhash was chosen as it is also relatively incredibly fast - outpaced by xxhash - however faster than traditional cryptographically secure hashing algorithms.
SHA256 was included as a safe guard - it is heavy to run compared to the above two - however unlike the above two is cryptographically secure. This hash was included as a safe guard incase the MTD would run into potentially malicious files.
Speed for all of these can be found here:
https://github.com/Cyan4973/xxHash/blob/dev/README.md#benchmarks
Again the strings are case agnostic and the called with part describes that it should follow the -h.
overall the format for calling RB-Crypt, as specified by the --help
flag is
./RB-Crypt.py [options] [input file] [output file]
The MTD system comprises three primary running modes, documented in the included CLI and outlined below.
{normal,quarantiner,decryptor}
normal Start in Normal mode. Monitors directories for threats.
quarantiner Start in Quarantiner mode. Allows you to restore quarantined files, or delete them.
decryptor Start in Decryptor mode. Decrypts all encrypted files in the sensitive directories.
In normal operation, the MTD system runs in a continuous loop that scans monitored directories every 5 seconds once started until the operator stops the program. Being a headless program, its intended deployment is as a background service, where it can spawn on system startup.
To begin operation, at least one monitored directory and at least one sensitive directory must be passed as command line arguments.
usage: RapidoBank MTD System normal [-h] -m MONITORED [MONITORED ...] -s SENSITIVE [SENSITIVE ...] [-q [QUARANTINE]] [-y YARA_RULES [YARA_RULES ...]] [--whitelist [WHITELIST]]
[--malicious-threshold MALICIOUS_THRESHOLD]
options:
-h, --help show this help message and exit
-m MONITORED [MONITORED ...], --monitored MONITORED [MONITORED ...]
Monitored directories
-s SENSITIVE [SENSITIVE ...], --sensitive SENSITIVE [SENSITIVE ...]
Sensitive directories
-q [QUARANTINE], --quarantine [QUARANTINE]
Quarantine directory
-y YARA_RULES [YARA_RULES ...], --yara-rules YARA_RULES [YARA_RULES ...]
Path to additional YARA files directories
--whitelist [WHITELIST]
Path to directory where .whitelist file will be stored.
--malicious-threshold MALICIOUS_THRESHOLD
Threshold for the number of VirusTotal providers that must flag a file as malicious before quarantining action is taken.
Unencrypted files in the sensitive directories will be encrypted on start. The system will then begin scanning the monitored directories for files while violate the defined Yara rules.
Files in the monitored directory are presented to the Yara engine for scanning. Files which trigger an alert matching one or more of the specified Yara rules are hashed, this hash is then queried against VirusTotal's API to find any previously matched file uploads.
If any of these files have never been seen before we upload the file present on our system for scanning. The results from either the hash search or the file upload are checked for whether the number of security vendors that consider the file a vulnerability meets or exceeds the specified vulnerability threshold (optionally defined on program invocation, defaults to 5).
Files exceeding this safety threshold are then moved to a quarantine directory that only the user running the MTD system has access to (this is enforced by the system). Quarantined files have their permissions changed to be as restrictive as possible.
Files which trigger a Yara alert but do not exceed the specified malicious count threshold have their hashes added to an exempt list ([quarantine]/.whitelist
) so further alerts wont trigger a query against VirusTotal's API.
Files known to the MTD as safe have their hashes persisted inside a [quarantine]/.whitelist
file. Should a file be added, modified or deleted within a monitored directory its hash will not appear inside this whitelist file.
On the next system scan, newly added or modified files will themselves be scanned. Regardless of the scan outcome, the cipher system used to encrypt the contents of the sensitive directories is changed.
As per the project brief the MTD system will change the security settings of the encrypted sensitive directories after a periodic amount of time. The nature of these changes is random, and it is not guaranteed that the encryption cipher used will change. However, the encryption key is guaranteed to change.
In quarantiner mode, the system presents an interactive command line menu for viewing, restoring and deleting quarantined files.
usage: RapidoBank MTD System quarantiner [-h] [-q [QUARANTINE]] [--whitelist [WHITELIST]]
options:
-h, --help show this help message and exit
-q [QUARANTINE], --quarantine [QUARANTINE]
Quarantine directory
--whitelist [WHITELIST]
Path to directory where .whitelist file will be stored.
When a file is deemed dangerous by the Yara engine, it is placed in a segregated quarantine directory, and has its permissions stripped. Due to the possibility of false positives, we saw it prudent to implement a method of rectifying false positives.
Documented commands (type help <topic>):
========================================
delete help list quit restore
(quarantine)list
18c5140c131ad057b35ef4ba1fc8b7f9: /.../rapidobank/Monte's files/CEO's helpful scripts/networkpe.exe
(quarantine)restore 18c5140c131ad057b35ef4ba1fc8b7f9
18c5140c131ad057b35ef4ba1fc8b7f9-networkpe.exe
File restored
(quarantine)quit
When a file is restored, its hash is added to the system's [quarantine]/.whitelist
file, preventing it from being falsely flagged as malicious again.
By design of our MTD system, the sensitive directories persist as encrypted on disk even in the event the system is not running.
To decrypt the sensitive directories then, the system operator can run the MTD system in decryptor mode.
usage: RapidoBank MTD System decryptor [-h] -s SENSITIVE [SENSITIVE ...] [-q [QUARANTINE]] [--shuffle]
options:
-h, --help show this help message and exit
-s SENSITIVE [SENSITIVE ...], --sensitive SENSITIVE [SENSITIVE ...]
Sensitive directories
-q [QUARANTINE], --quarantine [QUARANTINE]
Quarantine directory
--shuffle Shuffle the cipher/key used for encryption. Will encrypt all files in sensitive directory.
By passing the directory/directories they wish to decrypt using the -s
tag, along with (optionally) the system's quarantine directory with -q
, the system will decrypt the supplied directories.
Additionally, the operator may invoke a manual shuffle of the encryption system by supplying the --shuffle
argument.
Our MTD system provides verbose logging out of the box. In the event an Alert is raised, a description of the alert is logged. As our MTD is designed to run as a headless application, these logs would likely be persisted by journalctl
or distro equivalent utility.
The Quarantining functionality of the MTD is designed to dynamically improve the security of the system. Instead of allowing a possible threat to exist until a log file is reviewed, the system will take immediate precautionary action to isolate the potential danger from the rest of the system. Only after a human operator reviews both the threat, and preferably the associated log information can the offending file be restored (or deleted, in the event of a valid detection).