forked from datumbox/datumbox-framework
-
Notifications
You must be signed in to change notification settings - Fork 0
/
TODO.txt
45 lines (30 loc) · 1.51 KB
/
TODO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
CODE IMPROVEMENTS
=================
- Improve Serialization by setting the serialVersionUID in every serializable class?
- Create better Exceptions and Exception messages.
- Add multithreading support.
- Check out the Sparse Matrices/Vectors in Apache Math3 library. Use them in GaussianDPMM, MultinomialDPMM and MatrixLinearRegression.
- Add the ability to call Machine Learning algorithms from command line like in Mahout.
DOCUMENTATION
=============
- Improve the code documentation.
- Write How-to blog post on building a Text Classification model.
NEW ALGORITHMS
==============
- Rewrite PCA to avoid using RealMatrix.
- Add regularization in the currenlty supported algorithms.
- Write a Mixture of Gaussians clustering method.
- Develop the FunkSVD and PLSI as probabilistic version of SVD.
- Include an anomaly detection algorithm.
- Add the ability to search through the configuration space and find the best performing algorithmic configuration.
CHECK OUT HUGE COLLECTION LIBS, DBS AND STORAGE:
================================================
- Java StoredMap + BerkeleyDB:
http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/collections/StoredMap.html
http://www.oracle.com/technetwork/database/berkeleydb/overview/index-093405.html
- Vanilla-java - HugeCollections:
https://code.google.com/p/vanilla-java/wiki/HugeCollections
- Fastutil:
http://fastutil.di.unimi.it/#install
- Joafip:
http://joafip.sourceforge.net/javadoc/net/sf/joafip/java/util/PHashMap.html