From 490afd04ed64c8ccf60b0672aaebf9ad04348627 Mon Sep 17 00:00:00 2001 From: Serguei Mokhov Date: Thu, 10 Jun 2021 01:08:16 -0400 Subject: [PATCH] [README] make a quick edit pass over the READMEs --- README.md | 131 ++++++++++++++++++++++------------------------ samples/README.md | 23 ++------ 2 files changed, 69 insertions(+), 85 deletions(-) diff --git a/README.md b/README.md index 8d0a9db..e67c775 100644 --- a/README.md +++ b/README.md @@ -1,69 +1,50 @@ -## Motivation ## +# Knowledge Graph-based Recommendation System framework -A recommendation system is needed as long as there are users, but since users have few ratings on items, there will be problems such as data sparsity. This problem can be solved by adding the knowledge graph as side information, but the existing solution does not include the construction of the knowledge graph. By adding the construction of the knowledge graph can help us better manage the data. +This code supplement's [Yuhao Mao](https://github.com/myh1234567)'s master's thesis "A Framework Design For Integrating Knowledge Graphs into Recommendation Systems" work and the resulting publication(s). +The framework uses movies as an example and is generalizable into other media types. ----- +- Yuhao Mao, "A Framework Design For Integrating Knowledge Graphs into Recommendation Systems", Master's thesis, Concordia University, 2021 +- Yuhao Mao, Serguei A. Mokhov, Sudhir P. Mudur: +Application of Knowledge Graphs to Provide Side Information for Improved Recommendation Accuracy. CoRR [abs/2101.03054](https://arxiv.org/abs/2101.03054) (2021) +- Sudhir Mudur, Serguei Mokhov, and Yuhao Mao. 2021. A Framework for Enhancing Deep Learning Based Recommender Systems with Knowledge Graphs. In IDEAS 2021: 25th International Database Engineering Applications Symposium, July 14–16, 2021, Montreal, Canada. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/1122445.1122456 + +## Background ## + +### Motivation ### + +A recommendation system is needed as long as there are users, but since users have few ratings on items, there will be problems such as data sparsity. This problem can be solved by adding the knowledge graph as side information, but the existing solution does not include the construction of the knowledge graph. By adding the construction of the knowledge graph can help us better manage the data. -## What application need it: +### What type of applications may need it - Movie RS - Book RS - News RS - User RS ----- - -## Datasets - -1. https://grouplens.org/datasets/movielens/ - ----- +--- -## Evaluation +### Datasets -1. CTR (Click-Through-Rate) +- https://grouplens.org/datasets/movielens/ ----- +#### Evaluation -## References +- CTR (Click-Through-Rate) -1. Frame rate. https://en.wikipedia.org/wiki/Frame_rate. Accessed: 2019- 07-23. -2. Software framework. https://en.wikipedia.org/wiki/Software_framework. Accessed: 2019-06-22. -3. RDF OWL difference https://www.cambridgesemantics.com/blog/semantic-university/learn-owl-rdfs/rdfs-vs-owl/ -4. Owlready2 documentation https://pythonhosted.org/Owlready2/ -5. Py2neo documentation https://py2neo.org/2.0/ -6. Introduce to RS https://towardsdatascience.com/introduction-to-recommender-systems-6c66cf15ada -7. auc&acc https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc -8. https://zhuanlan.zhihu.com/p/54325231 -9. youtube: https://www.youtube.com/watch?v=BP0IZ1uyUDE -10. https://blog.csdn.net/dreamzuora/article/details/86543157 +#### Dataset field sources ----- - -## Future work -1. Java API wrapper -2. Support different machine learning backend -3. Support more storage methods and more input formats -4. More effective loss function -5. Full-platform support -6. Auto installation +1. `kg_final.txt`: now_movie_id, relation, xxx +2. `ratings_final.txt`: user_id, user_gender, user_age, user_job, new_movie_id, rating +--- ## Software Requirements 1. python3 -2. Neo4j "https://neo4j.com/download/?ref=try-neo4j-lp" - ---- - -## Dataset explaination +2. Neo4j: https://neo4j.com/download/?ref=try-neo4j-lp -1. kg_final.txt: now_movie_id, relation, xxx -2. ratings_final.txt: user_id, user_gender, user_age, user_job, new_movie_id, rating +### Library Requirements ---- - -## Library Requirements 1. rdflib. Version: 4.2.2 2. urllib.request. '3.7' 3. networkx. '2.4' @@ -83,8 +64,7 @@ A recommendation system is needed as long as there are users, but since users ha ### Installing on MacOS ### -``` -#!bash +```#!bash brew install python3 pip3 install rdflib pip3 install urllib.request @@ -102,13 +82,12 @@ pip3 install sklearn pip3 install linecache ``` -## Installing on EL7 +### Installing on EL7 1. Clone the repo 2. Install dependencies -``` -#!bash +```#!bash yum install python3 gcc python3-devel pip3 install requests pip3 install py2neo @@ -117,33 +96,51 @@ pip3 install pandas ... ``` -## Samples: +## Running -1. [Framework usage examples](https://bitbucket.org/iss-v2-proj/video-recommender-system/src/master/samples/README.md) ---- +### Samples -## Questions: +- [Framework usage examples](samples/README.md) -How to train a model: -In src/recommendation_system/ folder. and run main.py +### Questions / FAQ -What IDE did we use to develop code? -Recommend to use pycharm (any version). Or use text editing software such as vim. +- How to train a model: go to `src/recommendation_system/` folder and run main.py +- What IDE did we use to develop code? +Recommend to use PyCharm (any version). Or use any text editing software such as vim or VS Code. +- How to run from command line: `python3 xxxx.py` +- How to run from Google Colab? Upload all the files to colab, and click run. -How to run from command line: -python3 xxxx.py +Tested MacOS version: macOS Mojave 10.14.6 -How to run from Google Colab? -upload all the file to colab, and click run. +#### How to start -Tested MacOS version: Mac Mojava 10.14.6 +1. `cd ../web_crawler`, `python3 add_infos.py` to get the `kg_additional` file. It includes movie director, writer and stars information. +2. Start Neo4j desktop then `cd ../knowledge_graph` and run `python3 main.py`. It create all triples in Neo4j. +3. `cd ../recommendation_system/data_process` and run `triples2txt.py`, `ratings2txt.py`, `kg_final.py` to get `ratings_final.txt` and `kg_final.txt` +4. `cd ../recommendation_system` and run `main.py` -## How to start: +---- -1. cd ../web_crawler python3 add_infors.py to get kg_additional file. It includes diretor information, writer information and stars information. +## Future work / TODO -2. Start Neo4j desktop then cd ../knowledge_graph run python3 main.py. Create all triples in Neo4j. +1. Java API wrapper +2. Support different machine learning backends +3. Support more storage methods and more input formats +4. More effective loss function +5. Full-platform support +6. Auto installation + +---- -3. cd ../recommendation system/data_process run triples2txt.py ratings2txt.py kg_final.py to get ratings_final.txt kg_final.txt +## References -4. cd ../recommendation system run main.py +1. Frame rate. https://en.wikipedia.org/wiki/Frame_rate. Accessed: 2019- 07-23. +2. Software framework. https://en.wikipedia.org/wiki/Software_framework. Accessed: 2019-06-22. +3. RDF OWL difference https://www.cambridgesemantics.com/blog/semantic-university/learn-owl-rdfs/rdfs-vs-owl/ +4. Owlready2 documentation https://pythonhosted.org/Owlready2/ +5. Py2neo documentation https://py2neo.org/2.0/ +6. Introduce to RS https://towardsdatascience.com/introduction-to-recommender-systems-6c66cf15ada +7. auc&acc https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc +8. https://zhuanlan.zhihu.com/p/54325231 +9. youtube: https://www.youtube.com/watch?v=BP0IZ1uyUDE +10. https://blog.csdn.net/dreamzuora/article/details/86543157 diff --git a/samples/README.md b/samples/README.md index 5d23756..c4824b4 100644 --- a/samples/README.md +++ b/samples/README.md @@ -1,6 +1,5 @@ ## Framework usage examples -<<<<<<< HEAD - ``crawler_example.py`` -- an example of how to crawl an IMDB data source based on movie name. The example uses add_python.py to fetch the following movie information: director names, writer names, and star names and saves it in a CSV format file. Use this to call: ```#!bash @@ -25,30 +24,18 @@ Use this to call: python3 modify_node.py ``` - - ``neo4j_multilingual.py``-- an example of Neo4j format supports multiple languages. Use this to call: ```#!bash python3 neo4j_multilingual.py ``` -======= -- ``crawler_example.py`` -- an example of how to crawl an IMDB data source based on movie name. The example uses ``add_python.py`` to fetch the following movie information: director names, writer names, and star names and saves it in a CSV format file. - -Use this to call: ``python3 crawler_example.py`` - -- ``get_alltriples.py`` -- an example of how to get all triples from Neo4j format or RDF format. The example uses ``get_triples_neo4j.py`` and get_triples_rdf.py to fetch the triple informations. - -Use this to call: ``python3 get_alltriples.py`` - -- ``kg_examples.py`` -- an example of how to add new triples to Neo4j format. The example uses ``add_triples_neo4j.py`` to add new triple informations. -Use this to call: ``python3 kg_examples.py`` +- ``crawler_example.py`` -- an example of how to crawl an IMDB data source based on movie name. The example uses ``add_python.py`` to fetch the following movie information: director names, writer names, and star names and saves it in a CSV format file. Use this to call: ``python3 crawler_example.py`` -- ``modify_node.py`` -- an example of how to add relations or delete a node from Neo4j format. The example uses ``modifiy_information()`` and ``query_delet_node()`` function to modify nodes. +- ``get_alltriples.py`` -- an example of how to get all triples from Neo4j format or RDF format. The example uses ``get_triples_neo4j.py`` and `get_triples_rdf.py` to fetch the triple informations. Use this to call: ``python3 get_alltriples.py`` -Use this to call: ``python3 modify_node.py`` +- ``kg_examples.py`` -- an example of how to add new triples to Neo4j format. The example uses ``add_triples_neo4j.py`` to add new triple informations. Use this to call: ``python3 kg_examples.py`` -- ``neo4j_multilingual.py`` -- an example of Neo4j format supports multiple languages. +- ``modify_node.py`` -- an example of how to add relations or delete a node from Neo4j format. The example uses ``modifiy_information()`` and ``query_delet_node()`` function to modify nodes. Use this to call: ``python3 modify_node.py`` -Use this to call: ``python3 neo4j_multilingual.py`` ->>>>>>> 49153a960748478d026582290841297130864eda +- ``neo4j_multilingual.py`` -- an example of Neo4j format supports multiple languages. Use this to call: ``python3 neo4j_multilingual.py``