Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finished Assignment 1 #35

Open
wants to merge 11 commits into
base: gh-pages
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
662 changes: 662 additions & 0 deletions projects/3-mazes/Chappidi.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions projects/5-emojiworld/Chappidi.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"meta":{"description":"When making your sim, remember the KISS Principle: Keep It Simple, Sucka.\n\nAt the bottom of this sidebar, you can SAVE and SHARE your sim. Have fun! 😘","draw":4,"fps":12,"play":true},"states":[{"actions":[{"actions":[{"stateID":1,"type":"go_to_state","actions":[]}],"probability":0.001,"type":"if_random"}],"description":"Grow Wheat","icon":"","id":0,"name":"empty spot"},{"actions":[{"actions":[{"stateID":"2","type":"go_to_state","actions":[]}],"probability":0.002,"type":"if_random"}],"description":"Grow Tomato","icon":"🌾","id":1,"name":"wheat "},{"actions":[{"actions":[{"stateID":"3","type":"go_to_state","actions":[]}],"num":1,"sign":">=","stateID":"2","type":"if_neighbor"}],"description":"Turn Tomato and Wheat into Spaghetti","icon":"🍅","id":2,"name":"tomatoes"},{"actions":[{"actions":[{"stateID":"4","type":"go_to_state","actions":[]}],"probability":0.03,"type":"if_random"},{"actions":[{"stateID":"4","type":"go_to_state","actions":[]}],"num":1,"sign":">=","stateID":"4","type":"if_neighbor"}],"description":"Humans eat lots of spaghetti","icon":"🍝","id":3,"name":"[new thing]"},{"actions":[{"stateID":0,"type":"go_to_state","actions":[]}],"description":"Done Eating ","icon":"😋","id":4,"name":"Satisfied Human"}],"world":{"neighborhood":"moore","proportions":[{"stateID":0,"parts":100},{"stateID":1,"parts":0},{"stateID":2,"parts":0},{"stateID":3,"parts":0},{"stateID":4,"parts":0}],"size":{"height":33,"width":40},"update":"simultaneous"}}
352 changes: 352 additions & 0 deletions projects/6-perceptron/Chappidi.ipynb

Large diffs are not rendered by default.

101 changes: 101 additions & 0 deletions projects/Chappidi-CNN.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'os' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-1-2d7190b556ac>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mlabels_index\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m \u001b[0;31m# dictionary mapping label name to numeric id\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mlabels\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;31m# list of label ids\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0;32mfor\u001b[0m \u001b[0mname\u001b[0m \u001b[0;32min\u001b[0m \u001b[0msorted\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlistdir\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mTEXT_DATA_DIR\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0mpath\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mTEXT_DATA_DIR\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0misdir\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mNameError\u001b[0m: name 'os' is not defined"
]
}
],
"source": [
"texts = [] # list of text samples\n",
"labels_index = {} # dictionary mapping label name to numeric id\n",
"labels = [] # list of label ids\n",
"for name in sorted(os.listdir(TEXT_DATA_DIR)):\n",
" path = os.path.join(TEXT_DATA_DIR, name)\n",
" if os.path.isdir(path):\n",
" label_id = len(labels_index)\n",
" labels_index[name] = label_id\n",
" for fname in sorted(os.listdir(path)):\n",
" if fname.isdigit():\n",
" fpath = os.path.join(path, fname)\n",
" f = open(fpath)\n",
" texts.append(f.read())\n",
" f.close()\n",
" labels.append(label_id)\n",
"\n",
"print('Found %s texts.' % len(texts))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [Root]",
"language": "python",
"name": "Python [Root]"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
65 changes: 65 additions & 0 deletions projects/PAPER.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
Sentiment Analysis

1. Background
The exponential growth of users on social media platforms inevitably generates vast amounts of public opinion, ranging from product reviews to concerns about political changes. Government bodies and companies can gain significant insight by analyzing the attitudes of these comments. This automated mining of text to deduce the attitude of the writer is known as sentiment analysis or opinion mining. Research in natural language processing has led to comparative studies on the specific approaches that are applied throughout sentiment analysis.

Often times in the online community, the online reputation of a product dictates its performance in the market. Companies can mine product reviews and forums to track the complaints of dissatisfied customers or to track the popular trends that people are inclined to be satisfied with. Sentiment analysis is often used to track the sentiment of bloggers because they play a large role in shifting the opinions of their readers. Governments and political parties search forums to see the shifts in public opinion towards a certain idea or political figure. This concept can also be used to track threats that are directed towards the government by analyzing text for large amounts of negative polarity. Another application that is being slowly implement can also be to search online discussion boards for those at risk of suicide by and severe depression. The applications of sentiment analysis are endless and are continuing to be implemented in various fields.

2. Branches
Sentiment analysis falls under the larger canopy of natural language processing, a branch of artificial intelligence. Natural language processing focuses on human and computer interaction and branches out into a variety of fields such as parsing, speech recognition, and other fields including sentiment analysis. Sentiment analysis itself, also branches into various subfields.

All of the sub branches of sentiment analysis all have specific tasks regarding sentiment. Subjectivity definition analyzes the text to see if the text contains any opinions. Aspect based sentiment summarization assigns star ratings or scores to a product based on the text it analyzes. Text summarization for opinions analyzes large segments of text and generates sentences to summarize it. Another branch known as contrastive viewpoint summarization emphasizes contradicting opinions apparent in the text. Product feature extraction specifically analyzes reviews and extracts the features of the product from the text. Opinion spam sifts through text to identify fake or bogus opinions. And lastly, there is sentiment prediction, which determines the polarity of the text. Often times sentiment prediction is referred to as sentiment analysis and there is no differentiation between the branches.

2.1 Approaches
Sentiment analysis or more specifically, “sentiment prediction”, groups the emotions of a text into three categories: positive, negative, or neutral. Classification can occur on three levels, document level, sentence level, or word/feature level. This classification correlates to how the input will be broken down. There are various approaches that can be taken to break down text but the three main families are lexical based, machine learning, and hybrid. This paper will expand individually on lexical and machine approaches.

3. Lexical
Lexical means relating to the words of a language, so the process relies on individual words of the input. This approach is able to cover a broader range of words because it has access to such a large network of words but also when using lexicons there are finite number words in the lexicon, which can also prove to be a disadvantage. Lexical approach is also unsupervised learning because there are no data sets involved.

The lexical approach is one broad format and there are two specific techniques that can be used called dictionary based or corpus based approach. In a basic lexical format there is a certain text inputted and each word is tokenized, meaning that each word is identified and given a token. Each token is then matched to a word in the dictionary and if the word is correlated to positive then a score is incremented, but if the word is negative then the score is decremented. At the end if the score is above a certain threshold then the overall polarity of the input is positive. But if the majority of the words registered as neutral than the input is registered as neutral overall.

3.1 Dictionary Based
The dictionary-based approach only affects the word bank in which the words are compared in the lexical approach. In the dictionary approach, an expert picks a group of opinion words, known as seeds, which are already tagged with polarities. Then the program uses Word Net to expand on the collection, by collecting synonyms and antonyms to the words, which correlate to the polarities of the word. Then it proceeds to iterate through all the words till it exhausts its options. Afterwards the same process of tokenizing, word matching to the new dictionary, and score incrementing occurs.

3.2 Corpus Based
The corpus based differs from the dictionary based because instead of a group of pre picked words, the word bank comes from a corpus with tagged sentiments. Corpus based has an advantage over dictionary based because it is better able to pick up the polarities of slang and online jargon, which the dictionary based approach would not be able to tackle because the dictionary only expands off of previously picked words.

It is important to realize that the dictionary and corpus based approaches only differ in the word bank that the lexical approach uses and the rest of the lexical process is the same for both techniques.

4. Machine Learning

The machine learning approach to sentiment analysis is often viewed as a more efficient because it can be trained to for specific purposes and domains. This approach requires two data sets, a training data set to learn as well as test data set to validate the performance of the algorithm. But unlike the lexicon approach is has a hard time being applied to new data because it needs to have a new set of training data. In machine learning the algorithm trains to find patterns that it can then apply to random data that it has never seen before. There are a variety of techniques used in machine learning for sentiment analysis such as: bayseian networks, support vector machines, neural networks, etc.

In sentiment analysis, especially for micro blogs such as twitter, the input data is pre-processed meaning the data is cleaned up. One step is called stop word removal, so small words or prepositions (the, and, before, while, so, etc.) are removed. Another stem called stemming takes all the different tenses of a word and condenses them to the root tense. Words such a walker or walking would be condensed to the word walk.

5. Conclusion

Sentiment Analysis is a constantly growing field because it is proven to be such a valuable technique to apply to user generated content. New techniques are being explored to generate more efficient sentiment analysis processes. This paper only expanded on one particular technique but there are many others to explore.













References

[1] Agarwal, Apoorv, Boyi Xie, Ilia Vovsha, Owen Rambow, and Rebecca Passonneau. "Sentiment Analysis Based on Dictionary Approach." (2016): 1-9. Columbia University. Web.

[2] Collomb, Anais, Crina Costea, Damien Joyeux, Omar Hasan, and Lionel Brunie. "A Study and Comparison of Sentiment Analysis Methods for Reputation Evaluation." (n.d.): 5-10. Web.

[3] Vohra, S.M., and J.B Teraiya. "A Comparative Study of Sentiment Analysis Techniques." Journal of Information, Knowledge, and Research in Computer Engineering 02.02 (2015): 313-17. Web.

[4] Thakkar, Harsh, and Dhiren Patel. "Approaches for Sentiment Analysis on Twitter: A State-of-Art Study." (n.d.): 1-6. Web.

[5] D'Andrea, Alessia, Fernando Ferri, and Patrizia Grifoni. "Approaches, Tools and Applications for Sentiment Analysis Implementation." Trans. Tiziana Guzzo. International Journal of Computer Applications 125.3 (2015): 26-33. Web.

[6] Bhonde, Reshma, Binita Bhagwat, Sayali Ingulkar, and Apeksha Pande. "Sentiment Analysis Based on Dictionary Approach." International Journal of Emerging Engineering Research and Technology 3.1 (2015): 51-54. Web.


13 changes: 13 additions & 0 deletions projects/class-projects/Chappidi.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Project Proposal

Group: Individual

Topic: Paper on Sentiment Analysis

Goal:

1. Discuss Sentiment Analysis in a Broad perspective 2. Detail and Compare Systems, Approaches, and Techniques used in Sentiment Analysis 3. Discuss why people use this (surveillance, customer service, etc. ), How it fits into NLP

How: Reading alot of papers? and reseraching

Halfway: Doing 1, 2, 3 of the goal