-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathdesign.html
executable file
·143 lines (134 loc) · 9.82 KB
/
design.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
<!DOCTYPE html>
<html>
<head>
<title> CS 499 Group 7: Supreme Court </title>
<meta charset="UTF-8">
<!-- Put JS headers here -->
<!-- CSS Stylesheets -->
<link rel="stylesheet" href="main.css" type="text/css">
</head>
<body>
<div id="design-page" class="container">
<div id="design-page-header" class="header">
<h1>Project Design</h1>
</div>
<br><br><br>
<div class="content">
<div class="links">
<a href="index.html">Home</a>
<br><br>
<a href="introduction.html">Introduction</a>
<br><br>
<a href="requirements.html">Requirements</a>
<br><br>
<a href="updates.html">Updates</a>
<br><br>
<a href="schedule.html">Schedule</a>
<br><br>
<a href="design.html">Design</a>
<br><br>
<a href="testing.html">Testing</a>
<br><br>
<a href="useCases.html"> Use Cases</a>
<br><br>
<a href="designAndImplementation.html">Design Considerations/Implementation Issues</a>
<br><br>
<a href="enhancementsMaintenance.html">Future Enhancements/Maintenance</a>
<br><br>
<a href="conclusions.html">Conclusions</a>
<br><br>
<a href="installation.html">Installation</a>
<br><br>
<a href="references.html"> References></a>
<br><br>
</div>
<div class="text">
<h2>Overview</h2>
<p>The requirements for our project are as follows:</p>
<ul>
<li>Develop a web application written in html javascript that'll have search functionality via keywords. It only goes through articles we've determined are relevant to the Supreme Court.</li>
<li>Each article entry inside the database will have columns of data associated with different attributes such as: original link and source information; links to images; text analysis attributes including word count, readability score, sentiment score.</li>
<li>The above mentioned attributes will be exportable. This will be achieved by implementing a function to output a .csv file with those attributes.</li>
<li>We will also implement a way to routinely fetch news articles from various news sources. The fetching is done through RSS feeds and NEWSAPI. We've acquired a free API key from NEWSAPI and we intend to implement it's function of searching for news articles that mention a specific topic or keyword. The API has a distinct endpoint which will be especially helpful for us; it is the Everything endpoint, a url (https://newsapi.org/v2/everything) that indexes articles from over 5,000 sources from the past 3 months that have and haven't made breaking news.</li>
<li>The UI of the application is part of one of the three layers present in the web application. In the visual layer, the UI will present a searchbar with options to set start and end dates for the query. This is similar to the SCOOP project. Once articles are retrieved and presented to the user, there will be a representation with the attributes of the article, mentioned in the second bullet point of this list.</li>
<li>The other two layers have been introduced already; the data layer corresponds to the database, and the external layer contains the article fetching methods.</li>
<li>One of our ultimate goals will be to contrive a method to analyze images of articles to identify association with the Supreme Court, using relevant articles' images as a learning tool for future articles tested. This action occurs when determining whether or not to add an article into the database; essentially, we're adding a feature to the learning algorithm, where instead of just using text, images are included.</li>
</ul>
<h2>Environment</h2>
<p>Our cutomers expressed to us that our web application needs to be compatible in both Windows and on macOS. Our web application will be written in html javascript [version 1.8.2] with some PHP [version 7.2], which is certainly compatible in both operating systems; it'll be the webpage which the users will access. It's more a question of browser compatibility, but our cutomers didn't mention any browser requirements.</p>
<p>The NEWSAPI [version 0.1.1] functions will be implemented in Python [version 3], and it is simply a choice of preference because it's a language we're all comfortable writing in. However, the main reason we are using Python is becaues of a library called Newspaper3k [version 0.2.6], which includes functions to extract plaintext and metadata from news articles. This is how we will fill the attributes inside the database.</p>
<p>The database will be produced in the MySQL relational database management system. It is also compatible with Windows and macOS.</p>
<h2>Modules</h2>
<ol><li>There are four major portions to the project that could be considered modules:
<ul><li>Article Collector
<ul>
<li>Sends Requests for information NEWSAPI
<li>Sends GET Requests to News websites to get articles
<li>Sends requests to RSS feeds for latest feeds
<li>Uses Newpaper 3k library to extract text, images, and metadata from articles
<li>Sends article text to classifier to determine relevancy
</ul>
<li>Article Classifier
<ul>
<li>Receives text of given article
<li>Classifies articles as relevant or not (Simple Boolean decision)
</ul>
<li>Database and associated clients that read/write to it
<ul>
<li>Database set up with ER diagram below
<li>Also includes client which allows regularly running script to save articles and metadata to database
<li>Client also saves images
</ul>
<li>User Interface
<ul><li>Queries the database to display relevant results
<li>Allows user to set date constraints on searches
<li>Allows user to see associated images
<li>Allows user to download list of results as csv
</ul>
</ul>
</ol>
<p>These are the four high level components of the project. To give a brief summary, the Article Collector is responsible for collecting articles. The classifier determines if these are worthy of being saved into the database.
The database and associated client allows for articles deemed relevant to be saved to the database and organized in a orderly fashion. This client will also save article images in a directory on the server for easy retrieval. The user interface is then responsible for querying the database and displaying information to users in a presentable format.</p>
<h3>Collector/Classifier Script Overview</h3>
<img src="callResponseDiagram.jpg">
<p>The above diagram helps to describe the script that handles the collection, classification, and saving of articles. This script will run on a regular basis, and will have multiple sources of articles. First, the script requests articles from various sources, like NEWSAPI, RSS feeds, and Sections of news websites dedicated to the supreme court.
For the first two, no additional work is necessary to get links, texts, and images associated with articles. For the last source, the sections of websites, the script will search for certain html tags in the page that correspond to the article links being shown. Once this is accomplished, the text of the articles is then downloaded, and the text of articles will be sent to the classifier filter out irrelevant results. Once this filtering is done, the list of remaining articles will be saved to the database through a database client that will have functions for easily saving article data and associated metadata, like source, author, date, and date posted. This is a good overview of how the collection process will work.</p>
<br>
<h3>Database Overview</h3>
<img src="erDiagram.jpg">
<p>Above is an ER diagram of the database for article collection. The database is relatively simple, just saving articles and information about associated images. If time permits, this can easily be extended to support saving image and text analysis. Analysis tables can simply be added to reference the article id and image id for text and image analysis, respectively.</p>
<h3>User Interface</h3>
<p>A large portion of the user interface will be adapted from the previous group's user interface, so the design for this involves ensuring that we make our database compatible with the current user interface of the previous project.</p>
<h2>Screenshots</h2>
<p>These shots are of the current project, showing the user interfaces for searching the database and viewing individual articles:</p>
<img src="Capture1.jpeg" alt="Top of search page" width=100%>
<p>This is the top of the search page</p>
<hr>
<img src="Capture2.jpeg" alt="Result page" width=100%>
<p>The article result page</p>
<br>
<h2>Use Cases</h2>
<ol type=A>
<li><b>Browsing available articles</b>
<ol>
<li><p>User pulls up the website, and available articles sorted by date are already available</p>
<li><p>User can change 'Sort By' to sort by different heuristics</p>
<li><p>User can change page number to look at older articles</p>
</ol>
<li><b>Searching for a particular article</b>
<ol>
<li><p>User pulls up the website</p>
<li><p>User types a query into the search bar, and may input additional filters to search by</p>
<li><p>The database processed their query to search through the database for relevant articles</p>
<li><p>The user's results are returned and displayed to them</p>
</ol>
<li><b>Downloading search results</b>
<ol>
<li><p>User pulls up the website and searches for articles using process from B</p>
<li><p>User clicks the highly visible "Download Results" button</p>
<li><p>A comma separated list of the query results is created and made available to download to the user</p>
</div>
</div>
</div>
</body>
</html>