LESA Crawler

A web crawler that uses the Elasticsearch, Kibana, Scrapy framework, Splash javascript rendering service on top of a Docker containerized application archtecture that aims to retrieve data from LESA tickets.

Getting Started

Since current LESA doesn't provide any sort of REST API to retrieve data from tickets, I've started developping a web crawler that aquire data through Xpath queries. All data retrieved is stored in an Elasticsearch index where it can be visualized through Kibana.

Prerequisites

To run this app you will need to install:

Docker (version 17.09.0+)
docker-compose (version 1.16.1+)

Configuration

Clone this project.
Replace the SCREEN_NAME and TIME_ZONE values with your LESA screen name and time zone:

# lesa-crawler/crawler/lesaticket/custom_settings.py
SCREEN_NAME = 'screen.name'
TIME_ZONE = '<your time zone>' # Where "+0000" means GMT time zone.

You can also change the start mark of date range and region of the query:

# lesa-crawler/crawler/lesaticket/custom_settings.py
START_MONTH = ...
START_DAY = ...
START_YEAR = ...
REGION_ID = ...

Encode your screen.name:password using a base64 enconder. (Your JIRA credentials).

# lesa-crawler/crawler/lesaticket/custom_settings.py
LIFERAY_ISSUES_AUTORIZATION_HEADER = {
    'Authorization': 'Basic c2NyZWVuLm5hbWU6cGFzc3dvcmQ='
}

Replace the SUPPORT_OFFICE value with yours:

# lesa-crawler/crawler/lesaticket/custom_settings.py
SUPPORT_OFFICE = '<your support office>'

Encode your email:password using a base64 enconder. (Your LESA site credentials).
Replace the authorization hash code with yours:

# lesa-crawler/crawler/lesaticket/settings.py
DEFAULT_REQUEST_HEADERS = {
   'Accept': 'text/html,application/xhtml+xml, ...',
   'Accept-Language': 'en',
   'Authorization': 'Basic c2NyZWVuLm5hbWVAbGlmZXJheS5jb20=',
}

Set the time zone of scrapyd and splash Dockerfiles.

Deployment

Run the following command to build the containers and startup the aplication.

$ docker-compose --file <path to>/lesa-crawler/docker-compose.yml up --build

Or go to lesa-crawler directory and just enter:

$ docker-compose up --build

When it finishes its initialization, you are able to access the following URLs:

http://localhost:9200 (user: elastic, password: changeme)
http://localhost:5601 (same as above)
http://localhost:6800 (The scrapyd web interface)
http://localhost:8050 (The splash web interface)

If don't want to wait the application start crawling, execute the following command:

$ curl http://localhost:6800/schedule.json -d project=default -d spider=ticket

Check out the dashboard sample by importing 'export.json' into Kibana.

Built With

Elasticsearch - An open-source full-text search and analytics engine.
Kibana - An open source visualization platform designed to work with Elasticsearch.
Scrapyd - A service to run Scrapy spiders.
Splash - Lightweight, scriptable browser as a service with an HTTP API.

Screenshots

Sample 1:

Sample 2:

Sample 3:

Sample 4:

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
crawler		crawler
elasticsearch		elasticsearch
kibana		kibana
scrapyd		scrapyd
screenshots		screenshots
splash		splash
.gitignore		.gitignore
README.md		README.md
clean_all.sh		clean_all.sh
docker-compose.yml		docker-compose.yml
export.json		export.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LESA Crawler

Getting Started

Prerequisites

Configuration

Deployment

Built With

Screenshots

About

Releases

Packages

Languages

walber/lesa-crawler

Folders and files

Latest commit

History

Repository files navigation

LESA Crawler

Getting Started

Prerequisites

Configuration

Deployment

Built With

Screenshots

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages