Greenfinger Framework

A high-performance distributed web crawling framework based on SpringBoot framework. It provides rich APIs to customize business and easily embedded your system.

Compatibility

Jdk8 (or later)
Netty 4.x (or later)
SpringBoot Framework 2.2.x (or later)
Redis 3.x (or later)
MySQL 5.x (or later)
ElasticSearch 6.8.6

Install

<dependency>
    <groupId>com.github.paganini2008.atlantis</groupId>
    <artifactId>greenfinger-spring-boot-starter</artifactId>
    <version>1.0-RC1</version>
</dependency>

Features

Perfect compatibility with SpringBoot framework
Support universal and vertical Crawlers
Adopt depth-first crawling strategy
Support multiprocessing crawling, which is easy to scale horizontally
Support full index and incremental index
Support update indexes by task scheduling
Support mainstream web page parsing technologies (like htmlunit, selenium)
Support removing duplicate URLs
Support multiple conditional interrupt strategies or customizing conditional interrupt strategies
Support index query with multiple version

Core API

CrawlerLauncher

PathFilterFactory

CrawlerHandler

ResourceManager

PageExtractor

Condition

PathAcceptor

Quick Start

Step 1:

You need to get the latest code of GreenFinger project.
git clone https://github.com/paganini2008/greenfinger.git

Step 2:

cd greenfinger/greenfinger-console/run

You can see:

run
├── config
│   ├── application-dev.properties
│   └── application.properties
├── db
│   └── crawler.sql
├── images
├── lib
├── greenfinger-console-1.0.1.jar
└── logs

Step 3:

open application-dev.properties then to modify the Jdbc Configuration, Redis Configuration and Elasticsearch Configuration Other configuration has better keep default settings when you start the application on first time.

Step 4:

execute 'java -jar greenfinger-console-1.0.1.jar --server-port=21212' and start the Greenfinger Console application. Default http port is 21212. You can also start multiple applications of Greenfinger-Console to form a cluster and then work together.

Step 5:

Open your browser: http://localhost:21212/atlantis/greenfinger/catalog/

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
greenfinger-console		greenfinger-console
greenfinger-spring-boot-starter		greenfinger-spring-boot-starter
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Greenfinger Framework

Compatibility

Install

Features

Core API

Quick Start

About

Releases

Packages

Languages

License

paganini2008/greenfinger

Folders and files

Latest commit

History

Repository files navigation

Greenfinger Framework

Compatibility

Install

Features

Core API

Quick Start

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages