scrapelect

scrapelect is a web scraping language inspired by CSS that turns a web page into structured JSON data. Select elements with CSS selectors, apply filters to extract and modify the data you want from a web page, and get the output in a structured, machine-readable, interoperable format.

installation

Install the Rust toolchain. Using cargo, run:

$ cargo install scrapelect

to install the scrapelect interpreter.

usage

Write a scrapelect program into a .scrp file. Documentation for the language can be found in the scrapelect book.

A quick example, title.scrp, retrieves the title of a Wikipedia article:

title: .mw-page-title-main {
  content: $element | text();
};

Run the scrp with the URL of the web page to scrape:

$ scrapelect title.scrp "https://en.wikipedia.org/wiki/Cat"

It will output:

{
  "title": {
    "content": "Cat"
  }
}

documentation

The scrapelect book contains documentation on language concepts and how to write a scrapelect program.
Additionally, documentation for scrapelect's built-in filters is located at docs.rs
Developer-level documentation is also at docs.rs, but it is currently incomplete.

community

GitHub issues and discussions are great places to report bugs, request features, and get help using scrapelect
Also, consider submitting a pull request to contribute to the code or documentation.
See the contributing chapter of the scrapelect book for more information on contributing to scrapelect.

license

scrapelect is available under the MIT or Apache 2 licenses, at your option. Copies of these licenses are included at LICENSE-MIT and LICENSE-APACHE at the root directory.

scrapelect: scrape + select, also -lect

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.github/workflows		.github/workflows
doc		doc
examples		examples
filter-proc-macro		filter-proc-macro
filter-types		filter-types
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
grammar.txt		grammar.txt
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

scrapelect

installation

usage

documentation

community

license

About

Licenses found

Releases

Packages

Languages

License

Licenses found

suaviloquence/scrapelect

Folders and files

Latest commit

History

Repository files navigation

scrapelect

installation

usage

documentation

community

license

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages