Skip to content

Commit

Permalink
Instantiate scraper as class (#43)
Browse files Browse the repository at this point in the history
* refactor: emitter & scrape()

* temp commit query broken

* sync start() working, added 'initialized' event

* fix type issue with transactions

* fix logger initialize issue

* style: remove it.only

* test: move logic inside it()

* refactor: query fn args

query function was creating a new db connection on every prepare(),
now that is not the case

* add active scraper lock

* style: fix lint & prettier complaints

* more strict typechecking

* add one off full package test

* ci: fix packaging test syntax

* website: fix npm audit security flaw

* docs: improve error message

* Duck typing scraper extends EventEmitter (#45)

* tests: passing

* tmp commit, check errors thrown in callbacks

* tmp commit, tests passing

* style: fix linter errors

* update package-lock.json, readme, remove scar code

* update integration test to use new api

* remove unnecessary test helpers

* cleanup scar code
  • Loading branch information
andykais authored Feb 7, 2020
1 parent c42bec2 commit 0fcdf02
Show file tree
Hide file tree
Showing 46 changed files with 8,174 additions and 8,544 deletions.
16 changes: 15 additions & 1 deletion .eslintrc.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,21 @@
"no-unreachable": "error",
"@typescript-eslint/no-unused-vars": ["error", { "varsIgnorePattern": "^_" }],
"@typescript-eslint/explicit-member-accessibility": "error",
"@typescript-eslint/member-ordering": "error",
"@typescript-eslint/member-ordering": [
"error",
{
"default": [
"static-field",
"instance-field",
"abstract-field",

"constructor",
"static-method",
"instance-method",
"abstract-method"
]
}
],
"no-only-tests/no-only-tests": "error"
}
}
8 changes: 8 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,14 @@ jobs:
with:
github-token: ${{ secrets.github_token }}

##############################
# Test Package Usage #
##############################
- run: npm ci --no-audit
if: matrix.os == 'ubuntu-latest' && matrix.node-version == '8'
- run: npm run build
- run: ./testing/packaging/setup.sh

##############################
# NPM Publish #
##############################
Expand Down
44 changes: 21 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,17 +76,14 @@ const params = {
// - emit events back to the scraper (like 'stop')
// - query the scraped data

const { scraper } = require('scrape-pages')
// create an executable scraper and a querier
const { start, query } = scrape(config, options, params)
// begin scraping here
const { on, emit } = await start()
// listen to events
on('image:compete', id => console.log('COMPLETED image', id))
on('done', () => {
const result = query({ scrapers: ['images'] })
// result is [[{ filename: 'img1.jpg' }, { filename: 'img2.jpg' }, ...]]
})
const scraper = new ScraperProgram(config, options, params)
scraper
.on('image:complete', id => console.log('COMPLETED image', id))
.on('done', () => {
const result = scraper.query(['images'])
// result is [[{ filename: 'img1.jpg' }, { filename: 'img2.jpg' }, ...]]
})
.start()
```

For more real world examples, visit the [examples](examples) directory
Expand All @@ -103,11 +100,11 @@ given a different output folder in the `params` object, it will run completely f

### scrape

| argument | type | required | type file | description |
| -------- | ------------- | -------- | -------------------------------------------------------------- | ----------------------------- |
| config | `ConfigInit` | Yes | [src/settings/config/types.ts](src/settings/config/types.ts) | _what_ is being downloaded |
| options | `OptionsInit` | Yes | [src/settings/options/types.ts](src/settings/options/types.ts) | _how_ something is downloaded |
| params | `ParamsInit` | Yes | [src/settings/params/types.ts](src/settings/params/types.ts) | _who_ is being downloaded |
| argument | type | type file | description |
| -------- | ------------- | -------------------------------------------------------------- | ----------------------------- |
| config | `ConfigInit` | [src/settings/config/types.ts](src/settings/config/types.ts) | Pages that are being downloaded & parsed |
| options | `OptionsInit` | [src/settings/options/types.ts](src/settings/options/types.ts) | Knobs to tweak download behavior
| params | `ParamsInit` | [src/settings/params/types.ts](src/settings/params/types.ts) | Inputs values and output file locations

### scraper

Expand All @@ -117,13 +114,14 @@ The `scrape` function returns a promise which yields these utilities (`on`, `emi

Listen for events from the scraper

| event | callback arguments | description |
| ---------------------- | ------------------ | ------------------------------------------ |
| `'done'` | | when the scraper has completed |
| `'error'` | Error | if the scraper encounters an error |
| `'<scraper>:progress'` | download id | emits progress of download until completed |
| `'<scraper>:queued'` | download id | when a download is queued |
| `'<scraper>:complete'` | download id | when a download is completed |
| event | callback arguments | description |
| ---------------------- | ------------------ | ------------------------------------------ |
| `'initialized'` | | after start(), `initialized` means scraper has begun scraping |
| `'done'` | | when the scraper has completed |
| `'error'` | Error | if the scraper encounters an error |
| `'<scraper>:progress'` | download id | emits progress of download until completed |
| `'<scraper>:queued'` | download id | when a download is queued |
| `'<scraper>:complete'` | download id | when a download is completed |

#### emit

Expand Down
Loading

0 comments on commit 0fcdf02

Please sign in to comment.