Scraper for LinkedIn full profile data.
Unlike others scrapers, it's working in 2019 with their new website.
Install via npm package manager: npm i scrapedin
We need to update at every LinkedIn change. Please check if you have the latest version.
- Latest release: v1.0.8 (latest 16 jul 2019)
const scrapedin = require('scrapedin')
const profileScraper = await scrapedin({ email: '[email protected]', password: 'pass' })
const profile = await profileScraper('https://www.linkedin.com/in/some-profile/')
-
scrapedin(options)
- options Object:
- email: LinkedIn login e-mail (required)
- password: LinkedIn login password (required)
- isHeadless: display browser (default
false
) - hasToLog: print logs on stdout (default
false
) - puppeteerArgs: puppeteer launch options Object. It's very useful, you can also pass Chromium parameters at its
args
property, example:{ args: ['--no-sandbox'] }
(defaultundefined
)
- returns: Promise of profileScraper function
- options Object:
-
profileScraper(url, waitTimeMs = 500)
- url string: A LinkedIn profile URL
- waitTimeMs integer: milliseconds to wait page load before scraping
- returns: Promise of profile Object
-
profile
Object:{ profile: { name, headline, location, summary, connections, followers }, positions:[ { title, company, description, date1, date2, roles: [{ title, description, date1, date2 }] } ], educations: [ { title, degree, date1, date2 } ], skills: [ { title, count } ], recommendations: [ { user, text } ], recommendationsCount: { received, given }, recommendationsReceived: [ { user, text } ], recommendationsGiven: [ { user, text } ], accomplishments: [ { count, title, items } ], volunteerExperience: { title, experience, location, description, date1, date2 }, peopleAlsoViewed: [ { user, text } ] }
-
We already built a crawler to automatically collect multiple profiles, so check it out: scrapedin-linkedin-crawler
-
Usually in the first run LinkedIn asks for a manual check, to solve that you should:
- set
isHeadless
tofalse
on scrapedin to solve the manual check in the browser. - set
waitTimeMs
with a large number (such as10000
) to you have time to solve the manual check.
After doing the manual check once you can go back with
isHeadless
andwaitTimeMs
previous values and start the scraping.We still don't have a solution for that on remote servers without GUI, if you have any idea please tell us!
- set
Feel free to contribute. Just open an issue to discuss something before creating a PR.