Skip to content

Commit

Permalink
v0.5.272-alpha
Browse files Browse the repository at this point in the history
  • Loading branch information
RafaelEstevamReis committed Aug 22, 2020
1 parent 2bd7336 commit da07df3
Show file tree
Hide file tree
Showing 4 changed files with 75 additions and 9 deletions.
66 changes: 66 additions & 0 deletions NugetPage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# SimpleSpider

A simple and modular web spider written in C# .Net Core

![.NET Core](https://github.com/RafaelEstevamReis/SimpleSpider/workflows/.NET%20Core/badge.svg)
[![NuGet](https://buildstats.info/nuget/Net.RafaelEstevam.Spider.Simple.Lib)](http://nuget.org/packages/Net.RafaelEstevam.Spider.Simple.Lib)

# Content
- [SimpleSpider](#simplespider)
- [Content](#content)
- [Some advantages](#some-advantages)
- [Getting started](#getting-started)
- [Samples](#samples)

## Some advantages

* Very simple to use and operate, ideal for lots of small projects or personal ones
* Easy html filter with [HObject](https://github.com/RafaelEstevamReis/SimpleSpider/blob/master/Simple.Test/Sample/QuotesToScrape_HObject.cs) (a HtmlNode wrap with use similar to JObject)
* Internal conversion from html to XElement, no need to external tools on use
* Automatic Json parser to JObject
* Automatic Json deserialize <T>
* Modular Parser engine (you can add your own parsers!)
* JSON and XML already included
* Modular Caching engine (you can add your own!)
* Stand alone Cache engine included, no need to external softwares
* Modular Downloader engine (you can add your own!)
* WebClient with cookies or HttpClient download engine included

Easy **import with [NuGet](https://www.nuget.org/packages/Net.RafaelEstevam.Spider.Simple.Lib)**

## Getting started

1. Start a new console project and add Nuget Reference
2. PM> Install-Package Net.RafaelEstevam.Spider.Simple.Lib
3. Create a class for your spider (or leave in program)
4. create a new instance of SimpleSpider
1. Give it a name, cache and log will be saved with that name
2. Give it a domain (your spider will not fleet from it)
5. Add a event `FetchCompleted` to
6. Optionally give a first page with `AddPage`. If omitted, it will use the home page of the domain
7. Call `Execute()`

``` C#
void run()
{
var spider = new SimpleSpider("QuotesToScrape", new Uri("http://quotes.toscrape.com/"));
// Set the completed event to implement your stuff
spider.FetchCompleted += fetchCompleted_items;
// execute
spider.Execute();
}
void fetchCompleted_items(object Sender, FetchCompleteEventArgs args)
{
// walk around ...
// TIP: inspect args to see stuff
var hObj = args.GetHObject();
string[] quotes = hObj["span > .text"];
}
```

> TIP: Use the [Simple.Tests](https://github.com/RafaelEstevamReis/SimpleSpider/tree/master/Simple.Test/Sample) project to see examples and poke around
## Samples

See all samples at [Simple.Tests](https://github.com/RafaelEstevamReis/SimpleSpider/tree/master/Simple.Test/Sample)
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,8 @@ void fetchCompleted_items(object Sender, FetchCompleteEventArgs args)
// walk around ...
// TIP: inspect args to see stuff
// Two good starts: XElement and HObject
var XElement = args.GetXElement();
var hObj = args.GetHObject();
string[] quotes = hObj["span > .text"];
}
```

Expand Down
12 changes: 6 additions & 6 deletions Simple.Lib/Simple.Lib.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,16 @@
<PackageProjectUrl>https://github.com/RafaelEstevamReis/SimpleSpider</PackageProjectUrl>
<RepositoryUrl>https://github.com/RafaelEstevamReis/SimpleSpider</RepositoryUrl>
<RepositoryType>git</RepositoryType>
<PackageTags>spider</PackageTags>
<PackageTags>spider web crawling</PackageTags>
<GeneratePackageOnBuild>false</GeneratePackageOnBuild>
<Version>0.5.266</Version>
<Version>0.5.272</Version>
<PackageRequireLicenseAcceptance>true</PackageRequireLicenseAcceptance>
<Description>A simple to use web spider with an example rich GitHub repository
<Description>A simple to use spider for web crawling with an example rich GitHub repository
(work in progress)</Description>
<PackageReleaseNotes>Work in progress. See examples and documentation on GitHub page
Better redirect handling, added HObject support
Commit 29d3a44</PackageReleaseNotes>
<AssemblyVersion>0.5.266.0</AssemblyVersion>
Improved performance
Commit 2bd7336</PackageReleaseNotes>
<AssemblyVersion>0.5.272.0</AssemblyVersion>
</PropertyGroup>

<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|AnyCPU'">
Expand Down
3 changes: 2 additions & 1 deletion SimpleSpider.sln
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,11 @@ EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Text", "Text", "{78F66FBC-1B52-4851-89E3-798D6E6AE310}"
ProjectSection(SolutionItems) = preProject
LICENSE = LICENSE
NugetPage.md = NugetPage.md
README.md = README.md
EndProjectSection
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Simple.UnitTests", "Simple.UnitTests\Simple.UnitTests.csproj", "{A7EB8974-B50F-499A-BAAA-53C2CB33E08C}"
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Simple.UnitTests", "Simple.UnitTests\Simple.UnitTests.csproj", "{A7EB8974-B50F-499A-BAAA-53C2CB33E08C}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Expand Down

0 comments on commit da07df3

Please sign in to comment.