Skip to content
/ earl Public

Earl is looking for URLs in your area.

License

Notifications You must be signed in to change notification settings

Cryptoc1/earl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

155 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

earl

Looking for URLs in your area.

Language Checks Coverage Version

Earl is a suite of APIs for developing url crawlers & web scrapers driven by a middleware pattern similar to, and strongly influenced by, ASP.NET Core.

Basic Usage

var services = new ServiceCollection()
    .AddEarlCrawler()
    .AddEarlJsonPersistence()
    .BuildServiceProvider();

var crawler = services.GetService<IEarlCrawler>();
var options = CrawlerOptionsBuilder.CreateDefault()
    .BatchSize( 50 )
    .MaxRequestCount( 500 )
    .On<CrawlUrlResultEvent>( 
        ( CrawlUrlResultEvent e, CancellationToken cancellation ) =>
        {
            Console.WriteLine( $"Crawled {e.Result.Url}" );
            return default;
        }
    )
    .Timeout( TimeSpan.FromMinutes( 30 ) )
    .Use(
        ( CrawlUrlContext context, CrawlUrlDelegate next ) =>
        {
            Console.WriteLine( $"Executing delegate middleware while crawling {context.Url}" );
            return next( context );
        }
    )
    .PersistTo( persist => persist.ToJson( json => json.Destination(...) ) )
    .Build();

await crawler.CrawlAsync( new Uri(...), options );

Documentation

Documentation can be find within the READMEs of the sub-directories representing the conceptual components of Earl:

All public APIs should contain thorough XML (triple slash) comments.

Something missing, still have questions? Please open an Issue or submit a PR!

About

Earl is looking for URLs in your area.

Topics

Resources

License

Stars

Watchers

Forks

Languages