2008/08/28 6:30:04
90
13
15 www.manageability.org [
この元コンテンツへ ]
I was recently quite pleased to learn that the Internet Archive's new crawler is written in Java. Coincindentally, I had in addition to put together a list of open source projects for full-text search engines, I put together a list of crawlers written in Java to complement that list. Here's the list: Heritrix - Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix is designed to respect the robots.txt exclusion directives and META robots tags ...
[
← 前の画面 ]
【 PR 】 Webデザインに使えるジェネレーター [
ニコニコ風 ] [
関連記事 ] [
Feeling Lucky ]