Re: Wget ignores robot.txt entry

> Max,
> No, I don't think cURL does recursive retrieval. I don't think it does
> Web page dependency retrieval, either. Both of these are a big deal for
> me. How could a tool of wget's versatility be replaced by something
> inferior? Whatever happened to technological meritocracy? (Please, no
> laughing.)
> I was actually hoping to get some time to work on an extension to wget
> of my own. I wanted to add an option that would cause wget to look in
> one hierarchy to determine file existence and modification times
> relative to the set of files and mod times on the server and download
> new or newer files to a different location. That way I can easily
> maintain mirror copies on a CD-ROM. I'd tell wget to use the CD's
> contents as the file and mod-time reference and to download to a
> location on my hard drive (of course). Then I could incrementally
> update the ROM with whatever was downloaded.

That's a real good idea! :-)

> Of course I can still do that and I may yet. Does that sound like a
> desirable feature to anyone? I don't know how many people share my
> mania for keeping local archives of content from the Internet.

I seem to end-up doing this quite a lot when on a hunt for new concepts and
ways of doing things. A 'uge web suck, most stuff I never glance a quarter
of my eye over, and I got a whole archive of stuff where i can just grep out
the crap.

> What happens to an open source project when it devolves to this state?
> Who, for example, could hand out writable access to the wget CVS
> repository? Surely this isn't an unrecoverable state of affairs, is it?
> Randall Schulz

Wasn't a patch applied to CVS HEAD of the wget repos only a few weeks ago.
Thats what it looks like anyway.


Elfyn McBratney

