This is the mail archive of the
mailing list for the Cygwin project.
Re: Wget ignores robot.txt entry
> No, I don't think cURL does recursive retrieval. I don't think it does
> Web page dependency retrieval, either. Both of these are a big deal for
> me. How could a tool of wget's versatility be replaced by something
> inferior? Whatever happened to technological meritocracy? (Please, no
> I was actually hoping to get some time to work on an extension to wget
> of my own. I wanted to add an option that would cause wget to look in
> one hierarchy to determine file existence and modification times
> relative to the set of files and mod times on the server and download
> new or newer files to a different location. That way I can easily
> maintain mirror copies on a CD-ROM. I'd tell wget to use the CD's
> contents as the file and mod-time reference and to download to a
> location on my hard drive (of course). Then I could incrementally
> update the ROM with whatever was downloaded.
That's a real good idea! :-)
> Of course I can still do that and I may yet. Does that sound like a
> desirable feature to anyone? I don't know how many people share my
> mania for keeping local archives of content from the Internet.
I seem to end-up doing this quite a lot when on a hunt for new concepts and
ways of doing things. A 'uge web suck, most stuff I never glance a quarter
of my eye over, and I got a whole archive of stuff where i can just grep out
> What happens to an open source project when it devolves to this state?
> Who, for example, could hand out writable access to the wget CVS
> repository? Surely this isn't an unrecoverable state of affairs, is it?
> Randall Schulz
Wasn't a patch applied to CVS HEAD of the wget repos only a few weeks ago.
Thats what it looks like anyway.
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html