Staging
v0.5.1
https://github.com/python/cpython
Revision 122541beceeccce4ef8a9bf739c727ccdcbf2f28 authored by Raymond Hettinger on 13 May 2014, 04:56:33 UTC, committed by Raymond Hettinger on 13 May 2014, 04:56:33 UTC
* Repair the broken link to norobots-rfc.txt.

* HTTP response codes >= 500 treated as a failed read rather than as a not
found.  Not found means that we can assume the entire site is allowed.  A 5xx
server error tells us nothing.

* A successful read() or parse() updates the mtime (which is defined to be "the
  time the robots.txt file was last fetched").

* The can_fetch() method returns False unless we've had a read() with a 2xx or
4xx response.  This avoids false positives in the case where a user calls
can_fetch() before calling read().

* I don't see any easy way to test this patch without hitting internet
resources that might change or without use of mock objects that wouldn't
provide must reassurance.
1 parent 73308d6
History
Tip revision: 122541beceeccce4ef8a9bf739c727ccdcbf2f28 authored by Raymond Hettinger on 13 May 2014, 04:56:33 UTC
Issue 21469: Mitigate risk of false positives with robotparser.
Tip revision: 122541b
File Mode Size
Doc
Grammar
Include
Lib
Mac
Misc
Modules
Objects
PC
PCbuild
Parser
Python
Tools
.bzrignore -rw-r--r-- 584 bytes
.gitignore -rw-r--r-- 960 bytes
.hgeol -rw-r--r-- 800 bytes
.hgignore -rw-r--r-- 1.2 KB
.hgtags -rw-r--r-- 6.5 KB
.hgtouch -rw-r--r-- 1.2 KB
LICENSE -rw-r--r-- 12.5 KB
Makefile.pre.in -rw-r--r-- 53.1 KB
README -rw-r--r-- 6.6 KB
config.guess -rwxr-xr-x 44.2 KB
config.sub -rwxr-xr-x 34.7 KB
configure -rwxr-xr-x 437.6 KB
configure.ac -rw-r--r-- 137.7 KB
install-sh -rwxr-xr-x 7.0 KB
pyconfig.h.in -rw-r--r-- 40.1 KB
setup.py -rw-r--r-- 94.8 KB

README

back to top