Staging
v0.5.1
https://github.com/python/cpython
Revision a5413c499702a74fdc50e4bc8e7e6a480856a1f9 authored by Raymond Hettinger on 13 May 2014, 05:18:50 UTC, committed by Raymond Hettinger on 13 May 2014, 05:18:50 UTC
* Repair the broken link to norobots-rfc.txt.

* HTTP response codes >= 500 treated as a failed read rather than as a not
found.  Not found means that we can assume the entire site is allowed.  A 5xx
server error tells us nothing.

* A successful read() or parse() updates the mtime (which is defined to be "the
  time the robots.txt file was last fetched").

* The can_fetch() method returns False unless we've had a read() with a 2xx or
4xx response.  This avoids false positives in the case where a user calls
can_fetch() before calling read().

* I don't see any easy way to test this patch without hitting internet
resources that might change or without use of mock objects that wouldn't
provide must reassurance.
1 parent c594596
History
Tip revision: a5413c499702a74fdc50e4bc8e7e6a480856a1f9 authored by Raymond Hettinger on 13 May 2014, 05:18:50 UTC
Issue 21469: Mitigate risk of false positives with robotparser.
Tip revision: a5413c4
File Mode Size
Demo
Doc
Grammar
Include
Lib
Mac
Misc
Modules
Objects
PC
PCbuild
Parser
Python
RISCOS
Tools
.bzrignore -rw-r--r-- 554 bytes
.gitignore -rw-r--r-- 583 bytes
.hgeol -rw-r--r-- 692 bytes
.hgignore -rw-r--r-- 886 bytes
.hgtags -rw-r--r-- 7.8 KB
LICENSE -rw-r--r-- 12.5 KB
Makefile.pre.in -rw-r--r-- 43.2 KB
README -rw-r--r-- 52.7 KB
config.guess -rwxr-xr-x 44.2 KB
config.sub -rwxr-xr-x 34.7 KB
configure -rwxr-xr-x 417.2 KB
configure.ac -rw-r--r-- 130.8 KB
install-sh -rwxr-xr-x 7.0 KB
pyconfig.h.in -rw-r--r-- 34.1 KB
setup.py -rw-r--r-- 95.2 KB

README

back to top