Staging
v0.5.1
https://github.com/python/cpython
Revision 122541beceeccce4ef8a9bf739c727ccdcbf2f28 authored by Raymond Hettinger on 13 May 2014, 04:56:33 UTC, committed by Raymond Hettinger on 13 May 2014, 04:56:33 UTC
* Repair the broken link to norobots-rfc.txt.

* HTTP response codes >= 500 treated as a failed read rather than as a not
found.  Not found means that we can assume the entire site is allowed.  A 5xx
server error tells us nothing.

* A successful read() or parse() updates the mtime (which is defined to be "the
  time the robots.txt file was last fetched").

* The can_fetch() method returns False unless we've had a read() with a 2xx or
4xx response.  This avoids false positives in the case where a user calls
can_fetch() before calling read().

* I don't see any easy way to test this patch without hitting internet
resources that might change or without use of mock objects that wouldn't
provide must reassurance.
1 parent 73308d6
Raw File
Tip revision: 122541beceeccce4ef8a9bf739c727ccdcbf2f28 authored by Raymond Hettinger on 13 May 2014, 04:56:33 UTC
Issue 21469: Mitigate risk of false positives with robotparser.
Tip revision: 122541b
.bzrignore
ยด.purify
autom4te.cache
config.log
config.cache
config.status
config.status.lineno
db_home
Makefile
buildno
python
build
Makefile.pre
platform
pybuilddir.txt
pyconfig.h
libpython*.a
libpython*.so*
python.exe
python-gdb.py
reflog.txt
tags
TAGS
.gdb_history
Doc/tools/sphinx
Doc/tools/jinja
Doc/tools/jinja2
Doc/tools/pygments
Doc/tools/docutils
Misc/python.pc
Modules/Setup
Modules/Setup.config
Modules/Setup.local
Modules/config.c
Modules/ld_so_aix
Parser/pgen
Lib/test/data/*
Lib/lib2to3/Grammar*.pickle
Lib/lib2to3/PatternGrammar*.pickle
__pycache__
.coverage
coverage/*
htmlcov/*
back to top