I fight a daily battle against distractions. I know I’m not the only one, either, because the web sites that distract me are full of articles about how I can be less distracted. (I don’t want to change too much about how my mind works—the same part of my brain that gets distracted also is amazingly creative in problem solving.)
I’ve used some tools that help some of the time: pomodoro timers, (10 + 2) * 5 timers, time trackers, a custom hosts file, closing the browser, and upbeat music are just the beginning. In this article, I want to talk about how I handle the tool of productivity and distraction called the web.
/etc/hosts
The cheapest and easiest way to selectively block websites is to prevent your computer from resolving their names. By making the sites resolve to localhost before hitting a DNS server, my browser can’t show them. I did not come up with the idea, but it is a great one. As you can see from this segment of my /etc/hosts
file, I have blocked several news sites.
When I want or need to view a blocked site, I have the distinct annoyance of needing to $ sudo vim /etc/hosts
to comment out the line. This deters me from cheating, but it also is a pain when it comes to work-related searches that point to a blocked site.
An Idea
Recently I was pondering how inefficiently I track my time. I know that I could use any one of the many time-tracking tools available and that lots of them are really good. It occurred to me that it would be handy to have a record of my web browsing. (I could just file a FOIA request with the NSA, but that would be slow.)
Then I realized that I could have a record of my browsing. I just needed a local proxy server that would record everything. (I normally use about four different browsers at a time, so just reading the browser cache is not a useful solution.)
Local HTTP Proxy
Implementing an HTTP proxy is not a difficult task. Implementing a robust HTTP proxy that performs well is not trivial. Thankfully, Twisted has already done the hard parts for me. Here is a simple HTTP proxy server that listens on port 8080:
import twisted.internet.reactor
import twisted.web.http
import twisted.web.proxy
def start_proxy():
proxyFactory = twisted.web.http.HTTPFactory()
proxyFactory.protocol = twisted.web.proxy.Proxy
twisted.internet.reactor.listenTCP(8080, proxyFactory)
def main():
start_proxy()
twisted.internet.reactor.run()
if '__main__' == __name__:
main()
To make it more useful, I want it to log the URLs I visit, along with timestamps. After digging in Twisted’s twisted.web.proxy
module, I’ve determined that the twisted.web.proxy.ProxyRequest
class is the best location for logging the information I want to capture. Below I’ve replaced it with CustomProxyRequest
which adds logging in the process
method:
import logging import sys
import twisted.internet.reactor
import twisted.web.http
import twisted.web.proxy
class CustomProxyRequest(twisted.web.proxy.ProxyRequest):
def __init__(self, channel, queued, reactor=twisted.internet.reactor):
# twisted.web.http.Request--the ultimate base class--is an old-style
# class, so we can't use super() here.
twisted.web.proxy.ProxyRequest.__init__(self, channel, queued, reactor)
def process(self):
log = logging.getLogger('CustomProxyRequest')
m = '{method}t{uri}'.format(method=self.method, uri=self.uri)
log.info(m)
twisted.web.proxy.ProxyRequest.process(self)
def configure_logging():
formatter = logging.Formatter('%(created)ft%(message)s')
stdhandler = logging.StreamHandler(sys.stdout)
stdhandler.setFormatter(formatter)
log = logging.getLogger('CustomProxyRequest')
log.addHandler(stdhandler)
log.setLevel(logging.INFO)
def start_proxy():
proxyFactory = twisted.web.http.HTTPFactory()
proxyFactory.protocol = twisted.web.proxy.Proxy
proxyFactory.protocol.requestFactory = CustomProxyRequest
twisted.internet.reactor.listenTCP(8080, proxyFactory)
def main():
configure_logging()
start_proxy()
twisted.internet.reactor.run()if '__main__' == __name__:
main()
At this point, the CustomProxyRequest
is logging the request method and URI to standard output. (To log the response, more work is required.) This is a good first step in helping me understand my browsing habits.
In the future, I hope to modify the proxy so it actively helps me focus, possibly by injecting pages that remind me that I should be working instead of browsing. I’m not sure how best to accomplish this, whether I need to study machine learning or if a set of rules will do the trick. Suggestions are welcome.