Tag Archives: Python

Master of your domain, King of your castle

Our computers do hundreds of tasks without asking us. They perform updates, index the hard drive, check for viruses, verify the integrity of programs, and hundreds of other preventative maintenance, monitoring, security, and random processes. And most of the time we’re okay with this. Sure it can be a pain in the butt when we want to shut down our laptop at a coffee shop, and are forced to wait for Windows updates to complete.* But most things happen without our notice and we just accept this as part of technology.

But what about when our computers prevent us from doing things we want them to do, when they disobey our commands?

I recently finished reading Cory Doctorow’s latest book of essays Information Doesn’t Want To Be Free. In it he addresses some of the history of using our computers against us, and where it might be going in the name of security and copyright protection.

Doctorow is an outspoken advocate of DRM free media and a frequent critic of the DMCA (Digital Millennium Copyright Act, something probably very familiar to the people who discovered BitTorrent in the early ’00s). We all may have heard of Sony rootkits (an attempt by Sony to prevent their CD’s from being ripped that opened up a security vulnerability that was ultimately exploited by virus writers) and Amazon’s yanking of 1984, but to most of us these seem like isolated incidents, or something that doesn’t affect law-abiding citizens.

Both these cases are examples of a company trying to make a device work against what you want to do with it. When we still trafficked in CD’s you would rip a copy so you could listen to it on an MP3 player, or play a copy in your car so you wouldn’t damage the original. And when we read a book on the Kindle, we expect it to stay there, especially if we paid money for it. I don’t want my Kindle trying to figure out if all the books I’ve loaded on it are legitimate or not, because I don’t trust programmers to always get that right. At the very least in my case I have eBook versions of my own unpublished draft books, and other books purchased from the Humble Bundle and Story Bundle. What if one day my Kindle didn’t let me load these books, and only let me load stuff from the Amazon store? I signed a big long EULA with Amazon to let me use the Kindle and I didn’t read it, and neither did you. But I still expect the device to do what I bought it for.

Ultimately I’m more of a tinkerer than most people with computers today, but “hacking” a computer to get it to do new and more creative things has been part of owning computers since their inception. Let’s take a more morally gray area and pick apart all the legitimate and illegitimate uses of it, web scraping. What is web scraping? Simply put it’s a program designed to read all the pages of a website or series of websites and download specific content. Applications include downloading all the strips of a web-comic, pulling Bible studies from Bible Gateway and making an eBook, or pulling down a directory of pictures and making an application to decide out of a random pairing which is hotter (a la Social Network).

These are fairly easy scripts to write, in fact here’s a whole on-line chapter of a book about how. Some websites and web-comics hate web-scrapers (GPF comics being just one example). Requests from web-scrapers can be bounced back with fake webpages or even threats of banning (since this kind of traffic circumvents ad revenue, though then again so does AdBlock plus). Websites that don’t like web-scraping want you to load their site one page at a time, see their ads, and go back to the site every time you want to read those comics again. And that makes some sense, they own their own content right? The book I linked to is listed with a creative commons license for online reading, but with a little digging behind the page source, you could probably scrape the book down into an eBook using the knowledge gained in that online reading.

What differentiates web-scraping traffic from legitimate communication is speed and type of request. Requests can be massaged to look like their coming from a real person, and timing can be adjusted, but ultimately there are still ways to tell. But what if your computer decided that it would limit the amount of outgoing web requests to something more akin to normal usage. What if your hard drive stopped letting you save images pulled down this way? You wrote a program to do something, and your computer doesn’t want you to do it. Maybe that helps the legitimate cases of copyright infringement, but what about study applications or the experiments that are part of developing code.

I don’t know if I’ve made web-scraping sound shady or really cool, and I can come up with applications that would skew you both ways. But the truth is, some applications are beneficial, some are close to stealing, and some are creepy. But the act itself is neutral, and should be allowed. It’s not something that’s been specifically targeted yet, but it could be. Websites already have some ability to recognize that kind of traffic, and it would actually be easier to monitor at the source.

Do you value being able to tinker with your machine, to know what it is doing, and to make your own decisions about what it’s doing rather than to have your machines decide for you? Then maybe take a look a Doctorow’s book, look at your task manager and running services, and learn some python. Or just share that value far and wide with anyone who’ll listen.

* When they say don’t turn off your computer they mean it. My dad’s Windows 7 starter netbook was at a Panera once when a Windows update started. During peak hours of 11:30am – 1:30pm, Panera limits WiFi use to 30 minutes. Apparently the update had only half downloaded before Dad’s connection was cut off and he was forced to hard shut down the computer. It took a long time to get the system fixed and there are still hiccups probably caused by this.

Leave a comment

Filed under Trube On Tech