whaley

Drawing a Blank

On Schneier’s Survelliance State

I’ll say this up front before continuing. I absolutely love reading Bruce Schneier and thoroughly respect his opinion on damn near everything that has to do with electronic security and privacy. This post is basically a brain dump of thoughts and rebuttals that immediately sped through my mind as I read his latest piece over here on CNN.

Let’s start with anecdote about Google since Google seems to be the posterchild for an external entity mining data on everything about us it can find online.

Just now I typed in the search query ”set add”, a rather ambiguous query without context, in a browser that presently is logged in to my Google account. The first ten results have to do with adding an element to a set in various programming languages. The last thing to appear is in the section named “News for set add”, where the first visible entry is an article about a free agent signing in the NFL. If you know me at all, then you know that pretty much every single hit visible on the first page seems to be relatively tailored to me - I’m a software developer who loves (and played collegiate) American football. Google has profiled me that well.

Just for comparison’s sake, let’s do the same search on Bing - a search service that I’ve used probably less than 5 times in the last 2 years. The first two hits are for Java documention. After that we’ve got stuff about Drum Add-Ons Sets, Swing Sets, Setting up and Adding Email, and something called SETAD. Bing obviously does not have me profiled that well.

I can’t tell you how much time I save on a day to day basis by Google’s ability to give me what I consider to be the best search results. The fact that I very rarely have to context switch from solving “$X problem” to “how do I get this search engine to find what the hell I’m looking for” is really quite invaluable.

Such convenience and overall utility is what Bruce fails to consider in his CNN piece. Bruce concludes his article by saying “Welcome to an Internet without privacy, and we’ve ended up here with hardly a fight”. I’d like to opine that we’ve ended up here on purpose and, as evidenced above, to our benefit. We’ve ended up here without anything to fight about… yet.

The fact that Google has me profiled basically led to its service being able to tailor results for me. It’s so damn tailored to me that I haven’t even considered switching to something else even on the days where I feel like maybe I ought to be wearing a tin foil hat after all. In general, the results of my anecdote are very similar to how I perceive most people view their privacy on the internet. In summary, general opinion is “yes, I might be accessing this from my device, but this is the public square and I want everything in the public square to deliver the best personal experience to me… who cares if they know that I like Java and Football!”. Google implicitly learns about me as I trade the illusion of my privacy to them in exchange for better search results. I find this inherently good, truth be told.

Conversely, Bruce seems to be coming from a position that the “powerful are spying on the powerless” and this is inherently bad. I find that an interesting position to take when two of the first three examples Bruce gives as data points to open up his article are about crackers who definitely were not powerless and the third example involved a women who was bedding the top intelligence official of the United States of America. These don’t exactly sound like people without a certain amount of power in the world we are describing. All three of the examples resulted in the discovery of identities of people engaging in what could at least be considered ethically dubious behavior.

What about the real powerless in this context - those being consumers who are locked in to having their activity monitored and mined just to participate in modern society? In my admittedly limited experience with the world after 1998 (when I first got internet access), I’ve observed that typically the only time someone actually wishes to hide their identity online or minimize their footprint is when they are actively engaging in what could also be considered ethically dubious behavior. All of the talk of wanting to maintain privacy because of fears of overly powerful governments and/or businesses monitoring and profiling is just lip service by folks who are trying to not pay for something that is of value to them. Note that I use the phrase “could be considered ethically dubious”. Without delving into the extremely nuanced topic of whether pirating and cracking are unethical, or even remotely wrong (I refer you to “Free”, by Chris Anderson for some thoughts on pirating, at least) let’s just say that to determine what is ethical here is very interesting and depends on your perspective. Generally speaking, if you aren’t doing anything that could possibly harm someone else (emotionally, monetarily, reputation, etc. etc.) then there isn’t anything to worry about.

Or is there?

We have nothing to fight about… yet. Yes, the potential for abuse in collecting, searching, and profiling all of this data about us is extremely high. But we certainly aren’t going back now and I don’t think anyone who isn’t a complete luddite wants to either.

Sadly, I think Bruce is bit too bleak and sensationalist in the suggestion that we “now have an Internet without privacy” after making references to an Orwellian society and he’s not putting enough emphasis on what could be real abuses. The defining characteristic of an “Orwellian” society that the society be oppressive and destructive to the general welfare and openness of society. The “Big Brother” aspect is just a means to that end. Is identifying Chinese crackers who were conducting international espionage oppressive? Is the act of discovering the identities of and criminally convicting members of LulzSec destructive to our general welfare? Is finding out the identity of the woman who was under the sheets with the top appointed public official of the United States Central Intelligence Agency really a threat to the openness to society?

In all cases the answer is no. We as a society should be keeping our eyes open and be ready to raise a fight for real abuses in the use of this data and profiling. The aforementioned cases are not such abuses.

Decrying the lack of privacy in cases such as the ones Schneier has mentioned is akin to decrying all warfare between states as a completely undesirable thing, despite its total inevitably given human nature. Lack of privacy isn’t the thing to be rallied against here as that lack of privacy is completely inevitable given the nature of the internet and our interactions with each entity on it.

The United States once developed an atomic bomb in a seemingly natural progression in develping better arms, used it twice before realizing what had just been unleashed on the world, and then spent the better half of the last century trying to minimize proliferation of such weapons. We couldn’t turn back from the development of the atomic bomb just as we now can’t turn back from erosion of privacy on the internet. Unlike the atomic bomb, much good can come from the usage of this profile. The only option is to hold those in posession of that data accountable to do no harm with it.

Creation vs. Consumption Machine

A previous post I wrote, “Are We Part of the War Against General Purpose Computing?”, also made me consider ideal utilizations of workstations and other devices to help separate the two modes I’m in - those modes being a “creation” mode and a “consumption” mode. I wonder if the ideal setup for a workstation perhaps involves the following:

  • Remove all possible applications and utilities that are geared toward consumption from your work machine. This would include all RSS Readers, Twitter Clients
  • Keep your secondary consumption device in a completely separate room or area at all times. If at the office, keep that device in your bag unless you are explicitly taking an elongated break.
  • Remove all non-work mail accounts from your mail client on your primary workstation. This is easier than it sounds, since most folks use non-work email addresses for services such as Github - so some amount of discipline may be needed in checking those accounts through typical Web email interfaces.

There are possibly other ideas I’m missing. Feel free to suggest what those might be.

Previously I’ve tried attempting this separation of mental modes with the Mission Control/Spaces features of OSX, but the demarcation isn’t really strong enough. It’s entirely too easy to make a swift stroke of your fingers to get to the consumption space/screen. I think what is really needed is to really limit consumption to another machine. This would be especially true if that one machine geared solely to consumption - an iToy in my case or perhaps dusty 3 year old backup Macbook Pro I haven’t used in months.

I previously went on a different experiment where I used OSX full screen only. This is actually an ideal setup when I’m working solely from a single lower resolution screen. This naturally lead me to more focus since you can really only view one thing on your screen at a time. However, I’m back in my Florida home office now with a large external display at my command. This means that using full screen mode of apps is a gross underutilization of resources. I’m back to using OSX and Divvy for my working layouts.

Now the experiement is going to be on moving all, or as much, consumption to my ipad as possible. This basically means all Tweeting, Reading, RSS, Video Watching. If someone gives me an interesting link that isn’t directly related to what I’m working on, it’s going to go into the Safari Reading List, and it gets read on the iPad. The iPad is also going to be kept nowhere near my office. I want to keep the lines as clear as possible.

Hopefully over the next few days i’m going to be giving some “tips” and “gotchas” on how this is all going.

Index Sub-Documents in Mongo

Let’s say I have a collection of documents that all kind of look like this:

> db.foos.find().pretty()
{
    "_id" : ObjectId("511fe286777e76dfdddbf440"),
    "foo" : "bar",
    "timestamp" : ISODate("2013-02-16T20:02:39.417Z")
}

Then, all of a sudden I get a requirement handed to me that this collection needs to be searched for all documents with a timestamp within a given hour, on a given month, within a given year. Writing a query for that is going be a bit gnarly, so we create a sub-document in our documents that makes this query easier.

{
    "_id" : ObjectId("511fe286777e76dfdddbf440"),
    "foo" : "bar",
    "timestamp" : ISODate("2013-02-16T20:02:39.417Z"),
    "timefields" : {
        "y" : 2013,
        "mo" : 2,
        "d" : 16,
        "h" : 20,
        "mi" : 02,      
    }
}

So now my query will look like the following and all will be happy in the world. And yes, I could have put each of these elements at the top level, but bear with me for demonstration purposes…

> db.foos.find({"timefields.y":2013, "timefields.mo": 2, "timefields.h":20}).pretty()
{
    "_id" : ObjectId("511fe286777e76dfdddbf440"),
    "foo" : "bar",
    "timestamp" : ISODate("2013-02-16T20:02:39.417Z"),
    "timefields" : {
        "y" : 2013,
        "mo" : 2,
        "d" : 16,
        "h" : 20,
        "mi" : 2
    }
}

Ok, but what about that index that was (hypothetically) on timestamp? Because we are searching other fields we aren’t using indexes in our query and the search is happening in linear time. The horror!

Well, mongo is flexible if it isn’t anything else. You can just as easily index fields in a sub-document like you can any other field. For instance:

> var indices = { "timefields.h" : 1,
... "timefields.y" : 1,
... "timefields.mo" : 1,
... "timefields.d" : 1,
... "timefields.m" : 1}
> db.foos.ensureIndex(indices)

> db.foos.getIndices()
[
    ...
    {
        "v" : 1,
        "key" : {
            "timefields.h" : 1,
            "timefields.y" : 1,
            "timefields.mo" : 1,
            "timefields.d" : 1,
            "timefields.m" : 1
        },
        "ns" : "test.foos",
        "name" : "timefields.h_1_timefields.y_1_timefields.mo_1_timefields.d_1_timefields.m_1"
    }
]

And now when I run explain on the cursor using the same query I performed before, you can now see those indices are being used:

> db.foos.find({"timefields.y":2013, "timefields.mo": 2, "timefields.h":20}).pretty().explain()
{
    ...
    "indexBounds" : {
        "timefields.h" : [
            [
                20,
                20
            ]
        ],
        "timefields.y" : [
            [
                2013,
                2013
            ]
        ],
        "timefields.mo" : [
            [
                2,
                2
            ]
        ],
        "timefields.d" : [
            [
                {
                    "$minElement" : 1
                },
                {
                    "$maxElement" : 1
                }
            ]
        ],
        "timefields.m" : [
            [
                {
                    "$minElement" : 1
                },
                {
                    "$maxElement" : 1
                }
            ]
        ]
    },
}