Living in a user-generated online society, who owns the data and how can it be used have been persistent questions. The debate continues, particularly as data stores grow with more complex (and more personal) information.
Fred Wilson’s Union Square Ventures has invested in a company called Wesabe, which like several others aims to sort through and make sense of your personal financial information. Financial data is probably the most sensitive of all with regard to online conveyance, and individual concern as to how that data is handled is an obvious barrier to acceptance of services like Wesabe. The company has answered the call in part by publishing a “Data Bill of Rights,” the purpose of which is to alleviate anxieties regarding housing personal financial information with them. Mr. Wilson caveats the “press” by stating it’s a good start, and calls out for additional opinion. Mine are as follows (with the disclaimer that said opinions are by no means steadfast rules, nor are they necessarily cost-effectively operationally feasible)…
Who owns the metadata you and others create about the transactions that come into the system?
In the world according to credit card processors and credit reporting agencies, they do, and despite your requests to block its use there is probably a lot of metadata being gathered that doesn’t fall within the two-point type guidelines your creditors periodically send you. They’re likely using it – and you should get used to it. But with regard to opt-in services such as Wesabe, I think there’s a happy median to be had. Clearly, these types of online services see value in said metadata, and allowing you to remove your viewable information shouldn’t necessarily be accompanied by complete removal of the offspring (particularly if the service was offered for free). I believe if personally identifiable and proprietary data elements (meaning data uploaded, imported, or otherwise entered by the user) are stripped away from the metadata, then the result (or what’s left, if anything) should be available to the service provider.
Is it better to let the service do the tagging or is it better to let the community to do the tagging of the transactions?
Both. The services themselves are the machine, and the community is the blood and guts. Algorithms versus psychy, or the two working in harmony and learning from each other. I believe there is a lot of value to be gained from allowing the machine to suggest helpful tag elements to the users, and I believe the users should be ready, willing and able to reciprocate.
Should the tags be shared and if so, when and with whom?
This should depend on the data elements or transactions being tagged and who is doing the tagging. If the machine “suggests” a tag for a personally identifiable element, then the end user should have the option to reject that metadata. But that doesn’t mean the service shouldn’t be allowed to use that metadata in conjunction with non-personally identifiable information to improve itself for the benefit of others in the community. By the same token, user generated tags should be sharable within the community while directly related to said user (or their data) only with their permission, but the “transaction” which resulted in that choice should be something the machine is allowed to learn from.
Where should your login and passwords be stored?
Probably a personal choice issue – there are a lot of folks working on various solutions which include third-party authentication, token exchange, etc., and there is not enough information to make a blanket judgment call on the matter either. I will likely never input my bank, securities, or credit related login information into a third party service, regardless of the level of security assurance the service provides. That is my choice, and the logic is this: a centralized repository of such data will attract threats in direct proportion to the service’s popularity, particularly given the potentially profitable nature of that data. My accounts are spread across numerous vendors, and while the possibility of having my data stolen through phishing attempts and the like increases with each transaction, I personally don’t engage in large numbers of them. I assume the risk is lesser than that presumed in a “large target” stored environment.
The bottom line is that the storage of login identifiers and passwords should be a choice based on convenience versus comfort. If the user wants to store their various account login information in a system for quick and easy retrieval, let them, but the service provider should be prepared to accept the burden of responsibility. If the user values the comfort more than the convenience, give them that option. Unfortunately, we live in world where the easy out is to blame the other guy, and proceed to court. There is simply no easy answer here (yet).
Can these services be hacked?
Of course! The moment someone says something is unhackable is most often immediately followed by a moment of apology over a breach. It is the value of the information housed within that service provider that they and their users need to be cognizant of, as the usefulness of the data within the store for a hacker to garner profit from is directly proportional to the amount of effort they (the hackers) are willing to pursue to break in. If the data is segmented by account type, unbranded, and non-personally identifiable, it’s usefulness goes down tremendously.
Is personal identifiable information (PII) being stored with the data?
The End Note
Again, these are just my opinions, and offering every nuance of this self-prescribed “perfect world” is impossible and likely unprofitable (or at the minimum, a major pain in the ass for some engineers). There is no way to please every user, and there probably never will be. Nonetheless, we’re talking user inputs, service outputs, and wants and needs which are either presently being breached or are yet unfulfilled. And there are a growing number of solution providers jockeying for position, hoping to provide enough answers to get up front.
A Side Note
I’m presently working on some research related to the login/password storage issue, and am looking for some data. In particular, I’m trying to find statistics on internet usage stratified by user type (i.e. core, casual, convenience only, what-have-you), including the number of sites visited daily, login counts, and time spent on sites thereafter. Site types (including blogs, bookmarking, social networking, and financial) would also be helpful. If anyone can point me to something useful in this regard, I’d greatly appreciate it.