19 May Metadata Investigation: the death of the tree
At Tiltfactor we’re designing a suite of games that inspires users to tag photographs with “expert” data. Using their input, are building a searchable database of terms that users can explore to find the photographs they need. But how should such a database be constructed in order to be searchable? To answer this question we need to decide exactly what we will be sorting.
If you want to sort dates, design a timeline; if you want to sort names, just make an alphabetical list.
It turns out we’re sorting the entirety of human knowledge.
So how on earth do you arrange “everything” in a way that is searchable and avoid keyword-only piles of data? My initial idea was to build a “tree” which could be measured and molded and meticulously added to by expert players.
It’s not that unique of an idea:
Aristotle is said to have been the last person to know everything there was in his own time.
Knowing everything in the context of ancient Greece, Aristotle established a framework for that body of information. By doing so, he established an informational “ontology,” an answer to the question: how should one think about the world? In information science, an “ontology” is the way in which data is structured and referred to, the vocabulary with which information is discussed inside its own context. Yep.
But Aristotle’s context was 2300 years ago. Today it might seem strange to us that he divided the category for all animals into two main groups: those that have blood and those that do not. As the world’s first taxonomist, (a person who arranges meanings) Aristotle determined the nature of zoology for the next two thousand years simply by making a miniscule decision regarding where things should be divided in the animal kingdom within his particular project.
So what is our context, and how might we best design an ontology to organize it? Well, today the estimated information content of all human knowledge is 10 exobytes. That’s 1 billion gigabytes, 6000 times the storage capacity of the human brain. We have to deal with way more meanings to design a context for than Aristotle admirably dealt with. Information is also created at a much faster rate today, so we’d have to structure a database prepared to take in information in real time and instantly incorporate it.
Yet even if we were to “succeed” in representing human knowledge as it stands today, our Solomonic decisions regarding what-goes-where will seem equally as random to future generations as Aristotle’s animals-with-blood division seems to us today.
Human knowledge is a dynamic, changing thing. As time passes, our libraries and universities slowly shift from one paradigm to the next. As such, to try to design a perfect hierarchy would be to miss the point: knowledge and culture are developing systems of astounding complexity and they’re growing more complex every day. Even the way they change is changing!
So what do we do?
Well it might be interesting to try to build a taxonomy with some help from crowd sourcing. We could hand users a very basic outline (animals, plants, volcanoes etc) and let them have a go at filling it out. But this might be a tad chaotic.
Hierarchical databases have been outdated since the 1970’s for a few reasons:
1) They only allow for entries to be listed under a single category. Thus “European History” would have to be listed under either “Europe” or “History”.
2) If you delete a single node, every node listed underneath it is deleted.
So if we we were to build relationships between all of the tags we receive, we’d have to come up with something “polyarchical,” with connections possible between every node in the structure. Thus “European History” would be connected both to “Europe” AND “History.” It might even be connected to a hundred other terms in varying ways. Why not?
Because moderating a structure like that would be unfeasible. As language and culture develop with new events happening every day, the pace of data entry would far outpace any possible moderation. The network would become unavoidably riddled with bugs, mistags and misconceptions. Wikipedia survives because a small handful of moderators volunteer to painstakingly preserve it; they believe in the cause. Because our database is hidden behind the façade of a game, the humanitarian goals of our project will not be as openly advertised and thus publicly upheld.
So what is the tree now? Maybe it was just a good idea. An ideal. We’d like to think that everything in the world makes sense in a way that is somehow navigable. Things do, but only in a given context.
There are 8 billion contexts walking around the planet every day, each changing completely with everything they do. It is time for us to scale this project back: Rauner (the Archive we are working with) needs their photos tagged so that people can use them. Any information, albeit temporary and contextually biased information, is useful.
So lets design a system we can make, moderate, and use rather than lament unattainable ideals. (This, from a humanist!)