Each photo taken by cell phone becomes a file saved on the device or in a cloud computing system. These files have a part dedicated to recording various information, such as the date the photo was taken, time, location, device name, serial number and, often, the name of the owner. These additional records are called metadata, and are not present only in photos.
Virtually every type of file has associated metadata, such as Word, Excel documents and files with PDF extensions. Each of them has its own metadata, including information such as the name of the computer or device, times when the file was accessed and changes made.
As much as they are useful for countless professional activities, and even for those who like to keep folders organized, this data can be used to expose the user’s privacy, giving possible intruders fundamental information to map their habits.
Overkill? Not so much. There have been cases of companies sued for plagiarism claiming that, in reality, they had new, authentic files created by them. The metadata, however, proved that it was a file from another company, which was modified to look like an original document, since the records of everything that was changed in it were saved in the metadata – as a history that records its entire trajectory.
That way, even a Word document sent by an employee to a customer, in an email attachment, can carry essential business information with them.
Why then were metadata created, if they can represent an eventual digital security problem? In reality, its concept was conceived centuries ago. That is, it is since ancient times that this type of information has been used to classify, organize and search.
In this context, business definitions and rules, domain information, security details and, among others, XML tags are metadata, and all have a thousand possibilities of use.
Imagine the usefulness of metadata in managing files and information. In the physical stock of a virtual store, for example, they provide the location, the box number, label and the entire classification system, as well as the sender’s records and what data must accompany the order, such as a telephone number for contact at delivery time.
Going beyond document management, they become a strategic tool. This is the case of Data Warehouse technology, which deals with the extraction and consolidation of information from multiple sources, gathered on a basis that allows it to be consulted in various ways by managers.
When applied more widely, metadata represents a real revolution on the internet. It is the concept of the Semantic Web, which is an interconnection that will provide the chance for computers and humans to act at a different level of cooperation.
On the Semantic Web, all information will come with a well-defined meaning, easily identifiable and classifiable by search engines. That is, generic results or little connected to the research purposes will become increasingly rare.
Thus, Semantic Web should be seen as something comprehensive, with the aim of making the internet a refined global database. It will then be possible to obtain semantically interrelated data, and not just a list of documents in the searches we do on the internet – often without a semantic link between them.
Caution in each file
Software has numerous ways of collecting data from users. In Word, for example, footnotes can include author information, which is then recorded in the metadata.
In addition, there are functions that allow several changes to disappear from the screen, but will remain in the file’s metadata – such as changing owner names or attached image files that have been replaced.
In both cases, the metadata will record that one has been exchanged for another, even though only the most recent are visible, since the previous ones have been exchanged.
However, metadata can be changed by the users themselves. There are ways to do this, through the operating system or special software. Depending on the case, metadata may not be reliable in isolation, requiring the presence of other records to verify its veracity. In this context, a malicious employee could delete traces of problems caused by themselves.
Thus, if metadata can reduce users’ privacy, with possible exposure of information, they represent a problem for digital security – something more pronounced in companies. After all, they are records with names of employees involved in certain projects, times and days they are working, among others.
In the event of an invasion, in addition to exposing the files, which alone can cause inconvenience to companies, metadata can provide details on how the work is done and which professionals are involved, for example, something valuable in the hands of the competition. It is something that can be done even if attackers are unable to open the files, as some metadata can be viewed without opening the files.