Columns

How the ‘Data’ Sausage Gets Made

August 13, 2019

If you’re reading this right now, odds are that some of your personal data attributes are being captured, stored, and/or sold by one of the 4000+ data brokers currently operating in the U.S. As digital transactions continue to absorb an increasing share of the economy, the impact of these attributes casts a growing shadow on the lives of consumers across the country. Yet, how consumer data is collected, stored, and utilized remains largely opaque to the average American. 

Join OWI as we take a dip into the modern data economy looking at the history of data brokers, and exploring how your “personal data sausage” actually gets made.

Data Brokers are Older than you Think

Recent events, like Cambridge Analytica, General Data Protection Regulation (“GDPR”), and a constant drip of corporate data breaches, have brought heightened media and regulatory attention to the world of consumer data. Yet, the commercial use of consumer data was a multi-million dollar industry long before the invention of the personal computer or before “The Facebook” was a twinkle in Zuckerberg’s eye.

The names of the “big three” consumer credit bureaus (Experian, Equifax, TransUnion) are largely synonymous with the term “data broker” while other data giants such as Acxiom, CoreLogic, and Intelius remain less well-known to the average consumer. Fewer people realize that both Equifax and Experian have histories dating back to before the turn of the 20th century. Experian, the oldest of the three, dates back to 1826 when London merchants began exchanging information on customers who failed to pay their debts. Equifax shares a similar origin story.  In 1898, a Tennessee grocery store owner saw an opportunity, and he began to monetize a list of his most creditworthy customers by selling his data to other local merchants.

By the middle of the 20th century, a booming post-war economy had created skyrocketing consumer demand for expensive new products like cars, dishwashers, and TV sets. To meet the data demands of this consumer credit boom, thousands of local and regional consumer credit agencies sprang up across the country. Before the advent of the modern database, these consumer credit files were maintained in massive filing cabinets of 3×5 index cards. Merchant or lender requests for consumer data were answered by telephone operators, with runners manually pulling an individual consumer’s card for the operator to read out verbally.

By the latter half of the 20th century, the scale and scope of the data attributes collected were comprehensive even by modern standards. A 1970 Congressional investigation into the practices of the direct mail industry is ripe with quotations that would fit seamlessly into a 2019 data privacy hearing. 

“I have frequently spoken of the danger to privacy and to the Bill of Rights presented by the compilation of massive amounts of data in computerized information systems, and the ease with which all safeguards to the individual can be bypassed,” said New Jersey Representative Cornelius E. Gallagher in his testimony

Gallagher further cited the unauthorized sale of data tapes of 2 million Encyclopedia Britannica customers to mailing list brokers. Additional testimony includes an example of a man who learned his wife was pregnant through targeted mailing for diaper ads, as a result of medical lab testing results being sold to a data broker, and a private firm selling computer tapes containing the detailed personal data of over 3 million Federal employees.

Where Your Data Lives on the Internet 

Nearly 50 years later, consumer data sources remain largely unchanged. However, the scale and the scope of consumer data has exponentially increased alongside the explosive growth of the data attributes available thanks to our near 24/7 online presence. The vast majority of consumer data attributes available for sale are obtained from one of three primary sources: the government, public databases, and commercial enterprises.

Government: Data brokers source attributes from all three levels of government: federal, state and local. Federal sources include many commonly available data sets, such as the U.S. census, publicly available congressional district and geographic data, and change of address information sourced from the U.S. Postal Service. Additionally, attributes on deceased individuals including name, date of death, and social security number (SSN) are available via the Social Security Administration Master Death File.

State and local government sources often offer even richer detailed attribute sets. Public tax and property records, professional and recreational licenses, mortgage records, voter registrations, motor vehicle licensing and registration, birth certificates, death certificates, and criminal records are all frequently leveraged. 

How this information is gathered by data brokers remains as varied as the sources themselves. While some data sets are made openly available for direct online access, others require additional transaction layers before making their way into the largest databases. More esoteric data sets are often only available via in-person records requests. These are often fulfilled by independent subcontractors, who collect and collate data sets before selling organized data up the food chain to larger brokers.

Public Databases: In the digital era, public online sources are also fertile ground for consumer data collection. With nearly limitless computing power and storage available, nearly any information posted to a public website can be scraped and stored by automated bots. For example, personal employment history posted to a LinkedIn profile or a post to Craigslist containing a phone number or email address. Depending on the website, scraping might be contractually agreed to, tolerated for public profiles and posts, or prohibited by the site’s terms of service. However, the terms of service are largely an honor-based system, as both technical and regulatory countermeasures remain limited. As a rule of thumb, assume that if it can be discovered by surfing the web, a bot likely has the ability to scrape it.

Commercial Enterprise: One of the largest sources of consumer data attributes remains commercial enterprises, conducting and logging billions of interaction and transaction details per year. If data can be captured during a transaction, it’s very likely that it can, and has, been sold. Retailers bundle together information including the category of a purchase, the date and dollar amount, and the type of payment method used. Online web advertising networks sell information about websites visited and ads clicked. Mobile and landline telecom operators offer information about account openings.

As a consumer, identifying which sites and businesses sell your information can be challenging, as the details are often buried deep within terms and conditions or privacy policies. While many sites state that personal information is not sold to third parties, anonymized, or aggregated data is often excluded from these restrictions. While valuable independently, these digital breadcrumbs can also be combined together with other data pieces from government and public sources to make them more useful. These enhanced data sets are then sold directly or between the data brokers themselves.

Part II Coming Soon 

The speed and scope of this data collection can surprise even the most hardened consumer. But questions remain: 

  • What is being done with these data attributes?  
  • Are they being used to add value or remove privacy from our daily lives? 
  • What have regulators done to restrict how these attributes may be collected and monetized? 

Stay tuned for Part II, where we explore the next link in the chain of the modern data economy.