‘Data is the new oil’ – this phrase summarises that data is becoming the world’s most valuable commodity. The buzz around alternative data has increased considerably in the financial and investment community, as asset managers try to extract signals from alternative data to transform them into added wealth for their clients.
Information retrieved from non-traditional sources is increasingly used as input for investment decisions. It can give investors insights that go beyond traditional data such as earnings, credit reports, and company and industry statistics. For example, alternative data can come from ingenious new data sources such as sensors or GPS locations. Other examples of alternative data sources are social media platforms, news wires, satellite imagery, sustainability-related data, online spending patterns and more. The usefulness of alternative data lies in providing almost real-time information and information which was recently unavailable.
There are several formal characterisation methods for alternative data sets, based, for example, on a data set’s origin, how and by whom it is generated, and others. These characterisation schemes do not consider how to determine the value of alternative data for investors. This is exactly the subject of this paper. Here we suggest an alternative data valuation framework, which distinguishes various investment styles and takes into account the existing investment philosophy.
An interesting (and unorthodox) source of alternative data is satellite imagery. Satellites orbit regularly and frequently over the same spots around the earth, providing regular updates on a location. The cost of satellite imagery has diminished significantly in the last few years, while the resolution has improved. Examples of satellite imagery applications in investment decisions include the number of cars in parking lots of retailers to forecast revenue, or ships passing through ports to proxy the volume and rate of commercial activities. Another interesting example is the use of images of shadow lengths from a real estate project’s construction site to determine the project’s pace.
A popular source of alternative data – and the example we use in this paper – is the sentiment of news and social media content. Research in this area dates back to the seminal papers by Tetlock (2007, 2008) and is further developed by numerous papers by Borovkova (e.g., Borovkova and Xiaobo (2015)) and others. In these papers, various applications of sentiment in finance are investigated, such as systemic risk monitoring, commodity trading, sector rotation and so on. It appears that sentiment can be a powerful tool in reducing risk and enhancing investor returns. These and other related papers justify the use of sentiment as an alternative (and additional) signal for investment strategies.
Challenges with alternative data sets are related to their volume and complexity: how to process the data, how to turn it into actionable signals and how to join it to more traditional data typically used in investing (such as asset prices or company’s fundamentals). In the satellite imaging of parking lots example, how does one determine whether the lot belongs to Walmart or Target or Costco? Textual data sets such as media content are challenging too, since these are typically unstructured and at first glance it seems hard to determine what asset(s) a text refers to for example, whether ‘apple’ refers to a fruit or a rather large tech company. These examples are related to mapping the data into an investable universe, and the data set which has been already mapped (or tagged) into companies, commodities or other assets is clearly more valuable to users, as it saves them a lot of processing time and energy.
This brings us to another major challenge with alternative data: how to assign value to it. This question is interesting from the perspective of both asset managers (who can spend substantial amounts on alternative data) and data vendors. Data buyers and data providers have different perspectives when it comes to pricing. For a data vendor, it is essential to recover the costs associated with the creation and distribution of the data set. However, the value of alternative data for asset managers lies predominantly in its monetising potential. Our paper proposes a practical framework to quantify the value of alternative data in quantitative investing.
Specifically, we address the following questions:
- What is the impact of an alternative data set on portfolio performance?
- Is there a systematic way to capture and evaluate the benefits achieved by using alternative data?
- Is the performance of a data set sensitive to an investor profile and strategy?
We take the popular factor model investment strategy (Fama and French, 2016) – the main workhorse of modern investing – as the main competitor and use news sentiment data as the example of an alternative data set. We show that sentiment data has a significant value for asset managers: portfolios constructed using sentiment signals significantly outperform the benchmark, but, most importantly, using sentiment as the sole portfolio construction factor performs as well as, and in some cases better than, traditional multifactor portfolios. We also show that the value of alternative data depends on the investor’s profile and the fund’s size.