Data mining – The hunt for hidden correlations that others do not see. Sounds great, what with all the data and computing power available. But, wait… Don’t all investors have access to those resources? Then why should we think there is something yet to be discovered? Data mining is not like Bitcoin mining where you know when you have it. Instead, a “new” correlation unearthed only looks shiny – we do not really know whether it is valuable or fool’s gold.
I met my (perhaps, the) first data miner in 1974 at Chase Manhattan Bank
In 1974, I was a consultant at the New York City office of Cyphernetics, a computer time-sharing firm. That year, the company worked with Chase Manhattan Bank to put up the Chase Econometrics Database on Cyphernetics’ computer installation. Doing so allowed users at Chase and elsewhere to access all the data and pre-programmed statistical programs. This was a huge step forward at the time. (See more information at the end of this write-up.)
After the database was up and running, I visited the head of the Chase department responsible for statistical analysis. He was located in a large room with rows of desks and a multitude of people busily working on terminals. As he extended his hand towards the analysts, he proudly said, “I’m having them look for correlations everywhere!”
Fresh from business school the year before, I saw two major errors in this mass correlation search:
- First, he was not testing a hypothesis, just trying to find links that looked good. That was putting the cart before the horse and it was the wrong way to achieve reliable and meaningful results.
- Second, by tossing out items (that is, assigning variables a weight of zero) until he spotted something good, he was using up so many degrees of freedom that a happenstance “good” result would carry no statistical weight – if and when he got around to creating a hypothesis
However, since he was a client, I simply said, “Your operation is really something.”
Today’s data mining is even worse than before
Hypothesis? What’s that? And why do we need it?
We are now enduring the sequel to the original data mining process. Before, once an interesting correlation was found, there was real effort put into coming up with a plausible rationale. Nowadays, though, with the speed and volume of materials coming out, who has the time to either dream up a rationale or, for the reader, to evaluate the proffered logic? Instead, simplification rules and many are willing to read and say, “interesting,” based simply on a correlation looking good.
The spread of bad information has ramped up
The especially disturbing development is that the mainstream media is playing into the data mining game. That is why we got all those August 2017 warnings about years ending in “7” being bad for the stock market. (For more information, see, “Sorry, Barron’s – The 2017 Stock Market Is Not Cursed Because It Ends In ‘7’)
The lack of time, understanding and expertise behind these articles is glaring. The terrible misinterpretation of the latest new home sales report is a perfect example. (See “Media Grossly Misinterprets New Home Sales Report. Two Firms Got It Right” for the explanation of what almost every news report got wrong.)
Then, there is the lack of common sense and rudimentary knowledge. A perfect example is the widespread warnings of the ominous link between an inverted yield curve and a coming recession. Where is the explanation about how inverted yield curves happen? And what about the fact that no two recessions are alike – that the causes and the environments are always different. That last point should be obvious because most investors remember the last recession, but few of them see the next one coming. (For more information, see “Yes, The Inverted Yield Curve Foreshadows Something, But Not A Recession”)
The bottom line
Data miners are here to stay, regardless of their lack of value. Their titillating correlations make for compelling reading to many. What has changed is that the mainstream media has joined in, so the worthless (and harmful) information also appears beneath trusted mastheads.
What to do about the trend? My preferred approach has always been to question everything, while relying on common sense and maintaining a sense of humor. Doing so is the only way I know to avoid being caught up in the next fad or getting frightened off by scary visions.
For your sense of humor…
Here is the description of a cartoon I saw years ago. A man is sitting in front of the TV, staring at the screen in shocked disbelief. The newscaster is saying, “… and in business news, the Dow Jones Industrial Average sold off today, briefly hitting zero, before bargain hunters stepped in. At the close, the market was off ten points.”
More information about Cyphernetics and Chase Econometrics
Wikipedia – “Cyphernetics”
Cyphernetics was a commercial timesharing company based in Ann Arbor, Michigan. As was the case with a number of commercial timesharing operators in the 1970s, Cyphernetics utilized the DECsystem 10 computer systems from Digital Equipment Corporation. The company also had sales offices in most major US cities and many international locations, providing communications and technical support for clients.
Cyphernetics developed many products that were well ahead of their time, and whose concepts are contained in many of the most important PC applications, even today. Cyphernetics had an email system in the early 1970s, as well as word processing, spreadsheets, project management, and time-series data storage and analysis.
Cyphernetics was purchased by Automatic Data Processing in 1975 and renamed ADP Network Services. The business did very well until the introduction of the PC, and declined after that.
From a November 1977 report, “Commercial Bank Financial and Economic Remote Computing Services Subsidiaries”
Chase Econometrics Associates (CEA) was purchased directly by the bank [Chase Manhattan Bank] to offer econometric and financial forecasting service to banks and corporate groups on a fee basis. The CEA financial and forecast databases and the forecasting models were first offered through ADP Cyphernetics timesharing network.
Following the acquisition of IDC [Interactive Data Corporation] in 1975, CEA databases and forecasting models were also made available on the IDC network.