Growth is Data is Stressing Modern Analytics Systems

As the size and velocity of data continues to grow faster than the ability of servers to keep up, modern analytics systems cannot keep pace.

Growth is Data Stressing Modern Analytics Data is doubling every two years, while server power doubles every four. When you couple this with modern analytics methods that tend to scale exponentially as datasets grow, you quickly end up with an analysis gap. Approaches to solving this problem range from sampling data, to adding servers, to moving analytics into memory. None of these methods are sustainable. Treeminer approaches the problem from a new perspective: how can we structure data so that we can dramatically improve the performance of analytics algorithms. Introducing Vertical Data Mining.

By organizing data in thin, vertical strips, analytics algorithms can operate exceptionally efficiently.

Treeminer's analytics completes in a fraction of the time of competitive solutions, while returning results just as accurate.

Class Driven Analytics

Modern analytics systems work by sequentially considering each new data point, one by one. In the case of the bank, perhaps each data point represents a new loan application; in the case of the image analyst, perhaps it is a pixel in a satellite image. Each new data point requires that the analytics run on that point to execute on it. Unfortunately, if an image grows from one million pixels to two million pixels, you are doubling the amount of work required (and potentially quadrupling the execution time!) We call this point-driven classification - each point in the dataset requires a loop through an analytics engine. As a result, growing datasets cause analytics executions times to grow as well.

Treeminer is introducing a concept called Class Driven Analytics. By organizing data vertically for the analytics process, each new point in the dataset does not require analytics to run separately. In fact, we run the analytics for a particular class, or attribute in the dataset for the whole dataset in a single operation. Imagine if you will now selecting all mortgage applications that we predict will not foreclose in a single operation, without needing to consider each application separately! This is the power of Class Driven Analytics.

Government Solutions

Distributed DataSheet datasheet/Treeminer Government Solutions Datasheet.pdf