Home Archive Massively Parallel Processing of Big Data – CEO Interview with ParStream

Massively Parallel Processing of Big Data – CEO Interview with ParStream

by -

Parstream-socialmarketingfella
In the world of data there’s long been a limitation in what type of analytics can be generated in real-time. Stream analytics for things like calling on in-memory data are available, but it’s not been possible to pair real-time with massive volumes of big data. One company’s set to try and change that.

ParStream’s vision is to revolutionize the database market to enable real time big data analytics applications. To achieve this, the company is performing fundamental research and driving innovation in database technology. The results allow users to perform real-time analytics on big data at a significantly lower cost.

In areas like keyword monitoring, for example, ParStream is the analytics platform for Searchmetrics, a search and social analytics software company. The company monitors over 75 million domains and 100 million keywords on the world’s biggest search engines.

Searchmetrics customers use the service to monitor competing domains and to optimize their keywords to drive traffic. This translates into typical imports of over seven terabytes of data and querying more than ten billion data records. By switching to ParStream, Searchmetrics greatly reduced infrastructure requirement and achieved faster import and query execution times.

Interview

Michael HummelTechnorati writer Andre Bourque (@SocialMktgFella) and Sam KumarsamySam Kumarsamy, Partner, Tap the 90, a market analyst and research company interviewed Michael Hummel, CEO of Parstream.

 

Sam/Andre:  What is your secret sauce?  And would you consider Parsteam to be a player in the NewSQL database category since it ships with an SQL interface?

Michael: Yes, we are in the NewSQL database market even though the NewSQL market is not clearly defined as other markets.  Even NoSQL is not clearly defined and there are many definitions out there. But it is fair to say we are in the NewSQL database market or Analytics Platform product.

You asked what is our secret sauce- It’s the HPCI or High Performance Compression Index. At ParStream, we have found a way to represent the data in an index which allows us to find the relevant record much faster by looking at much less data. We not only work with numbers or low commonalities, we can also index strings, dates, floating point numbers and of course integers as well. The efficiency in our software comes from not having to crunch so much data. The other major impact is the significant reduction in the number of servers required to run ParStream.

Sam/Andre:  That is good to know Michael, but have you done any benchmarking against your peers and competitors such as VoltDB, NuoDB, and Clustrix as there is so much noise out there with new companies sprouting all over and some comparative analysis would help.

Michael: Yes, we have. Most of the other technologies today are distributed databases optimized for transactional workloads. So it makes little sense to compare them to Parstream – we are 50-100 times faster. We have compared Parstream to popular products and to other traditional databases and we had performance gains usually in the range of 1000. We have compared to MySQL clusters and there we achieved performance gains between 1000 and 12,000. In fact, some of our clients have switched from popular traditional DBs and MySQL clusters. Officially we are not allowed to compare …

Sam/Andre: So you are officially not allowed to share this benchmarking information?

Michael: Nobody is allowed to share this information against commercial databases due to the new laws. There is a Board that is trying to put up a new benchmarking standard because TPCH (old Std.) is using Joins in Benchmarking and in the Big Data space everybody is trying to avoid Joins.

Sam/Andre: That’s too bad because customers definitely need help among the plethora of new databases with their marketing buzz to determine right one for their specific use case.  A vendor agnostic company or independent Board comparing various vendor performance metrics can definitely help clear the air for customers.

Michael: I completely agree with you and it is totally confusing as everyone is comparing apples to pears. All I can say is that in all tests conducted by our clients, ParStream was always faster.

Sam/Andre: Who are you running into most in sales situations?  In other words who are your closest competitors

Michael: Our competitors are Vertica, Sybase IQ and SAP HANA.

Sam/Andre: Could you share more details on your indexing and how it is different from other databases from a business perspective?

Michael: Yes, if you look at the analytical space, then at the moment columnar databases are most frequently used. In-Memory is used too; the problem with such an approach is these databases have to run through the whole column to analyze the data and that is called full column scan. There is a lot of big data to look at when you’re talking billions of records!

RT-analytics-socialmarketingfella
ParStream technology provides much faster query execution, i.e. delivers much faster the results of a given query than other databases. The technology is highly specialized on analyzing large amounts of data in sub second response times while continuously importing new data at very high speed. So what we can provide is real time analytics meaning fast results on up to second data and the secret sauce is in our technology. We can provide full granularity, full flexibility in all dimensions and extremely fast results.

Sam/Andre: So is this Index, your secret sauce so defensible that the large companies such as HP, SAP and others can’t emulate even if they throw their deep research pockets at it?

Michael: That is a fair question and yes, it is very defendable technology.  First we have a patent on it. Second, we are moving very fast ahead on the indexing technology to make it applicable in all different markets and that is very hard for others to catch up.

Sam/Andre: We have one more technology related question; I have been reading up on the fact that in some situations you recommend that your customers use Graphical Processing Units (GPUs) as opposed to CPUs.   What is the real advantage of using GPUs versus CPUs and when would you recommend GPUs over CPUs?  

Michael: GPUs have two major advantages over CPUs. Let’s call GPUs as General Purpose GPUs (or GPGPUs) in the future and they are not made to display nice graphics on the screen but they are made as a mathematical co-processor.

First advantage is memory throughput and the second one is the number of calls. The first advantage means that you can transfer data from memory to the processing call 12 times faster than on a standard CPU. If that is the limiting factor in your software and you have to write excellent code for the processing unit to make that possible.

The second advantage is the number of calls but for the calls you will need a piece of software which can run highly parallel and break down a single query into multiple little pieces which can be analyzed in parallel. Parstream can do exactly that and it doing that on multi-core CPU chipsets and on multiple servers in a cluster. On GPUs we can do that as well and the advantages we have just seen but they also come with disadvantages. The current state of the GPGPUs is that they can only store up to 6GB of data on the GPGPU board itself. That’s not a lot and it is only a very small memory they have and we only recommend using GPGPUs if you have expensive analytical operations on a small slow moving data set.

On static data and for example gene sequencing is done on such GPGPU boards very successfully because it is limited data volume but you have to crunch for minutes and hours on the same data. As soon as you have more dynamic data or larger data volume you have to move the data over to the GPGPU board, do the analysis and then move it back and this costs too much time. Therefore we are looking for the next generation of integrated CPUs with GPGPUs on the same dial. There is a new generation of chips coming and we are especially waiting for multi-core GPU chipsets from Intel.

Sam/Andre: In other words when the next generation GPGPUs are released are you looking to take advantage and replace CPUs?

Michael: I don’t believe it will be a full replacement for CPUs. GPGPUs have certain limitations. They are good at certain operations like filtering of bit map indices or filtering data but they are not good on other operations like on randomized access to certain data areas. They are good for streamed processing and Parstream actually comes from parallel streamed operation so streaming memory access like GPU boards have it is ideal for us.

Sam/Andre: Moving on to other questions; what is your sales model and channel strategy?  I understand in Europe everyone leverages channels but do you have a direct sales model too or are you 100% channel sales focused?

Michael: We open up markets with direct sales but our main focus is on partnering with other software companies with ISV partners to sell our solution as an accelerator for their products. We make other software companies and our end users more successful on analyzing big data. We are an enabling technology and therefore we partner with companies that have a certain vertical expertise and help them be even more successful in that vertical.

Sam/Andre: Would finance be considered one of your vertical market focuses?

Michael: Absolutely, we mainly focus on digital marketing, e-commerce, retail, telecommunication, and finance.

Finance-ParStream-socialmarketingfella
Sam/Andre: So what does your pricing model look like?  Do you have a cloud strategy or is it sold strictly as a turnkey appliance?

Michael: We do not sell appliances, Parstream is a software-only solution and together with hardware partners we can provide servers with Parstream already installed on it.  Our customers can run it on their bare metal or run it in a virtualized environment or run it in Amazon’s EC2 in a cloud or in any other cloud offering.

Our license model is purely based on the data volume you can store at any given point in time within the database. So you can use as many cores as you want, use as many servers as you want and query the system as much as like and it won’t impact pricing.

The smallest license we sell is 5 terabytes because typically our customers have between 5-100 TB of data.

Sam/Andre: Any famous paying customer (s) that you would like to name?

Michael: There are several in Europe and in Australia.  The ones that I can openly name are:

  • Etracker, a leading European web analytics company,
  • Search Metrics, a SEO specialist in Germany,
  • Coface Services, a S&P like company in the French market

There are two I cannot name that are in the tourism and in the oil and gas industry.

Sam/Andre: So you are still looking to bag your first US customer? 

Michael: Yes, we are testing with 5 companies at the moment and we are in the middle of a sales cycle and hoping to close some sales soon.

Sam/Andre: Last question, so can you shed some light on your product roadmap? What is next for Parstream and would we perhaps see a SaaS offering in the future?

Michael: Our product roadmap has three major areas:

  1. Keep the lead – continue to work on our core technology to make it faster than it is today and allow advanced analytical capabilities on the data
  2. We would like to improve the ease of use by adding tools to Parstream
  3. Integration with third party products especially with ETL vendors and IAAS providers

Sam/Andre: Thank you for your time Michael and will see you soon in a Bay Area meet up.

Parstream is certainly going up against some big guns in the form of SAP and HP but they do seem to have a defendable technology and the right sales strategy.  Only time will determine how successful they will be in acquiring customers in the US.

Newsletter Signup

Subscribe to our monthly newsletter, and we'll send you insights and opinions on the online advertising industry.