Predict your next investment

INTERNET | Internet Software & Services / Gaming
kylingame.com

See what CB Insights has to offer

Founded Year

2007

Stage

Other Investors | Alive

About Kylin

Kylin was founded by former online game producer for SOHU, Shang Jin, and is a China-based game developer. Releasing a history-themed "Cheng Ji Si Han Online" and developing new titles as well.

Kylin Headquarter Location

China

Latest Kylin News

Overhauling Apache Kylin for the cloud

Nov 18, 2021

Apache Kylin was built to query massive relational tables with sub-second response times. A new, fully distributed query engine in Kylin 4 steps up performance of both cubing and queries. Thinkstock Recently, the Apache Kylin community released a major update with the general availability of Kylin 4. Kylin 4 continues the mission to provide a unified, high-performance, cloud-friendly, open source OLAP (online analytical processing) platform. Kylin 4 upgrades the Kylin architecture to make it easy to deploy and scale in the cloud. The new release features three major platform updates and myriad other improvements. First, Kylin 4 replaces its previous HBase storage engine with Apache Parquet, making it possible to decouple compute and storage for unlimited independent scalability. Second, Kylin 4 unifies the compute engine and removes any previous dependencies on the Hadoop ecosystem. This makes resource allocation much more flexible, resulting in a significant reduction in total cloud resource usage and associated costs. Third, by introducing a brand new, fully distributed query engine, Kylin 4 makes cubing duration and query latency much more performant compared to previous releases. In this article, we will dive into the details of these new innovations and the new capabilities they enable. What is Apache Kylin? Apache Kylin is an open source distributed analysis engine that provides SQL query interfaces above Hadoop and Spark, along with OLAP capabilities to support extremely large data sets. It was initially developed at eBay and contributed to the Apache Software Foundation. Kylin can query massive relational tables with sub-second response times. Kylin’s core idea is the precomputation of result sets, meaning it calculates all possible query results in advance according to the specified dimensions and measures. Kylin basically exchanges space for time to speed up OLAP queries with fixed query patterns. Apache Kylin lets you query billions of rows at sub-second latency in three steps: Identify a star or snowflake schema on Hadoop/Spark. Build a cube from the identified tables. Query using ANSI-SQL and get results via ODBC, JDBC, or RESTful API. How Kylin works Concepts Kyligence Each combination of dimensions is called a cuboid and the set of all cuboids is a cube. The cuboid composed of all dimensions is called the base cuboid. All cuboids can be calculated from the base cuboid. A cuboid can be understood as a wide table after precomputation. During the query, Kylin will automatically select the most suitable cuboid that meets the query requirements. Basic query process Kyligence The above figure is a scenario without precomputation, which requires on-site calculation. Agg and Join will involve a shuffle, so the performance will be poor and more resources will be occupied with large amounts of data, which will affect the concurrency of queries. Kyligence After the precomputation, the previously most time-consuming two-step operation (Agg/Join) disappeared from the rewritten execution plan, showing a cuboid precise match. Additionally, when defining the cube we can choose to order by column so the Sort operation does not need to be calculated. The whole calculation is a single stage without the expense of a shuffle. The calculation can be completed with only a few tasks therefore improving the concurrency of the query. Cloud-friendly architecture New storage engine When Apache Kylin was born, it relied on Hadoop. In Kylin 3.x and before, Kylin used HBase as a storage engine to save the precomputing results generated after cube builds; supported MapReduce, Spark, and Flink as the build engine; and used the query engine based on Apache Calcite. Time in production use and continued development have gradually exposed a variety of problems with this architecture, such as the high maintenance cost of HBase and the performance limitations of the Calcite query engine, which is difficult to expand horizontally. And while HBase, as the database of HDFS, has been excellent in terms of query performance, it still has the following disadvantages: HBase is not real columnar storage. HBase has no secondary index; Rowkey is the only index. HBase has no encoding; Kylin has to do the encoding by itself. HBase does not fit for cloud deployment and auto-scaling. HBase has different API versions and compatibility issues between them (e.g, 0.98, 1.0, 1.1, 2.0). HBase has different vendor releases and compatibility issues between them (e.g, Cloudera’s is not compatible with others). Facing the above problems, the Apache Kylin community proposed to replace HBase with Apache Parquet and Apache Spark, for the following reasons: Parquet is a mature and stable open source column storage format. Parquet is more cloud-friendly, able to work with most cloud file systems (HDFS, Amazon S3, Azure Blob Storage, Alibaba Cloud Object Storage Service, etc.). Parquet can be tightly integrated with Hadoop, Hive, Spark, Impala, etc. Parquet supports custom indexes. New Spark build engine In Kylin 4, the Spark engine is the only build engine. Compared with the build engine in previous versions, the Spark engine has the following characteristics: Kylin 4 simplifies many build steps. For example, Kylin 4 only needs two steps to build a cube: resource detection and cubing. Because Parquet encodes the stored data, an encoding process for dimension dictionaries and dimension columns is no longer needed in Kylin 4. Kylin 4 implements a new global dictionary. For more details, please refer to this Kylin Wiki article . Kylin 4 will automatically adjust the parameters of Spark according to available cluster resources and the build job. Kylin 4 improves build performance. New distributed query engine Kyligence Sparder, the new query engine of Kylin 4, is a distributed query engine implemented by the Spark back end. Compared with the original query engine, Sparder has the following advantages: Distributed query engine eliminates a single point of failure. Unified computation framework (Spark) for building and querying. Substantial increase in performance of complex queries. Can benefit from new features in Spark and the Spark ecosystem. Kyligence Kylin 4 in the cloud Cloud computing has many compelling features (unlimited storage capacity, easy maintenance, paying for what you use) that are drawing more enterprises into the public cloud. We see many companies benefiting from moving their on-premises infrastructure to cloud, achieving goals of lower TCO (total cost of ownership), greater scalability and reliability, and stronger data protection. On the engineering side, cloud computing also brings changes to the way enterprises design and deploy their software. Modular software architecture makes applications user-friendly and flexible to develop and use. Kylin 3 relies on Hadoop. Before deploying a Kylin 3 instance, users must prepare a Hadoop cluster including heavy services such as HDFS and HBase. Kylin 3 users must acquire a lot of knowledge about how to maintain and optimize these Hadoop components. Because Kylin 3 has a complex architecture, and suffers reliability and scalability problems, it is not generally suitable for cloud deployment. All of this changes with Kylin 4. Kylin 4 removes Kylin’s dependency on Hadoop components such as Yarn and HBase. The “Kylin plus Spark plus object storage” architecture has less complexity, making deployment in the cloud easier and more manageable. In this new architecture, Parquet replaces HBase and Spark replaces Yarn and MapReduce. Kyligence This figure shows how Kylin 4 could be deployed on a public cloud. First, the new architecture is lightweight, and the required components are fewer than before. Deployment is easier and faster, and most components are stateless; by contrast, HDFS and HBase are stateful services. Statelessness means we can delete these resources when we do not need them. Second, scaling is much easier than before, done simply by adding or deleting these components to your Spark cluster. Kylin 4 performance on AWS Preparation In order to help readers understand the performance differences between Kylin 3 and Kylin 4, we have provided a performance benchmark report in a standard software and hardware environment. Amazon EMR was chosen as our benchmark platform. Additionally, we chose TPC-H and SSB as our benchmark standards. The scale factor used in this test is 10 (meaning fact table has 60 million rows). The following table shows the aspects compared between different versions in this benchmark report. Metrics/Aspect Query performance In big query scenarios (queries that scan and do on-site complex calculations on large numbers of partitions/files) Kylin 3 query optimization is difficult, requiring repeated optimization of HBase RS servers and Kylin query servers. In stress test scenarios, query nodes become unstable because they need to do post-calculation on large data sets, and performance (query latency) degrades over time. Kylin 4 removes the single bottleneck of the Kylin query server, significantly improving both response time and QPS. Further, performance is stable during the stress test. In the TPC-H query set, response time of Kylin 4 is improved by 5x to 7x, and its concurrency is improved by 4x. Kyligence P95 response time of TPC-H query under different concurrency levels. In point query scenarios (queries that scan small numbers of partitions/files and do not do many on-site calculations) Kylin 4 can meet the sub-second query latency requirement after some simple parameter adjustments, and its performance is relatively close to Kylin 3 (to be specific, only slightly worse).

Predict your next investment

The CB Insights tech market intelligence platform analyzes millions of data points on venture capital, startups, patents , partnerships and news mentions to help you see tomorrow's opportunities, today.

Expert Collections containing Kylin

Expert Collections are analyst-curated lists that highlight the companies you need to know in the most important technology spaces.

Kylin is included in 1 Expert Collection, including Gaming.

G

Gaming

4,656 items

Gaming companies are defined as those developing technologies for the PC, console, mobile, and/or AR/VR video gaming market.

Kylin Patents

Kylin has filed 23 patents.

The 3 most popular patent topics include:

  • Plumbing
  • Bathrooms
  • Fluid dynamics
patents chart

Application Date

Grant Date

Title

Related Topics

Status

4/19/2020

7/6/2021

Plumbing, Bathing, Bathrooms, Fluid dynamics, Hygiene

Grant

00/00/0000

00/00/0000

Subscribe to see more

Subscribe to see more

Subscribe to see more

00/00/0000

00/00/0000

Subscribe to see more

Subscribe to see more

Subscribe to see more

00/00/0000

00/00/0000

Subscribe to see more

Subscribe to see more

Subscribe to see more

00/00/0000

00/00/0000

Subscribe to see more

Subscribe to see more

Subscribe to see more

Application Date

4/19/2020

00/00/0000

00/00/0000

00/00/0000

00/00/0000

Grant Date

7/6/2021

00/00/0000

00/00/0000

00/00/0000

00/00/0000

Title

Subscribe to see more

Subscribe to see more

Subscribe to see more

Subscribe to see more

Related Topics

Plumbing, Bathing, Bathrooms, Fluid dynamics, Hygiene

Subscribe to see more

Subscribe to see more

Subscribe to see more

Subscribe to see more

Status

Grant

Subscribe to see more

Subscribe to see more

Subscribe to see more

Subscribe to see more

CB Insights uses Cookies

CBI websites generally use certain cookies to enable better interactions with our sites and services. Use of these cookies, which may be stored on your device, permits us to improve and customize your experience. You can read more about your cookie choices at our privacy policy here. By continuing to use this site you are consenting to these choices.