What is ElasticSearch?


Elasticsearch is based on the principle of search engines and is part of the elastic stack. This article on Elasticsearch is a combination of concepts and learning and you will gain a deeper understanding of how Elasticsearch works. This article is made from an early perspective. We will walk through various aspects of Elasticsearch, and at the end of the article you will get an overview about elasticsearch.

About Elasticsearch

In today's business scenario, analyzing log data is very important for an enterprise. Elastic stack can be used for the same. An important aspect of log analysis is the analysis and aggregation of huge log data. Elasticsearch plays a major role in analyzing these data. It is an open source analytics and aggregation platform designed to work together with Logstash and Kibana.

Introduction to Elasticsearch

Elasticsearch is an open-source, broadly-distributable, readily-scalable, enterprise-wide search engine. It is accessible through an extensive and elaborate REST API. Elasticsearch can power extremely fast searches that support your data discovery capabilities in your applications. It makes it easy to analyze and aggregate huge volumes of data. Elasticsearch belongs to the most popular full-text search engines implementation available. Large companies like Facebook, GitHub, Netflix and many others have been using this for several years.

Elasticsearch was developed and published by Shay Banon in 2010. Elasticsearch is powerful, flexible and allows to do real-time analytics with a distributed environment. Elasticsearch uses de-normalization data to improve the search performance and can be used as a document store database.

Who Uses ElasticSearch?

According to recent report approximately 2943 companies are currently use elasticsearch in tech stack including uber, udemy and facebook.

Facebook is using elasticsearch from past 3+ years, having gone from single enterprise search to over 40 tools across multiple clusters with 60+ million queries a day and growing.
Netflix use elasticsearch to store, index and search document have grown from a couple of isolated deployment to more than 50 clusters comprised to nearly 800 nodes that are centrally managed by cloud engineering team.

Terminologies in Elasticsearch

This are the few basic Concepts and terminologies used in Elasticsearch.
1. Node : A node is a single server that stores searchable data and is part of a cluster. If a cluster only contains one node then this node stores all of the data otherwise a node contains a subset of a cluster of data nodes participates in the clusters indexing and search capabilities.

2. Cluster : Cluster is a collection of one or more nodes or servers. A cluster can consist of as many nodes which makes it extremely scalable. The collection of nodes contain all of the data in the cluster and the cluster provide indexing and search capability across all of the nodes. Clusters are identified by a unique name.

3. Index : An index is a collection of documents which could be a product and order. Indexes are also identified by names that you choose and these names must be lowercased. The names are used when indexing searching updating and deleting documents within indexes. You can define as many indexes as you want within the cluster.

4. Type : A type represents a class or category of similar documents example - product or user. A type consists of a name and a mapping where the mapping does not need to be explicitly defined. Type can be table within database. Index can have one or more types and each can have its own mapping. Type stored within metadata field name _type because lucene has no concept of document type.

5. Mapping : Mapping that is similar to the schema of a table in a relational database. It describes the fields that a document of a given type may have along with the data types such as string, integer, date. Mapping also included the information on how fields should be indexed and how this should be stored by Lucene. Dynamic mapping means that is optional to define a mapping explicitly.

6. Documents : A document is a basic unit of information that can be indexed. It consists of fields which are key value pairs where a value can be of various types such as strings, dates, objects. Document are expressed in JSON. You can store many document within an index.

7. Shards : Provides the ability to subdivide your index into multiple pieces called shards. A shards is fully-functional and independent index. Shards can be specified while creating index. It allow to scale horizontally by content volume and allows to distribute and parallelize operations across charts which increases the performance.

8. Replicas : Replicas ensure high availability. A replica is a copy of a shard which can take over in case a shard or node fails. A replica never resides on the same node as the original shards meaning that if a given node fails the replicas will be available on another node. By default, Elasticsearch adds 5 primary Shards and 1 replica for each index.

Features of Elasticsearch

1. Elasticsearch is scalable up to petabytes of structured and unstructured data.
2. Elasticsearch can be used as a replacement of document data stores.
3. Elasticsearch is open source and available under the Apache License version 2.0
4. Elasticsearch is used primarily to implement search.

Advantages of Elasticsearch

1. Elasticsearch is distributed, which makes it easy to scale and integrate in any big enterprise.
2. Elasticsearch supports almost every document type except those that do not support text rendering.
3. Elasticsearch developed on Java, which makes it compatible on almost every platform.
4. Elasticsearch supports the concept of gateway, which is used to create full backups.

Difference Between Elasticsearch and SQL

Basic Of Comparison ElasticSearch SQL Server
Description Restful mordern search and analytics engine Open source RDBMS
Database Model Search Engine Relational RDBMS
Developer Elastic Oracle
Implementation Language Java C and C++
XML Supported XML Not Supported XML Supported
Partitoning Method Shared Horizontal
Replication Method Yes Master-Master and Master-Slave
Consistency Concept Eventual Consistency Immediate Consistency
Map Reduce ES-Hadoop Connector No
Foreign Keys No Yes
Transition Concept No ACID

Elastic Stack

The Elastic Stack consists of technologies developed and maintained by the company behind Elasticsearch. We just talked about Elasticsearch, which is the heart of the Elastic Stack, meaning that the technologies that we are about to tell you, generally interact with Elasticsearch, although it’s optional for some of them. However, there is a strong synergy between the technologies, so they are frequently used together for various purposes.

Kibana : Kibana is an analytics and visualization platform, which lets you easily visualize data from Elasticsearch and analyze it to make sense of it. You can think of Kibana as an Elasticsearch dashboard where you can create visualizations such as pie charts, line charts, and many others. Kibana is also where you configure change detection and forecasting. Kibana also provides an interface to manage certain parts of Elasticsearch, such as authentication and authorization.
Kibana uses the data from Elasticsearch and basically just sends queries using the same REST API. It just provides an interface for building those queries and lets you configure how to display the results. This can save you a lot of time because you don’t have to implement all of this yourself.

Logstash : Traditionally, Logstash has been used to process logs from applications and send them to Elasticsearch, and that’s still a popular use case, but Logstash has evolved into a more general purpose tool, meaning that Logstash is a data processing pipeline. The data that Logstash receives, will be handled as events, which can be log file entries, ecommerce orders, customers, chat messages, etc. These events are then processed by Logstash and shipped off to one or more destinations. A examples could be Elasticsearch, a Kafka queue, an e-mail message, or to an HTTP endpoint.
A Logstash pipeline consists of three parts or stages - inputs, filters, and outputs. Each stage can make use of a so-called plugin. An input plugin could be a file, for instance, meaning that Logstash will read events from a given file. It could also be that we are sending events to Logstash over HTTP, or we could look up rows from a relational database, or listen to a Kafka queue. A filter plugin is all about how Logstash should process them. Here we can parse CSV, XML, or JSON, for instance. We can also do data enrichment, such as looking up an IP address and resolving its geographical location, or look up data in a relational database. An output plugin is where we send the processed events to formally, those places are called stashes.

X-Pack : X-Pack is actually a pack of features that adds additional functionality to Elasticsearch and Kibana. It adds functionality in various feature areas.

Security : X-Pack adds both authentication and authorization to both Kibana and Elasticsearch. In regards to authentication, Kibana can integrate with LDAP, Active Directory and other technologies to provide authentication.

Monitoring : X-Pack enables you to monitor the performance of the Elastic Stack, being Elasticsearch, Logstash, and Kibana. Specifically, you can see CPU and memory usage, disk space, and many other useful metrics, which enable you to stay on top of the performance and easily detect any problems.

Alerting : Alerting is not specific to the monitoring of the Elastic Stack though, as you can set up alerting for anything you want.

Beats : Beats is a collection of so-called data shippers. They are lightweight agents with a single purpose that you install on servers, which then send data to Logstash or Elasticsearch. There are a number of data shippers called beats - that collect different kinds of data and serve different purposes.
For example, there is a beat named Filebeat, which is used for collecting log files and sending the log entries off to either Logstash or Elasticsearch. Filebeat ships with modules for common log files, such as nginx, the Apache web server, or MySQL. This is very useful for collecting log files such as access logs or error logs.
Metricbeat , which collects system-level and/or service metrics. You can use it for collecting CPU and memory usage for the operating system, and any services running on the system as well. Metricbeat also ships with modules for popular services such as nginx or MySQL, so you can monitor how they perform.

Best ElasticSearch Books

1. Elasticsearch in Action

Author :- Radu Gheorghe, Matthew Lee Hinman, Roy Russo
Edition :- 2015 Edition
Published by :- Manning Publications

Elasticsearch in Action teaches you how to write applications that deliver professional quality search. As you read, you'll learn to add basic search features to any application, enhance search results with predictive analysis and relevancy ranking, and use saved data from prior searches to give users a custom experience. This practical book focuses on Elasticsearch's REST API via HTTP. Code snippets are written mostly in bash using cURL, so they're easily translatable to other languages.

2. Elasticsearch: The Definitive Guide

Author :- Clinton Gormley, Zachary Ton
Edition :- 2015 Edition
Published by :- O'Reilly Media

The reader with a search background will also benefit from this book. Elasticsearch is a new technology that has some familiar concepts. The more experienced user will gain an understanding of how those concepts have been implemented and how they interact in the context of Elasticsearch. Even the early chapters contain nuggets of information that will be useful to the more advanced user.

Best ElasticSearch Courses And Tutorials

1. Complete Guide to Elasticsearch(Udemy)

2. Elasticsearch: Getting Started(Pluralsight)

Also Check

   What is LAMP Stack?
   10 Best Data Science Books