ELK Exploration Companion

ELK

ELK (or Elastic stack) is the name for the Elasticsearch/Logstash/Kibana stack. Logstash gets log information, reports it to Elasticsearch for searching, and Kibana lets you analyze it. While the tools work independently, and with other software, they play together especially well. To understand what’s going on, let’s look at each one individually. This guide is meant to be a bit of a guided tour to each of these services.

Elasticsearch

Elasticsearch is a real-time search and analytics engine. It’s distributed and built on Apache’s Lucene. Curl it requests to default port 9200 like:

(curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>');

it will respond with a json; alternatively, you can use sense (a Kibana app) to query directly from the browser. (More on Kibana later, just keep this in the back of your head)

Insert data with PUT /{index name}/{type}/{id}; get with GET /{index name}/{type}/{id}.

Use GET /{index name}/{type}/_search to get all, append with ?q={field}:{value} to search more Domain-specific language (DSL); can use a get body to /{index name}/{type}/_search.

DSL allows for queries that are more specific and makes them easier to read. Look at some examples! Searches will often provide relevance scores. You can use constant_score to avoid scoring and exact matches.

Filters let you narrow down your searches; they can (and should) be nested:

  • Bool has four fields, must, should, must_not, and filter; filter runs the query in non-scoring mode.
    • Many filters use bool to deal with multi-part/multi-word queries.
  • Terms can take an array instead of a value, which searches for any that match; use tags and tag_count within to specify more specific matches.
  • Use range with GT, LT, gte, and/or LTE to search on ranges.
    • For dates, can use things like now-1m, {timestamp}||+1M.
      • Be aware of date format marks though!
    • Strings search in lexicographic order.
  • Dealing with null.
    • Exists deals with null values (if a value is not stored in post, it’s null).
    • Missing does the opposite of exists.
    • For both of the above, note that you may need to choose a null_value to store an actual/intentional null value.
    • Exists/missing will reduce to bool -> should on subfields if applicable.
  • Full-text has an analysis phase, unlike term-based.
    • Match is the general “find this thing” query.
    • Multi-word match queries are bool -> should; can use operator: and to change to must.
    • Can boost: # to add a weight (default 1).

Aggregations are an important part of Elasticsearch. Buckets are documents that meet a criterion; you set those using aggs in a query. Metrics are statistics on a bucket’s documents, like SQL COUNT, AVG, SUM, etc. You can nest both of these.

You can monitor the Elasticsearch status in kibana using the x-pack plugin. Alternatively, you can use curl directly:

      • GET _cluster/health returns status, nodes, shards, etc
      • GET _nodes/stats to check individual nodes, their indices
      • GET _cluster/stats to get similar info to node stats but aggregated for the whole cluster
      • GET {index}/_stats to get stats about that index, or GET _all/_stats for all
      • GET _cluster/pending_tasks to see if any tasks are pending
      • GET /_cat to see a list of endpoints which return more linux-like tabular info, instead of json

Logstash

Logstash is a log formatter/aggregator and operates on a pipeline. Make a pipeline configuration file to describe what you want to happen to the logs. A pipeline includes inputs, filters, and outputs (and codecs). To use a pipeline, run logstash like “{path}bin/logstash -f {pipeline file} --config.reload.automatic”, which allows auto config changes without stopping.

Inputs are ways that data enters the pipeline. Some common ones:

  • Beat: Filebeat collects logs from server files
  • File, like unix tail -0F
  • Syslog listens on port 514
  • Redis for Redis server

Filters are actions taken on the logs during processing.  Some common ones:

  • Grok to parse and structure arbitrary text (built in)
  • Mutate for specific changes (rename, remove, replace, modify, etc)
  • Drop / clone
  • GeoIP to add go ip info

Outputs are destinations from the pipeline. Some common ones:

  • Elasticsearch (you want this if you’re doing ELK)
  • File
  • Graphite - another open source tool
  • Statsd, which listens for statistics over udp

Codecs allow you to change the output format (like json, or multiline).

For all parts of the pipeline, you can add more plugins of any of these types to match your needs.

Monitoring

  • Can use ?human={true|false} or ?pretty={true|false} to tune for debugging
  • GET commands
    • / for general
    • /_node_/pipeline for pipeline info
    • /_node/os for node os info
    • /_node/jvm for jvm info
    • /_node/plugins for any plugins
    • /_node/stats/{jvm|process|mem|pipeline} to get some useful statistics
    • /_node/hot_threads to see what threads are running with high cpu for long

Kibana

Honestly, the best way to get to know Kibana is to try it. Elastic (who makes all ELK components) hosted a demo here: demo.elastic.co/.

Alternatively, set up your own: Install from a package (deb or rpm) and run sudo -i service kibana start to start it. It listens on localhost:5601 by default.

Double alternatively, you can get a docker image with kibana and x-pack already there, so it can easily connect to elasticsearch.

  • Quick guide to Kibana
    • Discover lets you see what fields are available, and do quick searches.
    • Visualize lets you make charts out of saved searches. Be sure to click the “play” button to update the visualization with changes.
    • A dashboard can be created out of visualizations to get a sense of what’s going on. I highly recommend making at least something similar to a “server/app status” one.
    • Management just lets you manage saved searches/dashboards, kibana/search configuration, and reporting to another source.
    • Dev tools help you develop for or with the ELK stack.

X-Pack

X-pack is not part of the ELK stack, but it’s useful in using it. It adds a sign on and additional security, alerting conditions, monitoring for the components of ELK, graph visualization to Kibana, and pdf reports.


Developers can now get a no-cost Red Hat Enterprise Linux® Developer Suite subscription for development purposes by registering and downloading through developers.redhat.com.

Last updated: February 24, 2024