Elasticsearch

  • Author: Roberto Rodriguez (@Cyb3rWard0g)

  • Notes: Download this notebook and use it to connect to your own Elasticsearch database. The BinderHub project might not allow direct connections to external entities on port 9092

  • References:

    • https://medium.com/threat-hunters-forge/jupyter-notebooks-from-sigma-rules-%EF%B8%8F-to-query-elasticsearch-31a74cc59b99

    • https://github.com/target/huntlib

Using Elasticsearch DSL

Pre-requisites:

  • pip install elasticsearch

  • pip install pandas

  • pip install elasticsearch-dsl

Import Libraries

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
import pandas as pd

Initialize an Elasticsearch client

Initialize an Elasticsearch client using a specific Elasticsearch URL. Next, you can pass the client to the Search object that we will use to represent the search request in a little bit.

es = Elasticsearch(['http://<elasticsearch-ip>:9200'])
searchContext = Search(using=es, index='logs-*', doc_type='doc')

Set the Query Search Context

In addition, we will need to use the query class to pass an Elasticsearch query_string . For example, what if I want to query event_id 1 events?.

s = searchContext.query('query_string', query='event_id:1')

Run Query & Explore Response

Finally, you can run the query and get the results back as a DataFrame

response = s.execute()

if response.success():
    df = pd.DataFrame((d.to_dict() for d in s.scan()))

df

Using HuntLib (@DavidJBianco)

Pre-requisites:

  • pip install huntlib

Import Library

from huntlib.elastic import ElasticDF

Create Connection

Create a plaintext connection to the Elastic server, no authentication

e = ElasticDF(
    url="http://localhost:9200"
)

Search ES

A more complex example, showing how to set the Elastic document type, use Python-style datetime objects to constrain the search to a certain time period, and a user-defined field against which to do the time comparisons. The result size will be limited to no more than 1500 entries.

df = e.search_df(
  lucene="item:5285 AND color:red",
  index="myindex-*",
  doctype="doc", date_field="mydate",
  start_time=datetime.now() - timedelta(days=8),
  end_time=datetime.now() - timedelta(days=6),
  limit=1500
)