Splunk Regex Cheat Sheet



Intro. What is Splunk

Regex in Splunk SPL “A regular expression is an object that describes a pattern of characters. Regular expressions are used to perform pattern-matching and ‘search-and-replace’. Of regular expressions. Regular Expression Flags; i: Ignore case: m ^ and $ match start and end of line: s. Matches newline as well: x: Allow spaces and comments: J: Duplicate group names allowed.

Splunk turns Machine Data Into Answers

  • Real-Time – Splunk gives you the real-time answers you need to meet customer expectations and business goals.
    See How Zillow is Taking Advantage
  • Machine Data – Use Splunk to connect your machine data and gain insights into opportunities and risks for your business.
    Gain Answers With Machine Data
  • Scale – Splunk scales to meet modern data needs — embrace the complexity, get the answers.
  • AI and Machine Learning – Leverage artificial intelligence (AI) powered by machine learning for actionable and predictive insights.
    Learn About the Must Have Technology
  • Reporting health conditions in real time
  • Delve deeper into the patient’s health record and analyze patterns
  • Alarms / Alerts to both the doctor and patient when the patient’s health degrades

Splunk is the engine for machine data

  • Machine data is more than just logs -­‐ it’s configuration data, data from APIs and message queues, change events, the output of diagnostic commands and more
  • Log types: ApplicaFon, Web Access and Proxy, Call Detail Records(CDR), Clickstream, Message Queues, Packet, Database audit and tables, File audit, Syslog, WMI, PerfMon

Quick and easy way to…

  • Easily visualize the data into events rather then lines of text
  • Quickly get the data properly broken into events
  • Accurately get the Timestamp extracted
  • All in a wicked cool GUI… – Once everything is good you take your PROPS secngs and deploy

Splunk structure

TestEnvironment

  • Every Splunk deployment should have a test environment
  • It can be a laptop, virtual machine or spare server
  • Should have the same version of Splunk running in production
  • Accessible to other Splunk developers and administrators

CONSIDERATION IN MIND when instaling Splunk

The following considerations need to be taken into account before installingconfiguring:

  • 1.Disc capacity
  • 2. Prformance CPU
  • 3.SSH as best practices for app configuratuions
  • 4. SE/CIM setuo
  • 5.Universal forwarder config/install

Planningfor Splunk setup

Setting up a Splunk AWS instance details: Instance URL: ec2-1-2-3-4.eu-west-1.compute.amazonaws.com

Diagram of systems with a single EC2 Instance being the AIO. Only the UFagent (installed manually to clients) and TA (pushed to clients viaDeploymentApp on Server, no manual install) are installed remoteclients/hosts.

The AIO server is comprising of all these modules All-In-One:

  • Search Indexer
  • Deployment Apps
  • SE
  • CIM
  • Generally Splunk keeps 14 days of logs, keeping 6/12months is an overkill, measured in TB which is not justified in Storage volumes
  • Data freezing: There’s HOT/WARM bucket, COLD bucket and FROZEN (archive) bucket
  • Capacity planning is key for healthy Splunk
  • Monitoring console is Healthcheck area

Apps to Install:

  • Common interface model (SE/CIM)
  • Indexes volume indexer # Always use local, do not edit default folder. Config file is indexes.conf
  • Splunk take precedense of LOCAL ovr DEFAULT folder locations.
  • Installing apps via SSH as best practice, with configs always in LOCAL folder (and create one if missing that stores configs) as opposed to defaults DEFAULTS one.
  • It’s best to test out configs/installs in DEV-SPLUNK box and use a Trial for 60 days, then it’s free with 500MB of indexes data !!
  • Data is stored in .tsidx format and not a SQL db. Raw data is stored in tsidx

PREPARATIONS

1. Prepare Drives

Live-Splunk-App1 has the following:

  • system drive – 20GB (system)
  • primary drive – 300GB (data-drive hot holding
  • secondary drive – 100GB (holding FROSEN data, past 10/14 days as configured)

List of apps command:

/MNT/DATA is the 300GB DATA drive. A splunkdata folder needs to be created and then user SPLUNK has access to manage filder

Rebooting to refresh config:

2. Prepare indexerbase configs

  • Editing indexes and configs mostly needs a restarts of splunk service
  • Everything in Splunk is measured in seconds

3.Prepare SE / CIM

  • lookup editor
  • SA CIM
  • Splunk_TA_nix
  • Security esentials.zip
  • We need permissions setup for TA (Technical Add-ons) which are actually scripts

Then reboot. Thus apps asre visible pm left and also DATA MODELS

4.Pрепаре Universal Forwarders

DOMAIN_all_deployments

DOMAIN_all forwarders

PORTS need to be whitelisted – 8089, 8081,8082, 9997 etc (see furtherfor common ports

AGENT IS INSTALLED with a quiet CMD

>>>

5. Prepare SPLUNK APPS

Splunk Server is v7

  • Agents are best to be matching version or older. The latest v7.1 is a bit risky to use. Might work but have that in mind
  • Agents are downloaded and copied to Webservers – Installation is run by a quiet CMD command:

Cluster Classes:

Creating an all_windows_server_test. Then edit classes to includerelevant IP/DNS/hostname (whitelist IP/hostmame/DNS. Then add APPS, edit app,click to include and then SAVE)

Deploying RESTARTS the agent

Forwarding agent installation: Once installed to check if app isinstalled, click EDIT

Once installed and internal logs will start pushing (used fortroubleshooting and proof)

6. Prepare TA_AGENTS

TA-agents are important, these define what is being collected forUniversal Forwarder Agent to push to Splunk

Unzip file in /deployment-apps/

Then’s the security defined:

chown – R – splunk:splunk /opt/splunk/etc/splunk/apps/

su splunk

pwd

cd splunk_TA_windows

DEPLOYMENT

Forward Managemetn – Edit – Click Move to right – Now we have 3 appsdeployed

Then troubleshoot if TA works in > Splunk>Volume.Instancesthus confirming Windows logs logging

Changes need to be applied:

OVERALL>SETTINGS>MONITORING CONSOLE> APPLY settings

In case Win Security is not showing – Windows Audit logs need to be enabledin MMC

7. Review with runnning some Search/Reports

Generic APP installlation steps

1.SpluinkAdmin

Settings>Forward Mangement (top right)

Server classes > Create new class: LIVE (this is a new group for LIVEservers) # This is needed for new GROUPS of servers

Then we have two areas:

ADD APPS – All three apps – selected to be installed

ADD CLASSES – defines which servers to add

(include) – whitelist – prefered to allow whole VPC or server IP – Addind10.1.100.* (NOTE: Dns does not work, splunk cannot ping hostname, even whenvisible in gui)

Note: AWS GATEWAY must be whitelisted for server withPrivate IP and VPC GW public IP

2.INSTALLTHE AGENT

2.1.Agent is downloaded and silently installed via command. Go to folder andexecute fillowinf

msiexec.exe /i splunkforwarder-7.1.1-8f0ead9ec3db-x64-release.msi DEPLOYMENT_SERVER=”1.2.3.4:8089″ AGREETOLICENSE=Yes SPLUNKPASSWORD=RELEVANT_CONPL /quiet

2.2.Firewall Whitelist the ProgramFiles > bin/splunkd.exe file

2.3.Enable Windows Security Logs in Locals Security Policy!!! (chooseprefered success//failure audits)

2.4.Note: AWS GATEWAY must be whitelisted in SPLUNK ADMIN

2.5.SPL management – Forwarder Management – the new server is now showing aslisted

2.6. Then to push apps to Agent Servers a deploy-server command need to beexecuted:

su splunk

(sudo -u) splunk /opt/splunk/bin/splunk reload deploy-server

2.7 Troubleshoot if agent is not connecting

Open logs in C:/ProgramFiles/UniversalForwarder/var/logs.. and read logs

Next image of logs listed the pointer of Splunk as an internalIP, which was not resolved by agent. Thus SPLUNK required additional outputs.configedit to add Splunk-server identified with its PUBLIC IP also!!!

3. Once installed, a verification can be done via SEARCH:

index=_internal | stats count by host

HandyInfo

Diagrams – Overview of Splunk systems

Optimisation

  • Whitelist or Blacklist Windows Events
  • This will selectively include or exclude events from collection on a Windows forwarder
  • Available feature on 6.x or greater Windows forwarders
  • All controlled through inputs.conf on the Windows forwarders

Example:

[WinEventLog://Security]
whitelist = 4,5,7,100-­‐200

[WinEventLog://Security]
blacklist = EventCode=%^200$% User=%duca%

  • Provides reliable and consistent indexing of data with headers
  • Address issue on forwarder:

INDEX_EXTRACTIONS = {CSV | W3C | TSV | PSV | JSON}

  • Supports custom header parsing and easy mode for common formats
  • Extract IIS fields using Props.conf on Windows forwarder: [IIS]

INDEX_EXTRACTIONS = w3c

  • Modular Inputs – Splunk Enterprise app or add-­‐on that extends the Splunk Enterprise framework to define a custom input capability. Examples: (Checkpoint OPSEC, Twider, Stream, Amazon S3 Online storage)
  • Scripted Imputs – A scripted input is used to get data from applicaFon program interfaces (APIs) and other remote data interfaces and message queues. Examples (VMStat, Top, iostat)
  • Scripted Inputs Example – This is Shell script saved in /opt/splunk/bin/scripts/ OR in a specific App; It Allows you to execute any program on Splunk Forwarder and index

STDOUT data

  • Splunk DB Connect is also an option – Allows for indexing data directly from database queries.
  • DB Connect Best Practice:

— Normalize Fmestamps naFvely inside the SQL Query

— Filter results down in SQL Query to reduce garbage in Splunk Index.

— Repeated DBLookups should be converted to static lookup

Regex cheat sheet pdf

— Search Head Pooling requires encrypted password replication

— Search Head Clustering Supported

  • Splunk App For Stream – Provides the ability to capture real-­‐Fme streaming wire data from anywhere in your datacenter or from any public Cloud infrastructure (Win, Mac, Unix)
  • Splunk Stream DNS Capture – Full DNS Queries without logging enabled

Portsused by Splunk

Common ports listed below (All ports are TCP)

  • 9997 for forwarders to the Splunk indexer. 9997 is not a default; just a convention. You need to set it explicitly on the receiving instance (indexer). Flows on port 9997 from the search heads, deployment server, license server, and cluster master to the indexers, with a footnote that this is an optional flow used for forwarding Splunk’s internal indexes (a recommended best practice).
  • 8000 for clients to the Splunk Search page
  • 8089 for splunkd (also used by deployment server).

Optional ports for distributed systems:

  • 8080 – Indexer Replication port
  • 514 – Network port
  • 8191 – KV store port (since v6.2)
  • Search Head Clustering uses a new replicationport that you can pick, e.g. 8181. Also with SHC you need the KV store port (bydefault, 8191) must be available to all other members. You can use the CLIcommand splunk show kvstore-port to identify the port number. The replicationport must be available to all other members.

Note: There’s confusion about port required from UFs to a HF. Which is 9997too i.e. Many uses HF & DS as same server.

UFs —9997—> HF — 9997—> Indexers
UFs, Indexers, SHs —8089 —> DS

Splunk Regex Cheat Sheet Free

Directions of ports. Generally as below. Use tcpdump to verify

  • 8089 for the deployment server is only neededfrom the client to the deployment server. Client being indexer, UF, etc.
  • 9997 from the forwarder to the indexer. Noconnection is needed back from the indexers.
  • 8089 is also used from a Search Head to yourindexers. Again only single direction.
  • port 8089 for the license-master (fromlicense-slave to license-master)
  • port XXXX for the replication cluster master,and slaves.

Source: https://answers.splunk.com/answers/58888/what-are-the-ports-that-i-need-to-open.html

Writing Effective Queries for Splunk with SPL

Splunk is arguably one of the most popular and powerful tools across the security space at the moment, and for good reason. It is an incredibly powerful way to sift through and analyze big sets of data in an intuitive manner. SPL is the Splunk Processing Language which is used to generate queries for searching through data within Splunk.
The organization I have in mind when writing this is a SOC or CSIRT, in which large scale hunting via Splunk is likely to be conducted, though it can apply just about any where. It is key to be able to have relevant data sets for which to properly vet queries against. Fortunately, there are many example data sets available for testing on GitHub, from Splunk, and some mentioned below. There are also “data generators” which can generate noise for testing. Best of all would be to create your own though :).
I was fortunate to have had the enjoyable experience of participating in a Boss of the SOC CTF a few years back, which had some pretty good exemplar security related data. Earlier this year, they released the data set publicly here.
This guide is not meant to be a deep dive into the structuring of a query using the SPL. The best place for that is the Splunk documentation itself, starting with this. This is geared more towards operations in which multiple queries are written, maintained, and used in an operational capacity. Many of these concepts can be generalized and applied to other signatures, rules, code or programmatic functions, such as Snort, YARA, or ELK, in which a large quantity of multi-version discrete units must be maintained.

1. Balance efficiency with enough specificity to minimize false positives

The ultimate goal of any Splunk query is to search and present data in order to answer some question(s). There are many right ways to search in Splunk, but there are often far fewer best ways (yes, multiple bests, see next sentence). Before formulating a search query, a couple considerations should be weighed and prioritized, such as accuracy, efficiency, clarity, integrity, and duration. It is easy to get spoiled by simply doing wildcard searches, but also just as easy to unnecessarily bog down a search with superfluous key value mappings. An over reliance of either can lead to problems.
Accuracy – are there multiple sources which can answer the question? If so, which is more reliable and authoritative? More importantly, how important is it to reduce or eliminate false positives from your results? There is a heavy inverse correlation between accuracy and efficiency.
Clarity – filtering down to the most relevant information needed to answer the question is only half of the battle –you still need to interpret it. It may be fine to view the results as raw data if there are only one or two results of non-complex data, but when there are rows of deeply structured data, taking the time to present it in the most appropriate manner will go a long way.
Duration – the length required for the query to complete. Is this a search that will be run often, and so delays are additive and add to total inefficiency; is there an urgent need to answer something ASAP; is a longer duration eating up resources on other running functions on the search head? Sometimes it is necessary to break a search into smaller sub-searches or to target smaller sets of data and then pivot from there.
Efficiency – closely tied to duration, an inefficient query will lead to unnecessary delays, excessive resource consumption, and could even effect the integrity of the data (pay close attention to implicit limitations of results on certain commands!). Paying attention to efficiency is especially important if there are per-user limitations on number of searches, memory usage, or other constraints.Too many explicitly defined wildcard placeholders could become very expensive, and the atomicity of a formulated query should always be considered.
Integrity – will you be manipulating any data as part of your search? If so, understand the risks to compromising the integrity of your results in doing so. The more pivots made on returned data, the more susceptible to loss of integrity the search becomes.

2. Make it readable

Write queries in a consistent and clear manner. Sometimes it is better to have a query take up many additional lines for the sake of better readability. Breaking into newlines on pipes is the defacto standard for readability purposes, as can be seen below.

3. Make it extensible

Queries should be written in such a way that other people can modify it for their own adaptations or to update or expand a current one. Some ways to accomplish this would be using obvious variable names, readability, or even leaving in inexpensive functionality or variables which can be used for other purposes.

4. Make it modular

Modularity will lead to extensibility, maintainability, and resiliency. This will also increase efficiency as code reuse will be much simpler.

5. Make it feasible

If the query is written for the purpose of manual sifting and analysis, then 50k results is not very reasonable. However, if it is for stateful preservation, alerts, or lookups, then that is more acceptable. Incorporating pivots on the information with subsearches and filtering or even, if necessary, breaking it up in to multiple different queries will make managing the results a surmountable task.

6. Make it resilient

The data can change and so can the SPL itself (or even custom commands if used), so writing queries that are less effected by potential changes is important, especially if the effects of the changes are not obvious, which could lead to a loss of integrity in the results. (This is where testing is also important)

7. Make it consistent

Splunk Regex Cheat Sheet Template

Having a style guide may seem like overkill, but if your operation is highly dependent on maintaining a repository of queries, it can go a long way. Naming conventions, spacing, line breaks, use of quotations, ordering, and style are some of the things to standardize to help with consistency.

8. Make it identifiable

Something as simple as:

This ID can then be printed out with the results if needed or purely used as a means to categorize and quickly identify. Naming conventions should be obvious or recognizable (wxp = Windows XP, query 110), or even mappable to the repository itself.

9. Make it noob friendly

This is obviously highly dependent on your usage and organizational structure, however, it never hurts to keep queries as simple as can be, since there is always the chance that someone else will need to maintain or interpret them. Bonus* less time needing to train people on their purpose!

10. RTFM!

I am a huge proponent of RTFM (F!=field, btw) for both myself and others. Splunk has put a lot of effort into meticulous documentation, which is clearly reflected in the detailed and thorough documentation. With regards to writing SPL queries, the search reference is your absolute best friend!

11. Know your data

The first two things that I tell anyone to do that is new to Splunk is to familiarize yourself with the syntax of SPL (#10) and just as importantly, to get to know how the data is structured. The simplest way to do this is to do a wildcard search (*) and start reviewing the raw results under the events tab. The data will usually be structure in XML or JSON. Initially, it will be less important to know which data was structured from indexing, field extractions, or other transforms, but may become important with more advanced searches.

12. Test it

Do not ever merge a query into production ops, bless off on it, trust it, or whatever it is you do to give it legitimacy without first testing and confirmation of positive results. Regardless of how simple the query is, you can never guarantee that some other confounding issue isn’t occurring. If it is a matter of missing the applicable data, well then, Try Harder! There are many great products out there to help with this at scale, such as Red Canary’s atomic red team or Mitre’s caldera.

13. Build it out piecemeal

It can get stressful spending a lot of time on a query, only for it to not return the correct or any results, regardless of tweaking. The best way to build complex queries is to build them in pieces, testing as you go along. This is especially convenient because you can point to available data for the sake of testing to ensure positive results, and then change it as it is built out.

14. Implement version control

The necessity of this is really dependent on the amount of queries and modifications, though it makes sense even for small quantities. This can be accomplished as simply as baking a version into the query itself, such as from #8 with revisions tacked on with periods (wxp-110.3) or even in its own field:

Even better than that would be to maintain them in a database or repository such as GitHub, which gives the added benefit of stateful change representations. It is also possible to save searches directly in Splunk, the version control is less intuitive in this way.

15. Maintain multiple versions of the same thing

This doesn’t just apply to older versions of the same query, but queries which may search the same thing but present it in a different manner, search a different data set, or search a different time window.

16. Don’t reinvent the wheel

It is all too easy to blow a full 12 hour shift perfecting a query, which may not even end up working at all. While it is important to have these search queries catered to your specific need, it is not always necessary to MacGyver it alone. There are lots of great resources available to borrow ideas or techniques from, such as the Splunk blogs and forums, or you can even work with a co-worker.

17. Don’t depend on the wheel

Counter to #16, you do not want to become over reliant on searching for help, as this could lead to running queries which may not be working as you think they are. This could also potentially compromise the integrity of the results. Worse yet, it could be an inefficient way of doing something which has caught on and persisted through the forums.

18. Share it

If you have written a gem or come up with a novel approach to something, share it back with the community. Even if the data set is different, there may still be much which can be gleaned from it. It also helps to drive conversations which benefit the community as a whole.

19. Save it

This is such an obvious one, but in spite of that, I still constantly find myself rewriting queries that I had previously written over and over again…

20. REGEX!

I don’t know why I have this all the way down at #20, because this is easily one of the most powerful and important concepts for which to be able to pivot on results with. There are several commands where regex is able to be leveraged, but the two most significant are regexand rex.
Regex does exactly what it says –allows you to filter on respective fields (or _raw) using regex, which in Splunk is a slimmed down version of PCRE. The rex command is much more powerful, in that it allows you to create fields based on the parsed data, which can then be used to pivot your searches on. You can even build it as a multivalued field if more than one match occurs. An example of the rex command (and potentially more than one value) can be seen in the example from #13.

21. Know when its better to go beyond just using a search with SPL

Finally, we made it all the way to #21! Sometimes, depending on circumstance, function, and operational usage, manual searching with SPL queries is just not the best answer. Splunk has a lot of other functionality which can accomplish many of the same things, with less manual requirements. Alerts, scheduled reports, dashboards, and any of a number of apps built within or against the API allow for almost limitless capability. If you are struggling to maintain or achieve some of the topics annotated here, it may mean it is time to explore some of these alternative options.

Overall

This is certainly not an all inclusive list, as there are many more practices which can apply here. Ultimately, it depends on the specific deployment, implementation, and usage of Splunk which should dictate exactly how you create and maintain search queries. This was also not meant to go too deep in the weeds on generating advanced queries (though that may come in the future), but rather a high level approach to maintaining quality and standards. There are many other people who are far more experienced and with much greater Splunk-fu out there, so if you have any input or insight, please feel free to reach out.

Apart from being asource of all too frequent and embarrassing typos, Splunk is a big dataplatform which allows you to interrogate data and present results is a varietyof contexts and visualisations. I've been using it for a little over 12 months,self teaching or Googleing as I go, predominantly to sift through theterabytes of logs from various applications and appliances that get generatedin my 9-5 every day.
You can use Splunk to build dashboards which are typically better than the ones that come with the product (full size)
I've started to pull together all the searches, notes and bits of code into a sort of security cheat sheet which I thought would be a good thing to share as well asproviding some real world examples of how you might use Splunk in a securitycontext.

Regex Cheat Sheet Powershell

Cheat Sheet
I'm actively working back through my notes and adding to this all the time so it might be a good thing to reference via the URL or re-visit from time to time. I'll try to keep this as accessible as possible and base it around real world examples and use cases.
Splunk is a great way to convert reams of log data into views which mean something (full size)

Splunk Regex Cheat Sheet Download


Of course, there is a wealth of documentation over at http://docs.splunk.com and I'd highly recommend that if you start using Splunk you start there or at least turn to that as your primary reference. I'd also strongly recommend that you check to see if there is an existing Splunk App if you have a very specific requirement. Why re-invent the wheel if the vendor (or the community) has already built an app for that appliance / application you've just installed?

Like maps? No problem (full size)

I intend to write a separate piece about some of the very clever things you can do with Splunk, especially some of the instances where we currently use it as the center for an automation piece. It's not just reports and dashboards that Splunk can power - with a bit of thinking and tinkering you can get it to interact and respond to your environment, making it a very powerful tool to add to your security arsenal. I'll still add any searches and code for these solutions to the cheat sheet but I want to expand on them sufficiently so people can follow the recipe to bake their own.