środa, 26 czerwca 2013

Flow Analysis - introduction, analysis considerations

During any type of investigation it is beneficial to get as much different-level evidence as it is possible. As I have previously mentioned, the incident response or digital investigation  is based on multi-level and multi-staged process with backtracking. In another words, firstly the investigator states a hypothesis that then try to confirm it or deny. Then he or she looks for another clues - if previous hypothesis was wrong - and continue this process - very often making the process's path longer - dependently on how good and detailed information is provided.

Basically there are two types of evidence on early stage: network and host -based. I've been writing about basic NBE analysis before, but I have not covered session data analysis  (not sessions rebuilt from raw traffic, but flows captured on network devices). Typically we can split information coming from infrastructure into raw traffic, ‘logs’ and flows. (Problem with logs is the fact that they present only clue in many situation, not showing numbers and data).

Flows do not lie.  Due to network invisibility, knowing how much traffic crosses network is necessity, a proof that  successful network-level communication occurred between hosts.
 
Standard flow definition:

A flow is a series of packets that share the same source and destination IP, source and destination port, and IP protocol /UDP or TCP  (data flow connection between two hosts, can be defined uniquely by its five-tuple factors – Michael W. Lucas). Flow record is a summary of information about specific flow, tracking which two hosts communicated with each other.  Several conclusions:
  • one flow = one direction
  • session != flow (as session contains two flows)
  • Flows do not contain data exchanged between hosts  (no password, usernames, ...) - flow record are small.
  • flow aggregates transmission only for one direction
Versions:

Netflow version 7, includes switching, routing information not available in previous versions - such as  IP of the next  hop address for the flow - to track flow in the  network for different paths. The latest version of the network flow is called IP flow information export (ipfix) - standard.

5-tuple factors (green)
A flow is a series of packets which are described with 5-tuple factors. Let have a look and present flows and its analysis basing on different types of network traffic:
  1. icmp
  2. udp flow
  3. tcp flow
ICMP flows

Basic information about internet routing. ICMP has no TCP-style flags, instead it has ICMP type(general purpose  of packet)  and ICMP code.

type 8 - echo-request
type 0 - echo-response

Two flows : the first one, client creates ICMP request with source-destination IP, and server responds. The sensor holds icmp flows in memory until a time out expires, at which point the sensor marks the flows as completed and transmit them to collector. Quick example, when sending UDP packet to closed port, we may get as response, ICMP packet (no port!)showing that port is unreachable (code 303)

UDP flows

UDP has no codes, types, does not have tcp-like flags, it uses tcp -style ports. It has no built-in concept of a session data, so is described as connectionless. UDP does carry useful application-level data, and most udp traffic is a part of some sort of session or transaction. (example is dns request). As with tcp , the udp request originates from unused port on client side, and uses the standard DNS port 53 as its destination. 2 flows, because udp is connectionless and it does not have tcp-fin flags, sensor waits for timeout and then report to collector.

TCP flows

TCP factors indicates state of connections - requested, ongoing, being torn down. Firstly client choose the unused port exclusively for him.. (...) then send first synchronization  packet with SYN flag. As it is, server response with his first flag sends SYN, and ACK acknowledgement. a single flow shares the same source and destination ip / port, ip protocol. The third packet is the second packet in the first flow. now client can raise get/ post, response in html. Now client can request some data using GET/POST requests (as part of first flow), server response with HTML contents, files, and other formats of data. Packets now stream back and forth ,including ACKs as required to acknowledge receipt of earlier packets. When communication is about to end (or ends), one of participants sends FIN flag, then get ACK/FIN, and again clients confirms with ACK.Sensor sees FIN and ACK and terminates both flows.
 
TCP flow explained

Flow management system can track protocol other than icmp, udp, tcp, but those three comprise the overwhelming majority of network traffic.
5 tuple is a notion employed by network and system administrators in identifying the key requirements to create an operational, secure and bidirectional network connection between two or more local and remote machines. The primary components of 5 tuple are the source and destination address. The former refers to the IP address of the network that created and sent the data packet, while the latter is its recipient.

Definition found on technopedia. When big download, flows can be splitted into several consecutive  flow records (configured timeout - maximum device time, that it can track single flow).
 
Flows visualization

Analysis considerations

Having flows in our security system give us a great view on our infrastructure and communication. Flows never lie, and are reliable piece of information. This kind of data can show us how big was the traffic, if there was a communication at all, if a recipient responded,  and support information taken from logs or other levels. Standard flows also can show us what was the communication, basing on destination port. Remember, that a port assignment is not a proof that a particular protocol was running over that port. What is more, flows (session data too) are  the best for tracking what have happened with infected/compromised host after being owned. On the other hand, there are many flows captured by collectors ,almost for every incident – making analysis harder and more time-consuming.

Firstly, available filters should be applied to help finding any meaningful patterns, clues and answers. Basing on 5-tuple factors, most information can be easily and quickly found. In my opinion,  very often only this information is really needed, or additionally with source/destination bytes.  There are of course other filters, pre-configurated reports, alerts, clearly removing any useless information. Then, what I find really awesome is the option, to visualize flows included in specific time frame or incidents. There are plenty of free-tools giving such capabilities (gephi, afterglow, etherape, graphviz, safemap). For example, please check following http://www.itworld.com. Furthermore, several SIEM’s have capability to make flows, and events smarter, applying many tests (on flows, or aggregated events, or even both) and assigning them special details such as credibility, severity, prioritatization. Another great feature, is to intelligently group flows together to present some suspicious/known pattern, or even show what is unusual.
At the end of the day, the best way to understand flows, and correctly extract potential information from them is to connect different types of data together, link them, correlate and of course practice.