Network measurement is instrumental to understanding network behavior, fault diagnosing, attack prevention and traffic engineering. While conventional measurement equipment/tools deliver basic network flow metrics (e.g. NetFlow, sFlow), hardware and software built upon those are often proprietary and closed to extension. This imposes tremendous obstacles to satisfying emerging, on demand and ad hoc measurement demands. In addition, with the recent development of ultra high speed networks, the network line speed has reached 100Gbps and beyond. To achieve scalable network measurement keeping up with the line rate has become increasingly important.
In AMIS project, we design the following two major parts to achieve composable and programmable measurement for 40/100 GbE network links using multicore processors.
We propose EQuery, a language and associated toolset designed for event-driven declarative queries in programmable network measurement. This query language supports both the specification of functionalities and temporal dependencies of measurement tasks and enables network monitoring and diagnosing effectively.
- DPDK-based packet processing architecture
We design a programmable network measurement platform for line rate of 100+ Gbps using general-purpose x86 multicore processors and off-the-shelf 100 GbE NICs with the support of DPDK, a high speed packet I/O library.
Composable and Programmable Measurement Framework
In our proposed framework, there are three major parts: EQuery, run-time system and packet processing in physical network.
In this part, users (e.g. network operators, researchers, engineers) can submit a customized measurement query by using EQuery via a web GUI. This query then will be parsed and complied into network measurement primitives, which will be instantiated on AMIS boxes to collect network states. Once the results become ready in the database, they will be illustrated on the web GUI and will be refreshed in real-time when the results are updated.
- Web GUI
In this web GUI, users can submit query, check query result and monitor AMIS boxes operation status as well as flow statistics. In the backend, all the requests (query, operation status, flow statistics) will be arranged to different AMIS boxes by communicating with University of Kentucky's plugin. The returned results from distributed AMIS boxes will be aggregated and then written into the database for showing on the web GUI. The picture in right shows the example of flow statistics from StarLight AMIS box.
- Packet Processing in Physical Networks
In physical networks, AMIS Boxes are deployed at the exchage points and network packets will be duplicated to AMIS Boxes via a switch mirror port or a Tap device. To spped up the packet processing, general-purpose x86 multicore processors and off-the-shelf 100 GbE NICs with the support of DPDK will be used on AMIS Boxes.
The typical workflow for processing a measurement query is illustrated in the left diagram.
AMIS Query Language: EQuery
We argue that an “event” is fundamental in defining network measurement functions as well as meaningfully chaining them for extensibility. In AMIS project, we present EQuery, a language and associated toolset designed for event-driven declarative queries in programmable network measurement. This query language supports both the specification of functionalities and temporal dependencies of measurement tasks and enables network monitoring and diagnosing effectively. Specifically, the features of EQuery can be summarized as follows.
- support declarative query language to obtain network-wide information, including per-packet, per-flow, and per-link status;
- launch compositional queries and build “events” into the query language to trigger subsequent queries according to the occurrence of specified situations.;
- be independent of specified hardware;
- be compiled automatically into low-level rules and state variables.
- EQuery Language
We exploit and extend an SQL-like query language to collect network metrics. Our new language, called EQuery, can declaratively express what metrics should be measured, how to aggregate the data, and when to trigger subsequent measurement queries.
The syntax of EQuery is illustrated in the left diagram where the new clauses are highlighted. There are 7 primary clauses in the first 7 lines, among which only the “Select” clause is mandatory. Here we briefly explain each of the clauses.
Select: A Select(h, s, f (h, s)) clause is used to express what data a network operator wants to query. Here the arguments are defined as "measurement fields", including the fields of a packets header, h, the parameters of network devices, s, as well as the aggregated fields, f (h, s) (e.g. COUNT, SUM and customized aggregation fields).
Where: A Where(p) clause applies the predicate p to filter the query results. Only those packets satisfying the predicate p will be analyzed for statistical purposes. The basic predicate uses the syntax ((h | s | f ) value) to test the selected measurement field. In this paper, a test on an aggregated field (f value) actually imposes a test on a data plane state. "" is a series of operators where "~=" indicates "unequal". More complicated filtering predicates can be constructed using natural set-theory operations such as intersection (p&p), union (p|p) and complementation (~p).
Groupby: A Groupby(h, s) clause can be used in a "Select" statement to collect measurement data across multiple records and group the results by one or more columns (h, s). The "Groupby" clause is often combined with aggregate functions.
When: A When(e) clause is employed to impose a trigger on a query. A query driven solely by a "When" clause will be executed once event e occurs.
Others: A Start(t) clause indicates that the start time of the submitted query is t (in date time format). The default value is the time when the query is submitted. A Within(i) clause specifies the maximum execution time for a query. An Every(i) clause groups packets that arrive within the same time window, where i can be specified in seconds or microseconds. In turn, the results will be returned every i seconds (or microseconds). The "Every(i)" clause is also used to express a periodic event (PE) in a query.
- Example: Necessary events to diagnose TCP packet loss along a flow path
As shown in the left figure, in order to detect packet loss and its root causes along the path from Host A to Host B, the following steps should be implemented: Firstly, we should launch a query (Query 1) to measure the packet loss pkt_loss (here we define the TCP packet loss as the number of out of sequence packets) on switch sw1. To detect the specified network condition, i.e. whether the packet loss exceeds a threshold, we can define the first event "Event 1" as pkt_loss > K1, and the subsequent Query 2 will be triggered to check the throughput "SUM(pkt_len)" of links "Host A sw1" when "Event 1" occurs. If the throughput is normal, i.e., Event 2 occurs, and the link "sw1 sw2" will be checked by executing Query 3. Similarly, if this link is normal too, we will check the TCP window size (Query 4) to clarify the setting on the end-host. As a result, network operators only need to review the received events in order to localize the potential culprit, rather than comprehensively reviewing all the network-wide measurement results. This example can be described in the proposed event-driven language as:
AMIS Internal Architecture based on DPDK
1) Issues of using exsiting tools
- Issues using OVS:
- The packets traversing OVS are largely less than the real throughput (around 500Mb/s when the real throughput is 25Gb/s).
- The max speed: 10Gb/s.
- Issues using Argus. When the throughput exceeds around 10Gb/s, Argus has the following issues:
- Floating point exception
- Incorrect statistics
2) Speedup data processing by using multi-core (RSS) and DPDK
The Intel Data Plane Development Kit (DPDK) consists of a set of libraries, which can be used to provide high throughput packet I/O in user space, addressing the performance issues of OS kernel device drivers and protocol stacks. Nowadays, modern high-performance servers usually employ non-uniform memory access (NUMA) architecture. The DPDK supports the NUMA architecture and provides the multicore framework. The DPDK allows multiple RX queues and TX queues at the same time, which can significantly increase packet I/O performance.
- The principle of Receive Side Scaling (RSS):
As shown in the left figure, the following steps are used to implement the RSS feature of the NIC. (1) Specify which fields of the received packet are chosen to be hashed. (2) Specify which hash function is going to be used to hash the chosen fields. (3) Choose the last 7 bits of the hash value to form the final hash value, since the redirection table has only 128 entries. (4) Find the core ID stored in the redirection table according to the final hash value. (5) By following the core ID, the received packet is directed to the corresponding DPDK RX queue.
- RSS Design
After all these setup, as shown in the left figure, multiple DPDK RX queues can be kept at the same time, and each core can receive and handle the packets from each DPDK RX queue in parallel. Since the NIC transfers each packet to its corresponding DPDK RX queue by applying the same hash function, packets within the same flow would be steered to the same DPDK RX queue.
AMIS Measurement Functions
- The basic measurement functions of our DPDK based AMIS box:
- Flow filter and aggregate;
- Per-flow status report (pkt header fields);
- TCP packet loss detection according to retransmission;
- Calculating Per-flow volume;
- Packet count;
- Generating Netflow records.
- More advanced measurement functions can be implemented:
- Top-K talker;
- Heavy hitter detection;
- Super-spreader detection;
- UDP flood detection;
- SYN flood detection;
- Elephant flow detection.
AMIS Measurement Fields
1) Fields for Statistics
|1||mid||measurement device/node identifier|
|3||stime||record start time|
|4||ltime||record last time|
2) Fields from Packet Headers
|5||smac, dmac||source or destination MAC addr|
|6||saddr, daddr||source or destination IP addr|
|8||sport, dport||source or destination port number|
|9||flgs||flow state flags seen in transaction|
|10||stcps, dtcps||source or destination TCP base sequence number|
|11||stos, dtos||source or destination TOS byte value|
|12||sttl, dttl||src -> dst (sttl) or dst -> src (dttl) TTL value|
|13||sipid, dipid||source or destination IP identifier|
|14||smpls, dmpls||source or destination MPLS identifier|
|15||dir||direction of transaction|
|16||swin, dwin||source or destination TCP window advertisement|
|17||svlan, dvaln||source or destination Tag protocol identifier|
|18||svid, dvid||source or destination VLAN identifier|
|19||svpri, dvpri||source or destination VLAN priority|
3) Calculated Fields
|20||spkts, dpkts, pkts||src -> dst (spkts), dst -> src (dpkts) packet count and total packet count|
|21||sload, dload, load||bits per second|
|22||sloss, dloss, loss||pkts retransmitted|
Compare with Argus
Argus  is a network utilization audit system which is focused on developing all aspects of large scale network situational awareness derived from network activity audit and composed of
- Real-time Network flow monitor
- Network flow data collection system
- Network flow data processing programs
- Audit data repository tools
We compre the functionalities of our AMIS box with Argus as the following table:
|Features||AMIS box (DPDK)||Argus|
|Throughput||small size: 34 Gbps; large size: 90 Gbps||~10Gbps|
|Measurement fields||pkt header fields||pkt header fields|
|Measurement functions||Basic and advanced functions||Basic aggregate functions|
|Direction||uni-directional flow monitor||bi-directional flow monitor|
- B. Claise. Cisco systems netflow services export version 9. 2004.
- P. Phaal, S. Panchen, and N. McKee. Inmon corporation’s sflow: A method for monitoring traffic in switched and routed networks. Technical report, 2001.
- M. Balman, E. Pouyoul, Y. Yao, E. Bethel, B. Loring, M. Prabhat, J. Shalf, A. Sim, B. L. Tierney, Experiences with 100gbps network applications, in Proceedings of the fifth international workshop on Data-Intensive Distributed Computing Date, ACM, 2012, pp. 33–42.
- D. Intel, Data plane development kit, URL http://dpdk.org.
- Argus. URL https://qosient.com/argus/