AFP: Um Escalonador de Requisições de Microsserviços Guiado por Feedback
BERGHETTI, M. S.; CARVALHO, F. B.; FERREIRA, R. A.
Proceedings of SBRC 2024. Niterói, RJ, Brazil, May 2024.
Abstract: Applications that serve millions of users over the Internet, such as social networks and online games, are typically decomposed into microservices to scale and provide low response times. Many of these applications process requests with high service-time dispersion, which leads to the Head-Of-Line Blocking problem in which requests with high service times block other requests and increase the average and tail latencies of the applications' requests. This work presents AFP (Application Feedback Policy), a scheduling policy that mitigates this problem. Experimental results show that AFP processes up to 4.5x more workload than the best-known approach for SLOs of a few microseconds.
State Disaggregation for Dynamic Scaling of Network Functions
CARVALHO, F. B.; FERREIRA, R. A.; CUNHA, Í.; VIEIRA, M. A. M.; RAMANATHAN, M. K.
IEEE/ACM Transactions on Networking. Jun 2023.
Abstract: Network Function Virtualization promises better utilization of computational resources by dynamically scaling resources on demand. However, most network functions (NFs) are stateful and require state updates on a per-packet basis. During a scaling operation, cores need to synchronize access to a shared state to avoid race conditions and to guarantee that NFs process packets in arrival order. Unfortunately, the classic approach to control concurrent access to a shared state with locks does not scale to today's throughput and latency requirements. Moreover, network traffic is highly skewed, leading to load imbalances in systems that use only sharding to partition the NF states. To address these challenges, we present Dyssect, a system that enables dynamic scaling of stateful NFs by disaggregating the states of network functions. By carefully coordinating actions between cores and a central controller, Dyssect migrates shards and flows between cores for load balancing or traffic prioritization without resorting to locks or reordering packets. Also, Dyssect's state disaggregation allows the offloading of stateful network functions to programmable NICs and makes it easier for exploring hardware-software tradeoffs that better suit specific service chains and traffic loads. Our experimental evaluation shows that Dyssect reduces tail latency up to 32.04% and increases throughput up to 19.36% when compared to state-of-the-art competing solutions.
DWT in P4: Periodicity Detection in the Data Plane
HUAYTALLA, B.R.; JACOBS, A. S.; SILVA, M. V. B.; CARVALHO, F. B.; FERREIRA, R. A.; WILLINGER, W.; GRANVILLE, L.
Proceedings of IEEE GLOBECOM 2022. Rio de Janeiro, RJ, Brazil, Dec 2022.
Abstract: This paper presents a P4 implementation of the (1-D) Discrete Wavelet Transform (DWT) method. As a mathematical tool for analyzing signals such as packet-level traces, the DWT divides a given signal into different frequency components and analyzes each component with a resolution matched to its scale. We develop an efficient online algorithm that circumvents various limitations of existing P4-programmable data plane devices and performs the DWT decomposition entirely in the data plane. Our evaluation of a hardware implementation (i.e., Netronome NFP-4000 SmartNIC) of the algorithm shows that it results in only minimal throughput overhead (less than 1% for average-sized packets) and operates within constraints imposed by the limited available data plane resources. As an application, we use our lightweight P4 implementation of the DWT and describe a novel threshold-based approach for detecting periodic behavior in a signal in real-time, at line rate in the data plane (40 Gbps). We illustrate our approach with different examples of synthetic and real-world packet-level traffic traces that exhibit periodic patterns of either benign or malicious origins.
Dyssect: Dynamic Scaling of Stateful Network Functions
CARVALHO, F. B.; FERREIRA, R. A.; CUNHA, Í.; VIEIRA, M. A. M.; RAMANATHAN, M. K.
Proceedings of IEEE INFOCOM 2022. London, UK, May 2022.
Abstract: Network Function Virtualization promises better utilization of computational resources by dynamically scaling resources on demand. However, most network functions (NFs) are stateful and require state updates on a per-packet basis. During a scaling operation, cores need to synchronize access to a shared state to avoid race conditions and to guarantee that NFs process packets in arrival order. Unfortunately, the classic approach to control concurrent access to a shared state with locks does not scale to today's throughput and latency requirements. Moreover, network traffic is highly skewed, leading to load imbalances in systems that use only sharding to partition the NF states. To address these challenges, we present Dyssect, a system that enables dynamic scaling of stateful NFs by disaggregating the states of network functions. By carefully coordinating actions between cores and a central controller, Dyssect migrates shards and flows between cores for load balancing or traffic prioritization without resorting to locks or reordering packets. Our experimental evaluation shows that Dyssect reduces tail latency up to 32% and increases throughput up to 19.36% when compared to state-of-the-art competing solutions.
A Verified Session Protocol for Dynamic Service Chaining
ZAVE, P.; CARVALHO, F. B.; FERREIRA, R. A.; REXFORD, J.; MORIMOTO, M.; ZOU, X. K.
IEEE/ACM Transactions on Networking. Nov 2020.
Abstract: Middleboxes are crucial for improving network security and performance, but only if the right traffic goes through the right middleboxes at the right time. Existing traffic-steering techniques rely on a central controller to install fine-grained forwarding rules in network elements—at the expense of a large number of rules, a central point of failure, challenges in ensuring all packets of a session traverse the same middleboxes, and difficulties with middleboxes that modify the “five tuple.” We argue that a session-level protocol is a fundamentally better approach to traffic steering, while naturally supporting host mobility and multihoming in an integrated fashion. In addition, a session-level protocol can enable new capabilities like dynamic service chaining, where the sequence of middleboxes can change during the life of a session, e.g., to remove a load-balancer that is no longer needed, replace a middlebox undergoing maintenance, or add a packet scrubber when traffic looks suspicious. Our Dysco protocol steers the packets of a TCP session through a service chain and can dynamically reconfigure the chain for an ongoing session. Dysco requires no changes to end-host and middlebox applications, host TCP stacks, or IP routing. Dysco's distributed reconfiguration protocol handles the removal of proxies that terminate TCP connections, middleboxes that change the size of a byte stream, and concurrent requests to reconfigure different parts of a chain. Through formal verification using Spin and experiments with our prototype, we show that Dysco is provably correct, highly scalable, and able to reconfigure service chains across a range of middleboxes.
S-Trace: Construindo Caminhos Causais em Redes Definidas por Software
CARVALHO, F. B. and FERREIRA, R. A.
Proceedings of SBRC 2015. Vitória, ES, Brazil, May 2015.
Abstract: Determining hardware and software elements used to process an application request and group them in a set, called causal path, that exposes relevant parameters, such as processing time and delays, and that may explain the behavior of the application is a challenging task. Several tools for building causal paths have been proposed for the current Internet architecture, but none so far explores features offered by Software Defined Networks (SDN). This paper proposes S-Trace, a tool for building causal paths that does not modify applications and that uses specific features of SDN to build precise causal paths. To build a causal path, S-Trace intercepts library function calls to correlate communication events between processes and uses record and replay techniques in SDN to correlate network events. S-Trace was evaluated using a benchmark (TPC-W) and an application that emulates the behavior of multi-tier applications that was instrumented to validate the causal paths constructed by S-Trace. The experimental results show that S-Trace builds correct causal paths at the cost of a small overhead associated with some library function calls.
Nemo: Procurando e Encontrando Anomalias em Ambientes Distribuídos
SILVA JR, B. A.; CARVALHO, F. B.; FERREIRA, R. A.
Proceedings of SBRC 2013. Brasilia, DF, Brazil, May 2013.
Abstract: Diagnosing anomalies in large enterprise networks consumes significant time of technical support teams, mainly because of the numerous complex interactions among the applications and network elements (servers, routers, links, etc.). The most complete and promising approach for solving this problem, called Sherlock, uses network traces to automatically build an Inference Graph (IG) that models the multiple interactions and dependencies present in a distributed environment. Despite the progress provided by Sherlock in the problem modeling, its execution time for inferring the probable causes and the precision of its anomaly detection results leave much room for improvements. This work proposes Nemo, a tool that explores domain-specific knowledge and a theoretical property of Bayesian Networks to significantly reduce the IG and consequently the execution time. Simulation results using real and synthetic data show that Nemo reduces Sherlock's execution time by over 90% and improves its precision in all simulated scenarios.