Improving Linux kernel networking performance 笔记
来源:互联网 发布:擎洲广达计价软件 编辑:程序博客网 时间:2024/06/05 15:41
原文链接:
https://lwn.net/Articles/629155/
By Jonathan Corbet
January 13, 2015
- Time budgets预算
- 只有672ns
- 可行性分析
- 可行方案
- 批量操作
- 免锁
- 减少系统调用
- cache 优化
- 提升batchingLatency and throughput的折中权衡
- Memory Management需要bypass以便提升性能
- Time budgets预算
Time budgets预算
只有67.2ns
The smallest Ethernet frame that can be sent is 84 bytes; on a 10G adapter, Jesper said, there are 67.2ns between minimally-sized packets.
可行性分析
- a cache miss on Jesper’s 3GHz processor takes about 32ns to resolve
- thus only takes two misses to wipe out the entire time budget for processing a packet
- Given that a socket buffer (“SKB”) occupies four cache lines on a 64-bit system and that much of the SKB is written during packet processing
- the x86 LOCK prefix for atomic operations takes about 8.25ns, 所以the shortest spinlock lock/unlock cycle takes a little over 16ns. So there is not room for a lot of locking within the time budget.
- the cost of performing a system call 大约75ns
可行方案
批量操作
免锁
减少系统调用
cache 优化
The key appears to be batching of operations, along with preallocation and prefetching of resources. These solutions keep work CPU-local and avoid locking. It is also important to shrink packet metadata and reduce the number of system calls. Faster, cache-optimal data structures also help. Of all of these techniques, batching of operations is the most important. A cost that is intolerable on a per-packet basis is easier to absorb if it is incurred once per dozens of packets. 16ns of locking per packet hurts; if sixteen packets are processed at once, that overhead drops to 1ns per packet.
2个cache miss以内,不能有spin lock
提升batching,Latency and throughput的折中权衡
The tricky part, he said, is adding batching APIs to the networking stack without increasing the latency of the system. Latency and throughput must often be traded off against each other; here the objective is to optimize both.
TCP bulk transmission work :
Bulk network packet transmission [LWN.net]
https://lwn.net/Articles/615238/
Memory Management【需要bypass以便提升性能】
implemented a subsystem called qmempool; it does bulk allocation and free operations in a lockless manner
[RFC PATCH 0/3] Faster than SLAB caching of SKBs with qmempool (backed by alf_queue) [LWN.net]
https://lwn.net/Articles/625427/
- Improving Linux kernel networking performance 笔记
- linux kernel performance 调试小结
- Improving Web Services Performance
- Improving Performance for Wlan
- Improving SQL Server Performance
- improving Gradle build performance
- Improving Layout Performance
- improving sql server performance
- Linux Tune Network Stack (Buffers Size) To Increase Networking Performance
- High Performance Browser Networking
- Improving Database Performance with Partitioning
- Improving (network) I/O performance
- Improving Snort performance with Barnyard
- Chapter 10 —Improving Web Services Performance
- Chapter 10 — Improving Web Services Performance
- Improving DataSet Serialization and Remoting Performance
- Improving .NET Application Performance and Scalability
- Improving Performance of FOR ALL ENTRIES QUERY
- 丢棋子问题
- oracle下mybatis批量下载的问题
- Python生成PASCAL VOC格式的xml标注文件
- Android用标注管理数据库的简单示例
- .net总结(二)
- Improving Linux kernel networking performance 笔记
- Android 开发之漫漫长途 Ⅲ—Activity 的显示之 Window和View(2)
- HTTP(三)
- expected unqualified-id before '(' token
- spring boot ActiveMQ学习
- 事件监听
- MySQL中创建数据库时的一些属性
- (二)java框架篇笔记库(27)
- 技术共享之AS代码混淆