Demystifying the Linux Kernel Socket File Systems (Sockfs)

2024-06-03 16:08

本文主要是介绍Demystifying the Linux Kernel Socket File Systems (Sockfs),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

All Linux  networking works with System Calls creating network sockets (using the Socket System Call). The Socket System Call returns an integer (socket descriptor).

“Writing” or “reading” to/from that socket descriptor (as though it were a file) using generic System Calls  write / read respectively creates TCP network traffic rather than file-system writes/reads.

Note: The file-system descriptor would have been created by the “Open” system call IF … the descriptor were a “regular” file-system descriptor, intended for “regular” / file-system writes and reads (via System Calls write/read respectively) to files etc.

Further Note: This implies that the network socket descriptor created by the “socket” System Call will be used by systems programmer to write/read , using the same System Calls write/read used for “regular” file system writes/reads (System Calls that would, under normal and other circumstances, write/read data to/from memory).

Further further Note:  A System Call  “write” (to the descriptor that was created by the socket System Call)  must translate “magically” into a TCP transaction that “writes” the data across the network (ostensibly to the client on the other end), with the data “written” encapsulated within the payload section of a TCP packet.

This process of adapting  and hijacking the kernel file-system infrastructure to incorporate network operations /socket operations is called SOCKFS (Socket File System).

So how does  the linux kernel accomplish this process, where a file-system write is “faked” into a network-system “write”, if indeed it can be called that ?

Well…as is usually the case, the linux kernel’s methods begins at System / Kernel Initialization, when a special socket file-system (statically defined sock_fs_type)  for networks is “registered” by register_file_system. This happens in sock_init. File systems are registered so that disk partitions can be mounted for that file system.

The kernel registered file system type sock_fs_type  so that it could create a fake mount point  using kern_mount (for the file system sock_fs_type).  This mount point is necessary if the kernel is to later create a “fake file”   *struct file  using  existing/generic mechanisms and infrastructure  made available for the Virtual File System (VFS). These mechanisms  and infrastructure would include a mount point being available.

         Note:  No “actual” mount point exists, not in the sense an inode etc etc.

                       We will blog on file systems later.

Then when the socket System Call is initiated (to create the socket descriptor),  the kernel executes sock_create to create a new descriptor (aka the socket descriptor). The kernel also  executes sock_map_fd, which creates a   “fake file” , and  assigns the “fake file” to the socket descriptor. The “fake” files ops ( file->f_op) are then initialized to be socket_file_ops  (statically defined at compile time in source/net/socket.c).

The kernel assigns/maps the socket descriptor created earlier to the new “fake”  file using fd_install.

This socket descriptor is returned by the Socket System Call (as required by the MAN page of the Socket System Call) to the user program.

I only call it “fake” file because a System Call write executed against that socket descriptor will use the VFS infrastructure created, but  the data will not be written into a disk-file anywhere. It will, instead, be translated into a network operation because of the f_op‘s assigned to the “fake” file (socket_file_ops).

The kernel is now set up to create network traffic when System Calls write/read  are executed to/from to the “fake” file descriptor (the socket descriptor)  which was returned to the user when System Call socket was executed.

In point of fact, a System Call write to the “fake” files socket descriptor will then translate into a call to  __sock_sendmsg within the kernel, instead of a write into the “regular” file system. Because that is how socket_file_ops is statically defined before assignment to the “fake” file.

And then we are into networking space. And the promised Lan of milk, honey,  TCP traffic,  SOCKFS and File Systems.

No one said understanding the kernel was easy. But extremely gratification awaits those that work on it. And also creates enormous opportunities for innovation.  I  explain Linux Kernel concepts and more in my classes ( Advanced Linux Kernel Programming @UCSC-Extension, and also in other classes that I teach independently).

As always, Feedback, Questions  and Comments are appreciated and will be responded to. I will like to listen to gripes, especially  if you also paypal me some.  Thanks

这篇关于Demystifying the Linux Kernel Socket File Systems (Sockfs)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1027461

相关文章

ubuntu16.04如何部署dify? 在Linux上安装部署Dify的技巧

《ubuntu16.04如何部署dify?在Linux上安装部署Dify的技巧》随着云计算和容器技术的快速发展,Docker已经成为现代软件开发和部署的重要工具之一,Dify作为一款优秀的云原生应用... Dify 是一个基于 docker 的工作流管理工具,旨在简化机器学习和数据科学领域的多步骤工作流。它

Linux高并发场景下的网络参数调优实战指南

《Linux高并发场景下的网络参数调优实战指南》在高并发网络服务场景中,Linux内核的默认网络参数往往无法满足需求,导致性能瓶颈、连接超时甚至服务崩溃,本文基于真实案例分析,从参数解读、问题诊断到优... 目录一、问题背景:当并发连接遇上性能瓶颈1.1 案例环境1.2 初始参数分析二、深度诊断:连接状态与

Linux系统调试之ltrace工具使用与调试过程

《Linux系统调试之ltrace工具使用与调试过程》:本文主要介绍Linux系统调试之ltrace工具使用与调试过程,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐... 目录一、ltrace 定义与作用二、ltrace 工作原理1. 劫持进程的 PLT/GOT 表2. 重定

Linux区分SSD和机械硬盘的方法总结

《Linux区分SSD和机械硬盘的方法总结》在Linux系统管理中,了解存储设备的类型和特性是至关重要的,不同的存储介质(如固态硬盘SSD和机械硬盘HDD)在性能、可靠性和适用场景上有着显著差异,本文... 目录一、lsblk 命令简介基本用法二、识别磁盘类型的关键参数:ROTA查询 ROTA 参数ROTA

嵌入式Linux之使用设备树驱动GPIO的实现方式

《嵌入式Linux之使用设备树驱动GPIO的实现方式》:本文主要介绍嵌入式Linux之使用设备树驱动GPIO的实现方式,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐... 目录一、设备树配置1.1 添加 pinctrl 节点1.2 添加 LED 设备节点二、编写驱动程序2.1

嵌入式Linux驱动中的异步通知机制详解

《嵌入式Linux驱动中的异步通知机制详解》:本文主要介绍嵌入式Linux驱动中的异步通知机制,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录前言一、异步通知的核心概念1. 什么是异步通知2. 异步通知的关键组件二、异步通知的实现原理三、代码示例分析1. 设备结构

Linux搭建单机MySQL8.0.26版本的操作方法

《Linux搭建单机MySQL8.0.26版本的操作方法》:本文主要介绍Linux搭建单机MySQL8.0.26版本的操作方法,本文通过图文并茂的形式给大家讲解的非常详细,感兴趣的朋友一起看看吧... 目录概述环境信息数据库服务安装步骤下载前置依赖服务下载方式一:进入官网下载,并上传到宿主机中,适合离线环境

windows和Linux使用命令行计算文件的MD5值

《windows和Linux使用命令行计算文件的MD5值》在Windows和Linux系统中,您可以使用命令行(终端或命令提示符)来计算文件的MD5值,文章介绍了在Windows和Linux/macO... 目录在Windows上:在linux或MACOS上:总结在Windows上:可以使用certuti

Linux之systemV共享内存方式

《Linux之systemV共享内存方式》:本文主要介绍Linux之systemV共享内存方式,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录一、工作原理二、系统调用接口1、申请共享内存(一)key的获取(二)共享内存的申请2、将共享内存段连接到进程地址空间3、将

快速修复一个Panic的Linux内核的技巧

《快速修复一个Panic的Linux内核的技巧》Linux系统中运行了不当的mkinitcpio操作导致内核文件不能正常工作,重启的时候,内核启动中止于Panic状态,该怎么解决这个问题呢?下面我们就... 感谢China编程(www.chinasem.cn)网友 鸢一雨音 的投稿写这篇文章是有原因的。为了配置完