uWSGI线上优化

About 3620 wordsAbout 12 min

2024-01-20

- [] 介绍daphne和supervisor这两个软件服务的用法？
- [] 介绍uwsgitop的用法？

20240119 更新

今天一个朋友给我看了一下服务器内存的占用情况图，服务器大概 32G，内存的使用量达到了 93%的情况，这样子肯定是不行的；第二点就是 uwsgi 是其他同事安装的，自己没有任何了解，秉持着持续学习的心态，这里仔细的了解一下 uwsgi 的使用。

正常情况下在刚启动的时候只会占到 4 个多个的内存。

多了解涉及真的是一件好事情，今天搞了很久，最后在使用 supervisor 上面出现了问题，这个是我相当不熟悉的一个服务，真的惊险，后面具体说明

排查过程-重新启动服务

重新启动服务发现，默认情况下只会占用 4 个多 G 的内存，还有 25 个 G 的富裕量。

              total        used        free      shared  buff/cache   available
Mem:            31G        4.1G         25G        4.2M        1.4G         26G
Swap:          2.0G        742M        1.3G

使用 uswgitop 来查看每一个 worker 具体的执行情况，与后面运行一段时间信息作对照；这里开了 16 个 process，具体情况如下，有许多 worker 是没有工作的。

uwsgi-2.0.20 - Mon Jan 22 00:49:07 2024 - req: 117 - RPS: 0 - lq: 0 - tx: 1.3M
node: shifang1 - cwd: /home/project/dengc4r/xxxx/ - uid: 1002 - gid: 1002 - masterpid: 106134
 WID    %       PID     REQ     RPS     EXC     SIG     STATUS  AVG     RSS     VSZ     TX      ReSpwn  HC      RunT    LastSpwn
 14     49.6    106148  58      0       0       0       idle    242ms   0       0       943.9K  1       0       9025.437        00:43:27
 3      19.7    106137  23      0       0       0       idle    228ms   0       0       70.4K   1       0       3923.712        00:43:27
 13     11.1    106147  13      0       0       0       idle    435ms   0       0       207.2K  1       0       5196.51 00:43:27
 4      8.5     106138  10      0       0       0       idle    195ms   0       0       23.1K   1       0       2907.447        00:43:27
 15     5.1     106149  6       0       0       0       idle    298ms   0       0       14.2K   1       0       2837.843        00:43:27
 5      4.3     106139  5       0       0       0       idle    433ms   0       0       22.6K   1       0       3999.476        00:43:27
 7      0.9     106141  1       0       0       0       idle    472ms   0       0       263     1       0       945.449 00:43:27
 8      0.9     106142  1       0       0       0       idle    470ms   0       0       263     1       0       941.359 00:43:27
 1      0.0     106135  0       0       0       0       idle    0ms     0       0       0       1       0       0.0     00:43:27
 2      0.0     106136  0       0       0       0       idle    0ms     0       0       0       1       0       0.0     00:43:27
 6      0.0     106140  0       0       0       0       idle    0ms     0       0       0       1       0       0.0     00:43:27
 9      0.0     106143  0       0       0       0       idle    0ms     0       0       0       1       0       0.0     00:43:27
 10     0.0     106144  0       0       0       0       idle    0ms     0       0       0       1       0       0.0     00:43:27
 11     0.0     106145  0       0       0       0       idle    0ms     0       0       0       1       0       0.0     00:43:27
 12     0.0     106146  0       0       0       0       idle    0ms     0       0       0       1       0       0.0     00:43:27
 16     0.0     106150  0       0       0       0       idle    0ms     0       0       0       1       0       0.0     00:43:27

20240121 更新

1、uwsgi 懒加载, 属性: lazy-app

我们都知道 uwsgi 可以使用 uwsgi --ini uwsgi.ini来启动 uwsgi 服务，使用uwsgi --stop uwsgi.pid来停止 uwsgi 服务，使用uwsgi --reload uwsgi.pid来重启 uwsgi 服务。

就是在我们重启或者停止的时候，我们的服务是没有办法访问的，不管在配置的时候 prcesses 是设置的几个，这样如果是线上的话，会出现短暂的无法使用的状况，虽然时间比较短，但是我们应该尽量来避免这种情况的发生。

uwsgi 在配置的时候提供了另外一种方式来解决这个问题，就是如果我们启动 4 个 prcess，他会然其中一个 process 实现重启，然后在试第二个，在这个过程中我们的服务是可以访问的，不至于出去一点也不能访问的情况。

我们只需要在 uwsgi.ini 里面添加如下配置即可：

lazy-apps=true
touch-chain-reload = /home/dengc4r/xxx/bin/settings.py

我们想重启的时候，只需要 touch 这个文件就可以 -> touch ./setting.py

我们可以查看 log 记录，下面是这样输出的；我们可以看到，他实际就是一个一个 process 在重启。

Sun Jan 21 22:02:03 2024 - *** /home/dengc4r/xxx/bin/settings.py has been touched... chain reload !!! ***
Sun Jan 21 22:02:03 2024 - chain next victim is worker 1
Gracefully killing worker 1 (pid: 2433)...
Sun Jan 21 22:03:04 2024 - worker 1 (pid: 2433) is taking too much time to die...NO MERCY !!!
worker 1 killed successfully (pid: 2433)
Respawned uWSGI worker 1 (new pid: 9930)
Sun Jan 21 22:03:05 2024 - chain is still waiting for worker 1...
Sun Jan 21 22:03:06 2024 - chain is still waiting for worker 1...
WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x246be60 pid: 9930 (default app)
Sun Jan 21 22:03:07 2024 - chain next victim is worker 2
Gracefully killing worker 2 (pid: 2489)...

参考链接：

uwsgi graceful reload

2、使用 uwsgitop 查看个 worker 的执行情况

没有设置要使用的进程数或线程数的魔法规则。它是灰常依赖于应用和系统的。简单的算术，例如 processes = 2 * cpucores ，并不够。你需要对多种步骤进行实验，并且准备好不断监控当你的应用。 uwsgitop 会是一个帮你找到最佳值的好工具。

在 uwsgi 的中文文档里面有这样一句话，就是说 uwsgitop 是一个找到最佳值的好工具，我们就安装来看一下如何使用。

1、我们需要安装，python 在 pipy.org 里面写的很清晰，直接pip install uwsgitop, 这里结合自己的情况，是使用 pip 还是 pip3；也可以把安装包下载下来本地安装。

2、在 uswgi.ini 里面添加 stats = /opt/stats.socket, 修改完之后需要重启一下 uwsgi 服务

3、执行uwsgitop /opt/stats.socket，我们就可以看到如下执行内容，下面简单说明一下各个参数的含义，具体含义可以参考pypi.org/uwsgitop

uwsgi-2.0.20 - Sun Jan 21 23:12:24 2024 - req: 63 - RPS: 0 - lq: 0 - tx: 334.3K
node: BTHost - cwd: /home/dengc4r/xxx - uid: 0 - gid: 0 - masterpid: 11136
 WID    %       PID     REQ     RPS     EXC     SIG     STATUS  AVG     RSS     VSZ     TX      ReSpwn  HC      RunT    LastSpwn
 2      34.9    12681   22      0       0       0       idle    63ms    0       0       86.2K   1       0       4451.688        22:51:17
 3      28.6    12682   18      0       0       0       idle    694ms   0       0       150.9K  1       0       4316.87 22:51:17
 1      23.8    12680   15      0       0       0       idle    308ms   0       0       48.6K   1       0       3971.446        22:51:17
 4      12.7    12683   8       0       0       0       idle    397ms   0       0       48.6K   1       0       3012.243        22:51:17

Field	Description	说明
WID	worker id
%	Worker usage
PID	Worker pid
REQ	Number of requests the worker executed sinece last spawn	这个 worker 上一次创建出来执行了多少请求？
RPS	requests per second	每秒请求数
EXC	exceptions
STATUS	Worker is busy or free to use?	worker 的情况
AVG	Average request time
RSS	Worker RSS (Resident Set Size, see linux memory management)
VSZ	Worker vsz(virtual memory size, see linux memory management)	worker 的虚拟内存大小
TX	How much data was transmitted by the worker	transmitted （传播）？
RunT	How long the worker has been running?

过程中遇到的问题

socket 和 http 之间的区别？

一般情况下，我们的 uwsgi 都是配合 nginx 使用的，所以用的都是 socket-timeout 参数。这两者的区别简单说就是：uwsgi 单独使用就用 http, 配合 nginx 就用 socket

django 如何和 uwsgi 进行整合？

？

20240122 更新

推荐文章：

uwsgi 参数之 harakiri

对于 harakiri 的简单了解就是， harakiri 是一个设置的超时时间，如果后端服务器处理时间超过这个时间，那么后端将不在进行处理，并且重启当前执行的 worker。

uwsgi 参数之 max-requests

这个参数表示为每个 worker 的请求在达到一定数量后，工作进程将会被重新加载；如果我启动 4 个 process，max-requests 设置成为 100，就是说在每个进程请求数达到 100，总量达到 400 的时候，会重载这些 worker；我在实验的时候刚好是每一个 worker 都达到了 100 个请求之后重载的。

uwsgi-2.0.20 - Mon Jan 22 10:59:41 2024 - req: 400 - RPS: 0 - lq: 0 - tx: 13.2M
 WID    %       PID     REQ     RPS     EXC     SIG     STATUS  AVG     RSS     VSZ     TX      ReSpwn  HC      RunT    LastSpwn
 2      26.2    20201   100     0       0       0       idle    16ms    0       0       4.1M    3       0       49351.395       10:43:22
 1      24.9    20204   100     0       0       0       idle    82ms    0       0       7.0M    3       0       45436.789       10:43:23
 4      24.9    20205   100     0       0       0       idle    38ms    0       0       1.2M    3       0       46265.23        10:43:23
 3      24.0    20285   100     0       0       0       idle    23ms    0       0       986.6K  3       0       35035.232       10:43:53

同理往后每次在达到 400 个请求都会重载一次，下面我们看一下重新启动的日志，可以看出每一个 worker 都是先 killed 掉，然后在 respawned（启动），所以说这个时候我们是可以访问系统的，这样就很 nice。

...The work of process 19859 is done. Seeya!
Mon Jan 22 10:43:22 2024 - worker 2 (pid: 19868) is taking too much time to die...NO MERCY !!!
worker 2 killed successfully (pid: 19868)
Respawned uWSGI worker 2 (new pid: 20201)
Mon Jan 22 10:43:23 2024 - worker 1 (pid: 19841) is taking too much time to die...NO MERCY !!!
Mon Jan 22 10:43:23 2024 - worker 4 (pid: 19862) is taking too much time to die...NO MERCY !!!
worker 1 killed successfully (pid: 19841)
Respawned uWSGI worker 1 (new pid: 20204)
worker 4 killed successfully (pid: 19862)
Respawned uWSGI worker 4 (new pid: 20205)

20240123 更新

学习使用 supervisor 和 daphne

昨天真的是相当惊吓的一天晚上，我在优化 uwsgi 的时候，发现重启不了了，是因为 supervisor 报错了，说没有/tmp/supervisor.sock这个文件，我手动 touch 一个之后，又报错说unix:///tmp/supervisor.sock refused connection 然后就是一直启动不起来，我知道/tmp 下面的文件会被清掉。

这里简单描述一下服务的架构，项目是 python 项目，使用 django 作为开发框架，使用 uwsgi 做服务代理，使用 supervisor 来管理服务 daphne，使用 daphne 来间接 uwsgi，整体是这样部署的。

话说为什么要这样子搞呢？我们这里先简单了解一下各个服务都是干什么的

uWSGI：uWSGI 是一个 web 服务器，它实现了 WSGI 协议、uswgi、http 等协议。要注意 WSGI/uwsgi/uWSGI 这三个概念的区分
Daphne：Daphne 是一个异步的 web 服务器，用于在 Django 项目中提供 WebSocket 支持。它的目的是提供高效、快速的 HTTP 和 WebSocket 服务
Supervisor：Supervisor 是用 python 开发的一个 clien/server 服务，是 Linux/Unix 系统下的一个进程管理工具，不支持 windown 系统。它可以很方便的监听、启动、停止、重启一个或者多个进程。用 supervisor 管理的进程，当 supervisor 监听到进程死后，会自动将它重新拉起来，很方便的做到进程自动恢复的功能，不在需要自己写 shell 脚本来控制。

项目用到 websocket、异步等功能所以用到了 Daphne，运行项目用 uwsgi，想着因为外界原因导致程序挂了自动重启就用到了 supervisor，上面说到的问题也就是这里的 supervisor 出的问题。

简单看一下 supervisor 的配置文件

下面是现在使用的配置文件，里面定义了 daphne 的配置项。

我们可以看到 supervisor.sock 文件是放在/tmp 下面的，但是这是一个风险，下面会解释。

我们也可以看到启动 daphne 的启动命令，是启动到 9001 上面，日志是放到对应目录的 websocket.log 下面。

unix_http_server

[unix_http_server]
file=/tmp/supervisor.sock   ; the path to the socket file

Supervisorctl

[supervisorctl]
serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL  for a unix socket

Daphne

[program:daphne]
directory=/home/dengc4r/work/xxxx/   #项目目录
command=daphne -b 127.0.0.1 -p 9001 --proxy-headers api.asgi:application #启动命令
autostart=true
autorestart=true
stdout_logfile=/home/dengc4r/work/xxxx/websocket.log  #日志
#logfile_maxbytes=50MB       #日志文件大小，超出会rotate，默认 50MB，如果设成0，表示不限制大小
#logfile_backups=10           #日志文件保留备份数量默认10，设为0表示不备份
redirect_stderr=true

下面我从 supervisor 的配置文件里面摘抄出来一段如下：

说的是/tmp是大多数系统都存在的一个目录，我们可以修改成适合我们系统的一个路径；一些系统定期会删除 tmp 目录下面的旧文件，/tmp/supervisor.sock被删除之后 supervisorctl 将不能连接 supervisord 服务。

就是因为这个文件被删除了，导致启动不了。

; Warning:
;  Paths throughout this example file use /tmp because it is available on most
;  systems.  You will likely need to change these to locations more appropriate
;  for your system.  Some systems periodically delete older files in /tmp.
;  Notably, if the socket file defined in the [unix_http_server] section below
;  is deleted, supervisorctl will be unable to connect to supervisord.

解决问题

我们找到了问题出在哪里，现在我们需要解决这个问题，有两种方式，

一种是把 supervisor.pid, supervisor.log 都删掉，然后在用相应的配置文件重新启动，例如supervisord -c ./supervisord.conf 。

另一种就是手动 touch 一个/tmp/supervisor.sock，然后chmod 0700 /tmp/supervisor.sock

为了防止文件被删掉，还可以把文件配置在其他目录下面；

对于修改后的服务我还在观察，看有什么问题，后面在更新。

参考博客：

Linux 命令 chmod 后为什么有时候是四位数字参数（如 0755）？【秒懂】
解决 supervisor unix:///var/run/supervisor.sock no such file, 亲测有效

dengc4r@shifang1:~$ ps -ef | grep uwsgi  // 这里应该把这些都kill掉
dengc4r     101591      1 24 1月21 ?       00:01:30 /home/dengc4r/.local/bin/uwsgi ./uwsgi.ini
dengc4r     101593      1 33 1月21 ?       00:02:06 /home/dengc4r/.local/bin/uwsgi ./uwsgi.ini
dengc4r     101599      1  3 1月21 ?       00:00:11 /home/dengc4r/.local/bin/uwsgi ./uwsgi.ini
dengc4r     102423 102402  0 00:05 pts/1    00:00:00 grep --color=auto uwsgi
dengc4r@shifang1:~/dengc4r/xxxx/bin$ ./app.sh stop
Stopping uwsgi:
unix:///tmp/supervisor.sock refused connection  // 不能暂停，这里应该把supervisor.log supervisor.pid这些文件都删掉，然后重启supervisord服务
daphne stopuwsgi STOPED.

dengc4r@shifang1:~/dengc4r/xxxx/bin$ ./app.sh restart
Restarting uwsgi:
signal_pidfile()/kill(): No such process [core/uwsgi.c line 1695]
uwsgi stopunix:///tmp/supervisor.sock refused connection
daphne stopuwsgi STOPED.
[uWSGI] getting INI configuration from ./uwsgi.ini
unix:///tmp/supervisor.sock refused connection
daphne runingdengc4r@shifang1:~/dengc4r/xxxx/bin$ rm -rf /tmp/supervisor.sock

dengc4r@shifang1:~/dengc4r/xxxx/bin$ ./app.sh restart
Restarting uwsgi:
signal_pidfile()/kill(): No such process [core/uwsgi.c line 1695]
uwsgi stopunix:///tmp/supervisor.sock no such file
daphne stopuwsgi STOPED.
[uWSGI] getting INI configuration from ./uwsgi.ini
unix:///tmp/supervisor.sock no such file
daphne runing

dengc4r@shifang1:~/dengc4r/xxxx/bin$ touch /tmp/supervisor.sock  // 这里提示没哟这个文件，不能直接touch，删除supervisor.pid log文件重启

root@shifang1:/home/project/dengc4r/xxxx//bin# supervisorctl shutdown  // 关闭supervisor服务
ERROR: unix:///tmp/supervisor.sock refused connection (already shut down?)  // 出现拒绝访问的报错
root@shifang1:/home/project/dengc4r/xxxx//bin# rm /tmp/supervisor.sock   // 删掉
root@shifang1:/home/project/dengc4r/xxxx//bin# ps -ef | grep super
root        730      1  0  2023 ?        00:27:41 /usr/bin/python /usr/bin/supervisord -n -c /etc/supervisor/supervisord.conf  // 发现起来了一个supervisor服务，但是这个不是希望起的。
root@shifang1:/home/project/dengc4r/xxxx//bin# supervisorctl shutdown  // 再关
ERROR: unix:///tmp/supervisor.sock no such file (already shut down?)  // 报错乜有socket文件，刚才我已经把他删掉了
root@shifang1:/home/project/dengc4r/xxxx//bin# kill -9 730  // 把supervisor这个服务kill掉
root@shifang1:/home/project/dengc4r/xxxx//bin# supervisord -c ./supervisord.conf  // 然后使用自己的supervisor.conf 启动supervisor
root@shifang1:/home/project/dengc4r/xxxx//bin# supervisorctl
daphne                           FATAL     can't find command '/home/dengc4r/.local/bin/daphne/daphne'  // 发现找不到daphne这个命令
supervisor> exit
root@shifang1:/home/project/dengc4r/xxxx//bin# ps -ef | grep daphne  // 发现daphne在运行中
dengc4r      87606      1 90  2023 ?        102-16:58:02 /usr/bin/python3 /home/dengc4r/.local/bin/daphne -b 127.0.0.1 -p 8001 --proxy-headers api.asgi:application

dengc4r@shifang1:~/dengc4r/xxxx/bin$ ./app.sh restart   // 然后重启服务 发现服务启动，但是报错一个权限问题
Restarting uwsgi:
The ./uwsgi.pid doesn't found
[uWSGI] getting INI configuration from ./uwsgi.ini
error: <class 'PermissionError'>, [Errno 13] Permission denied: file: /home/dengc4r/.local/lib/python3.7/site-packages/supervisor/xmlrpc.py line: 560

Changelog

8/20/25, 11:06 AM

View All Changelog

4c155-Merge branch 'dev1'on 8/20/25

uWSGI线上优化

求求了，快滚去学习！！！