故障排除
¥Troubleshooting
为了使此页面保持最新状态,我们很大程度上依赖社区的贡献。如果你发现某些内容不再是最新的,请发送 PR。
¥To keep this page up-to-date we largely rely on community contributions. Please send a PR if you notice something is no longer up-to-date.
Cannot find module 'puppeteer-core/internal/...'
如果你的 Node.js 版本低于 14 或者你使用自定义解析器(例如 jest-resolve
),则可能会发生这种情况。对于前者,我们不支持已弃用的 Node.js 版本。对于后者,通常升级解析器(或其父模块,例如 jest
)就可以了(例如 https://github.com/puppeteer/puppeteer/issues/9121)
¥This can occur if your Node.js version is lower than 14 or if you are using a
custom resolver (such as
jest-resolve
). For the former,
we do not support deprecated versions of Node.js. For the latter, usually
upgrading the resolver (or its parent module such as jest
) will work (e.g.
https://github.com/puppeteer/puppeteer/issues/9121)
Could not find expected browser locally
从 v19.0.0 开始,Puppeteer 将使用 os.homedir
将浏览器下载到 ~/.cache/puppeteer
,以便在 Puppeteer 升级之间实现更好的缓存。通常,主目录是明确定义的(即使在 Windows 上),但有时主目录可能不 可用。在这种情况下,我们提供 PUPPETEER_CACHE_DIR
变量,允许你更改安装目录。
¥Starting from v19.0.0, Puppeteer will download browsers into
~/.cache/puppeteer
using
os.homedir
for better caching
between Puppeteer upgrades. Generally the home directory is well-defined (even
on Windows), but occasionally the home directory may not be available. In this
case, we provide the PUPPETEER_CACHE_DIR
variable which allows you to change
the installation directory.
例如,
¥For example,
- npm
- Yarn
- pnpm
PUPPETEER_CACHE_DIR=$(pwd) npm install puppeteer
PUPPETEER_CACHE_DIR=$(pwd) node <script-path>
PUPPETEER_CACHE_DIR=$(pwd) yarn add puppeteer
PUPPETEER_CACHE_DIR=$(pwd) node <script-path>
PUPPETEER_CACHE_DIR=$(pwd) pnpm add puppeteer
PUPPETEER_CACHE_DIR=$(pwd) node <script-path>
你还可以在应用的根目录下创建一个名为 .puppeteerrc.cjs
(或 puppeteer.config.cjs
)的配置文件,其中包含以下内容
¥You can also create a configuration file named .puppeteerrc.cjs
(or
puppeteer.config.cjs
) at the root of your application with the contents
const {join} = require('path');
/**
* @type {import("puppeteer").Configuration}
*/
module.exports = {
cacheDirectory: join(__dirname, '.cache', 'puppeteer'),
};
你需要重新安装 puppeteer
才能使配置生效。请参阅 配置 Puppeteer 了解更多信息。
¥You will need to reinstall puppeteer
in order for the configuration to take
effect. See Configuring Puppeteer for more
information.
Chrome 无法在 Windows 上启动
¥Chrome doesn't launch on Windows
某些 chrome 政策 可能会强制运行带有某些扩展的 Chrome/Chromium。
¥Some chrome policies might enforce running Chrome/Chromium with certain extensions.
Puppeteer 默认情况下会传递 --disable-extensions
标志,并且当此类策略处于活动状态时将无法启动。
¥Puppeteer passes --disable-extensions
flag by default and will fail to launch
when such policies are active.
要解决此问题,请尝试在不使用该标志的情况下运行:
¥To work around this, try running without the flag:
const browser = await puppeteer.launch({
ignoreDefaultArgs: ['--disable-extensions'],
});
上下文:问题 3681。
¥Context: issue 3681.
Chrome 在 Windows 上报告沙箱错误
¥Chrome reports sandbox errors on Windows
Chrome 在 Windows 上使用沙箱,这需要对下载的 Chrome 文件具有额外的权限。目前,Puppeteer 无法为你设置这些权限。
¥Chrome uses sandboxes on Windows which require additional permissions on the downloaded Chrome files. Currently, Puppeteer is not able to set those permissions for you.
如果遇到此问题,你将在浏览器标准输出中看到如下错误:
¥If you encounter this issue, you will see errors like this in the browser stdout:
[24452:59820:0508/113713.058:ERROR:sandbox_win.cc(913)] Sandbox cannot access executable. Check filesystem permissions are valid. See https://bit.ly/31yqMJR.: Access is denied. (0x5)
要解决此问题,请使用 icacls 实用程序手动设置权限:
¥To workaround the issue, use the icacls utility to set permissions manually:
icacls %USERPROFILE%/.cache/puppeteer/chrome /grant *S-1-15-2-1:(OI)(CI)(RX)
详细信息请参见 https://bit.ly/31yqMJR。
¥See https://bit.ly/31yqMJR for more details.
Chrome 无法在 Linux 上启动
¥Chrome doesn't launch on Linux
确保安装了所有必需的依赖。你可以在 Linux 计算机上运行 ldd chrome | grep not
以检查缺少哪些依赖。下面提供了常见的。另请参阅 https://source.chromium.org/chromium/chromium/src/+/main:chrome/installer/linux/debian/dist_package_versions.json 以获取 Chrome 安装程序声明的最新依赖列表。
¥Make sure all the necessary dependencies are installed. You can run ldd chrome | grep not
on a Linux machine to check which dependencies are missing. The
common ones are provided below. Also, see
https://source.chromium.org/chromium/chromium/src/+/main:chrome/installer/linux/debian/dist_package_versions.json
for the up-to-date list of dependencies declared by the Chrome installer.
Chrome 目前不提供适用于 Linux 的 arm64 二进制文件。只有适用于 Mac ARM 的 arm64 二进制文件。这意味着默认下载的 Linux 二进制文件无法在 Linux arm64 上运行。
¥Chrome currently does not provide arm64 binaries for Linux. There are only arm64 binaries for Mac ARM. That means that Linux binaries downloaded by default will not work on Linux arm64.
Debian (e.g. Ubuntu) Dependencies
ca-certificates
fonts-liberation
libasound2
libatk-bridge2.0-0
libatk1.0-0
libc6
libcairo2
libcups2
libdbus-1-3
libexpat1
libfontconfig1
libgbm1
libgcc1
libglib2.0-0
libgtk-3-0
libnspr4
libnss3
libpango-1.0-0
libpangocairo-1.0-0
libstdc++6
libx11-6
libx11-xcb1
libxcb1
libxcomposite1
libxcursor1
libxdamage1
libxext6
libxfixes3
libxi6
libxrandr2
libxrender1
libxss1
libxtst6
lsb-release
wget
xdg-utils
CentOS Dependencies
alsa-lib.x86_64
atk.x86_64
cups-libs.x86_64
gtk3.x86_64
ipa-gothic-fonts
libXcomposite.x86_64
libXcursor.x86_64
libXdamage.x86_64
libXext.x86_64
libXi.x86_64
libXrandr.x86_64
libXScrnSaver.x86_64
libXtst.x86_64
pango.x86_64
xorg-x11-fonts-100dpi
xorg-x11-fonts-75dpi
xorg-x11-fonts-cyrillic
xorg-x11-fonts-misc
xorg-x11-fonts-Type1
xorg-x11-utils
安装依赖后,你需要使用此命令更新 nss
库
¥After installing dependencies you need to update nss
library using this
command
yum update nss -y
Check out discussions
chrome-headless-shell 禁用 GPU 合成
¥chrome-headless-shell disables GPU compositing
chrome-headless-shell 需要 --enable-gpu
到 在无头模式下启用 GPU 加速。
¥chrome-headless-shell requires --enable-gpu
to
enable GPU acceleration in headless mode.
const browser = await puppeteer.launch({
headless: 'shell',
args: ['--enable-gpu'],
});
使用 Chrome 设置 GPU
¥Setting up GPU with Chrome
一般来说,如果系统有合适的驱动程序,Chrome 应该能够检测并启用 GPU。有关其他提示,请参阅以下博客文章 https://developer.chrome.com/blog/supercharge-web-ai-testing。
¥Generally, Chrome should be able to detect and enable GPU if the system has appropriate drivers. For additional tips, see the following blog post https://developer.chrome.com/blog/supercharge-web-ai-testing.
设置 Chrome Linux 沙箱
¥Setting Up Chrome Linux Sandbox
为了保护主机环境免受不受信任的 Web 内容的影响,Chrome 使用 多层沙箱。为了使其正常工作,应首先配置主机。如果 Chrome 没有好的沙箱可以使用,它就会崩溃并出现错误 No usable sandbox!
。
¥In order to protect the host environment from untrusted web content, Chrome uses
multiple layers of sandboxing.
For this to work properly, the host should be configured first. If there's no
good sandbox for Chrome to use, it will crash with the error
No usable sandbox!
.
如果你绝对信任在 Chrome 中打开的内容,则可以使用 --no-sandbox
参数启动 Chrome:
¥If you absolutely trust the content you open in Chrome, you can launch
Chrome with the --no-sandbox
argument:
const browser = await puppeteer.launch({
args: ['--no-sandbox', '--disable-setuid-sandbox'],
});
强烈建议不要在没有沙箱的情况下运行。考虑配置沙箱。
¥Running without a sandbox is strongly discouraged. Consider configuring a sandbox instead.
运行 Chrome 的推荐方 式是使用沙箱
¥The recommended way to run Chrome is using sandboxes
Ubuntu 上的 AppArmor 问题
¥Issues with AppArmor on Ubuntu
Ubuntu 23.10+(或未来可能推出的其他 Linux 发行版)发布了一个 AppArmor 配置文件,该配置文件适用于安装在 /opt/google/chrome/chrome(默认安装路径)的 Chrome 稳定二进制文件。此策略存储在 /etc/apparmor.d/chrome。此 AppArmor 策略可防止 Puppeteer 下载的 Chrome for Testing 二进制文件使用用户命名空间,从而导致尝试启动浏览器时出现 No usable sandbox!
错误。
¥Ubuntu 23.10+ (or possibly other Linux distros in the future) ship an
AppArmor profile that applies to Chrome stable binaries installed at
/opt/google/chrome/chrome (the default installation path). This policy
is stored at /etc/apparmor.d/chrome. This AppArmor policy prevents
Chrome for Testing binaries downloaded by Puppeteer from using user namespaces
resulting in the No usable sandbox!
error when trying to launch the
browser.
有关解决方法,请参阅 https://chromium.googlesource.com/chromium/src/+/main/docs/security/apparmor-userns-restrictions.md。
¥For workarounds, see https://chromium.googlesource.com/chromium/src/+/main/docs/security/apparmor-userns-restrictions.md.
使用 setuid 沙箱
¥Using setuid sandbox
IMPORTANT NOTE:Linux SUID 沙箱几乎但并未完全删除。请参阅 https://bugs.chromium.org/p/chromium/issues/detail?id=598454 本节大部分内容已过时。
¥IMPORTANT NOTE: The Linux SUID sandbox is almost but not completely removed. See https://bugs.chromium.org/p/chromium/issues/detail?id=598454 This section is mostly out-of-date.
setuid 沙箱作为独立的可执行文件提供,位于 Puppeteer 下载的 Chrome 旁边。可以为不同的 Chrome 版本重复使用相同的沙箱可执行文件,因此每个主机环境只能执行一次以下操作:
¥The setuid sandbox comes as a standalone executable and is located next to the Chrome that Puppeteer downloads. It is fine to re-use the same sandbox executable for different Chrome versions, so the following could be done only once per host environment:
# cd to Puppeteer cache directory (adjust the path if using a different cache directory).
cd ~/.cache/puppeteer/chrome/linux-<version>/chrome-linux64/
sudo chown root:root chrome_sandbox
sudo chmod 4755 chrome_sandbox
# copy sandbox executable to a shared location
sudo cp -p chrome_sandbox /usr/local/sbin/chrome-devel-sandbox
# export CHROME_DEVEL_SANDBOX env variable
export CHROME_DEVEL_SANDBOX=/usr/local/sbin/chrome-devel-sandbox
你可能希望默认导出 CHROME_DEVEL_SANDBOX
环境变量。在这种情况下,请将以下内容添加到 ~/.bashrc
或 .zshenv
:
¥You might want to export the CHROME_DEVEL_SANDBOX
env variable by default. In
this case, add the following to the ~/.bashrc
or .zshenv
:
export CHROME_DEVEL_SANDBOX=/usr/local/sbin/chrome-devel-sandbox
或发送至你的 Dockerfile
:
¥or to your Dockerfile
:
ENV CHROME_DEVEL_SANDBOX /usr/local/sbin/chrome-devel-sandbox
在 Travis CI 上运行 Puppeteer
¥Running Puppeteer on Travis CI
👋 我们在 Travis CI 上运行 Puppeteer 测试,直到 v6.0.0(当我们迁移到 GitHub Actions 时) - 请参阅我们的历史
.travis.yml
(v5.5.0) 以供参考。¥👋 We ran our tests for Puppeteer on Travis CI until v6.0.0 (when we've migrated to GitHub Actions) - see our historical
.travis.yml
(v5.5.0) for reference.
提示与技巧:
¥Tips-n-tricks:
-
应启动 xvfb 服务才能在非无头模式下运行 Chrome for Testing
¥xvfb service should be launched in order to run Chrome for Testing in non-headless mode
-
默认在 Travis 上的 Xenial Linux 上运行
¥Runs on Xenial Linux on Travis by default
-
默认运行
npm install
¥Runs
npm install
by default -
默认缓存
node_modules
¥
node_modules
is cached by default
.travis.yml
可能看起来像这样:
¥.travis.yml
might look like this:
language: node_js
node_js: node
services: xvfb
script:
- npm test
在 WSL(适用于 Linux 的 Windows 子系统)上运行 Puppeteer
¥Running Puppeteer on WSL (Windows subsystem for Linux)
请参阅 这个线程,其中包含一些特定于 WSL 的提示。简而言之,你需要通过以下任一方式安装缺少的依赖:
¥See this thread with some tips specific to WSL. In a nutshell, you need to install missing dependencies by either:
-
手动安装所需的依赖:
sudo apt install libgtk-3-dev libnotify-dev libgconf-2-4 libnss3 libxss1 libasound2
。¥Installing required dependencies manually:
sudo apt install libgtk-3-dev libnotify-dev libgconf-2-4 libnss3 libxss1 libasound2
.
所需依赖列表可能会过时,并且取决于你已安装的内容。
¥The list of required dependencies might get outdated and depend on what you already have installed.
在 CircleCI 上运行 Puppeteer
¥Running Puppeteer on CircleCI
在 CircleCI 上顺利运行 Puppeteer 需要以下步骤:
¥Running Puppeteer smoothly on CircleCI requires the following steps:
-
在你的配置中从 NodeJS 图片 开始,如下所示:
¥Start with a NodeJS image in your config like so:
docker:
- image: circleci/node:14 # Use your desired version
environment:
NODE_ENV: development # Only needed if puppeteer is in `devDependencies` -
像
libXtst6
这样的依赖可能需要通过apt-get
安装,因此使用 threetreeslight/puppeteer orb (instructions),或者将其 source 的部分粘贴到你自己的配置中。¥Dependencies like
libXtst6
probably need to be installed viaapt-get
, so use the threetreeslight/puppeteer orb (instructions), or paste parts of its source into your own config. -
最后,如果你通过 Jest 使用 Puppeteer,那么你可能会在生成子进程时遇到错误:
¥Lastly, if you’re using Puppeteer through Jest, then you may encounter an error spawning child processes:
[00:00.0] jest args: --e2e --spec --max-workers=36
Error: spawn ENOMEM
at ChildProcess.spawn (internal/child_process.js:394:11)这可能是由于 Jest 自动检测整个计算机上的进程数 (
36
) 而不是容器允许的进程数 (2
) 造成的。要解决此问题,请在测试命令中设置jest --maxWorkers=2
。¥This is likely caused by Jest autodetecting the number of processes on the entire machine (
36
) rather than the number allowed to your container (2
). To fix this, setjest --maxWorkers=2
in your test command.
在 Docker 中运行 Puppeteer
¥Running Puppeteer in Docker
👋 我们使用 Cirrus Ci 在 Docker 容器中运行 Puppeteer 测试,直到 v3.0.x - 请参阅我们的历史
Dockerfile.linux
(v3.0.1) 以供参考。从 v16.0.0 开始,我们通过 GitHub 注册表传送 Docker 镜像。Dockerfile 位于 此处,使用说明位于 README.md。如果你正在构建自己的映像,下面的说明可能仍然有帮助。¥👋 We used Cirrus Ci to run our tests for Puppeteer in a Docker container until v3.0.x - see our historical
Dockerfile.linux
(v3.0.1) for reference. Starting from v16.0.0 we are shipping a Docker image via the GitHub registry. The Dockerfile is located here and the usage instructions are in the README.md. The instructions below might be still helpful if you are building your own image.
在 Docker 中启动并运行无头 Chrome 可能很棘手。Puppeteer 安装的打包 Chrome for Testing 缺少必要的共享库依赖。
¥Getting headless Chrome up and running in Docker can be tricky. The bundled Chrome for Testing that Puppeteer installs is missing the necessary shared library dependencies.
要修复此问题,你需要在 Dockerfile 中安装缺少的依赖和最新的 Chrome for Testing 包:
¥To fix, you'll need to install the missing dependencies and the latest Chrome for Testing package in your Dockerfile:
FROM node:14-slim
# Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)
# Note: this installs the necessary libs to make the bundled version of Chrome for Testing that Puppeteer
# installs, work.
RUN apt-get update \
&& apt-get install -y wget gnupg \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1 \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# If running Docker >= 1.13.0 use docker run's --init arg to reap zombie processes, otherwise
# uncomment the following lines to have `dumb-init` as PID 1
# ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.2/dumb-init_1.2.2_x86_64 /usr/local/bin/dumb-init
# RUN chmod +x /usr/local/bin/dumb-init
# ENTRYPOINT ["dumb-init", "--"]
# Uncomment to skip the Chrome for Testing download when installing puppeteer. If you do,
# you'll need to launch puppeteer with:
# browser.launch({executablePath: 'google-chrome-stable'})
# ENV PUPPETEER_SKIP_DOWNLOAD true
# Install puppeteer so it's available in the container.
RUN npm init -y && \
npm i puppeteer \
# Add user so we don't need --no-sandbox.
# same layer as npm install to keep re-chowned files from using up several hundred MBs more space
&& groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser \
&& mkdir -p /home/pptruser/Downloads \
&& chown -R pptruser:pptruser /home/pptruser \
&& chown -R pptruser:pptruser /node_modules \
&& chown -R pptruser:pptruser /package.json \
&& chown -R pptruser:pptruser /package-lock.json
# Run everything after as non-privileged user.
USER pptruser
CMD ["google-chrome-stable"]
构建容器:
¥Build the container:
docker build -t puppeteer-chrome-linux .
通过传递 node -e "<yourscript.js content as a string>"
作为命令来运行容器:
¥Run the container by passing node -e "<yourscript.js content as a string>"
as
the command:
docker run -i --init --rm --cap-add=SYS_ADMIN \
--name puppeteer-chrome puppeteer-chrome-linux \
node -e "`cat yourscript.js`"
https://github.com/ebidel/try-puppeteer 上有一个完整的示例,展示了如何从 App Engine Flex(节点)上运行的 Web 服务器运行此 Dockerfile。
¥There's a full example at https://github.com/ebidel/try-puppeteer that shows how to run this Dockerfile from a webserver running on App Engine Flex (Node).
在 Alpine 上运行
¥Running on Alpine
请注意 Chrome 不支持开箱即用的 Alpine,因此请确保在 Alpine 上安装了兼容的系统依赖,并在使用前测试映像。请参阅 https://source.chromium.org/chromium/chromium/src/+/main:chrome/installer/linux/rpm/dist_package_provides.json 和 https://source.chromium.org/chromium/chromium/src/+/main:chrome/installer/linux/debian/dist_package_versions.json 了解受支持的发行版所需的系统软件包列表。
¥Note that Chrome does not support Alpine out of the box so make sure you have compatible system dependencies installed on Alpine and test the image before using it. See https://source.chromium.org/chromium/chromium/src/+/main:chrome/installer/linux/rpm/dist_package_provides.json and https://source.chromium.org/chromium/chromium/src/+/main:chrome/installer/linux/debian/dist_package_versions.json for the list of system packages required on supported distros.
CAUTION
Alpine 3.20 中的当前 Chromium 版本导致 Puppeteer 出现超时问题。降级到 Alpine 3.19 可解决此问 题。请参阅 #11640、#12637、#12189
¥The current Chromium version in Alpine 3.20 is causing timeout issues with Puppeteer. Downgrading to Alpine 3.19 fixes the issue. See #11640, #12637, #12189
你需要找到 最新的 Chromium 软件包,然后查找 Puppeteer 的 支持的浏览器版本 并使用相应的版本。
¥You need to find the newest Chromium package, then look up the supported browser version for Puppeteer and use the coresponding version.
示例:
¥Example:
Alpine Chromium 版本:100
¥Alpine Chromium version: 100
Puppeteer:Puppeteer v13.5.0
Dockerfile:
FROM alpine
# Installs Chromium (100) package.
RUN apk add --no-cache \
chromium \
nss \
freetype \
harfbuzz \
ca-certificates \
ttf-freefont \
nodejs \
yarn
...
# Tell Puppeteer to skip installing Chrome. We'll be using the installed package.
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser
# Puppeteer v13.5.0 works with Chromium 100.
RUN yarn add puppeteer@13.5.0
# Add user so we don't need --no-sandbox.
RUN addgroup -S pptruser && adduser -S -G pptruser pptruser \
&& mkdir -p /home/pptruser/Downloads /app \
&& chown -R pptruser:pptruser /home/pptruser \
&& chown -R pptruser:pptruser /app
# Run everything after as non-privileged user.
USER pptruser
...
在 GitlabCI 上运行 Puppeteer
¥Running Puppeteer on GitlabCI
这与上面的一些说明非常相似,但需要一些不同的配置才能最终成功。
¥This is very similar to some of the instructions above, but require a bit different configuration to finally achieve success.
通常问题是这样的:
¥Usually the issue looks like this:
Error: Failed to launch chrome! spawn /usr/bin/chromium-browser ENOENT
你需要修补两个地方:
¥You need to patch two places:
-
你的
gitlab-ci.yml
配置¥Your
gitlab-ci.yml
config -
启动 puppeteer 时的参数列表
¥Arguments' list when launching puppeteer
在 gitlab-ci.yml
中,我们需要安装一些软件包,以便可以在你的 docker 环境中启动 headless Chrome:
¥In gitlab-ci.yml
we need to install some packages to make it possible to
launch headless Chrome in your docker env:
before_script:
- apt-get update
- apt-get install -yq gconf-service libasound2 libatk1.0-0 libc6 libcairo2
libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgbm1 libgcc1 libgconf-2-4
libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0
libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1
libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1
libxss1 libxtst6 ca-certificates fonts-liberation libnss3 lsb-release
xdg-utils wget
接下来,启动 Puppeteer 时必须使用 '--no-sandbox'
模式和 '--disable-setuid-sandbox'
。这可以通过将它们作为参数传递给 .launch()
调用来完成:puppeteer.launch({ args: ['--no-sandbox', '--disable-setuid-sandbox'] });
。
¥Next, you have to use '--no-sandbox'
mode and also
'--disable-setuid-sandbox'
when launching Puppeteer. This can be done by
passing them as an arguments to your .launch()
call:
puppeteer.launch({ args: ['--no-sandbox', '--disable-setuid-sandbox'] });
.
在 Google Cloud Run 上运行 Puppeteer
¥Running Puppeteer on Google Cloud Run
在 HTTP 响应写入客户端后,Google Cloud Run 默认禁用 CPU。这意味着,如果你在写完回复后 "在后台运行 puppeteer",puppeteer 将显得非常慢(需要 1-5 分钟才能启动)。
¥Google Cloud Run disables the CPU by default, after an HTTP response is written to the client. This means that puppeteer will appear extremely slow (taking 1-5 minutes to launch), if you "run puppeteer in the background" after your response has been written.
所以这个简单的 Express 应用会明显很慢:
¥So this simple express app will be percievably slow:
import express from 'express';
const app = express();
app.post('/test-puppeteer', (req, res) => {
res.json({
jobId: 123,
acknowledged: true,
});
puppeteer.launch().then(browser => {
// 2 minutes later...
});
});
app.listen(3000);
它很慢,因为 GCR 上的 CPU 被禁用,因为 puppeteer 在发送响应后启动。你想做的是这样的:
¥It is slow because CPU is disabled on GCR because puppeteer is launched after the response is sent. What you want to do is this:
app.post('/test-puppeteer', (req, res) => {
puppeteer.launch().then(browser => {
// A second later...
res.json({
jobId: 123,
acknowledged: true,
});
});
});
如果你想在后台运行这些东西,即使在发送响应之后你也需要 "始终启用 CPU"。那应该解决它。
¥If you want to run the stuff in the background, you need to "enable CPU always" even after responses are sent. That should fix it.
提示
¥Tips
默认情况下,Docker 运行一个具有 /dev/shm
共享内存空间 64MB 的容器。这是 Chrome 的 通常太小,会导致 Chrome 在渲染大页面时崩溃。要修复此问题,请使用 docker run --shm-size=1gb
运行容器以增加 /dev/shm
的大小。从 Chrome 65 开始,不再需要这样做。相反,使用 --disable-dev-shm-usage
标志启动浏览器:
¥By default, Docker runs a container with a /dev/shm
shared memory space 64MB.
This is typically too small
for Chrome and will cause Chrome to crash when rendering large pages. To fix,
run the container with docker run --shm-size=1gb
to increase the size of
/dev/shm
. Since Chrome 65, this is no longer necessary. Instead, launch the
browser with the --disable-dev-shm-usage
flag:
const browser = await puppeteer.launch({
args: ['--disable-dev-shm-usage'],
});
这会将共享内存文件写入 /tmp
而不是 /dev/shm
。详细信息请参见 crbug.com/736452。
¥This will write shared memory files into /tmp
instead of /dev/shm
. See
crbug.com/736452
for more details.
启动 Chrome 时看到其他奇怪的错误?在本地开发时尝试使用 docker run --cap-add=SYS_ADMIN
运行容器。由于 Dockerfile 将 pptr
用户添加为非特权用户,因此它可能不具有所有必要的特权。
¥Seeing other weird errors when launching Chrome? Try running your container with
docker run --cap-add=SYS_ADMIN
when developing locally. Since the Dockerfile
adds a pptr
user as a non-privileged user, it may not have all the necessary
privileges.
如果你遇到大量僵尸 Chrome 进程,那么 dumb-init 值得一试。对于 PID=1 的进程有特殊处理,这使得在某些情况下(例如在 Docker 中)很难正确终止 Chrome。
¥dumb-init is worth checking out if you're experiencing a lot of zombies Chrome processes sticking around. There's special treatment for processes with PID=1, which makes it hard to terminate Chrome properly in some cases (e.g. in Docker).
在云端运行 Puppeteer
¥Running Puppeteer in the cloud
在 Google App Engine 上运行 Puppeteer
¥Running Puppeteer on Google App Engine
App Engine 标准环境 的 Node.js 运行时附带运行 Headless Chrome 所需的所有系统包。
¥The Node.js runtime of the App Engine standard environment comes with all system packages needed to run Headless Chrome.
要使用 puppeteer
,请将该模块指定为 package.json
中的依赖,然后通过在应用的根目录中包含名为 .puppeteerrc.cjs
的文件来覆盖 puppeteer 缓存目录,其内容如下:
¥To use puppeteer
, specify the module as a dependency in your package.json
and then override the puppeteer cache directory by including a file named
.puppeteerrc.cjs
at the root of your application with the contents:
const {join} = require('path');
/**
* @type {import("puppeteer").Configuration}
*/
module.exports = {
cacheDirectory: join(__dirname, 'node_modules', '.puppeteer_cache'),
};
[!NOTE] Google App Engine 会在构建之间缓存你的
node_modules
。将 Puppeteer 缓存指定为node_modules
的子目录可以缓解因postinstall
未运行而导致 Puppeteer 无法找到浏览器可执行文件的问题。¥[!NOTE] Google App Engine caches your
node_modules
between builds. Specifying the Puppeteer cache as subdirectory ofnode_modules
mitigates an issue in which Puppeteer can't find the browser executable due topostinstall
not being run.
在 Google Cloud Functions 上运行 Puppeteer
¥Running Puppeteer on Google Cloud Functions
谷歌云功能 的 Node.js 运行时附带了运行 Headless Chrome 所需的所有系统包。
¥The Node.js runtime of Google Cloud Functions comes with all system packages needed to run Headless Chrome.
要使用 puppeteer
,请将该模块指定为 package.json
中的依赖,然后通过在应用的根目录中包含名为 .puppeteerrc.cjs
的文件来覆盖 puppeteer 缓存目录,其内容如下:
¥To use puppeteer
, specify the module as a dependency in your package.json
and then override the puppeteer cache directory by including a file named
.puppeteerrc.cjs
at the root of your application with the contents:
const {join} = require('path');
/**
* @type {import("puppeteer").Configuration}
*/
module.exports = {
cacheDirectory: join(__dirname, 'node_modules', '.puppeteer_cache'),
};
[!NOTE] Google Cloud Functions 会在构建之间缓存你的
node_modules
。将 puppeteer 缓存指定为node_modules
的子目录可以缓解命中缓存时 puppeteer 安装进程无法运行的问题。¥[!NOTE] Google Cloud Functions caches your
node_modules
between builds. Specifying the puppeteer cache as subdirectory ofnode_modules
mitigates an issue in which the puppeteer install process does not run when the cache is hit.
在 Google Cloud Run 上运行 Puppeteer
¥Running Puppeteer on Google Cloud Run
谷歌云运行 的默认 Node.js 运行时不附带运行 Headless Chrome 所需的系统包。你需要设置自己的 Dockerfile
和 包括缺少的依赖。
¥The default Node.js runtime of
Google Cloud Run does not come with the
system packages needed to run Headless Chrome. You will need to set up your own
Dockerfile
and
include the missing dependencies.
在 Heroku 上运行 Puppeteer
¥Running Puppeteer on Heroku
在 Heroku 上运行 Puppeteer 需要一些额外的依赖,而 Heroku 为你提供的 Linux 机器中未包含这些依赖。要添加部署依赖,请将 Puppeteer Heroku 构建包添加到“设置”>“构建包”下应用的构建包列表中。
¥Running Puppeteer on Heroku requires some additional dependencies that aren't included on the Linux box that Heroku spins up for you. To add the dependencies on deploy, add the Puppeteer Heroku buildpack to the list of buildpacks for your app under Settings > Buildpacks.
构建包的 url 是 https://github.com/jontewks/puppeteer-heroku-buildpack
¥The url for the buildpack is https://github.com/jontewks/puppeteer-heroku-buildpack
确保启动 Puppeteer 时使用 '--no-sandbox'
模式。这可以通过将其作为参数传递给 .launch()
调用来完成:puppeteer.launch({ args: ['--no-sandbox'] });
。
¥Ensure that you're using '--no-sandbox'
mode when launching Puppeteer. This
can be done by passing it as an argument to your .launch()
call:
puppeteer.launch({ args: ['--no-sandbox'] });
.
当你单击“添加构建包”时,只需将该 url 粘贴到输入中,然后单击“保存”即可。在下次部署时,你的应用还将安装 Puppeteer 运行所需的依赖。
¥When you click add buildpack, simply paste that url into the input, and click save. On the next deploy, your app will also install the dependencies that Puppeteer needs to run.
如果你需要渲染中文、日语或韩语字符,你可能需要使用带有附加字体文件(如 https://github.com/CoffeeAndCode/puppeteer-heroku-buildpack)的构建包
¥If you need to render Chinese, Japanese, or Korean characters you may need to use a buildpack with additional font files like https://github.com/CoffeeAndCode/puppeteer-heroku-buildpack
还有来自 @timleland 的另一个 简单指南,其中包含一个示例项目:https://timleland.com/headless-chrome-on-heroku/。
¥There's also another simple guide from @timleland that includes a sample project: https://timleland.com/headless-chrome-on-heroku/.
在 AWS Lambda 上运行 Puppeteer
¥Running Puppeteer on AWS Lambda
AWS Lambda limits 部署包大小约为 50MB。这给在 Lambda 上运行无头 Chrome(以及 Puppeteer)带来了挑战。社区汇集了一些解决这些问题的资源:
¥AWS Lambda limits deployment package sizes to ~50MB. This presents challenges for running headless Chrome (and therefore Puppeteer) on Lambda. The community has put together a few resources that work around the issues:
-
https://github.com/sparticuz/chromium(一个与浏览器和框架无关的库,支持
chromium
的现代版本)¥https://github.com/sparticuz/chromium (a vendor and framework agnostic library that supports modern versions of
chromium
)
在运行 Amazon-Linux 的 AWS EC2 实例上运行 Puppeteer
¥Running Puppeteer on AWS EC2 instance running Amazon-Linux
如果你在 CI/CD 管道中使用运行 amazon-linux 的 EC2 实例,并且想要在 amazon-linux 中运行 Puppeteer 测试,请按照以下步骤操作。
¥If you are using an EC2 instance running amazon-linux in your CI/CD pipeline, and if you want to run Puppeteer tests in amazon-linux, follow these steps.
-
要安装 Chromium,你必须首先启用
amazon-linux-extras
,它是 EPEL(企业 Linux 的额外软件包) 的一部分:¥To install Chromium, you have to first enable
amazon-linux-extras
which comes as part of EPEL (Extra Packages for Enterprise Linux):sudo amazon-linux-extras install epel -y
-
接下来,安装 Chromium:
¥Next, install Chromium:
sudo yum install -y chromium
现在 Puppeteer 可以启动 Chromium 来运行你的测试。如果你不启用 EPEL 并且继续安装 chromium 作为 npm install
的一部分,Puppeteer 将无法启动 Chromium,因为 libatk-1.0.so.0
和更多软件包不可用。
¥Now Puppeteer can launch Chromium to run your tests. If you do not enable EPEL
and if you continue installing chromium as part of npm install
, Puppeteer
cannot launch Chromium due to unavailability of libatk-1.0.so.0
and many more
packages.
代码转译问题
¥Code Transpilation Issues
如果你使用的是 babel 或 TypeScript 等 JavaScript 转译器,则使用异步函数调用 evaluate()
可能不起作用。这是因为虽然 puppeteer
使用 Function.prototype.toString()
来序列化函数,但转译器可能会以与 puppeteer
不兼容的方式更改输出代码。
¥If you are using a JavaScript transpiler like babel or TypeScript, calling
evaluate()
with an async function might not work. This is because while
puppeteer
uses Function.prototype.toString()
to serialize functions while
transpilers could be changing the output code in such a way it's incompatible
with puppeteer
.
此问题的一些解决方法是指示转译器不要弄乱代码,例如,将 TypeScript 配置为使用最新的 ecma 版本 ("target": "es2018"
)。另一种解决方 法是使用字符串模板而不是函数:
¥Some workarounds to this problem would be to instruct the transpiler not to mess
up with the code, for example, configure TypeScript to use latest ecma version
("target": "es2018"
). Another workaround could be using string templates
instead of functions:
await page.evaluate(`(async() => {
console.log('1');
})()`);