故障排除
🌐 Troubleshooting
为了保持此页面的最新状态,我们主要依赖社区的贡献。如果你发现有什么不再是最新的,请发送一个 PR。
🌐 To keep this page up-to-date we largely rely on community contributions. Please send a PR if you notice something is no longer up-to-date.
Cannot find module 'puppeteer-core/internal/...'
如果你的 Node.js 版本低于 14 或者你正在使用自定义解析器(例如 jest-resolve),可能会发生这种情况。对于前者,我们不支持已弃用的 Node.js 版本。对于后者,通常升级解析器(或其父模块,例如 jest)就可以解决问题(例如 https://github.com/puppeteer/puppeteer/issues/9121)
🌐 This can occur if your Node.js version is lower than 14 or if you are using a
custom resolver (such as
jest-resolve). For the former,
we do not support deprecated versions of Node.js. For the latter, usually
upgrading the resolver (or its parent module such as jest) will work (e.g.
https://github.com/puppeteer/puppeteer/issues/9121)
Could not find expected browser locally
从 v19.0.0 开始,Puppeteer 将使用 os.homedir 将浏览器下载到 ~/.cache/puppeteer,以便在 Puppeteer 升级之间更好地进行缓存。通常情况下,主目录是明确定义的(即使在 Windows 上),但有时主目录可能不可用。在这种情况下,我们提供了 PUPPETEER_CACHE_DIR 变量,允许你更改安装目录。
🌐 Starting from v19.0.0, Puppeteer will download browsers into
~/.cache/puppeteer using
os.homedir for better caching
between Puppeteer upgrades. Generally the home directory is well-defined (even
on Windows), but occasionally the home directory may not be available. In this
case, we provide the PUPPETEER_CACHE_DIR variable which allows you to change
the installation directory.
例如,
🌐 For example,
- npm
- Yarn
- pnpm
- Bun
PUPPETEER_CACHE_DIR=$(pwd) npm install puppeteer
PUPPETEER_CACHE_DIR=$(pwd) node <script-path>
PUPPETEER_CACHE_DIR=$(pwd) yarn add puppeteer
PUPPETEER_CACHE_DIR=$(pwd) node <script-path>
PUPPETEER_CACHE_DIR=$(pwd) pnpm add puppeteer
PUPPETEER_CACHE_DIR=$(pwd) node <script-path>
PUPPETEER_CACHE_DIR=$(pwd) bun add puppeteer
PUPPETEER_CACHE_DIR=$(pwd) node <script-path>
你也可以在应用根目录创建一个名为 .puppeteerrc.cjs(或 puppeteer.config.cjs)的配置文件,内容如下
🌐 You can also create a configuration file named .puppeteerrc.cjs (or
puppeteer.config.cjs) at the root of your application with the contents
const {join} = require('path');
/**
* @type {import("puppeteer").Configuration}
*/
module.exports = {
cacheDirectory: join(__dirname, '.cache', 'puppeteer'),
};
你需要重新安装 puppeteer 才能使配置生效。有关更多信息,请参阅 配置 Puppeteer。
🌐 You will need to reinstall puppeteer in order for the configuration to take
effect. See Configuring Puppeteer for more
information.
net::ERR_BLOCKED_BY_CLIENT 在 Chrome 中导航到 HTTP URL 时
🌐 net::ERR_BLOCKED_BY_CLIENT when navigating to an HTTP URL in Chrome
Chrome 正在推出一项名为 HttpsFirstBalancedModeAutoEnable 的功能,如果用户导航到 HTTP 网站,会向用户显示警告。该功能在 Puppeteer 默认使用的 Chrome 测试版本中默认启用。
🌐 Chrome is rolling out a feature called HttpsFirstBalancedModeAutoEnable that
displays a warning to the user if the user navigates to an HTTP site. The feature
is enabled by default in Chrome for Testing builds that Puppeteer uses by
default.
该功能会使对 HTTP URL 的导航请求导致错误 net::ERR_BLOCKED_BY_CLIENT,该错误可以被捕获并恢复。当错误发生时,会向用户显示一个警告页面,页面上有一个继续导航的按钮。通过 Puppeteer 可以点击该按钮。本地主机不会触发警告,但远程主机可能会。更多详情请参见 https://crbug.com/378022921
🌐 The feature makes a navigation request to an HTTP URL result in the error
net::ERR_BLOCKED_BY_CLIENT which can be caught and recovered from. When the
error occurs, a warning page is shown to the user with a button to continue
navigation. The button is clickable via Puppeteer. Local HTTP hosts do not
trigger a warning but remote hosts might. For more details see
https://crbug.com/378022921
可以通过在启动 Chrome 时传递 --disable-features=HttpsFirstBalancedModeAutoEnable 参数来禁用此 Chrome 功能:
🌐 It is possible to disable this Chrome feature by passing the
--disable-features=HttpsFirstBalancedModeAutoEnable argument when launching
Chrome:
const browser = await puppeteer.launch({
args: ['--disable-features=HttpsFirstBalancedModeAutoEnable'],
});
Chrome 无法在 Windows 上启动
🌐 Chrome doesn't launch on Windows
某些 Chrome 政策 可能会强制在运行 Chrome/Chromium 时使用特定的扩展程序。
🌐 Some chrome policies might enforce running Chrome/Chromium with certain extensions.
Puppeteer 默认传递 --disable-extensions 标志,并且在这些策略激活时将无法启动。
🌐 Puppeteer passes --disable-extensions flag by default and will fail to launch
when such policies are active.
为了解决此问题,请设置 enableExtensions 选项:
🌐 To work around this, set the enableExtensions option:
const browser = await puppeteer.launch({
enableExtensions: true,
});
上下文: 问题 3681。
Chrome 在 Windows 上报告沙箱错误
🌐 Chrome reports sandbox errors on Windows
Chrome 在 Windows 上使用沙箱,这需要对下载的 Chrome 文件授予额外的权限。从 Puppeteer v22.14.0 开始,Puppeteer 将尝试通过运行 Chrome 在浏览器安装过程中提供的 setup.exe 工具来配置这些权限。
🌐 Chrome uses sandboxes on Windows which require additional permissions on
the downloaded Chrome files. Starting from Puppeteer v22.14.0, Puppeteer
will attempt to configure those permissions by running the setup.exe
tool provided by Chrome during the installation of the browser.
如果你使用的是较旧的 Puppeteer 版本,或者在浏览器输出中仍然看到以下错误:
🌐 If you are using an older Puppeteer version or still seeing the following errors in the browser output:
[24452:59820:0508/113713.058:ERROR:sandbox_win.cc(913)] Sandbox cannot access executable. Check filesystem permissions are valid. See https://bit.ly/31yqMJR.: Access is denied. (0x5)
你可以使用 icacls 手动设置权限:
🌐 You can use icacls to set permissions manually:
icacls "%USERPROFILE%/.cache/puppeteer/chrome" /grant *S-1-15-2-1:(OI)(CI)(RX)
详情请参见 https://bit.ly/31yqMJR。
🌐 See https://bit.ly/31yqMJR for more details.
Chrome 无法在 Linux 上启动
🌐 Chrome doesn't launch on Linux
确保安装了所有必要的依赖。你可以在 Linux 机器上运行 ldd chrome | grep not 来检查缺少哪些依赖。常见的依赖列在下面。另外,请参阅 https://source.chromium.org/chromium/chromium/src/+/main:chrome/installer/linux/debian/dist_package_versions.json 以获取 Chrome 安装程序声明的最新依赖列表。
🌐 Make sure all the necessary dependencies are installed. You can run ldd chrome | grep not on a Linux machine to check which dependencies are missing. The
common ones are provided below. Also, see
https://source.chromium.org/chromium/chromium/src/+/main:chrome/installer/linux/debian/dist_package_versions.json
for the up-to-date list of dependencies declared by the Chrome installer.
Chrome 目前不提供适用于 Linux 的 arm64 二进制文件。只有适用于 Mac ARM 的 arm64 二进制文件。这意味着默认下载的 Linux 二进制文件在 Linux arm64 上无法运行。
🌐 Chrome currently does not provide arm64 binaries for Linux. There are only arm64 binaries for Mac ARM. That means that Linux binaries downloaded by default will not work on Linux arm64.
Debian (e.g. Ubuntu) Dependencies
ca-certificates
fonts-liberation
libasound2
libatk-bridge2.0-0
libatk1.0-0
libc6
libcairo2
libcups2
libdbus-1-3
libexpat1
libfontconfig1
libgbm1
libgcc1
libglib2.0-0
libgtk-3-0
libnspr4
libnss3
libpango-1.0-0
libpangocairo-1.0-0
libstdc++6
libx11-6
libx11-xcb1
libxcb1
libxcomposite1
libxcursor1
libxdamage1
libxext6
libxfixes3
libxi6
libxrandr2
libxrender1
libxss1
libxtst6
lsb-release
wget
xdg-utils
CentOS Dependencies
alsa-lib.x86_64
atk.x86_64
cups-libs.x86_64
gtk3.x86_64
ipa-gothic-fonts
libXcomposite.x86_64
libXcursor.x86_64
libXdamage.x86_64
libXext.x86_64
libXi.x86_64
libXrandr.x86_64
libXScrnSaver.x86_64
libXtst.x86_64
pango.x86_64
xorg-x11-fonts-100dpi
xorg-x11-fonts-75dpi
xorg-x11-fonts-cyrillic
xorg-x11-fonts-misc
xorg-x11-fonts-Type1
xorg-x11-utils
安装依赖后,你需要使用此命令更新 nss 库
🌐 After installing dependencies you need to update nss library using this
command
yum update nss -y
chrome-headless-shell 禁用 GPU 合成
🌐 chrome-headless-shell disables GPU compositing
chrome-headless-shell 需要 --enable-gpu 来在无头模式下启用 GPU 加速。
🌐 chrome-headless-shell requires --enable-gpu to
enable GPU acceleration in headless mode.
const browser = await puppeteer.launch({
headless: 'shell',
args: ['--enable-gpu'],
});
使用 Chrome 设置 GPU
🌐 Setting up GPU with Chrome
通常,如果系统有适当的驱动程序,Chrome 应该能够检测并启用 GPU。有关更多提示,请参阅以下博客文章 https://developer.chrome.com/blog/supercharge-web-ai-testing。
🌐 Generally, Chrome should be able to detect and enable GPU if the system has appropriate drivers. For additional tips, see the following blog post https://developer.chrome.com/blog/supercharge-web-ai-testing.
设置 Chrome Linux 沙箱
🌐 Setting Up Chrome Linux Sandbox
为了保护主机环境免受不受信任的网页内容的影响,Chrome 使用多层沙箱。
要使其正常工作,应首先配置主机。如果没有适合 Chrome 使用的好沙箱,它将会因错误No usable sandbox!而崩溃。
🌐 In order to protect the host environment from untrusted web content, Chrome uses
multiple layers of sandboxing.
For this to work properly, the host should be configured first. If there's no
good sandbox for Chrome to use, it will crash with the error
No usable sandbox!.
如果你完全信任在 Chrome 中打开的内容,你可以使用 --no-sandbox 参数启动 Chrome:
🌐 If you absolutely trust the content you open in Chrome, you can launch
Chrome with the --no-sandbox argument:
const browser = await puppeteer.launch({
args: ['--no-sandbox'],
});
强烈不建议在没有沙箱的情况下运行。请考虑配置一个沙箱。
🌐 Running without a sandbox is strongly discouraged. Consider configuring a sandbox instead.
运行 Chrome 的推荐方式是使用沙箱
Ubuntu 上的 AppArmor 问题
🌐 Issues with AppArmor on Ubuntu
Ubuntu 23.10 及更高版本(或者将来可能的其他 Linux 发行版)附带一个适用于安装在 /opt/google/chrome/chrome(默认安装路径)的 Chrome 稳定版二进制文件的 AppArmor 配置文件。该策略存储在 /etc/apparmor.d/chrome。此 AppArmor 策略会阻止 Puppeteer 下载的用于测试的 Chrome 二进制文件使用用户命名空间,从而在尝试启动浏览器时导致 No usable sandbox! 错误。
🌐 Ubuntu 23.10+ (or possibly other Linux distros in the future) ship an
AppArmor profile that applies to Chrome stable binaries installed at
/opt/google/chrome/chrome (the default installation path). This policy
is stored at /etc/apparmor.d/chrome. This AppArmor policy prevents
Chrome for Testing binaries downloaded by Puppeteer from using user namespaces
resulting in the No usable sandbox! error when trying to launch the
browser.
有关解决方法,请参见 https://chromium.googlesource.com/chromium/src/+/main/docs/security/apparmor-userns-restrictions.md。
🌐 For workarounds, see https://chromium.googlesource.com/chromium/src/+/main/docs/security/apparmor-userns-restrictions.md.
使用 setuid 沙箱
🌐 Using setuid sandbox
重要说明:Linux SUID 沙箱几乎但并未完全移除。请参见 https://bugs.chromium.org/p/chromium/issues/detail?id=598454 本部分内容大部分已过时。
🌐 IMPORTANT NOTE: The Linux SUID sandbox is almost but not completely removed. See https://bugs.chromium.org/p/chromium/issues/detail?id=598454 This section is mostly out-of-date.
setuid 沙箱作为独立的可执行文件提供,并位于 Puppeteer 下载的 Chrome 附近。对于不同的 Chrome 版本可以重复使用同一个沙箱可执行文件,所以以下操作每个主机环境只需执行一次:
🌐 The setuid sandbox comes as a standalone executable and is located next to the Chrome that Puppeteer downloads. It is fine to re-use the same sandbox executable for different Chrome versions, so the following could be done only once per host environment:
# cd to Puppeteer cache directory (adjust the path if using a different cache directory).
cd ~/.cache/puppeteer/chrome/linux-<version>/chrome-linux64/
sudo chown root:root chrome_sandbox
sudo chmod 4755 chrome_sandbox
# copy sandbox executable to a shared location
sudo cp -p chrome_sandbox /usr/local/sbin/chrome-devel-sandbox
# export CHROME_DEVEL_SANDBOX env variable
export CHROME_DEVEL_SANDBOX=/usr/local/sbin/chrome-devel-sandbox
你可能想要默认导出 CHROME_DEVEL_SANDBOX 环境变量。在这种情况下,将以下内容添加到 ~/.bashrc 或 .zshenv 中:
🌐 You might want to export the CHROME_DEVEL_SANDBOX env variable by default. In
this case, add the following to the ~/.bashrc or .zshenv:
export CHROME_DEVEL_SANDBOX=/usr/local/sbin/chrome-devel-sandbox
或者到你的 Dockerfile:
🌐 or to your Dockerfile:
ENV CHROME_DEVEL_SANDBOX /usr/local/sbin/chrome-devel-sandbox
在 Travis CI 上运行 Puppeteer
🌐 Running Puppeteer on Travis CI
👋 我们在 Travis CI 上运行 Puppeteer 测试直到 v6.0.0(当我们迁移到 GitHub Actions 时)——请参阅我们历史的
.travis.yml(v5.5.0) 作为参考。
提示与技巧:
🌐 Tips-n-tricks:
- xvfb 服务应该启动,以便在非无头模式下运行 Chrome 进行测试
- 默认在 Travis 上的 Xenial Linux 上运行
- 默认运行
npm install node_modules默认被缓存
.travis.yml 可能看起来像这样:
language: node_js
node_js: node
services: xvfb
script:
- npm test
在 WSL(适用于 Linux 的 Windows 子系统)上运行 Puppeteer
🌐 Running Puppeteer on WSL (Windows subsystem for Linux)
请参见这个帖子,其中有一些针对 WSL 的具体提示。简而言之,你需要通过以下方式安装缺失的依赖:
🌐 See this thread with some tips specific to WSL. In a nutshell, you need to install missing dependencies by either:
- 在 WSL 上安装 Chrome 以安装所有依赖
- 手动安装所需的依赖:
sudo apt install libgtk-3-dev libnotify-dev libgconf-2-4 libnss3 libxss1 libasound2。
所需依赖的列表可能会过时,并且取决于你已经安装了什么。
🌐 The list of required dependencies might get outdated and depend on what you already have installed.
在 CircleCI 上运行 Puppeteer
🌐 Running Puppeteer on CircleCI
在 CircleCI 上顺利运行 Puppeteer 需要以下步骤:
🌐 Running Puppeteer smoothly on CircleCI requires the following steps:
-
在你的配置中,像这样从一个 NodeJS 镜像 开始:
docker:
- image: circleci/node:14 # Use your desired version
environment:
NODE_ENV: development # Only needed if puppeteer is in `devDependencies` -
像
libXtst6这样的依赖可能需要通过apt-get安装,因此使用 threetreeslight/puppeteer 插件 (instructions),或者将其部分 source 粘贴到你自己的配置中。 -
最后,如果你通过 Jest 使用 Puppeteer,那么你可能会遇到生成子进程的错误:
[00:00.0] jest args: --e2e --spec --max-workers=36
Error: spawn ENOMEM
at ChildProcess.spawn (internal/child_process.js:394:11)This is likely caused by Jest autodetecting the number of processes on the entire machine (
36) rather than the number allowed to your container (2). To fix this, setjest --maxWorkers=2in your test command.
在 Docker 中运行 Puppeteer
🌐 Running Puppeteer in Docker
👋 我们使用 Cirrus Ci 在 Docker 容器中运行 Puppeteer 的测试,直到 v3.0.x 版本 - 参考我们的历史
Dockerfile.linux(v3.0.1)。从 v16.0.0 开始,我们通过 GitHub 注册表发布 Docker 镜像。Dockerfile 位于 这里,使用说明在 集成 > Docker。如果你正在构建自己的镜像,下面的说明可能仍然有帮助。
在 Docker 中启动和运行无头 Chrome 可能很棘手。Puppeteer 安装的用于测试的打包 Chrome 缺少必要的共享库依赖。
🌐 Getting headless Chrome up and running in Docker can be tricky. The bundled Chrome for Testing that Puppeteer installs is missing the necessary shared library dependencies.
要修复,你需要在 Dockerfile 中安装缺失的依赖和最新的 Chrome for Testing 包:
🌐 To fix, you'll need to install the missing dependencies and the latest Chrome for Testing package in your Dockerfile:
FROM node:14-slim
# Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)
# Note: this installs the necessary libs to make the bundled version of Chrome for Testing that Puppeteer
# installs, work.
RUN apt-get update \
&& apt-get install -y wget gnupg \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1 \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# If running Docker >= 1.13.0 use docker run's --init arg to reap zombie processes, otherwise
# uncomment the following lines to have `dumb-init` as PID 1
# ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.2/dumb-init_1.2.2_x86_64 /usr/local/bin/dumb-init
# RUN chmod +x /usr/local/bin/dumb-init
# ENTRYPOINT ["dumb-init", "--"]
# Uncomment to skip the Chrome for Testing download when installing puppeteer. If you do,
# you'll need to launch puppeteer with:
# browser.launch({executablePath: 'google-chrome-stable'})
# ENV PUPPETEER_SKIP_DOWNLOAD true
# Install puppeteer so it's available in the container.
RUN npm init -y && \
npm i puppeteer \
# Add user so we don't need --no-sandbox.
# same layer as npm install to keep re-chowned files from using up several hundred MBs more space
&& groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser \
&& mkdir -p /home/pptruser/Downloads \
&& chown -R pptruser:pptruser /home/pptruser \
&& chown -R pptruser:pptruser /node_modules \
&& chown -R pptruser:pptruser /package.json \
&& chown -R pptruser:pptruser /package-lock.json
# Run everything after as non-privileged user.
USER pptruser
CMD ["google-chrome-stable"]
构建容器:
🌐 Build the container:
docker build -t puppeteer-chrome-linux .
通过传递 node -e "<yourscript.js content as a string>" 作为命令来运行容器:
docker run -i --init --rm --cap-add=SYS_ADMIN \
--name puppeteer-chrome puppeteer-chrome-linux \
node -e "`cat yourscript.js`"
在 https://github.com/ebidel/try-puppeteer 上有一个完整的示例,展示了如何从在 App Engine Flex(Node)上运行的 Web 服务器运行此 Dockerfile。
🌐 There's a full example at https://github.com/ebidel/try-puppeteer that shows how to run this Dockerfile from a webserver running on App Engine Flex (Node).
在 Alpine 上运行
🌐 Running on Alpine
请注意,Chrome 默认不支持 Alpine,因此请确保在 Alpine 上安装了兼容的系统依赖,并在使用镜像前进行测试。有关受支持发行版所需的系统软件包列表,请参见 https://source.chromium.org/chromium/chromium/src/+/main:chrome/installer/linux/rpm/dist_package_provides.json 和 https://source.chromium.org/chromium/chromium/src/+/main:chrome/installer/linux/debian/dist_package_versions.json。
🌐 Note that Chrome does not support Alpine out of the box so make sure you have compatible system dependencies installed on Alpine and test the image before using it. See https://source.chromium.org/chromium/chromium/src/+/main:chrome/installer/linux/rpm/dist_package_provides.json and https://source.chromium.org/chromium/chromium/src/+/main:chrome/installer/linux/debian/dist_package_versions.json for the list of system packages required on supported distros.
注意
Alpine 3.20 中的当前 Chromium 版本正在导致 Puppeteer 出现超时问题。降级到 Alpine 3.19 可以解决该问题。 参见 #11640、#12637、#12189
你需要找到最新的 Chromium 包,然后查找 Puppeteer 支持的浏览器版本,并使用相应的版本。
🌐 You need to find the newest Chromium package, then look up the supported browser version for Puppeteer and use the coresponding version.
示例:
Alpine Chromium 版本:100
🌐 Alpine Chromium version: 100
Puppeteer: Puppeteer v13.5.0
Dockerfile:
FROM alpine
# Installs Chromium (100) package.
RUN apk add --no-cache \
chromium \
nss \
freetype \
harfbuzz \
ca-certificates \
ttf-freefont \
nodejs \
yarn
...
# Tell Puppeteer to skip installing Chrome. We'll be using the installed package.
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser
# Puppeteer v13.5.0 works with Chromium 100.
RUN yarn add puppeteer@13.5.0
# Add user so we don't need --no-sandbox.
RUN addgroup -S pptruser && adduser -S -G pptruser pptruser \
&& mkdir -p /home/pptruser/Downloads /app \
&& chown -R pptruser:pptruser /home/pptruser \
&& chown -R pptruser:pptruser /app
# Run everything after as non-privileged user.
USER pptruser
...
在 GitlabCI 上运行 Puppeteer
🌐 Running Puppeteer on GitlabCI
这与上面的一些指令非常相似,但需要稍微不同的配置才能最终取得成功。
🌐 This is very similar to some of the instructions above, but require a bit different configuration to finally achieve success.
通常问题是这样的:
🌐 Usually the issue looks like this:
Error: Failed to launch chrome! spawn /usr/bin/chromium-browser ENOENT
你需要修补两个地方:
🌐 You need to patch two places:
- 你的
gitlab-ci.yml配置 - 启动 puppeteer 时的参数列表
在 gitlab-ci.yml 中,我们需要安装一些软件包,以便在你的 Docker 环境中启动无头 Chrome:
🌐 In gitlab-ci.yml we need to install some packages to make it possible to
launch headless Chrome in your docker env:
before_script:
- apt-get update
- apt-get install -yq gconf-service libasound2 libatk1.0-0 libc6 libcairo2
libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgbm1 libgcc1 libgconf-2-4
libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0
libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1
libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1
libxss1 libxtst6 ca-certificates fonts-liberation libnss3 lsb-release
xdg-utils wget
接下来,你必须在启动 Puppeteer 时使用 '--no-sandbox' 模式。可以通过将它们作为参数传递给你的 .launch() 调用来完成:puppeteer.launch({ args: ['--no-sandbox'] });。
🌐 Next, you have to use '--no-sandbox' mode
when launching Puppeteer. This can be done by
passing them as an arguments to your .launch() call:
puppeteer.launch({ args: ['--no-sandbox'] });.
在 Google Cloud Run 上运行 Puppeteer
🌐 Running Puppeteer on Google Cloud Run
在默认情况下,Google Cloud Run 会在向客户端写入 HTTP 响应后禁用 CPU。这意味着,如果你在响应写入后“在后台运行 puppeteer”,puppeteer 看起来会非常慢(启动可能需要 1-5 分钟)。
🌐 Google Cloud Run disables the CPU by default, after an HTTP response is written to the client. This means that puppeteer will appear extremely slow (taking 1-5 minutes to launch), if you "run puppeteer in the background" after your response has been written.
所以这个简单的 Express 应用会明显很慢:
🌐 So this simple express app will be percievably slow:
import express from 'express';
const app = express();
app.post('/test-puppeteer', (req, res) => {
res.json({
jobId: 123,
acknowledged: true,
});
puppeteer.launch().then(browser => {
// 2 minutes later...
});
});
app.listen(3000);
它很慢,因为在 GCR 上 CPU 被禁用,因为 puppeteer 是在响应发送后启动的。你想做的是这样:
🌐 It is slow because CPU is disabled on GCR because puppeteer is launched after the response is sent. What you want to do is this:
app.post('/test-puppeteer', (req, res) => {
puppeteer.launch().then(browser => {
// A second later...
res.json({
jobId: 123,
acknowledged: true,
});
});
});
如果你想在后台运行这些东西,你需要“始终启用 CPU”(进入 Google Cloud Run 服务 > 编辑并部署版本 > CPU 分配和定价),即使在响应发送之后也需要这样做。这样应该可以解决问题。
🌐 If you want to run the stuff in the background, you need to "enable CPU always" (Go to Google Cloud Run Service > Edit & Deploy Revision > CPU allocation and pricing) even after responses are sent. That should fix it.
提示
🌐 Tips
启动 Chrome 时看到奇怪的错误?尝试在本地开发时使用 docker run --cap-add=SYS_ADMIN 运行你的容器。由于 Dockerfile 添加了一个 pptr 用户作为非特权用户,它可能没有所有必要的权限。
🌐 Seeing weird errors when launching Chrome? Try running your container with
docker run --cap-add=SYS_ADMIN when developing locally. Since the Dockerfile
adds a pptr user as a non-privileged user, it may not have all the necessary
privileges.
dumb-init 值得一试,如果你遇到很多僵尸 Chrome 进程残留的问题。对于 PID=1 的进程有特殊处理,这在某些情况下(例如在 Docker 中)会使正确终止 Chrome 变得困难。
在云端运行 Puppeteer
🌐 Running Puppeteer in the cloud
在 Google App Engine 上运行 Puppeteer
🌐 Running Puppeteer on Google App Engine
Node.js 运行时的App Engine 标准环境 包含运行无头 Chrome 所需的所有系统包。
🌐 The Node.js runtime of the App Engine standard environment comes with all system packages needed to run Headless Chrome.
要使用 puppeteer,请在你的 package.json 中将该模块指定为依赖,然后通过在应用根目录中包含一个名为 .puppeteerrc.cjs 的文件来覆盖 puppeteer 缓存目录,文件内容如下:
🌐 To use puppeteer, specify the module as a dependency in your package.json
and then override the puppeteer cache directory by including a file named
.puppeteerrc.cjs at the root of your application with the contents:
const {join} = require('path');
/**
* @type {import("puppeteer").Configuration}
*/
module.exports = {
cacheDirectory: join(__dirname, 'node_modules', '.puppeteer_cache'),
};
[!NOTE] Google App Engine 会在构建之间缓存你的
node_modules。 将 Puppeteer 缓存指定为node_modules的子目录 可以缓解一个问题,即 Puppeteer 无法找到浏览器可执行文件 因为没有运行postinstall。
在 Google Cloud Functions 上运行 Puppeteer
🌐 Running Puppeteer on Google Cloud Functions
Google Cloud Functions 的 Node.js 运行时附带运行无头 Chrome 所需的所有系统包。
🌐 The Node.js runtime of Google Cloud Functions comes with all system packages needed to run Headless Chrome.
要使用 puppeteer,请在你的 package.json 中将该模块指定为依赖,然后通过在应用根目录中包含一个名为 .puppeteerrc.cjs 的文件来覆盖 puppeteer 缓存目录,文件内容如下:
🌐 To use puppeteer, specify the module as a dependency in your package.json
and then override the puppeteer cache directory by including a file named
.puppeteerrc.cjs at the root of your application with the contents:
const {join} = require('path');
/**
* @type {import("puppeteer").Configuration}
*/
module.exports = {
cacheDirectory: join(__dirname, 'node_modules', '.puppeteer_cache'),
};
[!NOTE] Google Cloud Functions 会在构建之间缓存你的
node_modules。将 puppeteer 缓存指定为node_modules的子目录可以缓解当缓存命中时 puppeteer 安装过程不运行的问题。
在 Google Cloud Run 上运行 Puppeteer
🌐 Running Puppeteer on Google Cloud Run
默认的 Google Cloud Run Node.js 运行时不包含运行无头 Chrome 所需的系统软件包。你需要自己设置 Dockerfile 并 包含缺失的依赖。
🌐 The default Node.js runtime of
Google Cloud Run does not come with the
system packages needed to run Headless Chrome. You will need to set up your own
Dockerfile and
include the missing dependencies.
在 Heroku 上运行 Puppeteer
🌐 Running Puppeteer on Heroku
在 Heroku 上运行 Puppeteer 需要一些额外的依赖,而这些依赖并未包含在 Heroku 为你创建的 Linux 环境中。要在部署时添加这些依赖,请将 Puppeteer Heroku 构建包添加到应用的构建包列表中,路径为设置 > 构建包。
🌐 Running Puppeteer on Heroku requires some additional dependencies that aren't included on the Linux box that Heroku spins up for you. To add the dependencies on deploy, add the Puppeteer Heroku buildpack to the list of buildpacks for your app under Settings > Buildpacks.
buildpack 的 URL 是 https://github.com/jontewks/puppeteer-heroku-buildpack
🌐 The url for the buildpack is https://github.com/jontewks/puppeteer-heroku-buildpack
确保在启动 Puppeteer 时使用 '--no-sandbox' 模式。这可以通过将其作为参数传递给你的 .launch() 调用来完成:puppeteer.launch({ args: ['--no-sandbox'] });。
🌐 Ensure that you're using '--no-sandbox' mode when launching Puppeteer. This
can be done by passing it as an argument to your .launch() call:
puppeteer.launch({ args: ['--no-sandbox'] });.
当你点击添加构建包时,只需将该网址粘贴到输入框中,然后点击保存。在下一次部署时,你的应用也会安装 Puppeteer 运行所需的依赖。
🌐 When you click add buildpack, simply paste that url into the input, and click save. On the next deploy, your app will also install the dependencies that Puppeteer needs to run.
如果你需要渲染中文、日文或韩文字符,你可能需要使用包含额外字体文件的构建包,例如 https://github.com/CoffeeAndCode/puppeteer-heroku-buildpack
🌐 If you need to render Chinese, Japanese, or Korean characters you may need to use a buildpack with additional font files like https://github.com/CoffeeAndCode/puppeteer-heroku-buildpack
还有另一个来自 @timleland 的simple guide,其中包含一个示例项目:https://timleland.com/headless-chrome-on-heroku/。
🌐 There's also another simple guide from @timleland that includes a sample project: https://timleland.com/headless-chrome-on-heroku/.
在 AWS Lambda 上运行 Puppeteer
🌐 Running Puppeteer on AWS Lambda
AWS Lambda 将部署包大小限制为约 50MB。这为在 Lambda 上运行无头 Chrome(因此也包括 Puppeteer)带来了挑战。社区整理了一些资源来解决这些问题:
🌐 AWS Lambda limits deployment package sizes to ~50MB. This presents challenges for running headless Chrome (and therefore Puppeteer) on Lambda. The community has put together a few resources that work around the issues:
在运行 Amazon-Linux 的 AWS EC2 实例上运行 Puppeteer
🌐 Running Puppeteer on AWS EC2 instance running Amazon-Linux
如果你在 CI/CD 管道中使用运行 amazon-linux 的 EC2 实例,并且如果你想在 amazon-linux 上运行 Puppeteer 测试,请按照以下步骤操作。
🌐 If you are using an EC2 instance running amazon-linux in your CI/CD pipeline, and if you want to run Puppeteer tests in amazon-linux, follow these steps.
-
要安装 Chromium,你必须首先启用
amazon-linux-extras,它是 EPEL(企业 Linux 的额外包) 的一部分:sudo amazon-linux-extras install epel -y -
接下来,安装 Chromium:
sudo yum install -y chromium
现在 Puppeteer 可以启动 Chromium 来运行你的测试。如果你没有启用 EPEL,并且如果你继续在 npm install 中安装 chromium,Puppeteer 将由于 libatk-1.0.so.0 和许多其他软件包不可用而无法启动 Chromium。
🌐 Now Puppeteer can launch Chromium to run your tests. If you do not enable EPEL
and if you continue installing chromium as part of npm install, Puppeteer
cannot launch Chromium due to unavailability of libatk-1.0.so.0 and many more
packages.
代码转译问题
🌐 Code Transpilation Issues
如果你正在使用像 Babel 或 TypeScript 这样的 JavaScript 转译器,使用异步函数调用 evaluate() 可能不起作用。这是因为虽然 puppeteer 使用 Function.prototype.toString() 来序列化函数,但转译器可能会以某种方式更改输出代码,从而使其与 puppeteer 不兼容。
🌐 If you are using a JavaScript transpiler like babel or TypeScript, calling
evaluate() with an async function might not work. This is because while
puppeteer uses Function.prototype.toString() to serialize functions while
transpilers could be changing the output code in such a way it's incompatible
with puppeteer.
解决这个问题的一些变通方法是指示转译器不要搞乱代码,例如,将 TypeScript 配置为使用最新的 ECMAScript 版本("target": "es2018")。另一种变通方法是使用字符串模板而不是函数:
🌐 Some workarounds to this problem would be to instruct the transpiler not to mess
up with the code, for example, configure TypeScript to use latest ecma version
("target": "es2018"). Another workaround could be using string templates
instead of functions:
await page.evaluate(`(async() => {
console.log('1');
})()`);