Puppeteer
指南 | API | 常见问题 | 贡献 | 故障排除
¥Guides | API | FAQ | Contributing | Troubleshooting
Puppeteer 是一个 Node.js 库,它提供了一个高级 API 来通过 开发工具协议 控制 Chrome/Chromium。Puppeteer 默认以 无头 模式运行,但可以配置为在完整 ("有头") Chrome/Chromium 中运行。
¥Puppeteer is a Node.js library which provides a high-level API to control Chrome/Chromium over the DevTools Protocol. Puppeteer runs in headless mode by default, but can be configured to run in full ("headful") Chrome/Chromium.
我能做些什么?
¥What can I do?
你可以在浏览器中手动执行的大多数操作都可以使用 Puppeteer 完成!以下是一些帮助你入门的示例:
¥Most things that you can do manually in the browser can be done using Puppeteer! Here are a few examples to get you started:
-
生成页面的屏幕截图和 PDF。
¥Generate screenshots and PDFs of pages.
-
抓取 SPA(单页应用)并生成预渲染内容(即 "SSR"(服务器端渲染))。
¥Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. "SSR" (Server-Side Rendering)).
-
自动化表单提交、UI 测试、键盘输入等。
¥Automate form submission, UI testing, keyboard input, etc.
-
使用最新的 JavaScript 和浏览器功能创建自动化测试环境。
¥Create an automated testing environment using the latest JavaScript and browser features.
-
捕获站点的 时间线痕迹 以帮助诊断性能问题。
¥Capture a timeline trace of your site to help diagnose performance issues.
新手入门
¥Getting Started
安装
¥Installation
要在项目中使用 Puppeteer,请运行:
¥To use Puppeteer in your project, run:
npm i puppeteer
# or using yarn
yarn add puppeteer
# or using pnpm
pnpm i puppeteer
当你安装 Puppeteer 时,它会自动下载最新版本的 用于测试的 Chrome(~170MB macOS、~282MB Linux、~280MB Windows)和 chrome-headless-shell
二进制文件(从 Puppeteer v21.6.0 开始),即带有 Puppeteer 的 保证能用于。浏览器默认下载到 $HOME/.cache/puppeteer
文件夹(从 Puppeteer v19.0.0 开始)。请参阅 configuration 了解用于控制下载行为的配置选项和环境变量。
¥When you install Puppeteer, it automatically downloads a recent version of
Chrome for Testing (~170MB macOS, ~282MB Linux, ~280MB Windows) and a chrome-headless-shell
binary (starting with Puppeteer v21.6.0) that is guaranteed to
work
with Puppeteer. The browser is downloaded to the $HOME/.cache/puppeteer
folder
by default (starting with Puppeteer v19.0.0). See configuration for configuration options and environmental variables to control the download behavor.
如果你使用 Puppeteer 将项目部署到托管提供商(例如 Render 或 Heroku),你可能需要将缓存位置重新配置到你的项目文件夹中(请参阅下面的示例),因为并非所有托管提供商都将 $HOME/.cache
包含在项目部署的文件夹中。
¥If you deploy a project using Puppeteer to a hosting provider, such as Render or
Heroku, you might need to reconfigure the location of the cache to be within
your project folder (see an example below) because not all hosting providers
include $HOME/.cache
into the project's deployment.
对于没有安装浏览器的 Puppeteer 版本,请参阅 puppeteer-core
。
¥For a version of Puppeteer without the browser installation, see
puppeteer-core
.
如果与 TypeScript 一起使用,则支持的最低 TypeScript 版本为 4.7.4
。
¥If used with TypeScript, the minimum supported TypeScript version is 4.7.4
.
配置
¥Configuration
Puppeteer 使用多个默认值,可以通过配置文件进行自定义。
¥Puppeteer uses several defaults that can be customized through configuration files.
例如,要更改 Puppeteer 用于安装浏览器的默认缓存目录,你可以在应用的根目录中添加 .puppeteerrc.cjs
(或 puppeteer.config.cjs
),其中包含以下内容
¥For example, to change the default cache directory Puppeteer uses to install
browsers, you can add a .puppeteerrc.cjs
(or puppeteer.config.cjs
) at the
root of your application with the contents
const {join} = require('path');
/**
* @type {import("puppeteer").Configuration}
*/
module.exports = {
// Changes the cache location for Puppeteer.
cacheDirectory: join(__dirname, '.cache', 'puppeteer'),
};
添加配置文件后,你需要删除并重新安装 puppeteer
才能使其生效。
¥After adding the configuration file, you will need to remove and reinstall
puppeteer
for it to take effect.
请参阅 配置指南 了解更多信息。
¥See the configuration guide for more information.
puppeteer-core
对于 v1.7.0 以来的每个版本,我们都会发布两个包:
¥For every release since v1.7.0 we publish two packages:
puppeteer
是一款浏览器自动化产品。安装后,它会下载一个 Chrome 版本,然后使用 puppeteer-core
驱动该版本。作为终端用户产品,puppeteer
使用合理的 可以定制的 默认值自动执行多个工作流程。
¥puppeteer
is a product for browser automation. When installed, it downloads
a version of Chrome, which it then drives using puppeteer-core
. Being an
end-user product, puppeteer
automates several workflows using reasonable
defaults that can be customized.
puppeteer-core
是一个帮助驱动任何支持 DevTools 协议的库。作为一个库,puppeteer-core
完全通过其编程接口驱动,这意味着不采用默认值,并且 puppeteer-core
在安装时不会下载 Chrome。
¥puppeteer-core
is a library to help drive anything that supports DevTools
protocol. Being a library, puppeteer-core
is fully driven through its
programmatic interface implying no defaults are assumed and puppeteer-core
will not download Chrome when installed.
如果你是 连接到远程浏览器 或 自己管理浏览器,则应使用 puppeteer-core
。如果你自己管理浏览器,则需要使用显式 executablePath
调用 puppeteer.launch
(如果安装在标准位置,则调用 channel
)。
¥You should use puppeteer-core
if you are
connecting to a remote browser
or managing browsers yourself.
If you are managing browsers yourself, you will need to call
puppeteer.launch
with
an explicit
executablePath
(or channel
if it's
installed in a standard location).
使用 puppeteer-core
时,记得更改导入:
¥When using puppeteer-core
, remember to change the import:
import puppeteer from 'puppeteer-core';
用法
¥Usage
Puppeteer 遵循 Node.js 的最新 维护长期支持 版本。
¥Puppeteer follows the latest maintenance LTS version of Node.
使用其他浏览器测试框架的人会对 Puppeteer 感到熟悉。你 launch/connect 一个 browser,create 一些 pages,然后用 Puppeteer 的 API 操纵它们。
¥Puppeteer will be familiar to people using other browser testing frameworks. You launch/connect a browser, create some pages, and then manipulate them with Puppeteer's API.
¥For more in-depth usage, check our guides and examples.
示例
¥Example
以下示例在 developer.chrome.com 中搜索包含文本 "automate beyond recorder" 的博客文章,单击第一个结果并打印博客文章的完整标题。
¥The following example searches developer.chrome.com for blog posts with text "automate beyond recorder", click on the first result and print the full title of the blog post.
import puppeteer from 'puppeteer';
(async () => {
// Launch the browser and open a new blank page
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Navigate the page to a URL
await page.goto('https://developer.chrome.com/');
// Set screen size
await page.setViewport({width: 1080, height: 1024});
// Type into search box
await page.type('.devsite-search-field', 'automate beyond recorder');
// Wait and click on first result
const searchResultSelector = '.devsite-result-item-link';
await page.waitForSelector(searchResultSelector);
await page.click(searchResultSelector);
// Locate the full title with a unique string
const textSelector = await page.waitForSelector(
'text/Customize and automate'
);
const fullTitle = await textSelector?.evaluate(el => el.textContent);
// Print the full title
console.log('The title of this blog post is "%s".', fullTitle);
await browser.close();
})();
默认运行时设置
¥Default runtime settings
1.
默认情况下,Puppeteer 在 无头模式 中启动 Chrome。
¥By default Puppeteer launches Chrome in the Headless mode.
const browser = await puppeteer.launch();
// Equivalent to
const browser = await puppeteer.launch({headless: true});
在 v22 之前,Puppeteer 默认启动 旧的无头模式。旧的无头模式现在称为 chrome-headless-shell
,并作为单独的二进制文件发布。chrome-headless-shell
与常规 Chrome 的行为并不完全匹配,但目前对于不需要完整 Chrome 功能集的自动化任务来说,它的性能更高。如果性能对你的用例更重要,请切换到 chrome-headless-shell
,如下所示:
¥Before v22, Puppeteer launched the old Headless mode by default.
The old headless mode is now known as
chrome-headless-shell
and ships as a separate binary. chrome-headless-shell
does not match the
behavior of the regular Chrome completely but it is currently more performant
for automation tasks where the complete Chrome feature set is not needed. If the performance
is more important for your use case, switch to chrome-headless-shell
as following:
const browser = await puppeteer.launch({headless: 'shell'});
要启动 "有头" 版本的 Chrome,请在启动浏览器时将 headless
设置为 false
选项:
¥To launch a "headful" version of Chrome, set the
headless
to false
option when launching a browser:
const browser = await puppeteer.launch({headless: false});
2.
默认情况下,Puppeteer 下载并使用特定版本的 Chrome,因此保证其 API 开箱即用。要将 Puppeteer 与不同版本的 Chrome 或 Chromium 一起使用,请在创建 Browser
实例时传入可执行文件的路径:
¥By default, Puppeteer downloads and uses a specific version of Chrome so its
API is guaranteed to work out of the box. To use Puppeteer with a different
version of Chrome or Chromium, pass in the executable's path when creating a
Browser
instance:
const browser = await puppeteer.launch({executablePath: '/path/to/Chrome'});
你还可以在 Firefox 中使用 Puppeteer。请参阅 跨浏览器支持状态 了解更多信息。
¥You can also use Puppeteer with Firefox. See status of cross-browser support for more information.
有关 Chromium 和 Chrome 之间差异的说明,请参阅 this article
。This article
描述了 Linux 用户的一些差异。
¥See
this article
for a description of the differences between Chromium and Chrome.
This article
describes some differences for Linux users.
3.
Puppeteer 创建自己的浏览器用户配置文件,并在每次运行时进行清理。
¥Puppeteer creates its own browser user profile which it cleans up on every run.
使用 Docker
¥Using Docker
看看我们的 Docker 指南。
¥See our Docker guide.
使用 Chrome 扩展程序
¥Using Chrome Extensions
看看我们的 Chrome 扩展程序指南。
¥See our Chrome extensions guide.
资源
¥Resources
贡献
¥Contributing
查看我们的 贡献指南 以了解 Puppeteer 开发的概述。
¥Check out our contributing guide to get an overview of Puppeteer development.
常见问题
¥FAQ