Pdf.Tables

Accessing Data

Extracts tables from a PDF file and returns them as a navigation table.

Examples on this page use shared sample tables. View them to understand the input data before reading the examples below.

Syntax

Pdf.Tables(pdf as binary, optional options as nullable record) as table

Parameters

NameTypeRequiredDescription
pdfbinaryYesThe binary content of the PDF file, typically obtained from File.Contents or Web.Contents.
optionsrecordNoAn optional record with options including StartPage (number), EndPage (number), MultiPage (logical), and Implementation (text).

Return Value

tableA navigation table where each row represents a table found in the PDF, with columns for Id, Name, Page, Kind, and Data.

Remarks

Pdf.Tables extracts tabular data from a PDF document and returns a navigation table listing every table detected across all pages. It uses heuristic algorithms that analyze the visual layout of text, lines, and spacing to identify table structures. The function accepts the raw binary content of a PDF — obtained from File.Contents, Web.Contents, or the Content column in Folder.Files — and does not require any external PDF libraries.

Navigation table columns: The result contains one row per detected table with: - Id (text) — a unique identifier assigned by the extraction engine (e.g., "Table001", "Table002"). - Name (text) — same as Id by default. - Page (number) — the 1-based page number where the table was found. - Kind (text) — always "Table". - Data (table) — the extracted table as a Power Query table. The first row is often treated as a header; promote headers with Table.PromoteHeaders if needed.

Key options: - StartPage (number) — the first page (1-based) to extract from. Use to skip front matter in large PDFs. - EndPage (number) — the last page to extract from. Combine with StartPage to target a specific range. - MultiPage (logical) — when true, attempts to detect and merge tables that span across multiple pages into a single table. - Implementation (text) — set to "1.3" to use a newer extraction algorithm that may handle some PDFs more accurately.

Limitations: Pdf.Tables requires the PDF to contain actual machine-readable text. PDFs that consist of scanned images (image-only PDFs) are not supported — the function does not perform OCR. Password-protected PDFs raise an error. Tables with complex merged cells or highly irregular column alignment may not extract cleanly; in those cases, inspect multiple Implementation values or post-process the extracted data.

Authentication: Pdf.Tables does not make HTTP requests — it only parses the binary you provide. Authentication for downloading PDF files is handled by Web.Contents or other functions used to obtain the binary.

Query folding: Not supported. All filtering occurs in Power Query after extraction.

Examples

Example 1: Extract all tables from a local PDF file

```powerquery

Pdf.Tables(File.Contents("C:\Reports\AnnualReport.pdf"))

Example 2: Access the first detected table in a PDF

```powerquery

let
    Source = Pdf.Tables(File.Contents("C:\Data\SalesReport.pdf")),
    FirstTable = Source{0}[Data]
in
    FirstTable

Example 3: Extract tables from a specific page range in a large PDF

```powerquery

let
    Source = Pdf.Tables(
        File.Contents("C:\Data\LargeReport.pdf"),
        [StartPage = 5, EndPage = 10]
    ),
    FirstTable = Source{0}[Data]
in
    FirstTable

Example 4: Combine all PDFs in a folder, extracting the first table from each

```powerquery

let
    Files = Folder.Files("C:\Reports\PDFs"),
    Pdfs = Table.SelectRows(Files, each [Extension] = ".pdf"),
    AddTables = Table.AddColumn(Pdfs, "Tables", each Pdf.Tables([Content])),
    Expanded = Table.ExpandTableColumn(AddTables, "Tables", {"Name", "Data"}),
    FirstTables = Table.SelectRows(Expanded, each [Name] = "Table001"),
    Combined = Table.Combine(FirstTables[Data])
in
    Combined

Compatibility

Power BI Desktop Power BI Service Excel Desktop Excel Online Dataflows Fabric Notebooks