Files
bun.sh/docs/runtime/markdown.mdx
Jarred Sumner 1bfe5c6b37 feat(md): Zig markdown parser with Bun.markdown API (#26440)
## Summary

- Port md4c (CommonMark-compliant markdown parser) from C to Zig under
`src/md/`
- Three output modes:
  - `Bun.markdown.html(input, options?)` — render to HTML string
- `Bun.markdown.render(input, callbacks?)` — render with custom
callbacks for each element
- `Bun.markdown.react(input, options?)` — render to a React Fragment
element, directly usable as a component return value
- React element creation uses a cached JSC Structure with
`putDirectOffset` for fast allocation
- Component overrides in `react()`: pass tag names as options keys to
replace default HTML elements with custom components
- GFM extensions: tables, strikethrough, task lists, permissive
autolinks, disallowed raw HTML tag filter
- Wire up `.md` as a bundler loader (via explicit `{ type: "md" }`)

## JavaScript API

### `Bun.markdown.html(input, options?)`

Renders markdown to an HTML string:

```js
const html = Bun.markdown.html("# Hello **world**");
// "<h1>Hello <strong>world</strong></h1>\n"

Bun.markdown.html("## Hello", { headingIds: true });
// '<h2 id="hello">Hello</h2>\n'
```

### `Bun.markdown.render(input, callbacks?)`

Renders markdown with custom JavaScript callbacks for each element. Each
callback receives children as a string and optional metadata, and
returns a string:

```js
// Custom HTML with classes
const html = Bun.markdown.render("# Title\n\nHello **world**", {
  heading: (children, { level }) => `<h${level} class="title">${children}</h${level}>`,
  paragraph: (children) => `<p>${children}</p>`,
  strong: (children) => `<b>${children}</b>`,
});

// ANSI terminal output
const ansi = Bun.markdown.render("# Hello\n\n**bold**", {
  heading: (children) => `\x1b[1;4m${children}\x1b[0m\n`,
  paragraph: (children) => children + "\n",
  strong: (children) => `\x1b[1m${children}\x1b[22m`,
});

// Strip all formatting
const text = Bun.markdown.render("# Hello **world**", {
  heading: (children) => children,
  paragraph: (children) => children,
  strong: (children) => children,
});
// "Hello world"

// Return null to omit elements
const result = Bun.markdown.render("# Title\n\n![logo](img.png)\n\nHello", {
  image: () => null,
  heading: (children) => children,
  paragraph: (children) => children + "\n",
});
// "Title\nHello\n"
```

Parser options can be included alongside callbacks:

```js
Bun.markdown.render("Visit www.example.com", {
  link: (children, { href }) => `[${children}](${href})`,
  paragraph: (children) => children,
  permissiveAutolinks: true,
});
```

### `Bun.markdown.react(input, options?)`

Returns a React Fragment element — use it directly as a component return
value:

```tsx
// Use as a component
function Markdown({ text }: { text: string }) {
  return Bun.markdown.react(text);
}

// With custom components
function Heading({ children }: { children: React.ReactNode }) {
  return <h1 className="title">{children}</h1>;
}
const element = Bun.markdown.react("# Hello", { h1: Heading });

// Server-side rendering
import { renderToString } from "react-dom/server";
const html = renderToString(Bun.markdown.react("# Hello **world**"));
// "<h1>Hello <strong>world</strong></h1>"
```

#### React 18 and older

By default, `react()` uses `Symbol.for('react.transitional.element')` as
the `$$typeof` symbol, which is what React 19 expects. For React 18 and
older, pass `reactVersion: 18`:

```tsx
const el = Bun.markdown.react("# Hello", { reactVersion: 18 });
```

### Component Overrides

Tag names can be overridden in `react()`:

```tsx
Bun.markdown.react(input, {
  h1: MyHeading,      // block elements
  p: CustomParagraph,
  a: CustomLink,      // inline elements
  img: CustomImage,
  pre: CodeBlock,
  // ... h1-h6, p, blockquote, ul, ol, li, pre, hr, html,
  //     table, thead, tbody, tr, th, td,
  //     em, strong, a, img, code, del, math, u, br
});
```

Boolean values are ignored (not treated as overrides), so parser options
like `{ strikethrough: true }` don't conflict with component overrides.

### Options

```js
Bun.markdown.html(input, {
  tables: true,              // GFM tables (default: true)
  strikethrough: true,       // ~~deleted~~ (default: true)
  tasklists: true,           // - [x] items (default: true)
  headingIds: true,          // Generate id attributes on headings
  autolinkHeadings: true,    // Wrap heading content in <a> tags
  tagFilter: false,          // GFM disallowed HTML tags
  wikiLinks: false,          // [[wiki]] links
  latexMath: false,          // $inline$ and $$display$$
  underline: false,          // __underline__ (instead of <strong>)
  // ... and more
});
```

## Architecture

### Parser (`src/md/`)

The parser is split into focused modules using Zig's delegation pattern:

| Module | Purpose |
|--------|---------|
| `parser.zig` | Core `Parser` struct, state, and re-exported method
delegation |
| `blocks.zig` | Block-level parsing: document processing, line
analysis, block start/end |
| `containers.zig` | Container management: blockquotes, lists, list
items |
| `inlines.zig` | Inline parsing: emphasis, code spans, HTML tags,
entities |
| `links.zig` | Link/image resolution, reference links, autolink
rendering |
| `autolinks.zig` | Permissive autolink detection (www, url, email) |
| `line_analysis.zig` | Line classification: headings, fences, HTML
blocks, tables |
| `ref_defs.zig` | Reference definition parsing and lookup |
| `render_blocks.zig` | Block rendering dispatch (code, HTML, table
blocks) |
| `html_renderer.zig` | HTML renderer implementing `Renderer` VTable |
| `types.zig` | Shared types: `Renderer` VTable, `BlockType`,
`SpanType`, `TextType`, etc. |

### Renderer Abstraction

Parsing is decoupled from output via a `Renderer` VTable interface:

```zig
pub const Renderer = struct {
    ptr: *anyopaque,
    vtable: *const VTable,

    pub const VTable = struct {
        enterBlock: *const fn (...) void,
        leaveBlock: *const fn (...) void,
        enterSpan:  *const fn (...) void,
        leaveSpan:  *const fn (...) void,
        text:       *const fn (...) void,
    };
};
```

Four renderers are implemented:
- **`HtmlRenderer`** (`src/md/html_renderer.zig`) — produces HTML string
output
- **`JsCallbackRenderer`** (`src/bun.js/api/MarkdownObject.zig`) — calls
JS callbacks for each element, accumulates string output
- **`ParseRenderer`** (`src/bun.js/api/MarkdownObject.zig`) — builds
React element AST with `MarkedArgumentBuffer` for GC safety
- **`JSReactElement`** (`src/bun.js/bindings/JSReactElement.cpp`) — C++
fast path for React element creation using cached JSC Structure +
`putDirectOffset`

## Test plan

- [x] 792 spec tests pass (CommonMark, GFM tables, strikethrough,
tasklists, permissive autolinks, GFM tag filter, wiki links, coverage,
regressions)
- [x] 114 API tests pass (`html()`, `render()`, `react()`,
`renderToString` integration, component overrides)
- [x] 58 GFM compatibility tests pass

```
bun bd test test/js/bun/md/md-spec.test.ts       # 792 pass
bun bd test test/js/bun/md/md-render-api.test.ts  # 114 pass
bun bd test test/js/bun/md/gfm-compat.test.ts     # 58 pass
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Dylan Conway <dylan.conway567@gmail.com>
Co-authored-by: SUZUKI Sosuke <sosuke@bun.com>
Co-authored-by: robobun <robobun@oven.sh>
Co-authored-by: Claude Bot <claude-bot@bun.sh>
Co-authored-by: Kirill Markelov <kerusha.chubko@gmail.com>
Co-authored-by: Ciro Spaciari <ciro.spaciari@gmail.com>
Co-authored-by: Alistair Smith <hi@alistair.sh>
2026-01-28 20:24:02 -08:00

345 lines
14 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: Markdown
description: Parse and render Markdown with Bun's built-in Markdown API, supporting GFM extensions and custom rendering callbacks
---
{% callout type="note" %}
**Unstable API** — This API is under active development and may change in future versions of Bun.
{% /callout %}
Bun includes a fast, built-in Markdown parser written in Zig. It supports GitHub Flavored Markdown (GFM) extensions and provides three APIs:
- `Bun.markdown.html()` — render Markdown to an HTML string
- `Bun.markdown.render()` — render Markdown with custom callbacks for each element
- `Bun.markdown.react()` — render Markdown to React JSX elements
---
## `Bun.markdown.html()`
Convert a Markdown string to HTML.
```ts
const html = Bun.markdown.html("# Hello **world**");
// "<h1>Hello <strong>world</strong></h1>\n"
```
GFM extensions like tables, strikethrough, and task lists are enabled by default:
```ts
const html = Bun.markdown.html(`
| Feature | Status |
|-------------|--------|
| Tables | ~~done~~ |
| Strikethrough| ~~done~~ |
| Task lists | done |
`);
```
### Options
Pass an options object as the second argument to configure the parser:
```ts
const html = Bun.markdown.html("some markdown", {
tables: true, // GFM tables (default: true)
strikethrough: true, // GFM strikethrough (default: true)
tasklists: true, // GFM task lists (default: true)
tagFilter: true, // GFM tag filter for disallowed HTML tags
autolinks: true, // Autolink URLs, emails, and www. links
});
```
All available options:
| Option | Default | Description |
| ---------------------- | ------- | ----------------------------------------------------------- |
| `tables` | `false` | GFM tables |
| `strikethrough` | `false` | GFM strikethrough (`~~text~~`) |
| `tasklists` | `false` | GFM task lists (`- [x] item`) |
| `autolinks` | `false` | Enable autolinks — see [Autolinks](#autolinks) |
| `headings` | `false` | Heading IDs and autolinks — see [Heading IDs](#heading-ids) |
| `hardSoftBreaks` | `false` | Treat soft line breaks as hard breaks |
| `wikiLinks` | `false` | Enable `[[wiki links]]` |
| `underline` | `false` | `__text__` renders as `<u>` instead of `<strong>` |
| `latexMath` | `false` | Enable `$inline$` and `$$display$$` math |
| `collapseWhitespace` | `false` | Collapse whitespace in text |
| `permissiveAtxHeaders` | `false` | ATX headers without space after `#` |
| `noIndentedCodeBlocks` | `false` | Disable indented code blocks |
| `noHtmlBlocks` | `false` | Disable HTML blocks |
| `noHtmlSpans` | `false` | Disable inline HTML |
| `tagFilter` | `false` | GFM tag filter for disallowed HTML tags |
#### Autolinks
Pass `true` to enable all autolink types, or an object for granular control:
```ts
// Enable all autolinks (URL, WWW, email)
Bun.markdown.html("Visit www.example.com", { autolinks: true });
// Enable only specific types
Bun.markdown.html("Visit www.example.com", {
autolinks: { url: true, www: true },
});
```
#### Heading IDs
Pass `true` to enable both heading IDs and autolink headings, or an object for granular control:
```ts
// Enable heading IDs and autolink headings
Bun.markdown.html("## Hello World", { headings: true });
// '<h2 id="hello-world"><a href="#hello-world">Hello World</a></h2>\n'
// Enable only heading IDs (no autolink)
Bun.markdown.html("## Hello World", { headings: { ids: true } });
// '<h2 id="hello-world">Hello World</h2>\n'
```
---
## `Bun.markdown.render()`
Parse Markdown and render it using custom JavaScript callbacks. This gives you full control over the output format — you can generate HTML with custom classes, React elements, ANSI terminal output, or any other string format.
```ts
const result = Bun.markdown.render("# Hello **world**", {
heading: (children, { level }) => `<h${level} class="title">${children}</h${level}>`,
strong: children => `<b>${children}</b>`,
paragraph: children => `<p>${children}</p>`,
});
// '<h1 class="title">Hello <b>world</b></h1>'
```
### Callback signature
Each callback receives:
1. **`children`** — the accumulated content of the element as a string
2. **`meta`** (optional) — an object with element-specific metadata
Return a string to replace the element's rendering. Return `null` or `undefined` to omit the element from the output entirely. If no callback is registered for an element, its children pass through unchanged.
### Block callbacks
| Callback | Meta | Description |
| ------------ | ------------------------------------------- | ---------------------------------------------------------------------------------------- |
| `heading` | `{ level: number, id?: string }` | Heading level 16. `id` is set when `headings: { ids: true }` is enabled |
| `paragraph` | — | Paragraph block |
| `blockquote` | — | Blockquote block |
| `code` | `{ language?: string }` | Fenced or indented code block. `language` is the info-string when specified on the fence |
| `list` | `{ ordered: boolean, start?: number }` | Ordered or unordered list. `start` is the start number for ordered lists |
| `listItem` | `{ checked?: boolean }` | List item. `checked` is set for task list items (`- [x]` / `- [ ]`) |
| `hr` | — | Horizontal rule |
| `table` | — | Table block |
| `thead` | — | Table head |
| `tbody` | — | Table body |
| `tr` | — | Table row |
| `th` | `{ align?: "left" \| "center" \| "right" }` | Table header cell. `align` is set when alignment is specified |
| `td` | `{ align?: "left" \| "center" \| "right" }` | Table data cell. `align` is set when alignment is specified |
| `html` | — | Raw HTML content |
### Inline callbacks
| Callback | Meta | Description |
| --------------- | ---------------------------------- | ---------------------------- |
| `strong` | — | Strong emphasis (`**text**`) |
| `emphasis` | — | Emphasis (`*text*`) |
| `link` | `{ href: string, title?: string }` | Link |
| `image` | `{ src: string, title?: string }` | Image |
| `codespan` | — | Inline code (`` `code` ``) |
| `strikethrough` | — | Strikethrough (`~~text~~`) |
| `text` | — | Plain text content |
### Examples
#### Custom HTML with classes
```ts
const html = Bun.markdown.render("# Title\n\nHello **world**", {
heading: (children, { level }) => `<h${level} class="heading heading-${level}">${children}</h${level}>`,
paragraph: children => `<p class="body">${children}</p>`,
strong: children => `<strong class="bold">${children}</strong>`,
});
```
#### Stripping all formatting
```ts
const plaintext = Bun.markdown.render("# Hello **world**", {
heading: children => children,
paragraph: children => children,
strong: children => children,
emphasis: children => children,
link: children => children,
image: () => "",
code: children => children,
codespan: children => children,
});
// "Hello world"
```
#### Omitting elements
Return `null` or `undefined` to remove an element from the output:
```ts
const result = Bun.markdown.render("# Title\n\n![logo](img.png)\n\nHello", {
image: () => null, // Remove all images
heading: children => children,
paragraph: children => children + "\n",
});
// "Title\nHello\n"
```
#### ANSI terminal output
```ts
const ansi = Bun.markdown.render("# Hello\n\nThis is **bold** and *italic*", {
heading: (children, { level }) => `\x1b[1;4m${children}\x1b[0m\n`,
paragraph: children => children + "\n",
strong: children => `\x1b[1m${children}\x1b[22m`,
emphasis: children => `\x1b[3m${children}\x1b[23m`,
});
```
#### Code block syntax highlighting
````ts
const result = Bun.markdown.render("```js\nconsole.log('hi')\n```", {
code: (children, meta) => {
const lang = meta?.language ?? "";
return `<pre><code class="language-${lang}">${children}</code></pre>`;
},
});
````
### Parser options
Parser options are passed as a separate third argument:
```ts
const result = Bun.markdown.render(
"Visit www.example.com",
{
link: (children, { href }) => `[${children}](${href})`,
paragraph: children => children,
},
{ autolinks: true },
);
```
---
## `Bun.markdown.react()`
Render Markdown directly to React elements. Returns a `<Fragment>` that you can use as a component return value.
```tsx
function Markdown({ text }: { text: string }) {
return Bun.markdown.react(text);
}
```
### Server-side rendering
Works with `renderToString()` and React Server Components:
```tsx
import { renderToString } from "react-dom/server";
const html = renderToString(Bun.markdown.react("# Hello **world**"));
// "<h1>Hello <strong>world</strong></h1>"
```
### Component overrides
Replace any HTML element with a custom React component by passing it in the second argument, keyed by tag name:
```tsx
function Code({ language, children }) {
return (
<pre data-language={language}>
<code>{children}</code>
</pre>
);
}
function Link({ href, title, children }) {
return (
<a href={href} title={title} target="_blank" rel="noopener noreferrer">
{children}
</a>
);
}
function Heading({ id, children }) {
return (
<h2 id={id}>
<a href={`#${id}`}>{children}</a>
</h2>
);
}
const el = Bun.markdown.react(
content,
{
pre: Code,
a: Link,
h2: Heading,
},
{ headings: { ids: true } },
);
```
#### Available overrides
Every HTML tag produced by the parser can be overridden:
| Option | Props | Description |
| ------------ | ---------------------------- | --------------------------------------------------------------- |
| `h1``h6` | `{ id?, children }` | Headings. `id` is set when `headings: { ids: true }` is enabled |
| `p` | `{ children }` | Paragraph |
| `blockquote` | `{ children }` | Blockquote |
| `pre` | `{ language?, children }` | Code block. `language` is the info string (e.g. `"js"`) |
| `hr` | `{}` | Horizontal rule (no children) |
| `ul` | `{ children }` | Unordered list |
| `ol` | `{ start, children }` | Ordered list. `start` is the first item number |
| `li` | `{ checked?, children }` | List item. `checked` is set for task list items |
| `table` | `{ children }` | Table |
| `thead` | `{ children }` | Table head |
| `tbody` | `{ children }` | Table body |
| `tr` | `{ children }` | Table row |
| `th` | `{ align?, children }` | Table header cell |
| `td` | `{ align?, children }` | Table data cell |
| `em` | `{ children }` | Emphasis (`*text*`) |
| `strong` | `{ children }` | Strong (`**text**`) |
| `a` | `{ href, title?, children }` | Link |
| `img` | `{ src, alt?, title? }` | Image (no children) |
| `code` | `{ children }` | Inline code |
| `del` | `{ children }` | Strikethrough (`~~text~~`) |
| `br` | `{}` | Hard line break (no children) |
### React 18 and older
By default, elements use `Symbol.for('react.transitional.element')` as the `$$typeof` symbol. For React 18 and older, pass `reactVersion: 18` in the options (third argument):
```tsx
function Markdown({ text }: { text: string }) {
return Bun.markdown.react(text, undefined, { reactVersion: 18 });
}
```
### Parser options
All [parser options](#options) are passed as the third argument:
```tsx
const el = Bun.markdown.react("## Hello World", undefined, {
headings: { ids: true },
autolinks: true,
});
```