mirror of
https://github.com/oven-sh/bun
synced 2026-02-02 15:08:46 +00:00
feat(md): Zig markdown parser with Bun.markdown API (#26440)
## Summary
- Port md4c (CommonMark-compliant markdown parser) from C to Zig under
`src/md/`
- Three output modes:
- `Bun.markdown.html(input, options?)` — render to HTML string
- `Bun.markdown.render(input, callbacks?)` — render with custom
callbacks for each element
- `Bun.markdown.react(input, options?)` — render to a React Fragment
element, directly usable as a component return value
- React element creation uses a cached JSC Structure with
`putDirectOffset` for fast allocation
- Component overrides in `react()`: pass tag names as options keys to
replace default HTML elements with custom components
- GFM extensions: tables, strikethrough, task lists, permissive
autolinks, disallowed raw HTML tag filter
- Wire up `.md` as a bundler loader (via explicit `{ type: "md" }`)
## JavaScript API
### `Bun.markdown.html(input, options?)`
Renders markdown to an HTML string:
```js
const html = Bun.markdown.html("# Hello **world**");
// "<h1>Hello <strong>world</strong></h1>\n"
Bun.markdown.html("## Hello", { headingIds: true });
// '<h2 id="hello">Hello</h2>\n'
```
### `Bun.markdown.render(input, callbacks?)`
Renders markdown with custom JavaScript callbacks for each element. Each
callback receives children as a string and optional metadata, and
returns a string:
```js
// Custom HTML with classes
const html = Bun.markdown.render("# Title\n\nHello **world**", {
heading: (children, { level }) => `<h${level} class="title">${children}</h${level}>`,
paragraph: (children) => `<p>${children}</p>`,
strong: (children) => `<b>${children}</b>`,
});
// ANSI terminal output
const ansi = Bun.markdown.render("# Hello\n\n**bold**", {
heading: (children) => `\x1b[1;4m${children}\x1b[0m\n`,
paragraph: (children) => children + "\n",
strong: (children) => `\x1b[1m${children}\x1b[22m`,
});
// Strip all formatting
const text = Bun.markdown.render("# Hello **world**", {
heading: (children) => children,
paragraph: (children) => children,
strong: (children) => children,
});
// "Hello world"
// Return null to omit elements
const result = Bun.markdown.render("# Title\n\n\n\nHello", {
image: () => null,
heading: (children) => children,
paragraph: (children) => children + "\n",
});
// "Title\nHello\n"
```
Parser options can be included alongside callbacks:
```js
Bun.markdown.render("Visit www.example.com", {
link: (children, { href }) => `[${children}](${href})`,
paragraph: (children) => children,
permissiveAutolinks: true,
});
```
### `Bun.markdown.react(input, options?)`
Returns a React Fragment element — use it directly as a component return
value:
```tsx
// Use as a component
function Markdown({ text }: { text: string }) {
return Bun.markdown.react(text);
}
// With custom components
function Heading({ children }: { children: React.ReactNode }) {
return <h1 className="title">{children}</h1>;
}
const element = Bun.markdown.react("# Hello", { h1: Heading });
// Server-side rendering
import { renderToString } from "react-dom/server";
const html = renderToString(Bun.markdown.react("# Hello **world**"));
// "<h1>Hello <strong>world</strong></h1>"
```
#### React 18 and older
By default, `react()` uses `Symbol.for('react.transitional.element')` as
the `$$typeof` symbol, which is what React 19 expects. For React 18 and
older, pass `reactVersion: 18`:
```tsx
const el = Bun.markdown.react("# Hello", { reactVersion: 18 });
```
### Component Overrides
Tag names can be overridden in `react()`:
```tsx
Bun.markdown.react(input, {
h1: MyHeading, // block elements
p: CustomParagraph,
a: CustomLink, // inline elements
img: CustomImage,
pre: CodeBlock,
// ... h1-h6, p, blockquote, ul, ol, li, pre, hr, html,
// table, thead, tbody, tr, th, td,
// em, strong, a, img, code, del, math, u, br
});
```
Boolean values are ignored (not treated as overrides), so parser options
like `{ strikethrough: true }` don't conflict with component overrides.
### Options
```js
Bun.markdown.html(input, {
tables: true, // GFM tables (default: true)
strikethrough: true, // ~~deleted~~ (default: true)
tasklists: true, // - [x] items (default: true)
headingIds: true, // Generate id attributes on headings
autolinkHeadings: true, // Wrap heading content in <a> tags
tagFilter: false, // GFM disallowed HTML tags
wikiLinks: false, // [[wiki]] links
latexMath: false, // $inline$ and $$display$$
underline: false, // __underline__ (instead of <strong>)
// ... and more
});
```
## Architecture
### Parser (`src/md/`)
The parser is split into focused modules using Zig's delegation pattern:
| Module | Purpose |
|--------|---------|
| `parser.zig` | Core `Parser` struct, state, and re-exported method
delegation |
| `blocks.zig` | Block-level parsing: document processing, line
analysis, block start/end |
| `containers.zig` | Container management: blockquotes, lists, list
items |
| `inlines.zig` | Inline parsing: emphasis, code spans, HTML tags,
entities |
| `links.zig` | Link/image resolution, reference links, autolink
rendering |
| `autolinks.zig` | Permissive autolink detection (www, url, email) |
| `line_analysis.zig` | Line classification: headings, fences, HTML
blocks, tables |
| `ref_defs.zig` | Reference definition parsing and lookup |
| `render_blocks.zig` | Block rendering dispatch (code, HTML, table
blocks) |
| `html_renderer.zig` | HTML renderer implementing `Renderer` VTable |
| `types.zig` | Shared types: `Renderer` VTable, `BlockType`,
`SpanType`, `TextType`, etc. |
### Renderer Abstraction
Parsing is decoupled from output via a `Renderer` VTable interface:
```zig
pub const Renderer = struct {
ptr: *anyopaque,
vtable: *const VTable,
pub const VTable = struct {
enterBlock: *const fn (...) void,
leaveBlock: *const fn (...) void,
enterSpan: *const fn (...) void,
leaveSpan: *const fn (...) void,
text: *const fn (...) void,
};
};
```
Four renderers are implemented:
- **`HtmlRenderer`** (`src/md/html_renderer.zig`) — produces HTML string
output
- **`JsCallbackRenderer`** (`src/bun.js/api/MarkdownObject.zig`) — calls
JS callbacks for each element, accumulates string output
- **`ParseRenderer`** (`src/bun.js/api/MarkdownObject.zig`) — builds
React element AST with `MarkedArgumentBuffer` for GC safety
- **`JSReactElement`** (`src/bun.js/bindings/JSReactElement.cpp`) — C++
fast path for React element creation using cached JSC Structure +
`putDirectOffset`
## Test plan
- [x] 792 spec tests pass (CommonMark, GFM tables, strikethrough,
tasklists, permissive autolinks, GFM tag filter, wiki links, coverage,
regressions)
- [x] 114 API tests pass (`html()`, `render()`, `react()`,
`renderToString` integration, component overrides)
- [x] 58 GFM compatibility tests pass
```
bun bd test test/js/bun/md/md-spec.test.ts # 792 pass
bun bd test test/js/bun/md/md-render-api.test.ts # 114 pass
bun bd test test/js/bun/md/gfm-compat.test.ts # 58 pass
```
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Dylan Conway <dylan.conway567@gmail.com>
Co-authored-by: SUZUKI Sosuke <sosuke@bun.com>
Co-authored-by: robobun <robobun@oven.sh>
Co-authored-by: Claude Bot <claude-bot@bun.sh>
Co-authored-by: Kirill Markelov <kerusha.chubko@gmail.com>
Co-authored-by: Ciro Spaciari <ciro.spaciari@gmail.com>
Co-authored-by: Alistair Smith <hi@alistair.sh>
This commit is contained in:
@@ -150,6 +150,7 @@
|
||||
"/runtime/secrets",
|
||||
"/runtime/console",
|
||||
"/runtime/yaml",
|
||||
"/runtime/markdown",
|
||||
"/runtime/json5",
|
||||
"/runtime/jsonl",
|
||||
"/runtime/html-rewriter",
|
||||
|
||||
@@ -55,5 +55,5 @@ Click the link in the right column to jump to the associated documentation.
|
||||
| Stream Processing | [`Bun.readableStreamTo*()`](/runtime/utils#bun-readablestreamto), `Bun.readableStreamToBytes()`, `Bun.readableStreamToBlob()`, `Bun.readableStreamToFormData()`, `Bun.readableStreamToJSON()`, `Bun.readableStreamToArray()` |
|
||||
| Memory & Buffer Management | `Bun.ArrayBufferSink`, `Bun.allocUnsafe`, `Bun.concatArrayBuffers` |
|
||||
| Module Resolution | [`Bun.resolveSync()`](/runtime/utils#bun-resolvesync) |
|
||||
| Parsing & Formatting | [`Bun.semver`](/runtime/semver), `Bun.TOML.parse`, [`Bun.color`](/runtime/color) |
|
||||
| Parsing & Formatting | [`Bun.semver`](/runtime/semver), `Bun.TOML.parse`, [`Bun.markdown`](/runtime/markdown), [`Bun.color`](/runtime/color) |
|
||||
| Low-level / Internals | `Bun.mmap`, `Bun.gc`, `Bun.generateHeapSnapshot`, [`bun:jsc`](https://bun.com/reference/bun/jsc) |
|
||||
|
||||
344
docs/runtime/markdown.mdx
Normal file
344
docs/runtime/markdown.mdx
Normal file
@@ -0,0 +1,344 @@
|
||||
---
|
||||
title: Markdown
|
||||
description: Parse and render Markdown with Bun's built-in Markdown API, supporting GFM extensions and custom rendering callbacks
|
||||
---
|
||||
|
||||
{% callout type="note" %}
|
||||
**Unstable API** — This API is under active development and may change in future versions of Bun.
|
||||
{% /callout %}
|
||||
|
||||
Bun includes a fast, built-in Markdown parser written in Zig. It supports GitHub Flavored Markdown (GFM) extensions and provides three APIs:
|
||||
|
||||
- `Bun.markdown.html()` — render Markdown to an HTML string
|
||||
- `Bun.markdown.render()` — render Markdown with custom callbacks for each element
|
||||
- `Bun.markdown.react()` — render Markdown to React JSX elements
|
||||
|
||||
---
|
||||
|
||||
## `Bun.markdown.html()`
|
||||
|
||||
Convert a Markdown string to HTML.
|
||||
|
||||
```ts
|
||||
const html = Bun.markdown.html("# Hello **world**");
|
||||
// "<h1>Hello <strong>world</strong></h1>\n"
|
||||
```
|
||||
|
||||
GFM extensions like tables, strikethrough, and task lists are enabled by default:
|
||||
|
||||
```ts
|
||||
const html = Bun.markdown.html(`
|
||||
| Feature | Status |
|
||||
|-------------|--------|
|
||||
| Tables | ~~done~~ |
|
||||
| Strikethrough| ~~done~~ |
|
||||
| Task lists | done |
|
||||
`);
|
||||
```
|
||||
|
||||
### Options
|
||||
|
||||
Pass an options object as the second argument to configure the parser:
|
||||
|
||||
```ts
|
||||
const html = Bun.markdown.html("some markdown", {
|
||||
tables: true, // GFM tables (default: true)
|
||||
strikethrough: true, // GFM strikethrough (default: true)
|
||||
tasklists: true, // GFM task lists (default: true)
|
||||
tagFilter: true, // GFM tag filter for disallowed HTML tags
|
||||
autolinks: true, // Autolink URLs, emails, and www. links
|
||||
});
|
||||
```
|
||||
|
||||
All available options:
|
||||
|
||||
| Option | Default | Description |
|
||||
| ---------------------- | ------- | ----------------------------------------------------------- |
|
||||
| `tables` | `false` | GFM tables |
|
||||
| `strikethrough` | `false` | GFM strikethrough (`~~text~~`) |
|
||||
| `tasklists` | `false` | GFM task lists (`- [x] item`) |
|
||||
| `autolinks` | `false` | Enable autolinks — see [Autolinks](#autolinks) |
|
||||
| `headings` | `false` | Heading IDs and autolinks — see [Heading IDs](#heading-ids) |
|
||||
| `hardSoftBreaks` | `false` | Treat soft line breaks as hard breaks |
|
||||
| `wikiLinks` | `false` | Enable `[[wiki links]]` |
|
||||
| `underline` | `false` | `__text__` renders as `<u>` instead of `<strong>` |
|
||||
| `latexMath` | `false` | Enable `$inline$` and `$$display$$` math |
|
||||
| `collapseWhitespace` | `false` | Collapse whitespace in text |
|
||||
| `permissiveAtxHeaders` | `false` | ATX headers without space after `#` |
|
||||
| `noIndentedCodeBlocks` | `false` | Disable indented code blocks |
|
||||
| `noHtmlBlocks` | `false` | Disable HTML blocks |
|
||||
| `noHtmlSpans` | `false` | Disable inline HTML |
|
||||
| `tagFilter` | `false` | GFM tag filter for disallowed HTML tags |
|
||||
|
||||
#### Autolinks
|
||||
|
||||
Pass `true` to enable all autolink types, or an object for granular control:
|
||||
|
||||
```ts
|
||||
// Enable all autolinks (URL, WWW, email)
|
||||
Bun.markdown.html("Visit www.example.com", { autolinks: true });
|
||||
|
||||
// Enable only specific types
|
||||
Bun.markdown.html("Visit www.example.com", {
|
||||
autolinks: { url: true, www: true },
|
||||
});
|
||||
```
|
||||
|
||||
#### Heading IDs
|
||||
|
||||
Pass `true` to enable both heading IDs and autolink headings, or an object for granular control:
|
||||
|
||||
```ts
|
||||
// Enable heading IDs and autolink headings
|
||||
Bun.markdown.html("## Hello World", { headings: true });
|
||||
// '<h2 id="hello-world"><a href="#hello-world">Hello World</a></h2>\n'
|
||||
|
||||
// Enable only heading IDs (no autolink)
|
||||
Bun.markdown.html("## Hello World", { headings: { ids: true } });
|
||||
// '<h2 id="hello-world">Hello World</h2>\n'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## `Bun.markdown.render()`
|
||||
|
||||
Parse Markdown and render it using custom JavaScript callbacks. This gives you full control over the output format — you can generate HTML with custom classes, React elements, ANSI terminal output, or any other string format.
|
||||
|
||||
```ts
|
||||
const result = Bun.markdown.render("# Hello **world**", {
|
||||
heading: (children, { level }) => `<h${level} class="title">${children}</h${level}>`,
|
||||
strong: children => `<b>${children}</b>`,
|
||||
paragraph: children => `<p>${children}</p>`,
|
||||
});
|
||||
// '<h1 class="title">Hello <b>world</b></h1>'
|
||||
```
|
||||
|
||||
### Callback signature
|
||||
|
||||
Each callback receives:
|
||||
|
||||
1. **`children`** — the accumulated content of the element as a string
|
||||
2. **`meta`** (optional) — an object with element-specific metadata
|
||||
|
||||
Return a string to replace the element's rendering. Return `null` or `undefined` to omit the element from the output entirely. If no callback is registered for an element, its children pass through unchanged.
|
||||
|
||||
### Block callbacks
|
||||
|
||||
| Callback | Meta | Description |
|
||||
| ------------ | ------------------------------------------- | ---------------------------------------------------------------------------------------- |
|
||||
| `heading` | `{ level: number, id?: string }` | Heading level 1–6. `id` is set when `headings: { ids: true }` is enabled |
|
||||
| `paragraph` | — | Paragraph block |
|
||||
| `blockquote` | — | Blockquote block |
|
||||
| `code` | `{ language?: string }` | Fenced or indented code block. `language` is the info-string when specified on the fence |
|
||||
| `list` | `{ ordered: boolean, start?: number }` | Ordered or unordered list. `start` is the start number for ordered lists |
|
||||
| `listItem` | `{ checked?: boolean }` | List item. `checked` is set for task list items (`- [x]` / `- [ ]`) |
|
||||
| `hr` | — | Horizontal rule |
|
||||
| `table` | — | Table block |
|
||||
| `thead` | — | Table head |
|
||||
| `tbody` | — | Table body |
|
||||
| `tr` | — | Table row |
|
||||
| `th` | `{ align?: "left" \| "center" \| "right" }` | Table header cell. `align` is set when alignment is specified |
|
||||
| `td` | `{ align?: "left" \| "center" \| "right" }` | Table data cell. `align` is set when alignment is specified |
|
||||
| `html` | — | Raw HTML content |
|
||||
|
||||
### Inline callbacks
|
||||
|
||||
| Callback | Meta | Description |
|
||||
| --------------- | ---------------------------------- | ---------------------------- |
|
||||
| `strong` | — | Strong emphasis (`**text**`) |
|
||||
| `emphasis` | — | Emphasis (`*text*`) |
|
||||
| `link` | `{ href: string, title?: string }` | Link |
|
||||
| `image` | `{ src: string, title?: string }` | Image |
|
||||
| `codespan` | — | Inline code (`` `code` ``) |
|
||||
| `strikethrough` | — | Strikethrough (`~~text~~`) |
|
||||
| `text` | — | Plain text content |
|
||||
|
||||
### Examples
|
||||
|
||||
#### Custom HTML with classes
|
||||
|
||||
```ts
|
||||
const html = Bun.markdown.render("# Title\n\nHello **world**", {
|
||||
heading: (children, { level }) => `<h${level} class="heading heading-${level}">${children}</h${level}>`,
|
||||
paragraph: children => `<p class="body">${children}</p>`,
|
||||
strong: children => `<strong class="bold">${children}</strong>`,
|
||||
});
|
||||
```
|
||||
|
||||
#### Stripping all formatting
|
||||
|
||||
```ts
|
||||
const plaintext = Bun.markdown.render("# Hello **world**", {
|
||||
heading: children => children,
|
||||
paragraph: children => children,
|
||||
strong: children => children,
|
||||
emphasis: children => children,
|
||||
link: children => children,
|
||||
image: () => "",
|
||||
code: children => children,
|
||||
codespan: children => children,
|
||||
});
|
||||
// "Hello world"
|
||||
```
|
||||
|
||||
#### Omitting elements
|
||||
|
||||
Return `null` or `undefined` to remove an element from the output:
|
||||
|
||||
```ts
|
||||
const result = Bun.markdown.render("# Title\n\n\n\nHello", {
|
||||
image: () => null, // Remove all images
|
||||
heading: children => children,
|
||||
paragraph: children => children + "\n",
|
||||
});
|
||||
// "Title\nHello\n"
|
||||
```
|
||||
|
||||
#### ANSI terminal output
|
||||
|
||||
```ts
|
||||
const ansi = Bun.markdown.render("# Hello\n\nThis is **bold** and *italic*", {
|
||||
heading: (children, { level }) => `\x1b[1;4m${children}\x1b[0m\n`,
|
||||
paragraph: children => children + "\n",
|
||||
strong: children => `\x1b[1m${children}\x1b[22m`,
|
||||
emphasis: children => `\x1b[3m${children}\x1b[23m`,
|
||||
});
|
||||
```
|
||||
|
||||
#### Code block syntax highlighting
|
||||
|
||||
````ts
|
||||
const result = Bun.markdown.render("```js\nconsole.log('hi')\n```", {
|
||||
code: (children, meta) => {
|
||||
const lang = meta?.language ?? "";
|
||||
return `<pre><code class="language-${lang}">${children}</code></pre>`;
|
||||
},
|
||||
});
|
||||
````
|
||||
|
||||
### Parser options
|
||||
|
||||
Parser options are passed as a separate third argument:
|
||||
|
||||
```ts
|
||||
const result = Bun.markdown.render(
|
||||
"Visit www.example.com",
|
||||
{
|
||||
link: (children, { href }) => `[${children}](${href})`,
|
||||
paragraph: children => children,
|
||||
},
|
||||
{ autolinks: true },
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## `Bun.markdown.react()`
|
||||
|
||||
Render Markdown directly to React elements. Returns a `<Fragment>` that you can use as a component return value.
|
||||
|
||||
```tsx
|
||||
function Markdown({ text }: { text: string }) {
|
||||
return Bun.markdown.react(text);
|
||||
}
|
||||
```
|
||||
|
||||
### Server-side rendering
|
||||
|
||||
Works with `renderToString()` and React Server Components:
|
||||
|
||||
```tsx
|
||||
import { renderToString } from "react-dom/server";
|
||||
|
||||
const html = renderToString(Bun.markdown.react("# Hello **world**"));
|
||||
// "<h1>Hello <strong>world</strong></h1>"
|
||||
```
|
||||
|
||||
### Component overrides
|
||||
|
||||
Replace any HTML element with a custom React component by passing it in the second argument, keyed by tag name:
|
||||
|
||||
```tsx
|
||||
function Code({ language, children }) {
|
||||
return (
|
||||
<pre data-language={language}>
|
||||
<code>{children}</code>
|
||||
</pre>
|
||||
);
|
||||
}
|
||||
|
||||
function Link({ href, title, children }) {
|
||||
return (
|
||||
<a href={href} title={title} target="_blank" rel="noopener noreferrer">
|
||||
{children}
|
||||
</a>
|
||||
);
|
||||
}
|
||||
|
||||
function Heading({ id, children }) {
|
||||
return (
|
||||
<h2 id={id}>
|
||||
<a href={`#${id}`}>{children}</a>
|
||||
</h2>
|
||||
);
|
||||
}
|
||||
|
||||
const el = Bun.markdown.react(
|
||||
content,
|
||||
{
|
||||
pre: Code,
|
||||
a: Link,
|
||||
h2: Heading,
|
||||
},
|
||||
{ headings: { ids: true } },
|
||||
);
|
||||
```
|
||||
|
||||
#### Available overrides
|
||||
|
||||
Every HTML tag produced by the parser can be overridden:
|
||||
|
||||
| Option | Props | Description |
|
||||
| ------------ | ---------------------------- | --------------------------------------------------------------- |
|
||||
| `h1`–`h6` | `{ id?, children }` | Headings. `id` is set when `headings: { ids: true }` is enabled |
|
||||
| `p` | `{ children }` | Paragraph |
|
||||
| `blockquote` | `{ children }` | Blockquote |
|
||||
| `pre` | `{ language?, children }` | Code block. `language` is the info string (e.g. `"js"`) |
|
||||
| `hr` | `{}` | Horizontal rule (no children) |
|
||||
| `ul` | `{ children }` | Unordered list |
|
||||
| `ol` | `{ start, children }` | Ordered list. `start` is the first item number |
|
||||
| `li` | `{ checked?, children }` | List item. `checked` is set for task list items |
|
||||
| `table` | `{ children }` | Table |
|
||||
| `thead` | `{ children }` | Table head |
|
||||
| `tbody` | `{ children }` | Table body |
|
||||
| `tr` | `{ children }` | Table row |
|
||||
| `th` | `{ align?, children }` | Table header cell |
|
||||
| `td` | `{ align?, children }` | Table data cell |
|
||||
| `em` | `{ children }` | Emphasis (`*text*`) |
|
||||
| `strong` | `{ children }` | Strong (`**text**`) |
|
||||
| `a` | `{ href, title?, children }` | Link |
|
||||
| `img` | `{ src, alt?, title? }` | Image (no children) |
|
||||
| `code` | `{ children }` | Inline code |
|
||||
| `del` | `{ children }` | Strikethrough (`~~text~~`) |
|
||||
| `br` | `{}` | Hard line break (no children) |
|
||||
|
||||
### React 18 and older
|
||||
|
||||
By default, elements use `Symbol.for('react.transitional.element')` as the `$$typeof` symbol. For React 18 and older, pass `reactVersion: 18` in the options (third argument):
|
||||
|
||||
```tsx
|
||||
function Markdown({ text }: { text: string }) {
|
||||
return Bun.markdown.react(text, undefined, { reactVersion: 18 });
|
||||
}
|
||||
```
|
||||
|
||||
### Parser options
|
||||
|
||||
All [parser options](#options) are passed as the third argument:
|
||||
|
||||
```tsx
|
||||
const el = Bun.markdown.react("## Hello World", undefined, {
|
||||
headings: { ids: true },
|
||||
autolinks: true,
|
||||
});
|
||||
```
|
||||
Reference in New Issue
Block a user