Turso: wrap CSV import in a single transaction to avoid per-batch fsync by alexey-milovidov · Pull Request #871 · ClickHouse/ClickBench

alexey-milovidov · 2026-05-08T15:27:19Z

Summary

The current tursodb mydb '.import --csv hits.csv hits' commits every 1000-row batch, so each batch incurs an fsync. On EBS gp2 this dominates load time.
Wrap the import in BEGIN/COMMIT so the whole load is one transaction → one fsync.
Add PRAGMA synchronous = OFF for the import session (further reduces fsyncs during the transaction).
Add a trailing PRAGMA wal_checkpoint(TRUNCATE) so the wc -c mydb size measurement reflects all imported data and isn't skewed by WAL state.
Bump pinned Turso from v0.1.2-pre.4 (Jul 2025) to v0.6.0-pre.27 (May 2026, latest); this also adds an aarch64-unknown-linux-gnu artifact so the script runs on arm64 hosts as well.

Test plan

Tested end-to-end on aarch64 with the actual benchmark.sh installer + load logic, against a 200K-row slice of hits.csv on local SSD:

	Run 1	Run 2	Run 3
Before fix	11.91s	12.03s	12.01s
After fix	9.18s	9.22s	9.08s

Installer pulls v0.6.0-pre.27 cleanly on aarch64
Schema applies, all 200K rows imported correctly
WAL truncates to 0 bytes after the checkpoint so wc -c mydb is accurate
First 5 queries from queries.sql execute and return results

Local SSD showed ~24% faster; on the benchmark's EBS gp2 host, per-fsync latency is ~100× higher, so the wallclock win on the full 100M-row dataset will be substantially larger.

🤖 Generated with Claude Code

Loading hits.csv via `.import` was fsync-bound: each 1000-row batch committed separately, so on EBS gp2 the per-batch fsync dominated. Wrapping the import in BEGIN/COMMIT collapses it to one fsync, and PRAGMA synchronous = OFF disables fsyncs within the transaction. A trailing wal_checkpoint(TRUNCATE) keeps `wc -c mydb` consistent. Measured on a 200K-row hits.csv slice (local SSD): 12.0s -> 9.1s. On EBS gp2 the speedup will be substantially larger. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pinned version was v0.1.2-pre.4 from July 2025. v0.6.0-pre.27 is the latest release and adds an aarch64-unknown-linux-gnu artifact, so the benchmark runs on both x86_64 and arm64 hosts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`.import` prints "Inserting batch of 1000 rows" once per batch — that's ~100k lines on the full hits.csv, which bloats log.txt. Filter the progress lines out; errors still pass through grep -v. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

alexey-milovidov and others added 3 commits May 8, 2026 15:26

Turso: bump to v0.6.0-pre.27

3667498

Pinned version was v0.1.2-pre.4 from July 2025. v0.6.0-pre.27 is the latest release and adds an aarch64-unknown-linux-gnu artifact, so the benchmark runs on both x86_64 and arm64 hosts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turso: wrap CSV import in a single transaction to avoid per-batch fsync#871

Turso: wrap CSV import in a single transaction to avoid per-batch fsync#871
alexey-milovidov wants to merge 3 commits intomainfrom
turso-fsync-fix

alexey-milovidov commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alexey-milovidov commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alexey-milovidov commented May 8, 2026 •

edited

Loading