Skip to content

Turso: wrap CSV import in a single transaction to avoid per-batch fsync#871

Open
alexey-milovidov wants to merge 3 commits intomainfrom
turso-fsync-fix
Open

Turso: wrap CSV import in a single transaction to avoid per-batch fsync#871
alexey-milovidov wants to merge 3 commits intomainfrom
turso-fsync-fix

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented May 8, 2026

Summary

  • The current tursodb mydb '.import --csv hits.csv hits' commits every 1000-row batch, so each batch incurs an fsync. On EBS gp2 this dominates load time.
  • Wrap the import in BEGIN/COMMIT so the whole load is one transaction → one fsync.
  • Add PRAGMA synchronous = OFF for the import session (further reduces fsyncs during the transaction).
  • Add a trailing PRAGMA wal_checkpoint(TRUNCATE) so the wc -c mydb size measurement reflects all imported data and isn't skewed by WAL state.
  • Bump pinned Turso from v0.1.2-pre.4 (Jul 2025) to v0.6.0-pre.27 (May 2026, latest); this also adds an aarch64-unknown-linux-gnu artifact so the script runs on arm64 hosts as well.

Test plan

Tested end-to-end on aarch64 with the actual benchmark.sh installer + load logic, against a 200K-row slice of hits.csv on local SSD:

Run 1 Run 2 Run 3
Before fix 11.91s 12.03s 12.01s
After fix 9.18s 9.22s 9.08s
  • Installer pulls v0.6.0-pre.27 cleanly on aarch64
  • Schema applies, all 200K rows imported correctly
  • WAL truncates to 0 bytes after the checkpoint so wc -c mydb is accurate
  • First 5 queries from queries.sql execute and return results

Local SSD showed ~24% faster; on the benchmark's EBS gp2 host, per-fsync latency is ~100× higher, so the wallclock win on the full 100M-row dataset will be substantially larger.

🤖 Generated with Claude Code

alexey-milovidov and others added 3 commits May 8, 2026 15:26
Loading hits.csv via `.import` was fsync-bound: each 1000-row batch
committed separately, so on EBS gp2 the per-batch fsync dominated.
Wrapping the import in BEGIN/COMMIT collapses it to one fsync, and
PRAGMA synchronous = OFF disables fsyncs within the transaction.
A trailing wal_checkpoint(TRUNCATE) keeps `wc -c mydb` consistent.

Measured on a 200K-row hits.csv slice (local SSD): 12.0s -> 9.1s.
On EBS gp2 the speedup will be substantially larger.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pinned version was v0.1.2-pre.4 from July 2025. v0.6.0-pre.27 is the
latest release and adds an aarch64-unknown-linux-gnu artifact, so the
benchmark runs on both x86_64 and arm64 hosts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`.import` prints "Inserting batch of 1000 rows" once per batch — that's
~100k lines on the full hits.csv, which bloats log.txt. Filter the
progress lines out; errors still pass through grep -v.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant