Turso: wrap CSV import in a single transaction to avoid per-batch fsync#871
Open
alexey-milovidov wants to merge 3 commits intomainfrom
Open
Turso: wrap CSV import in a single transaction to avoid per-batch fsync#871alexey-milovidov wants to merge 3 commits intomainfrom
alexey-milovidov wants to merge 3 commits intomainfrom
Conversation
Loading hits.csv via `.import` was fsync-bound: each 1000-row batch committed separately, so on EBS gp2 the per-batch fsync dominated. Wrapping the import in BEGIN/COMMIT collapses it to one fsync, and PRAGMA synchronous = OFF disables fsyncs within the transaction. A trailing wal_checkpoint(TRUNCATE) keeps `wc -c mydb` consistent. Measured on a 200K-row hits.csv slice (local SSD): 12.0s -> 9.1s. On EBS gp2 the speedup will be substantially larger. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pinned version was v0.1.2-pre.4 from July 2025. v0.6.0-pre.27 is the latest release and adds an aarch64-unknown-linux-gnu artifact, so the benchmark runs on both x86_64 and arm64 hosts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`.import` prints "Inserting batch of 1000 rows" once per batch — that's ~100k lines on the full hits.csv, which bloats log.txt. Filter the progress lines out; errors still pass through grep -v. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tursodb mydb '.import --csv hits.csv hits'commits every 1000-row batch, so each batch incurs an fsync. On EBS gp2 this dominates load time.BEGIN/COMMITso the whole load is one transaction → one fsync.PRAGMA synchronous = OFFfor the import session (further reduces fsyncs during the transaction).PRAGMA wal_checkpoint(TRUNCATE)so thewc -c mydbsize measurement reflects all imported data and isn't skewed by WAL state.v0.1.2-pre.4(Jul 2025) tov0.6.0-pre.27(May 2026, latest); this also adds anaarch64-unknown-linux-gnuartifact so the script runs on arm64 hosts as well.Test plan
Tested end-to-end on aarch64 with the actual
benchmark.shinstaller + load logic, against a 200K-row slice ofhits.csvon local SSD:wc -c mydbis accuratequeries.sqlexecute and return resultsLocal SSD showed ~24% faster; on the benchmark's EBS gp2 host, per-fsync latency is ~100× higher, so the wallclock win on the full 100M-row dataset will be substantially larger.
🤖 Generated with Claude Code