Distributed Systems      Data Mining      Misc

Distributed Systems

Latest Systems

Ceph: A scalable, High-Performance Distributed File System (link) (pdf)

Data Placment & Replication

Worldwide Fast File Replication on Grid Datafarm (pdf)
Data Replication in Hadoop (link)
CRUSH: Controlled, Scalable, Decentralized Placment of Replicated Data (pdf)

Data Striping


FUSE: File System in Userspace (link)
Rapid File System Development Using ptrace (pdf)

Performance Measurement

File System Benchmarks, Then, Now, and Tomorrow (pdf)
Parallel I/O Examples and Benchmakr Codes (link)
IOzone Filesystem Benchmark (link)
Benefits of High Speed Interconnects to Cluster File Systems: A Case Study with Lustre (pdf)

Google Section

Data Mining

Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching (pdf)


Reinventing the Bazzar (link)
Beautiful Code (link)

Last Updated: Decemeber 1st, 2007