I have been looking for an Open Source data deduplication system that is stable enough to use for SMB file servers and for VMware iSCSI targets. Just recently I found a couple that show major promise. I wouldn’t say that I am an expert on the subject and to be honest I really am a bit scared of the technology in general because the information stored it not “all there” if you know what I mean.
I think it is wise to really justify the need for it and at most always have some kind of backup to make sure you can get to the un-duped data just incase.
Here are some good open source projects that you might want to take a look at and test with:
FUSE – http://fuse.sourceforge.net
lessfs – http://www.lessfs.com
Keep in mind that you need to research how to implement this technology. I will eventually write an entire post on how to set up some sort of test bed for each. I will include a step by step scenario using it for SMB sharing.
I have been working on an iSCSI device. I wanted to mount the volumes for making backups of the machines using rsync to another machine. I needed to load the VMFS3 tools (vmfs-tools) to mount the drive to make this work. Here is what I installed.
NOTE: This command will run on Ubuntu 9.10 and higher
apt-get install vmfs-tools |
Once you install the vmfs-tool then you can create a folder in “/mnt/nas” and the assuming that there is a drive located at “/dev/sdb1″
Now you can mount the drive with this command
vmfs-fuse /dev/sdb1 /mnt/nas |
You will need to change the fstab so it will mount on boot as well. Check my other blog posting for that and the iSCSI information. [I will update this soon with the fstab commands]
UPDATED
Found some more information online for VMFS tools. Not sure how to use these command once installed. I recommend upgrading to Ubuntu 9.10 or higher.
apt-get install open-vm-source |
OR
apt-get install open-vm-tools |
So after I setup the big RAID I ran some tests on the storage. Here are some commands, make sure you have at least 17GB space available for testing this. Once you run the command look at the time and throughput.
WRITE FILE
dd if=/dev/zero of=/mnt/<mount point>/testfile bs=4194304 count=4096 |
READ FILE
dd of=/dev/null if=/mnt/<mount point>/testfile bs=4194304 count=4096 |