Wuala
« previous entry | next entry »
Feb. 3rd, 2008 | 09:02 pm
Wuala is another attempt at end-user distributed storage. It provides 1GB of storage to start with that's provided by the company's server. For additional storage, you can invite up to 15 friends and get 1GB of storage for each one that joins. Beyond that, you can trade storage on your computer(s): (# of GBs)*(percent of uptime). Also, your download speed is proportional is adjusted based on your (computers) reputation. Unlike Freenet, they've traded anonymity in favor of accountability. Files and directories are encrypted (both data and metadata) with private, limited sharing (both individuals, e.g. alice and bob, and groups), and public systems.
Notably, it's alpha and read-only at the moment ... [Edit: It's read-only in the sense that you can't edit files. You can create and delete files and directories]
Files are broken up into chunks. Instead of duplicating chunks of a file a lot (like 25 times), they use erasure codes (bringing it down to 5 or 6, assuming 25% uptime per-node).
If a lot of requests come in for the same chunk within a short period of time, the requests are redirected to other nodes that downloaded the chunk recently (providing bittorrent-like swarm distribution).
Access control is provided using a structure called cryptree. It uses lazy revocation, so once someone gains access to a file they can always read the last version they had access to; on the other hand, this means that multiple files can be encrypted with the same key and updating one file doesn't require that all the files be re-encrypted. Also note that they could just save the file itself anyway.
Cryptrees use asymmetric crypto (RSA with 2048-bit keys) to make the connections directed. Lazy revocation is implemented via key regression; when updating a file or folder encrypted with a key, K, a new key, K', is used and the pair (K',K) has the property that K' can be used to compute K, but K can't be used to compute K'.
Public files are indexed on the central server, private and shared files are indexed locally.
Routing is handled using the usual distributed hash tables with random links ... the only thing that might be interesting is that in responses to requests, nodes add some information about nodes they're aware of. This is intended to help discover new(ly online) nodes quickly.
Fairness is ensured by random auditing and reporting, to prevent nodes from lying about how much they're storing or how much time they've spent online. A large number of nodes should have to lie to break the system (paper); presumably reports are signed to prevent spoofing. However, I don't see any protection against sybil attacks. The paper simply states that they're handled by a different part of the application.
Notably, it's alpha and read-only at the moment ... [Edit: It's read-only in the sense that you can't edit files. You can create and delete files and directories]
Files are broken up into chunks. Instead of duplicating chunks of a file a lot (like 25 times), they use erasure codes (bringing it down to 5 or 6, assuming 25% uptime per-node).
If a lot of requests come in for the same chunk within a short period of time, the requests are redirected to other nodes that downloaded the chunk recently (providing bittorrent-like swarm distribution).
Access control is provided using a structure called cryptree. It uses lazy revocation, so once someone gains access to a file they can always read the last version they had access to; on the other hand, this means that multiple files can be encrypted with the same key and updating one file doesn't require that all the files be re-encrypted. Also note that they could just save the file itself anyway.
Cryptrees use asymmetric crypto (RSA with 2048-bit keys) to make the connections directed. Lazy revocation is implemented via key regression; when updating a file or folder encrypted with a key, K, a new key, K', is used and the pair (K',K) has the property that K' can be used to compute K, but K can't be used to compute K'.
Public files are indexed on the central server, private and shared files are indexed locally.
Routing is handled using the usual distributed hash tables with random links ... the only thing that might be interesting is that in responses to requests, nodes add some information about nodes they're aware of. This is intended to help discover new(ly online) nodes quickly.
Fairness is ensured by random auditing and reporting, to prevent nodes from lying about how much they're storing or how much time they've spent online. A large number of nodes should have to lie to break the system (paper); presumably reports are signed to prevent spoofing. However, I don't see any protection against sybil attacks. The paper simply states that they're handled by a different part of the application.
(no subject)
from:
ravi
date: Feb. 4th, 2008 02:34 am (UTC)
Link
Reply | Thread
(no subject)
from:
nikolasco
date: Feb. 4th, 2008 02:39 am (UTC)
Link
Reply | Parent | Thread
(no subject)
from:
valiskeogh
date: Feb. 4th, 2008 05:38 am (UTC)
Link
i'm not quite sure of the point tho... it's not for "online storage"... unless you just want a gig... anywhere access to files i suppose maybe...
Reply | Thread
I think it's at least interesting ...
from:
nikolasco
date: Feb. 4th, 2008 11:25 am (UTC)
Link
(e.g. urn:x-wuala:sha256(username:path)) (or have they? I don't know)(Edit: They do. http://wua.la/username/path/filename redirects to http://127.0.0.1:57183/openfile?file=/us
Note that in addition to storing private data, you can share data with friends and groups, as well as share and search public data; considering the pile of temporary file storage sites, there certainly seems to be some interest in this. There is no file size limit, no transfer limit, and no limited life span. It's obviously not a backup service, since you need to put in at least as much storage as you use.
Also, despite the lack of anonymity, encryption and chunking makes it much harder to detect what a file actually is* and there's no way to enumerate private/controlled-shared files and metadata.
* In, say, bittorrent, the filename is sent in plaintext. With Wuala, someone would need generate hashes for at least some chunks for each key that it's encrypted with! That takes 2^(number of bits per chunk) of storage.
Additionally, all downloads are multi-sourced, potentially making them faster (consider that you can download a chunk from multiple sources and even if each chunk download only moves at 10KB/s, if you download 20 chunks at a time, that's 200KB/s throughput); erasure encoding increases the amount of data stored overall, but not the amount that needs to be downloaded.
The uptime constraint seems to be primarily be an issue for laptop-only users; people with desktops can just stay on almost all the time. Your quota is per-user, not per-machine, so you can use the space traded on your desktop to store files from your laptop; then you're looking at a free 'n easy, globally-accessible network drive.
They've also stated that once write support goes live, the network mount/"drive" it generates will be browsable and writable, which makes it really easy to write scripts. It would be interesting to see something that used something like S3 for your traded-storage; you'd go from a direct storage trade-off to a monetary trade off; 100GB of secure, sharable online storage for $15/mo.
Edited at 2008-02-06 11:03 pm (UTC)
Reply | Parent | Thread