2.2.1. UploadFile

Attention

CS1620/CS2660 students have an additional performance requirement on upload_file. This requirement is documented at the end of the page.

Implement this function:

User.upload_file(filename: str, data: bytes)

Stores data persistently such that future calls to download_file with the same filename can retrieve data. However, raises a util.DropboxError in the following circumstances:

  • File upload cannot complete due to malicious action.

If the calling user has already uploaded a file named filename:

  • Overwrites the existing file data with data.

  • Sharing permissions (if filename was previously shared) stay intact.

File filenames are unique within user accounts, but can overlap across users. That is, different users should be allowed to upload different files with the same filename without interfering with each other’s files.

Parameters:
  • filename (str) – The name of the file

  • data (bytes) – The file data

Returns:

nothing

Raises:

DropboxError – if an error case occurred

2.2.2. CS1620/CS2660 Extension: Efficient file updates

Attention

Teams in CS1620/CS2660 may implement either this feature (efficient updates) or delegated sharing. Only one extra requirement is required, even if both students are taking CS1620/CS2660. This portion is more open-ended than efficient updates and likely more challenging.

Grading for this part will primarily involve manual review of your code and your design document, rather than autograding.

To satisfy the additional CS1620/CS2660 requirement, upload_file that allows their clients to perform efficient updates to existing files via upload_file.

In the original specification for upload_file, users who wish to overwrite a file simply call upload_file again and the data from the latest call to upload_file takes the place of any existing data. When performing overwrites, some implementations may simply upload data linear in the total number of bytes in the file. However, this can be wasteful—for instance, if you make a 1 byte change in a 100 GB file, then in a upload_file implementation that uses bandwidth linear in the size of the file, you’ll end up transmitting a significant amount of data for a very small change!

Thus, the upload_file implementation of teams with at least CS1620/CS2660 student must satisfy the following addition to the original specification:

User.upload_file(filename: str, data: bytes)

[In addition to the requirements in the specification above…]

If a user calls upload_file on a filename for which data already exists, the update to filename must happen such that the number of bytes transferred during the upload must scale at most linearly in the total number of bytes **that differ* between the old and new files*.

Notes:

  • The performance requirement for append_file still applies even with this efficiency requirement on upload_file.

  • Additionally, the performance requirement must hold in the presence of sharing—users shared on a file and who are making updates to this file should still be able to perform efficient updates (and even if other users make updates to the file as well).

  • The scaling requirement only applies to the file contents: it is okay if the amount of file metadata transferred during the upload scales linearly in the size of the file (ie, a complete new copy of the metadata is uploaded each time), on the grounds that the metadata is much smaller than the actual file.

Note

As a reminder, your client implementation must be stateless, so you cannot use client-side state to satisfy the efficiency requirement.

Hint

To implement this feature, you should consider how your efficient update strategy will still allow you to enforce the confidentiality and integrity properties of the client. Specifically, recall that our integrity property means that a user should never accept file contents in download_file unless they were made by a user who had legitimate permissions to access the file. Thus, it is acceptable to ignore integrity violations that you discover (or, conversely, don’t discover) during upload_file as long as it means that you will detect the integrity violation in a later call to download_file.