Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locale path in AWS S3 #246

Open
ygg-sajith opened this issue Sep 6, 2020 · 15 comments
Open

Locale path in AWS S3 #246

ygg-sajith opened this issue Sep 6, 2020 · 15 comments

Comments

@ygg-sajith
Copy link

  • Which version of Django are you using?: django 1.5
  • Which version of django-rosetta are you using?: 0.7.2
  • Have you looked trough recent issues and checked this isn't a duplicate?Not duplicate

Currently rosetta scans the locale paths specify it in settings.py and loads into admin UI. Once the translation added and click to save button will store it in the same path of the corresponding locale. Is there any way we can change this to store it remote server or AWS S3 location.

In docker container based environment, during auto-scaling the translator instance can be redeployed so the saved changes in .po files cane be lost. To overcome this, prefer to store the files in AWS S3 location.

Is this approach is possible to achieve?

@iflare3g
Copy link

same requirement for my team 🙏

@mbi
Copy link
Owner

mbi commented Apr 10, 2021

I recognize this could be an interesting feature but don't have the bandwidth to implement it, at the moment. If anyone feels like taking on this development, please get in touch here to discuss first, thank you!

@captainrobbo
Copy link

captainrobbo commented Apr 10, 2021

I have exactly the same issue, migrating a project to AWS. Maybe it would be good to have a 'save hooks' in Rosetta that could be overwritten at application level, so a system could say "post a copy to this URL" or "run a management command after saving". With some such hook. one could either SCP changed files off the instance, or have an external server fetch them. I'll try to read the code and see where this might be done.

@iflare3g
Copy link

I have exactly the same issue, migrating a project to AWS. Maybe it would be good to have a 'save hooks' in Rosetta that could be overwritten at application level, so a system could say "post a copy to this URL" or "run a management command after saving". With some such hook. one could either SCP changed files off the instance, or have an external server fetch them. I'll try to read the code and see where this might be done.

Yup, we have the same issue as our project is on AWS.
Our idea was to have a way for overriding the save event and using boto3 to store for example on S3

@mbi
Copy link
Owner

mbi commented Apr 11, 2021

Rosetta sends a post_save signal, here (definition here), right after a block of data was saved to disk.

You could potentially write a signal handler that uploads to S3 right after saving, but that really only covers half of what this issue needs. More on this in the next comment.

@mbi
Copy link
Owner

mbi commented Apr 11, 2021

Right, here are the challenges I see, to implement this feature:

  1. Under the hood, Rosetta uses the polib library to read and write PO and MO files. The "default" way to do that is to pass a file path to pofile and mofile. Emphasis on path, i.e. polib doesn't deal with file-like objects directly, but rather with (local) paths. That said, polib's API also specifies that we could pass the actual string content of the file instead of the path, we could potentially use that. Thoughts, @izimobil?
  2. All over the code, Rosetta currently also relies on filesystem paths to find, read and write PO files. All these need to abstracted away to use a generic way to access the data.

So, assuming the first point can be easily handled by passing content to polib instead of file paths, the plan for the second point would probably be to:

  1. Design a storage interface (an API) that enumerates, reads and writes PO files: the default implementation would use the local filesystem, exactly as it is done today.
  2. Update the codebase locations that currently handle in filesystem paths directly, to uses the new API instead. I.e. the views should use generic ways to find, read and write PO objects, regardless of the underlying storage implementation.
  3. Once everything works as it does now with the new API, we can write new storage implementations that use S3, or FTP, blockchain, whatever. It should be as simple as specifying an alternative storage class in Rosetta's settings. Note that this could be trickier than expected, when we have to deal with e.g. perceived IO performance, concurrent writes, underlying storage limitations, ...

Thoughts?

@mbi
Copy link
Owner

mbi commented Apr 11, 2021

PS: also worth mentioning: maybe it'd be much easier to mount an S3 bucket as a local filesystem with e.g. s3-fuse, then have Rosetta think it's dealing with local paths as it does right now, even though the PO files are on S3? 🤷‍♂️

@ipmb
Copy link

ipmb commented Apr 12, 2021

Design a storage interface (an API) that enumerates, reads and writes PO files: the default implementation would use the local filesystem, exactly as it is done today.

Django has this system built-in already with backends available for S3 and many others. https://docs.djangoproject.com/en/3.2/ref/files/storage/

Would it be feasible to have a setting like ROSETTA_STORAGE_BACKEND which defaults to the local filesystem and then switch from using file paths as the argument to pofile to strings of the file contents opened by the Django storage backend?

One potential issue is how "noisy" the filesystem access is. If there is a lot of read/writing going on in each request, the performance may not be acceptable. If it's just a couple files, it shouldn't be a major concern.

@mbi
Copy link
Owner

mbi commented Apr 12, 2021

Django has this system built-in already with backends available for S3 and many others. https://docs.djangoproject.com/en/3.2/ref/files/storage/

Not quite, by default Django only handles in local files, IIRC, but django-storages would be the perfect solution here, probably.

But this only covers part of the problem (step 3 above), i.e. I don't think we can just rely on Django storage (the feature, not the app) to directly deal with PO and MO files, because a) it's primarily meant to handle static and media files and b) we'd end up with lots of if-then-else blocks all over the view functions, depending on the capabilities of each storage back-end. This is precisely why I think we need a "RosettaFilesStorage" abstraction layer (step 1 above) that deals with enumerating, reading and writing of PO and MO files.

@ipmb
Copy link

ipmb commented Apr 12, 2021

Not quite, by default Django only handles in local files, IIRC, but django-storages would be the perfect solution here, probably.

Correct, when I said backends available, I was referring to third-party backends like django-storages.

What sort of functionality does Rosetta require beyond the standard read/write/list files where an additional abstraction would be necessary? In my experience, you can count on django-storages providing all the necessary primitives for basic file manipulation.

@mbi
Copy link
Owner

mbi commented Apr 12, 2021

What sort of functionality does Rosetta require beyond the standard read/write/list files where an additional abstraction would be necessary? In my experience, you can count on django-storages providing all the necessary primitives for basic file manipulation.

Not much, really. The *real" problem though, is that Rosetta currently uses "low-level" direct file access (i.e. open(path), file.read(), ...) all over the place (which will have to be converted to whatever storage does) and that the PO file discovery is heavily based on the assumption that Rosetta operates within the project it is installed in, i.e. it looks for PO files inside its project, and not in some remote storage totally decoupled from the project it "lives" in.

That, and obviously that we need to be able to pass content and not paths to polib.

@ipmb
Copy link

ipmb commented Apr 12, 2021

That makes sense. I've done some conversions from the Python file API to the Django file API in the past and the open, read, etc. are pretty easy/trivial to handle.

Discovery, however, looks problematic. Even if you could come up with a reasonable remote directory structure scanning remote storage like that tends to get really expensive (in terms of time elapsed). Does this happen one-time at startup or also during runtime? If it's a one-time thing, I wonder if you could handle it the same way as collectstatic where the local file system is scanned and then uploaded to the remote storage? The fact that these files can change during runtime certainly complicates things as well.

@izimobil
Copy link
Contributor

izimobil commented Apr 12, 2021

That, and obviously that we need to be able to pass content and not paths to polib.

Hi @mbi , I can confirm that you can pass the content of the pofile as a string to polib !

@jyoost
Copy link

jyoost commented Dec 27, 2021

The problem no one has discussed is getting the PO / MO files back in the repo for version control.

Ideally, where ever the files reside should be within a repo that could push the changes back to a master repo.

@bingimar
Copy link

bingimar commented Nov 5, 2023

I'm having this problem as well, has there been any development?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants