Locale path in AWS S3 #246

ygg-sajith · 2020-09-06T11:01:41Z

Which version of Django are you using?: django 1.5
Which version of django-rosetta are you using?: 0.7.2
Have you looked trough recent issues and checked this isn't a duplicate?Not duplicate

Currently rosetta scans the locale paths specify it in settings.py and loads into admin UI. Once the translation added and click to save button will store it in the same path of the corresponding locale. Is there any way we can change this to store it remote server or AWS S3 location.

In docker container based environment, during auto-scaling the translator instance can be redeployed so the saved changes in .po files cane be lost. To overcome this, prefer to store the files in AWS S3 location.

Is this approach is possible to achieve?

iflare3g · 2020-12-11T14:54:22Z

same requirement for my team 🙏

mbi · 2021-04-10T13:58:30Z

I recognize this could be an interesting feature but don't have the bandwidth to implement it, at the moment. If anyone feels like taking on this development, please get in touch here to discuss first, thank you!

captainrobbo · 2021-04-10T14:17:04Z

I have exactly the same issue, migrating a project to AWS. Maybe it would be good to have a 'save hooks' in Rosetta that could be overwritten at application level, so a system could say "post a copy to this URL" or "run a management command after saving". With some such hook. one could either SCP changed files off the instance, or have an external server fetch them. I'll try to read the code and see where this might be done.

iflare3g · 2021-04-11T10:45:00Z

I have exactly the same issue, migrating a project to AWS. Maybe it would be good to have a 'save hooks' in Rosetta that could be overwritten at application level, so a system could say "post a copy to this URL" or "run a management command after saving". With some such hook. one could either SCP changed files off the instance, or have an external server fetch them. I'll try to read the code and see where this might be done.

Yup, we have the same issue as our project is on AWS.
Our idea was to have a way for overriding the save event and using boto3 to store for example on S3

mbi · 2021-04-11T12:19:04Z

Rosetta sends a post_save signal, here (definition here), right after a block of data was saved to disk.

You could potentially write a signal handler that uploads to S3 right after saving, but that really only covers half of what this issue needs. More on this in the next comment.

mbi · 2021-04-11T12:54:46Z

Right, here are the challenges I see, to implement this feature:

Under the hood, Rosetta uses the polib library to read and write PO and MO files. The "default" way to do that is to pass a file path to pofile and mofile. Emphasis on path, i.e. polib doesn't deal with file-like objects directly, but rather with (local) paths. That said, polib's API also specifies that we could pass the actual string content of the file instead of the path, we could potentially use that. Thoughts, @izimobil?
All over the code, Rosetta currently also relies on filesystem paths to find, read and write PO files. All these need to abstracted away to use a generic way to access the data.

So, assuming the first point can be easily handled by passing content to polib instead of file paths, the plan for the second point would probably be to:

Design a storage interface (an API) that enumerates, reads and writes PO files: the default implementation would use the local filesystem, exactly as it is done today.
Update the codebase locations that currently handle in filesystem paths directly, to uses the new API instead. I.e. the views should use generic ways to find, read and write PO objects, regardless of the underlying storage implementation.
Once everything works as it does now with the new API, we can write new storage implementations that use S3, or FTP, blockchain, whatever. It should be as simple as specifying an alternative storage class in Rosetta's settings. Note that this could be trickier than expected, when we have to deal with e.g. perceived IO performance, concurrent writes, underlying storage limitations, ...

Thoughts?

mbi · 2021-04-11T12:58:55Z

PS: also worth mentioning: maybe it'd be much easier to mount an S3 bucket as a local filesystem with e.g. s3-fuse, then have Rosetta think it's dealing with local paths as it does right now, even though the PO files are on S3? 🤷‍♂️

ipmb · 2021-04-12T15:50:54Z

Design a storage interface (an API) that enumerates, reads and writes PO files: the default implementation would use the local filesystem, exactly as it is done today.

Django has this system built-in already with backends available for S3 and many others. https://docs.djangoproject.com/en/3.2/ref/files/storage/

Would it be feasible to have a setting like ROSETTA_STORAGE_BACKEND which defaults to the local filesystem and then switch from using file paths as the argument to pofile to strings of the file contents opened by the Django storage backend?

One potential issue is how "noisy" the filesystem access is. If there is a lot of read/writing going on in each request, the performance may not be acceptable. If it's just a couple files, it shouldn't be a major concern.

mbi · 2021-04-12T16:23:30Z

Django has this system built-in already with backends available for S3 and many others. https://docs.djangoproject.com/en/3.2/ref/files/storage/

Not quite, by default Django only handles in local files, IIRC, but django-storages would be the perfect solution here, probably.

But this only covers part of the problem (step 3 above), i.e. I don't think we can just rely on Django storage (the feature, not the app) to directly deal with PO and MO files, because a) it's primarily meant to handle static and media files and b) we'd end up with lots of if-then-else blocks all over the view functions, depending on the capabilities of each storage back-end. This is precisely why I think we need a "RosettaFilesStorage" abstraction layer (step 1 above) that deals with enumerating, reading and writing of PO and MO files.

ipmb · 2021-04-12T16:30:10Z

Not quite, by default Django only handles in local files, IIRC, but django-storages would be the perfect solution here, probably.

Correct, when I said backends available, I was referring to third-party backends like django-storages.

What sort of functionality does Rosetta require beyond the standard read/write/list files where an additional abstraction would be necessary? In my experience, you can count on django-storages providing all the necessary primitives for basic file manipulation.

mbi · 2021-04-12T16:41:29Z

What sort of functionality does Rosetta require beyond the standard read/write/list files where an additional abstraction would be necessary? In my experience, you can count on django-storages providing all the necessary primitives for basic file manipulation.

Not much, really. The *real" problem though, is that Rosetta currently uses "low-level" direct file access (i.e. open(path), file.read(), ...) all over the place (which will have to be converted to whatever storage does) and that the PO file discovery is heavily based on the assumption that Rosetta operates within the project it is installed in, i.e. it looks for PO files inside its project, and not in some remote storage totally decoupled from the project it "lives" in.

That, and obviously that we need to be able to pass content and not paths to polib.

ipmb · 2021-04-12T16:59:07Z

That makes sense. I've done some conversions from the Python file API to the Django file API in the past and the open, read, etc. are pretty easy/trivial to handle.

Discovery, however, looks problematic. Even if you could come up with a reasonable remote directory structure scanning remote storage like that tends to get really expensive (in terms of time elapsed). Does this happen one-time at startup or also during runtime? If it's a one-time thing, I wonder if you could handle it the same way as collectstatic where the local file system is scanned and then uploaded to the remote storage? The fact that these files can change during runtime certainly complicates things as well.

izimobil · 2021-04-12T17:18:32Z

That, and obviously that we need to be able to pass content and not paths to polib.

Hi @mbi , I can confirm that you can pass the content of the pofile as a string to polib !

jyoost · 2021-12-27T10:27:37Z

The problem no one has discussed is getting the PO / MO files back in the repo for version control.

Ideally, where ever the files reside should be within a repo that could push the changes back to a master repo.

bingimar · 2023-11-05T15:07:26Z

I'm having this problem as well, has there been any development?

mbi added the PR welcome 🙏 label Apr 10, 2021

mbi mentioned this issue Dec 27, 2021

request or news (I can look) save to database for docker #265

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Locale path in AWS S3 #246

Locale path in AWS S3 #246

ygg-sajith commented Sep 6, 2020

iflare3g commented Dec 11, 2020

mbi commented Apr 10, 2021

captainrobbo commented Apr 10, 2021 •

edited

Loading

iflare3g commented Apr 11, 2021

mbi commented Apr 11, 2021

mbi commented Apr 11, 2021

mbi commented Apr 11, 2021

ipmb commented Apr 12, 2021

mbi commented Apr 12, 2021

ipmb commented Apr 12, 2021

mbi commented Apr 12, 2021

ipmb commented Apr 12, 2021

izimobil commented Apr 12, 2021 •

edited

Loading

jyoost commented Dec 27, 2021

bingimar commented Nov 5, 2023

Locale path in AWS S3 #246

Locale path in AWS S3 #246

Comments

ygg-sajith commented Sep 6, 2020

iflare3g commented Dec 11, 2020

mbi commented Apr 10, 2021

captainrobbo commented Apr 10, 2021 • edited Loading

iflare3g commented Apr 11, 2021

mbi commented Apr 11, 2021

mbi commented Apr 11, 2021

mbi commented Apr 11, 2021

ipmb commented Apr 12, 2021

mbi commented Apr 12, 2021

ipmb commented Apr 12, 2021

mbi commented Apr 12, 2021

ipmb commented Apr 12, 2021

izimobil commented Apr 12, 2021 • edited Loading

jyoost commented Dec 27, 2021

bingimar commented Nov 5, 2023

captainrobbo commented Apr 10, 2021 •

edited

Loading

izimobil commented Apr 12, 2021 •

edited

Loading