-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark up text for automated string extraction #1477
Comments
Following our conversation this week, I wanted to share an outline of the process that you'll need to follow. It's important that this is an automated process that is part of your standard workflow, so that any changes to text are detected and translated quickly in the future. In my experience, this is usually achieved by marking up the text in some way. See, for example, this Django template from one of our projects, which wraps each sentence in I can see that there's already some files with lists of strings that look like a translation mechanism, which may be how this is already starting to be implemented? My instinct is that this is potentially quite a fragile and high-effort way to work, but it's up to you! Either way, once the system is in place, then each time we make an update, the process is:
Apart from a few manual steps to authorise the translation and review the software output to make sure that nothing went wrong, this is an entirely manual process. I don't think that IATI Publisher necessarily has to have a .pot/po file - based process, but if you were to build one then it would be very close to what we're going to need once we work out the file details with the translation company. Does that make sense? I'm very happy to provide any more detail if it's useful. |
Thanks @robredpath - I assume you meant "Apart from a few manual steps to authorise the translation..., this is an entirely automated process" ? Otherwise, no questions from me |
Hello @robredpath cc. @praweshsth @PG-Momik |
Thanks for the information here @Sanilblank . @robredpath is away until Aug 29th unfortunately, but I will see if anyone else in our team can help with the file format question in the meantime.
Yes, that's correct to my understanding. We remain in control of how often and when we run the re-translation, but your system should be capable of detecting what English text has and hasn't changed since the last translation. By the extraction and re-integration of text into IATI Publisher being done in an automated way, we mean via a script as opposed to any manual copy and pasting. |
@emmajclegg Thanks for the clarification. cc. @praweshsth @PG-Momik |
Hi @Sanilblank - I don't want to give wrong information on this so will check with @robredpath once he's back (Aug 29th) and update here. |
Hi @Sanilblank! Thanks for this - it's really useful to understand what you're thinking. The exact format of the files doesn't really matter too much for us - I suggested .pot/.po files as they're fairly standard in our other applications and are straightforward to work with, but an .xlsx file would also be fine. The main thing is that the process is automated and repeatable. By "automated" what I mean is that we expect the list of strings to be generated directly from the source code by software, without any manual steps - and for the translated strings to similarly be re-integrated automatically. This means that the process is easily repeatable, so that a small update can be made easily and large updates aren't too much of a problem. We don't expect the automation to require zero human contact, but we want to make sure that everything gets translated as part of the regular updating process for the software: every time a form or button changes, or we add some new explanatory text, it should be translated promptly. By way of example, for our documentation platform we run one command to generate the .pot files that we send to the translators, and then we check in the translated files to git and re-run the build process to generate the multi-lingual website. This gives us a very high level of repeatability and consistency, and it's easy for us to do which encourages us to do it often - even for very small changes. In our documentation work we send the whole documentation site each time, and the translation platform figures out what's changed, and gets that translated. We then re-import the whole translated file back in. Our experience is that it's easier that way, rather than trying to manage lists of things that have changed. Ultimately, it is up to you, but that's our experience and recommendation. Hope that helps - do let me know if you have any further questions |
Hi @robredpath cc. @praweshsth @PG-Momik |
Hi @robredpath
This message is just to update you regarding the findings we have had and to give you an update about how we are proceeding for this feature. cc. @praweshsth @PG-Momik |
@Sanilblank - can you update on how this automated translation work is going please? We'd committed to implementing French & Spanish translation by the end of this year so I'm worried about delivery timelines slipping. I'm keen to review the extracted English text (the step before getting it translated) as soon as possible, so I discussed briefly with @BibhaT & @PG-Momik yesterday the possibility of you sharing the text you've extracted so far, rather than waiting until all user-facing text has been incorporated. This would mean I could start reviewing earlier. Any blockers or questions, let me know. I'm also trying to close old IATI Publisher issues that are no longer relevant - let me know if it's ok to close #885 and #1279? These were the previous issues related to translation that I assume have been replaced by this current issue. |
@emmajclegg Hello Emma
Yes those issues can be closed. |
To summarise from this morning's call, We don't mind what order text is extracted from different modules of the system - @PG-Momik suggested choosing a "simple" module to start. To reconfirm, only user-facing interface text and messages will need translating, nothing that only super-admins can see. The public-facing pages and registration workflow text was extracted by @Sanilblank last month. I'm summarising our feedback below from the email conversation:
We expect to go through several test-runs of the text extraction, review, translation & reintegration process to resolve small problems that come up. This will be necessary before we release the French & Spanish interface to end users. @BibhaT @PG-Momik @Sanilblank - I'm aware this is a big task and I'm worried that we don't have a good handle on timelines (considering it was something we were aiming to complete by the end of the year). Aside from bugs and user support issues, this translation work takes priority over any new work in the "proposed user story / task list". Any questions, please let me know. cc' @robredpath |
Hello @emmajclegg , |
Thanks @Sanilblank - I appreciate you've been off recently, that's no problem. I don't need an exact estimation for this at this point - the main thing is I'd like us to be making visible progress on it, rather than letting it potentially drag on for months. Just let me know when you've picked the module to work on first and when roughly I should be expecting to receive a text file to review (as it helps me plan). I expect we'll want to test run the entire extraction, translation, reintegration process on a single module first which, as you say, will help us all understand time and effort required for the remaining ones. |
To summarise where I think we got to on the questions from today's call:
Any other questions, or anything to add, just let us know @PG-Momik @BibhaT cc' @robredpath |
@emmajclegg Adding to this. On the call I mentioned that I wasn't sure if other codelist were besides OrganizationRegistrationAgency being sync'd. I've confirmed that the codelist are being synced as well. 👍 |
@emmajclegg A sheet has been shared with the current extracted contents for the completed modules. Please have a look. cc: @BibhaT |
Ok thanks a lot @PG-Momik - I'll have a look over the next few days, prioritising the sheets labelled green in the extracted text spreadsheet. One question - I see that a few standalone IATI-specific strings like "IATI organisation identifier", "publisher ID" and "default language" are appearing in the sheets multiple times, though the key field is similar in each case. Can you clarify if there's already been an attempt at deduplication here? I'm wondering how we avoid translating these important strings multiple times. cc'ing @robredpath for info. Rob - let me know what you think we need to run a first test of the translation and reintegration loop. "adminHeader" is the simplest sheet in that extracted text workbook, if useful as an easy example. Otherwise I'll let you know when there's a few sheets ready with de-duplicated and reviewed English text. |
@PG-Momik - to update, I've looked over the remaining green (i.e. nearly finished) sheets in the extracted text spreadsheet and have left a few more comments. I haven't edited any of the English text yet as it sounds like it make sense for YI to re-extract an updated version of the spreadsheet before I do that (to save me re-doing the review before @robredpath sends the text for translation). Happy to discuss any questions tomorrow. |
@PG-Momik - thanks for sharing the latest spreadsheet of extracted text (Extracted Sheet - Jan 17). I assume this was for me to have a look. Again, I've left a few comments to check where certain text appears in the interface and flag a few areas where it could be simplified.
Happy to discuss more on Wednesday (I'm not around tomorrow, Tues 21st) |
This is a first step to translating IATI Publisher's interface into French and Spanish, following the approach discussed here: #1420
YI will prepare text for automated extraction from IATI Publisher, ODS will review and get it translated, then YI will reintegrate text back into IATI Publisher.
Tasks
Test extraction by modules
The text was updated successfully, but these errors were encountered: