X-Git-Url: https://sipb.mit.edu/gitweb.cgi/ikiwiki.git/blobdiff_plain/44bb872a9705781e20c1b7af89297240523d6bb9..b19d0d3d245c7c2b605ac6c972c73c909a941db4:/doc/plugins/po.mdwn diff --git a/doc/plugins/po.mdwn b/doc/plugins/po.mdwn index e88cc3106..eeeabe730 100644 --- a/doc/plugins/po.mdwn +++ b/doc/plugins/po.mdwn @@ -129,6 +129,10 @@ Usage Templates --------- +When `po_link_to` is not set to `negotiated`, one should replace some +occurrences of `BASEURL` with `HOMEPAGEURL` to get correct links to +the wiki homepage. + The `ISTRANSLATION` and `ISTRANSLATABLE` variables can be used to display things only on translatable or translation pages. @@ -281,14 +285,21 @@ an initial goal, and analysing in detail the possible issues. ##### Locale::Po4a modules -- the modules we want to use have to be checked, as not all are safe - (e.g. the LaTeX module's behaviour is changed by commands included - in the content); they may use regexps generated from the content; we - currently only use the `Text` module -- the `Text` module does not run any external program -- check that no module is loaded by `Chooser.pm`, when we tell it to - load the `Text` one -- `nsgmls` is used by `Sgml.pm` +The modules we want to use have to be checked, as not all are safe +(e.g. the LaTeX module's behaviour is changed by commands included in +the content); they may use regexps generated from the content. + +`Chooser.pm` only loads the plugin we tell it too: currently, this +means the `Text` module only. + +`Text` module (I checked the CVS version): + +- it does not run any external program +- only `do_paragraph()` builds regexp's that expand untrusted + variables; they seem safe to me, but someone more expert than me + will need to check. Joey? + + > Freaky code, but seems ok due to use of `quotementa`. ##### Text::WrapI18N @@ -302,6 +313,41 @@ table manipulation tricks could work; overriding `Locale::Po4a::Common::wrapi18n` may be easier. I'm no expert at all in this field. Joey? [[--intrigeri]] +> Update: Nicolas François suggests we add an option to po4a to +> disable it. It would do the trick, but only for people running +> a brand new po4a (probably too late for Lenny). Anyway, this option +> would have to take effect in a `BEGIN` / `eval` that I'm not +> familiar with. I can learn and do it, in case no Perl wizard +> volunteers to provide the po4a patch. [[--intrigeri]] + +>> That doesn't really need to be in a BEGIN. This patch moves it to +>> `import`, and makes this disable wrap18n: +>> `use Locale::Po4a::Common q{nowrapi18n}` --[[Joey]] + +
+--- /usr/share/perl5/Locale/Po4a/Common.pm	2008-07-21 14:54:52.000000000 -0400
++++ Common.pm	2008-11-11 18:27:34.000000000 -0500
+@@ -30,8 +30,16 @@
+ use strict;
+ use warnings;
+ 
+-BEGIN {
+-    if (eval { require Text::WrapI18N }) {
++sub import {
++    my $class=shift;
++    my $wrapi18n=1;
++    if ($_[0] eq 'nowrapi18n') {
++    	shift;
++	$wrapi18n=0;
++    }
++    $class->export_to_level(1, $class, @_);
++
++    if ($wrapi18n && eval { require Text::WrapI18N }) {
+     
+         # Don't bother determining the wrap column if we cannot wrap.
+         my $col=$ENV{COLUMNS};
+
+ ##### Term::ReadKey `Term::ReadKey` is not a hard dependency in our case, *i.e.* po4a @@ -324,6 +370,10 @@ use in our case, I suggest we define `ENV{COLUMNS}` before loading `Locale::Po4a::Common`, just to be on the safe side. Joey? [[--intrigeri]] +> Update: adding an option to disable `Text::WrapI18N`, as Nicolas +> François suggested, would as a bonus disable `Term::ReadKey` +> as well. [[--intrigeri]] + ### msgmerge `refreshpofiles()` runs this external program. A po4a developer @@ -342,6 +392,92 @@ a program in order to easily detect some of the most obvious DoS. > po4a was not fuzzy-tested, but according to one of its developers, > "it would be really appreciated". [[--intrigeri]] +Test conditions: + +- a 21M file containing 100 concatenated copies of all the files in my + `/usr/share/common-licenses/`; I had no existing PO file or + translated versions at hand, which renders these tests + quite incomplete. +- po4a was the Debian 0.34-2 package; the same tests were also run + after replacing the `Text` module with the CVS one (the core was not + changed in CVS since 0.34-2 was released), without any significant + difference in the results. +- Perl 5.10.0-16 + +#### po4a-gettextize + +`po4a-gettextize` uses more or less the same po4a features as our +`refreshpot` function. + +Without specifying an input charset, zzuf'ed `po4a-gettextize` quickly +errors out, complaining it was not able to detect the input charset; +it leaves no incomplete file on disk. + +So I had to pretend the input was in UTF-8, as does the po plugin. + +Two ways of crashing were revealed by this command-line: + + zzuf -vc -s 0:100 -r 0.1:0.5 \ + po4a-gettextize -f text -o markdown -M utf-8 -L utf-8 \ + -m LICENSES >/dev/null + +They are: + + Malformed UTF-8 character (UTF-16 surrogate 0xdcc9) in substitution iterator at /usr/share/perl5/Locale/Po4a/Po.pm line 1443. + Malformed UTF-8 character (fatal) at /usr/share/perl5/Locale/Po4a/Po.pm line 1443. + +and + + Malformed UTF-8 character (UTF-16 surrogate 0xdcec) in substitution (s///) at /usr/share/perl5/Locale/Po4a/Po.pm line 1443. + Malformed UTF-8 character (fatal) at /usr/share/perl5/Locale/Po4a/Po.pm line 1443. + +Perl seems to exit cleanly, and an incomplete PO file is written on +disk. I not sure whether if this is a bug in Perl or in `Po.pm`. + +> It's fairly standard perl behavior when fed malformed utf-8. As long as it doesn't +> crash ikiwiki, it's probably acceptable. Ikiwiki can do some similar things itself when fed malformed utf-8 (doesn't crash tho) --[[Joey]] + +#### po4a-translate + +`po4a-translate` uses more or less the same po4a features as our +`filter` function. + +Without specifying an input charset, same behaviour as +`po4a-gettextize`, so let's specify UTF-8 as input charset as of now. + + zzuf -cv \ + po4a-translate -d -f text -o markdown -M utf-8 -L utf-8 \ + -k 0 -m LICENSES -p LICENSES.fr.po -l test.fr + +... prints tons of occurences of the following error, but a complete +translated document is written (obviously with some weird chars +inside): + + Use of uninitialized value in string ne at /usr/share/perl5/Locale/Po4a/TransTractor.pm line 854. + Use of uninitialized value in string ne at /usr/share/perl5/Locale/Po4a/TransTractor.pm line 840. + Use of uninitialized value in pattern match (m//) at /usr/share/perl5/Locale/Po4a/Po.pm line 1002. + +While: + + zzuf -cv -s 0:10 -r 0.001:0.3 \ + po4a-translate -d -f text -o markdown -M utf-8 -L utf-8 \ + -k 0 -m LICENSES -p LICENSES.fr.po -l test.fr + +... seems to lose the fight, at the `readpo(LICENSES.fr.po)` step, +against some kind of infinite loop, deadlock, or any similar beast. +It does not seem to eat memory, though. + +Whatever format module is used does not change anything. This is thus +probably a bug in po4a's core or in a lib it depends on. + +The sub `read`, in `TransTractor.pm`, seems to be a good debugging +starting point. + +#### msgmerge + +`msgmerge` is run in our `refreshpofiles` function. I did not manage +to crash it with `zzuf`. + gettext/po4a rough corners -------------------------- @@ -360,22 +496,49 @@ gettext/po4a rough corners into the Pot file, and let it propagate; should be fixed in `773de05a7a1ee68d2bed173367cf5e716884945a`, time will tell. -Misc. improvements ------------------- +Better links +------------ + +### Subpages + +On a translation page, links to subpages should actually be links to +the master page's subpages. They currently appear as broken links. + +### Page title in links -### page titles +To use the page titles set with the [[meta|plugins/meta]] plugin when +rendering links would be very much nicer, than the current +"filename.LL" format. This is actually a duplicate for +[[bugs/pagetitle_function_does_not_respect_meta_titles]]. -Use nice page titles from meta plugin in links, as inline already -does. This is actually a duplicate for -[[bugs/pagetitle_function_does_not_respect_meta_titles]], which might -be fixed by something like [[todo/using_meta_titles_for_parentlinks]]. +Going to work on this in my `meta` branch. -### source files format +### Translation status in links -Markdown is supported, great, but what about others? The set of file -formats supported both in ikiwiki and po4a probably is greater than -`{markdown}`. Warning: the po4a modules are the place where one can -expect security issues. +See [[contrib/po]]. + +### Backlinks + +They are not updated when the source page changes (e.g. meta title). + +Page formats +------------ + +Markdown is well supported, great, but what about others? + +The [[po|plugins/po]] uses `Locale::Po4a::Text` for every page format; +this can be expected to work out of the box with most other wiki-like +formats supported by ikiwiki. Some of their ad-hoc syntax might be +parsed in a strange way, but the worst problems I can imagine would be +wrapping issues; e.g. there is code in po4a dedicated to prevent +re-wrapping the underlined Markdown headers. + +While it would be easy to better support formats such as [[html]] or +LaTeX, by using for each one the dedicated po4a module, this can be +problematic from a security point of view. + +**TODO**: test the more popular formats and write proper documentation +about it. Translation quality assurance ----------------------------- @@ -388,3 +551,15 @@ A new `cansave` type of hook would be needed to implement this. Note: committing to the underlying repository is a way to bypass this check. + +Creating new pages on the web +----------------------------- + +See [[contrib/po]]. + +Documentation +------------- + +Maybe write separate documentation depending on the people it targets: +translators, wiki administrators, hackers. This plugin may be complex +enough to deserve this.