doc-fr/reference/parle/pattern.matching.xml

<?xml version="1.0" encoding="utf-8"?>
<!-- EN-Revision: 0be325d396b139feeaee38168f779be962b74f09 Maintainer: Fan2Shrek Status: ready -->
<!-- Reviewed: yes -->
<chapter xml:id="parle.pattern.matching" xmlns="http://docbook.org/ns/docbook">
 <title>Correspondance de paterne Parle</title>
 <titleabbrev>Correspondance de paterne</titleabbrev>
 <para>
  Parle supporte la correspondance de paterne avec des expressions régulières similaires à flex.
  Les ensembles de caractères POSIX suivants sont également supportés :
  <simplelist type="inline">
   <member><literal>[:alnum:]</literal></member>
   <member><literal>[:alpha:]</literal></member>
   <member><literal>[:blank:]</literal></member>
   <member><literal>[:cntrl:]</literal></member>
   <member><literal>[:digit:]</literal></member>
   <member><literal>[:graph:]</literal></member>
   <member><literal>[:lower:]</literal></member>
   <member><literal>[:print:]</literal></member>
   <member><literal>[:punct:]</literal></member>
   <member><literal>[:space:]</literal></member>
   <member><literal>[:upper:]</literal></member>
   <member><literal>[:xdigit:]</literal></member>
  </simplelist>
  .
 </para>
 <para>
  Les classes de caractères Unicode ne sont actuellement pas activées par défaut, passez --enable-parle-utf32 pour les rendre disponibles.
  Un encodage particulier peut être mappé avec une regex correctement construite.
  Par exemple, pour correspondre au symbole EURO encodé en UTF-8, l'expression régulière <literal>[\xe2][\x82][\xac]</literal> peut être utilisée.
  Le paterne pour une chaîne encodée en UTF-8 pourrait être <literal>[ -\x7f]{+}[\x80-\xbf]{+}[\xc2-\xdf]{+}[\xe0-\xef]{+}[\xf0-\xff]+</literal>.
 </para>

 <section xml:id="parle.regex.chars" annotations="chunk:false">
  <title>Représentation des caractères</title>
  <para>
   <table>
    <title>Représentation des caractères</title>
    <tgroup cols="2">
     <thead>
      <row>
       <entry>Séquence</entry><entry>Description</entry>
      </row>
     </thead>
     <tbody>
      <row>
       <entry>\a</entry><entry>Alerte (cloche).</entry>
      </row>
      <row>
       <entry>\b</entry><entry>Retour arrière (Backspace).</entry>
      </row>
      <row>
       <entry>\e</entry><entry>Caractère ESC, \x1b.</entry>
      </row>
      <row>
       <entry>\n</entry><entry>Nouvelle ligne.</entry>
      </row>
      <row>
       <entry>\r</entry><entry>Retour chariot.</entry>
      </row>
      <row>
       <entry>\f</entry><entry>Saut de page, \x0c.</entry>
      </row>
      <row>
       <entry>\t</entry><entry>Tabulation horizontale, \x09.</entry>
      </row>
      <row>
       <entry>\v</entry><entry>Tabulation verticale, \x0b.</entry>
      </row>
      <row>
       <entry>\oct</entry><entry>Caractère spécifié par un code octal à trois chiffres.</entry>
      </row>
      <row>
       <entry>\xhex</entry><entry>Caractère spécifié par un code hexadécimal.</entry>
      </row>
      <row>
       <entry>\cchar</entry><entry>Caractère de contrôle nommé.</entry>
      </row>
     </tbody>
    </tgroup>
   </table>
  </para>
 </section>
 <section xml:id="parle.regex.charclass" annotations="chunk:false">
  <title>Classes de caractères</title>
  <para>
   <table>
    <title>Classes de caractères</title>
    <tgroup cols="2">
     <thead>
      <row>
       <entry>Sequence</entry><entry>Description</entry>
      </row>
     </thead>
     <tbody>
      <row>
       <entry>[...]</entry><entry>Un seul caractère listé ou contenu dans une plage listée. Les plages peuvent être combinées avec les opérateurs <literal>{+}</literal> et <literal>{-}</literal>. Par exemple <literal>[a-z]{+}[0-9]</literal> est la même chose que <literal>[0-9a-z]</literal> et <literal>[a-z]{-}[aeiou]</literal> est la même chose que <literal>[b-df-hj-np-tv-z]</literal>.</entry>
      </row>
      <row>
       <entry>[^...]</entry><entry>Un seul caractère non listé et non contenu dans une plage listée.</entry>
      </row>
      <row>
       <entry>.</entry><entry>N'importe quel caractère, par défaut <literal>[^\n].</literal></entry>
      </row>
      <row>
       <entry>\d</entry><entry>Caractère numérique, <literal>[0-9]</literal>.</entry>
      </row>
      <row>
       <entry>\D</entry><entry>Caractère non numérique, <literal>[^0-9]</literal>.</entry>
      </row>
      <row>
       <entry>\s</entry><entry>Caractère d'espace blanc, <literal>[ \t\n\r\f\v]</literal>.</entry>
      </row>
      <row>
       <entry>\S</entry><entry>Caractère non d'espace blanc, <literal>[^ \t\n\r\f\v]</literal>.</entry>
      </row>
      <row>
       <entry>\w</entry><entry>Caractère de mot, <literal>[a-zA-Z0-9_]</literal>.</entry>
      </row>
      <row>
       <entry>\W</entry><entry>Caractère de non mot, <literal>[^a-zA-Z0-9_]</literal>.</entry>
      </row>
     </tbody>
    </tgroup>
   </table>
  </para>
 </section>
 <section xml:id="parle.regex.unicodecharclass" annotations="chunk:false">
  <title>Classes de caractères Unicode</title>
  <para>
   <table>
    <title>Classes de caractères Unicode</title>
    <tgroup cols="2">
     <thead>
      <row>
       <entry>Sequence</entry><entry>Description</entry>
      </row>
     </thead>
     <tbody>
      <row>
       <entry>\p{C}</entry><entry>Autre.</entry>
      </row>
      <row>
       <entry>\p{Cc}</entry><entry>Autre, contrôle.</entry>
      </row>
      <row>
       <entry>\p{Cf}</entry><entry>Autre, format.</entry>
      </row>
      <row>
       <entry>\p{Co}</entry><entry>Autre, utilisation privée.</entry>
      </row>
      <row>
       <entry>\p{Cs}</entry><entry>Autre, substitut.</entry>
      </row>
      <row>
       <entry>\p{L}</entry><entry>Lettre.</entry>
      </row>
      <row>
       <entry>\p{LC}</entry><entry>Lettre, casée.</entry>
      </row>
      <row>
       <entry>\p{Ll}</entry><entry>Lettre, minuscule.</entry>
      </row>
      <row>
       <entry>\p{Lm}</entry><entry>Lettre, modifiée.</entry>
      </row>
      <row>
       <entry>\p{Lo}</entry><entry>Lettre, autre.</entry>
      </row>
      <row>
       <entry>\p{Lt}</entry><entry>Lettre, de titre.</entry>
      </row>
      <row>
       <entry>\p{Lu}</entry><entry>Lettre, majuscule.</entry>
      </row>
      <row>
       <entry>\p{M}</entry><entry>Marque.</entry>
      </row>
      <row>
       <entry>\p{Mc}</entry><entry>Marque, espace combiné.</entry>
      </row>
      <row>
       <entry>\p{Me}</entry><entry>Marque, encadrant.</entry>
      </row>
      <row>
       <entry>\p{Mn}</entry><entry>Marque, non espacé.</entry>
      </row>
      <row>
       <entry>\p{N}</entry><entry>Nombre.</entry>
      </row>
      <row>
       <entry>\p{Nd}</entry><entry>Nombre, chiffre décimal.</entry>
      </row>
      <row>
       <entry>\p{Nl}</entry><entry>Nombre, lettre.</entry>
      </row>
      <row>
       <entry>\p{No}</entry><entry>Nombre, autre.</entry>
      </row>
      <row>
       <entry>\p{P}</entry><entry>Ponctuation.</entry>
      </row>
      <row>
       <entry>\p{Pc}</entry><entry>Ponctuation, connecteur.</entry>
      </row>
      <row>
       <entry>\p{Pd}</entry><entry>Ponctuation, tiret.</entry>
      </row>
      <row>
       <entry>\p{Pe}</entry><entry>Ponctuation, fermeture.</entry>
      </row>
      <row>
       <entry>\p{Pf}</entry><entry>Ponctuation, guillemet final.</entry>
      </row>
      <row>
       <entry>\p{Pi}</entry><entry>Ponctuation, guillemet initial.</entry>
      </row>
      <row>
       <entry>\p{Po}</entry><entry>Ponctuation, autre.</entry>
      </row>
      <row>
       <entry>\p{Ps}</entry><entry>Ponctuation, ouverture.</entry>
      </row>
      <row>
       <entry>\p{S}</entry><entry>Symbole.</entry>
      </row>
      <row>
       <entry>\p{Sc}</entry><entry>Symbole, devise.</entry>
      </row>
      <row>
       <entry>\p{Sk}</entry><entry>Symbole, modifié.</entry>
      </row>
      <row>
       <entry>\p{Sm}</entry><entry>Symbole, math.</entry>
      </row>
      <row>
       <entry>\p{So}</entry><entry>Symbole, autre.</entry>
      </row>
      <row>
       <entry>\p{Z}</entry><entry>Séparateur.</entry>
      </row>
      <row>
       <entry>\p{Zl}</entry><entry>Séparateur, ligne.</entry>
      </row>
      <row>
       <entry>\p{Zp}</entry><entry>Séparateur, paragraphe.</entry>
      </row>
      <row>
       <entry>\p{Zs}</entry><entry>Séparateur, espace.</entry>
      </row>
     </tbody>
    </tgroup>
   </table>
  </para>
  <para>
   Ces classe de caractères ne sont disponibles que si l'option --enable-parle-utf32 a été passée lors de la compilation.
  </para>
 </section>
 <section xml:id="parle.regex.alternation" annotations="chunk:false">
  <title>Alternance et répétition</title>
  <para>
   <table>
    <title>Alternance et répétition</title>
    <tgroup cols="3">
     <thead>
      <row>
       <entry>Sequence</entry><entry>Greedy</entry><entry>Description</entry>
      </row>
     </thead>
     <tbody>
      <row>
       <entry>...|...</entry><entry>-</entry><entry>Essayer les sous-paterne en alternance.</entry>
      </row>
      <row>
       <entry>*</entry><entry>yes</entry><entry>Correspond 0 ou plusieurs fois.</entry>
      </row>
      <row>
       <entry>+</entry><entry>yes</entry><entry>Correspond 1 ou plusieurs fois.</entry>
      </row>
      <row>
       <entry>?</entry><entry>yes</entry><entry>Correspond 0 ou 1 fois.</entry>
      </row>
      <row>
       <entry>{n}</entry><entry>no</entry><entry>Correspond exactement n fois.</entry>
      </row>
      <row>
       <entry>{n,}</entry><entry>no</entry><entry>Correspond au moins n fois.</entry>
      </row>
      <row>
       <entry>{n,m}</entry><entry>yes</entry><entry>Correspond au moins n fois mais pas plus de m fois.</entry>
      </row>
      <row>
       <entry>*?</entry><entry>no</entry><entry>Correspond 0 ou plusieurs fois.</entry>
      </row>
      <row>
       <entry>+?</entry><entry>no</entry><entry>Correspond 1 ou plusieurs fois.</entry>
      </row>
      <row>
       <entry>??</entry><entry>no</entry><entry>Correspond 0 ou 1 fois.</entry>
      </row>
      <row>
       <entry>{n,}?</entry><entry>no</entry><entry>Correspond au moins n fois.</entry>
      </row>
      <row>
       <entry>{n,m}?</entry><entry>no</entry><entry>Correspond au moins n fois mais pas plus de m fois.</entry>
      </row>
      <row>
       <entry>{MACRO}</entry><entry>-</entry><entry>Inclut la regex MACRO dans la regex courante.</entry>
      </row>
     </tbody>
    </tgroup>
   </table>
  </para>
 </section>
 <section xml:id="parle.regex.anchors" annotations="chunk:false">
  <title>Ancre</title>
  <para>
   <table>
    <title>Ancre</title>
    <tgroup cols="2">
     <thead>
      <row>
       <entry>Sequence</entry><entry>Description</entry>
      </row>
     </thead>
     <tbody>
      <row>
       <entry>^</entry><entry>Commence par une chaîne ou après un retour à la ligne.</entry>
      </row>
      <row>
       <entry>$</entry><entry>Finit par une chaîne ou avant un retour à la ligne.</entry>
      </row>
     </tbody>
    </tgroup>
   </table>
  </para>
 </section>
 <section xml:id="parle.regex.grouping" annotations="chunk:false">
  <title>Regroupement</title>
  <para>
   <table>
    <title>Regroupement</title>
    <tgroup cols="2">
     <thead>
      <row>
       <entry>Sequence</entry>
       <entry>Description</entry>
      </row>
     </thead>
     <tbody>
      <row>
       <entry>(...)</entry>
       <entry>Regroupe une expression régulière pour modifier l'ordre d'évaluation.</entry>
      </row>
      <row>
       <entry valign="top">(?r-s:pattern)</entry>
       <entry>
        <simpara>
         Applique l'option r et omet l'option s lors de l'interprétation du paterne.
         Les options peuvent être zéro ou plusieurs des caractères i, s ou x.
        </simpara>
        <simpara>
         <literal>i</literal> signifie insensible à la casse.
        </simpara>
        <simpara>
         <literal>-i</literal> signifie sensible à la casse.
        </simpara>
        <simpara>
         <literal>s</literal> modifie le sens de <literal>.</literal> pour correspondre à n'importe quel caractère.
        </simpara>
        <simpara>
         <literal>-s</literal> modifie le sens de <literal>.</literal> pour correspondre à n'importe quel caractère sauf <literal>\n</literal>.
        </simpara>
        <simpara>
         <literal>x</literal> ignore les commentaires et les espaces dans les paterne.
         Les espaces sont ignorés sauf s'ils sont échappés par un backslash, contenus dans des <literal>""s</literal>,
         ou apparaissent à l'intérieur d'une plage de caractères.
        </simpara>
        <simpara>
         Ces options peuvent être appliquées globalement au niveau des règles en passant une combinaison des indicateurs de bits à l'analyseur lexical.
        </simpara>
       </entry>
      </row>
      <row>
       <entry>(?# comment )</entry>
       <entry>Omet tout ce qui est dans (). Le premier caractère ) rencontré termine le paterne. Il n'est pas possible pour le commentaire de contenir un caractère ). Le commentaire peut s'étendre sur plusieurs lignes.</entry>
      </row>
     </tbody>
    </tgroup>
   </table>
  </para>
 </section>
</chapter>

<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-omittag:t
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
indent-tabs-mode:nil
sgml-parent-document:nil
sgml-default-dtd-file:"~/.phpdoc/manual.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:nil
sgml-local-ecat-files:nil
End:
vim600: syn=xml fen fdm=syntax fdl=2 si
vim: et tw=78 syn=sgml
vi: ts=1 sw=1
-->