• Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » email validation in a tmap components with perl regular expression

#1 2007-04-17 12:59:31

cyrilPierreDeGeyer
Member
Company: ANASKA
Registered: 2006-09-21
Posts: 16
Website

email validation in a tmap components with perl regular expression

Hi,

Using TOS to pull leads information from a database I created a routine with perl regular expression to validate email adress.
For this I used the following code :

------------------
use Exporter;

use vars qw(@EXPORT @ISA);

@ISA = qw(Exporter);
@EXPORT = qw(
    isemail
);

sub isemail {
    if (@_[0] =~ m/^[a-zA-Z_\-.0-9]+@[a-zA-Z_\-.0-9]+$/) {
        return 1;
    }else{
        return 0;
    }
}

1;
----------------

Once done it is easy to call this routine from your tMap component.

Of course you can improve the perl regular expression wink

Regards

Cyril

Offline

#2 2007-04-17 14:55:34

plegall
Member
Registered: 2006-09-19
Posts: 1586
Website

Re: email validation in a tmap components with perl regular expression

cyrilPierreDeGeyer wrote:

Of course you can improve the perl regular expression wink

That's my job :-)

Code:

sub isEmail {
    use Regexp::Common qw[Email::Address];
    
    if ($_[0] =~ m/$RE{Email}{Address}/) {
        return 1;
    }
    
    return 0;
}

You have to install Regexp::Common::Email::Address

Offline

#3 2007-04-18 12:15:19

cyrilPierreDeGeyer
Member
Company: ANASKA
Registered: 2006-09-21
Posts: 16
Website

Re: email validation in a tmap components with perl regular expression

Great it's working !

Where can we find what exactly this regexp  check ?

plegall wrote:

You have to install Regexp::Common::Email::Address

ppm is my friend wink

Offline

#4 2007-04-18 12:25:15

plegall
Member
Registered: 2006-09-19
Posts: 1586
Website

Re: email validation in a tmap components with perl regular expression

cyrilPierreDeGeyer wrote:

Where can we find what exactly this regexp  check ?

Code:

print $RE{Email}{Address}, "\n";

Code:

(?:(?-xism:(?:(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)|\.|\s*"(?-xism:(?-xism:[^\\"])|(?-xism:\\(?-xism:[^\x0A\x0D])))+"\s*)+)|(?:(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*"(?-xism:(?-xism:[^\\"])|(?-xism:\\(?-xism:[^\x0A\x0D])))+"(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*))+))?(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*<(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*"(?-xism:(?-xism:[^\\"])|(?-xism:\\(?-xism:[^\x0A\x0D])))+"(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*))\@(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*\[(?:\s*(?-xism:(?-xism:[^\[\]\\])|(?-xism:\\(?-xism:[^\x0A\x0D]))))*\s*\](?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)))>(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*))|(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*"(?-xism:(?-xism:[^\\"])|(?-xism:\\(?-xism:[^\x0A\x0D])))+"(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*))\@(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*\[(?:\s*(?-xism:(?-xism:[^\[\]\\])|(?-xism:\\(?-xism:[^\x0A\x0D]))))*\s*\](?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*))))(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)*))

Offline

#5 2007-12-13 17:06:01

cyrilPierreDeGeyer
Member
Company: ANASKA
Registered: 2006-09-21
Posts: 16
Website

Re: email validation in a tmap components with perl regular expression

Hum,
I am migrating my jobs to Java. How can i code the same email validation routine in Java ?
Any idea ?

Cyril

Offline

#6 2007-12-14 15:46:15

cyrilPierreDeGeyer
Member
Company: ANASKA
Registered: 2006-09-21
Posts: 16
Website

Re: email validation in a tmap components with perl regular expression

This could match :

Code:

//template routine Java
package routines;

import org.apache.oro.text.regex.MalformedPatternException;
import org.apache.oro.text.regex.Pattern;
import org.apache.oro.text.regex.Perl5Compiler;
import org.apache.oro.text.regex.Perl5Matcher;

public class Validation {
     
public static boolean isValideEmail(String email) {
        Perl5Matcher matcher = new Perl5Matcher();
        Perl5Compiler compiler = new Perl5Compiler();
        Pattern pattern;
 
        try {
            pattern = compiler.compile("^[\\w_.-]+@[\\w_.-]+\\.[\\w]+$");
            if (!matcher.matches(email, pattern)) {
                return false;
            }
        } catch (MalformedPatternException e) {
            throw new RuntimeException(e);
        }
        return true;
    }
}

But you need to add a lib ("jakarta-oro-2.0.8.jar") with a right clic on your routine (Edit routine librairies)

Cyril

Offline

#7 2007-12-14 16:10:30

mhirt
Talend team
Registered: 2006-09-19
Posts: 1639

Re: email validation in a tmap components with perl regular expression

Hello,

Your routine is working well, but for such a regexp, you can also simply use the java.util.regex.Pattern which is directly available in Standard Java.

Regards,

Offline

#8 2008-02-29 18:28:51

sylpri
Member
Company: NAVILLE SA
Registered: 2008-01-25
Posts: 37
Website

Re: email validation in a tmap components with perl regular expression

Hi,

here is my java code for the same utility : It works great with common error cases, except with an input string as "test@test.c", it returns true !

I'm not fluent with Regexp but this pattern seems to be a standard for email checking. It works properly on the RegExp editor in Talend, but not anymore once compiled in a job.


Code:

import java.util.regex.Pattern;

public class EmailChecking {
    /**
     * {talendTypes} String
     * 
     * {Category} EmailChecking
     * 
     * {param} string("destinataire@company.com") input: The email adress to be checked
     * 
     * {example} isEmailValid("bidule@ machin.com ") # false || isEmailValid("truc@machin.com ") # true
     */
    public static boolean isEmailValid(String email) {
                           
        boolean b = Pattern.matches("\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b", email); 
     
            return b;

       }
}

If someone has an idea ..

Offline

  • Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » email validation in a tmap components with perl regular expression

Board footer

Powered by FluxBB