• Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » how to get a Domain Name from Multiple SubDomains?

#1 2012-07-19 09:29:12

ilyasiqbal
Member
Registered: 2012-06-04
Posts: 88

how to get a Domain Name from Multiple SubDomains?

Hi,

How can I get the domain name from the string having multiple sub domains?
I'm using tExtractRegexFields component for string filtering.

Below is my Regular Expression to filter the URL string:

"^(https?|ftp|file)://([-a-zA-Z]*).([-a-zA-Z0-9+&@#%?=~_|!:,.;]*)/(^$|[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*)"

It works fine but cannot handle if the URL having few Sub Domains, e.g; stk.se.search.yahoo.com/page?q=abc
I also have to be sure, in which column will I find "Yahoo.com" using tExtractRegexFields, having above example in mind.


Any solution?
Thanks!

Last edited by ilyasiqbal (2012-07-19 09:30:40)

Offline

#2 2012-07-19 10:10:21

pedro
Member
Registered: 2011-11-17
Posts: 3682

Re: how to get a Domain Name from Multiple SubDomains?

Hi

I create a job.

Code:

"(\\w*://)?((\\w*\\-)*\\w*\\.(com)).*$"

Regards,
Pedro


Uploaded Images


Only Paranoid Survive.

Offline

#3 2012-07-19 11:03:29

ilyasiqbal
Member
Registered: 2012-06-04
Posts: 88

Re: how to get a Domain Name from Multiple SubDomains?

Hi,

Thanks for that. But I think it cannot help solving the problem and the problem is, not to handle only one specific domain name with .com.  I have to make it generic to handle all kind of TLDs e.g;
.com, .se , .us .. etc..

and SubDomains  e.g;
se.search.yahoo.com
search.google.se
se.ask.com
etc....


AND Then... I also have to be sure, in which column of tExtractRegexFields will I find "Yahoo.com" or "Google.se" or "Ask.com" , having above example in mind.
That means, when you split a string, it goes into different columns of tExtractRegexFields, and to get the last resultant string, it should refer to only one specific column, which I will use to Insert that into a specific table, later on.

Offline

#4 2012-07-19 11:13:23

pedro
Member
Registered: 2011-11-17
Posts: 3682

Re: how to get a Domain Name from Multiple SubDomains?

Hi

You might use it like this.
"(\\w*hmm/)?((\\w*\\-)*\\w*\\.(com|se|com.us|org)).*$"

Add domains as much as you can.
As far as I know, this is the only way to check root domain....

Regards,
Pedro


Only Paranoid Survive.

Offline

#5 2012-07-23 14:00:37

ilyasiqbal
Member
Registered: 2012-06-04
Posts: 88

Re: how to get a Domain Name from Multiple SubDomains?

Hi

Thanks for your guidence.

Is it also possible to fetch QueryString by extending this ReEx?

"(\\w*hmm/)?((\\w*\\-)*\\w*\\.(com|se|com.us|org)).*$"

e.g; to handle: se.search.yahoo.com/lp.php?a=22&b=44

so that I can get it like:
yahoo.com|a=22&b=44

Offline

#6 2012-07-24 04:32:06

pedro
Member
Registered: 2011-11-17
Posts: 3682

Re: how to get a Domain Name from Multiple SubDomains?

Hi

I don't think it's a good way to extend the regex above.
Why don't you create a new regex for query string?

Regards,
Pedro


Only Paranoid Survive.

Offline

#7 2012-07-24 10:55:04

ilyasiqbal
Member
Registered: 2012-06-04
Posts: 88

Re: how to get a Domain Name from Multiple SubDomains?

Hi

Actually from above regex, i get a domain name. Based on that I make an SQL Query to fetch a parameter. That param will use to get a certain value from QueryString.
And doing it with another regex, i'll have to write the same query again, which may effect the speed.

Is there a way that I can set a value in Global variable, and can access it anywhere in the job?

Offline

#8 2012-07-24 11:00:54

pedro
Member
Registered: 2011-11-17
Posts: 3682

Re: how to get a Domain Name from Multiple SubDomains?

Hi

A global variable? How about context variable?

Regards,
Pedro


Only Paranoid Survive.

Offline

#9 2012-07-24 11:54:14

ilyasiqbal
Member
Registered: 2012-06-04
Posts: 88

Re: how to get a Domain Name from Multiple SubDomains?

how can I make a context variable? please guide me.

Offline

#10 2012-07-24 12:34:55

ilyasiqbal
Member
Registered: 2012-06-04
Posts: 88

Re: how to get a Domain Name from Multiple SubDomains?

most of the available material on web regarding Context Variables is about tFixedFileInput etc... where I'm using tMap and tMySQLInput...

Offline

  • Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » how to get a Domain Name from Multiple SubDomains?

Board footer

Powered by FluxBB