The second milestone version of Talend Open Profiler is out. Try it now!
Among the new features, the support for Microsoft SQL Server has been added.

The second milestone version of Talend Open Profiler is out. Try it now!
Among the new features, the support for Microsoft SQL Server has been added.

Open Flash Chart is a flash library to display charts I plan to use in GWT.
Here is an example of the line chart in v1:
width="450" height="250" id="ie_chart" align="middle">
type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" id="chart"/>
Go to their site to see other examples (bar, pie, etc.)
This client-side chart library I’ve spoke before, 2.1 is out.
For remember:
The main idea behind GChart is simple: You can make very nice charts efficiently out of a reasonably small number of 1-cell Grids (for the aligned labels) and (empty) Images (for everything else), styled and positioned appropriately on an AbsolutePanel. Not surprisingly, bar charts don’t suffer at all under the limitations imposed by this strategy–but (as long as you don’t mind using dotted connecting lines or banded-filled pie slices) line and pie charts also do remarkably well.
As an example is better than a speech:

The first milestone release of the next version of Talend Open Profiler is out!!
About the new features:
You are welcome to suggest new features or report bugs in Talend’s bugtracker.

I found this video on Talend Open Profiler 1.0.0 on a French website dedicated to Business Intelligence.
The first video shows the installation of TOP on a Windows system and presents the layout of the application.
The second video is more interesting because it shows the functionalities of TOP. The demo shows how to create your own analyses and what you can tell about the quality of your data with a few clicks. It shows the use of the patterns indicators to check the validity of the email addresses, the phone numbers…
With this video, you can judge about the power of TOP in terms of speed. In this example, profiling around 7000 rows with all indicators selected and a few patterns defined takes less than 2 seconds.
If you want to test it by yourself, go to the Talend download page. You can even try the latest milestone release 1.1.0M1.

If you need to specify the JVM path to be used by Talend Open Profiler. Simply edit the TalendOpenProfiler-XXX.ini corresponding to your system and add the following 2 lines at the beginning of the file:
-vm
C:/usr/bin/java.exe
Be sure to write them on 2 lines, not on one line, otherwise it will not work.
The same configuration setting applies to Talend Open Studio.
Source: Eclipse FAQ.

This article will show an easy way to use regexp field validation on client-side using GwtExt.

The chosen policy is to implements com.gwtext.client.widgets.form.Validator.
The result is the following class ‘PatternValidator’:
private static class PatternValidator implements Validator {
protected String message;
protected String pattern;
public PatternValidator(String message, String pattern) {
super();
this.pattern = pattern;
this.message = message;
}
public boolean validate(String value) throws ValidationException {
boolean matches = value.matches(pattern);
if (!matches) {
throw new ValidationException(message);
}
return matches;
}
}
This class could be use with specifics patterns such:
Remark: the second pattern matches “french names” wich could uses accents, for example “Clémence”.
Here the complete code of the utility class ‘MyValidators’:
import com.gwtext.client.widgets.form.ValidationException;
import com.gwtext.client.widgets.form.Validator;
public class MyValidators {
public static final String MAIL_PATTERN = "^[a-z0-9._-]+@[a-z0-9.-]{2,}[.][a-z]{2,3}$";
public static final Validator EMAIL_VALIDATOR = new PatternValidator("The field must be a valid email", MAIL_PATTERN);
public static final String NAME_PATTERN = "^[a-zA-Zà-ÿ _\\-]*$";
public static final Validator NAME_VALIDATOR = new PatternValidator("The field must be a valid name", NAME_PATTERN);
private static class PatternValidator implements Validator {
protected String message;
protected String pattern;
public PatternValidator(String message, String pattern) {
super();
this.pattern = pattern;
this.message = message;
}
public boolean validate(String value) throws ValidationException {
boolean matches = value.matches(pattern);
if (!matches) {
throw new ValidationException(message);
}
return matches;
}
}
}
I need to load a huge number of data in memory with a Perl hash. The value corresponding to each key is an array of scalar data. The key is most of the time created with a single field of my array, but it can be made of several fields. The number of fields in the array may vary a lot, but most of the time it will be around 5 scalar values.
My goal is to load as many keys as possible with a limited memory size. Perl interpreter only takes 5MB at the beginning of the process, as Sys::Statistics::Linux::Processes tells me with the virtual size (which is the same as the real size for Perl scripts).
My data to load looks like this (500k lines, 5 columns):
1,1948-10-26,1951-12-08,8TBhXzkOvc,l0YO0ghDND 2,1920-04-16,1959-06-10,eyCFd4IjRo,41YTEnB7Qh 3,1978-12-28,2005-06-23,9LeBBiR2sw,qk30zZdftW 4,2004-01-05,1997-03-25,66K6gvdd5D,bmL3LpuLKT 5,2019-05-11,1995-08-27,rGRtJHioa7,qF7bhwfGeE
Here comes my basic script. The hash key is made of only one column, the first field in the line. In the next examples, only code between mark 1 and mark 2 will change.
#!/usr/bin/perl
use strict;
use warnings;
use Time::HiRes qw(gettimeofday tv_interval);
use Sys::Statistics::Linux::Processes;
my $lxs = Sys::Statistics::Linux::Processes->new;
$lxs->init;
my $start = [gettimeofday];
my %cache = ();
open(my $ifh, '<'.$ARGV[0])
or die 'cannot open input file';
while (<$ifh>) {
chomp;
my @fields = split ',', $_;
# mark 1
$cache{$fields[0]} = \@fields;
# mark 2
}
close($ifh);
my $stop = [gettimeofday];
my $stat = $lxs->get;
printf(
"time: %.1f seconds, memory : %uM\n",
tv_interval($start, $stop),
$stat->{$$}{vsize} / (1024 * 1024)
);
We can use less memory if we store a string instead of an array reference:
# mark 1
$cache{$fields[0]} = join $;, @fields;
# mark 2
plegall@miro:~/bench/hash$ perl load-01.pl in_5c_500kl.csv; perl load-02.pl in_5c_500kl.csv time: 2.1 seconds, memory : 173M time: 2.3 seconds, memory : 70M
This is of course a major improvement for me! 2.5 times less memory used and a really low extra time. Let's verify the memory usage is linear, let's have a data file with only the half of lines:
plegall@miro:~/bench/hash$ head -250000 in_5c_500kl.csv > in_5c_250kl.csv plegall@miro:~/bench/hash$ perl load-01.pl in_5c_250kl.csv; perl load-02.pl in_5c_250kl.csv time: 1.0 seconds, memory : 89M time: 1.1 seconds, memory : 37M
It confirms the memory usage is linear : half the data size, half the memory usage. On a unix-like operating system like Linux, a single process can consume up to 3GB of memory. According to this limit, it would mean that a hash can have nearly 22 millions records at once.
In Talend Open Studio, the tMap has one main link and several lookup links as input. Each lookup link corresponds to a data join. The data joined are stored in memory, thanks to hashes. So I have opened a feature request to implement this improvement.
Some new features have been added. Here is a list:
Some bugs have been fixed. Among them, the most important are:
Go to download page. Check also the “Getting started guide”: a new section with a short introduction to the usage of patterns.

As GWT java client code is compiled into javascript code, regexp (in field validators) must be javascript compatible.
Here is some links that helps to design javascript regexp: