August 20, 2008

Sebastiao CORREIA

TOP 1.1.0 milestone 2


The second milestone version of Talend Open Profiler is out. Try it now!

Among the new features, the support for Microsoft SQL Server has been added.

by scorreia at August 20, 2008 10:21 PM

Stephane MALLET

Open Flash Chart

Open Flash Chart is a flash library to display charts I plan to use in GWT.

Here is an example of the line chart in v1:




width="450" height="250" id="ie_chart" align="middle">
type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" id="chart"/>

Go to their site to see other examples (bar, pie, etc.)

Links

by stef at August 20, 2008 08:42 PM

August 19, 2008

Stephane MALLET

GChart 2.1

This client-side chart library I’ve spoke before, 2.1 is out.

For remember:

The main idea behind GChart is simple: You can make very nice charts efficiently out of a reasonably small number of 1-cell Grids (for the aligned labels) and (empty) Images (for everything else), styled and positioned appropriately on an AbsolutePanel. Not surprisingly, bar charts don’t suffer at all under the limitations imposed by this strategy–but (as long as you don’t mind using dotted connecting lines or banded-filled pie slices) line and pie charts also do remarkably well.

As an example is better than a speech:

Links

by stef at August 19, 2008 07:37 PM

August 06, 2008

Sebastiao CORREIA

TOP 1.1.0 milestone 1


The first milestone release of the next version of Talend Open Profiler is out!!

About the new features:

  • A “Result” tab has been added to the analysis editor in which result values are in tables.
  • In the indicator selector pop-up, a full row can be checked with one click.
  • You can start doing data quality monitoring by setting thresholds on indicators: when the thresholds are not respected, result is highlighted in red color in the Result tab of the analysis editor.
  • A new kind of analysis is provided: The connection analysis. But beware that the filter do not work yet. This means all tables are scanned. Don’t use it on big databases yet.
  • Regular Patterns can be imported from an Excel file.
  • A new type of Indicator has been created for SQL patterns. This allows you to create your own patterns to put in “LIKE” clause.
  • A menu “Column analysis” has been added on Table elements to profile all columns of one or several tables with a few clicks.
  • A new view outputs some details on the selected objects.
  • You can now see what objects are analyzed without having to open the analysis editor

You are welcome to suggest new features or report bugs in Talend’s bugtracker.

by scorreia at August 06, 2008 05:08 PM

Talend Open Profiler video


I found this video on Talend Open Profiler 1.0.0 on a French website dedicated to Business Intelligence.

The first video shows the installation of TOP on a Windows system and presents the layout of the application.

The second video is more interesting because it shows the functionalities of TOP. The demo shows how to create your own analyses and what you can tell about the quality of your data with a few clicks. It shows the use of the patterns indicators to check the validity of the email addresses, the phone numbers…

With this video, you can judge about the power of TOP in terms of speed. In this example, profiling around 7000 rows with all indicators selected and a few patterns defined takes less than 2 seconds.

If you want to test it by yourself, go to the Talend download page. You can even try the latest milestone release 1.1.0M1.

by scorreia at August 06, 2008 05:07 PM

August 01, 2008

Sebastiao CORREIA

How to launch TOP with a specified JVM?


If you need to specify the JVM path to be used by Talend Open Profiler. Simply edit the TalendOpenProfiler-XXX.ini corresponding to your system and add the following 2 lines at the beginning of the file:
-vm
C:/usr/bin/java.exe

Be sure to write them on 2 lines, not on one line, otherwise it will not work.

The same configuration setting applies to Talend Open Studio.

Source: Eclipse FAQ.

by scorreia at August 01, 2008 06:36 PM

July 22, 2008

Stephane MALLET

Form fields validators and GwtExt

This article will show an easy way to use regexp field validation on client-side using GwtExt.

The chosen policy is to implements com.gwtext.client.widgets.form.Validator.

The result is the following class ‘PatternValidator’:

private static class PatternValidator implements Validator {

        protected String message;

        protected String pattern;

        public PatternValidator(String message, String pattern) {
            super();
            this.pattern = pattern;
            this.message = message;
        }

        public boolean validate(String value) throws ValidationException {
            boolean matches = value.matches(pattern);
            if (!matches) {
                throw new ValidationException(message);
            }
            return matches;
        }
    }

This class could be use with specifics patterns such:

  • Mail pattern: ^[a-z0-9._-]+@[a-z0-9.-]{2,}[.][a-z]{2,3}$
  • First and last names pattern: ^[a-zA-Zà-ÿ _\\-]*$

Remark: the second pattern matches “french names” wich could uses accents, for example “Clémence”.

Here the complete code of the utility class ‘MyValidators’:

import com.gwtext.client.widgets.form.ValidationException;
import com.gwtext.client.widgets.form.Validator;

public class MyValidators {

    public static final String MAIL_PATTERN = "^[a-z0-9._-]+@[a-z0-9.-]{2,}[.][a-z]{2,3}$";

    public static final Validator EMAIL_VALIDATOR = new PatternValidator("The field must be a valid email", MAIL_PATTERN);

    public static final String NAME_PATTERN = "^[a-zA-Zà-ÿ _\\-]*$";

    public static final Validator NAME_VALIDATOR = new PatternValidator("The field must be a valid name", NAME_PATTERN);

    private static class PatternValidator implements Validator {

        protected String message;

        protected String pattern;

        public PatternValidator(String message, String pattern) {
            super();
            this.pattern = pattern;
            this.message = message;
        }

        public boolean validate(String value) throws ValidationException {
            boolean matches = value.matches(pattern);
            if (!matches) {
                throw new ValidationException(message);
            }
            return matches;
        }
    }
}

by stef at July 22, 2008 09:08 PM

July 15, 2008

Pierrick LE GALL

When memory matters

memory.jpg I need to load a huge number of data in memory with a Perl hash. The value corresponding to each key is an array of scalar data. The key is most of the time created with a single field of my array, but it can be made of several fields. The number of fields in the array may vary a lot, but most of the time it will be around 5 scalar values.

My goal is to load as many keys as possible with a limited memory size. Perl interpreter only takes 5MB at the beginning of the process, as Sys::Statistics::Linux::Processes tells me with the virtual size (which is the same as the real size for Perl scripts).

My data to load looks like this (500k lines, 5 columns):

1,1948-10-26,1951-12-08,8TBhXzkOvc,l0YO0ghDND
2,1920-04-16,1959-06-10,eyCFd4IjRo,41YTEnB7Qh
3,1978-12-28,2005-06-23,9LeBBiR2sw,qk30zZdftW
4,2004-01-05,1997-03-25,66K6gvdd5D,bmL3LpuLKT
5,2019-05-11,1995-08-27,rGRtJHioa7,qF7bhwfGeE

Here comes my basic script. The hash key is made of only one column, the first field in the line. In the next examples, only code between mark 1 and mark 2 will change.

#!/usr/bin/perl

use strict;
use warnings;

use Time::HiRes qw(gettimeofday tv_interval);
use Sys::Statistics::Linux::Processes;

my $lxs = Sys::Statistics::Linux::Processes->new;
$lxs->init;

my $start = [gettimeofday];
my %cache = ();

open(my $ifh, '<'.$ARGV[0])
    or die 'cannot open input file';

while (<$ifh>) {
    chomp;
    my @fields = split ',', $_;

    # mark 1
    $cache{$fields[0]} = \@fields;
    # mark 2
}

close($ifh);

my $stop = [gettimeofday];
my $stat = $lxs->get;

printf(
    "time: %.1f seconds, memory : %uM\n",
    tv_interval($start, $stop),
    $stat->{$$}{vsize} / (1024 * 1024)
);

We can use less memory if we store a string instead of an array reference:

# mark 1
$cache{$fields[0]} = join $;, @fields;
# mark 2
plegall@miro:~/bench/hash$ perl load-01.pl in_5c_500kl.csv; perl load-02.pl in_5c_500kl.csv
time: 2.1 seconds, memory : 173M
time: 2.3 seconds, memory : 70M

This is of course a major improvement for me! 2.5 times less memory used and a really low extra time. Let's verify the memory usage is linear, let's have a data file with only the half of lines:

plegall@miro:~/bench/hash$ head -250000 in_5c_500kl.csv  > in_5c_250kl.csv

plegall@miro:~/bench/hash$ perl load-01.pl in_5c_250kl.csv; perl load-02.pl in_5c_250kl.csv
time: 1.0 seconds, memory : 89M
time: 1.1 seconds, memory : 37M

It confirms the memory usage is linear : half the data size, half the memory usage. On a unix-like operating system like Linux, a single process can consume up to 3GB of memory. According to this limit, it would mean that a hash can have nearly 22 millions records at once.

In Talend Open Studio, the tMap has one main link and several lookup links as input. Each lookup link corresponds to a data join. The data joined are stored in memory, thanks to hashes. So I have opened a feature request to implement this improvement.

by Pierrick Le Gall at July 15, 2008 10:31 PM

July 05, 2008

Sebastiao CORREIA

TOP: New version


Some new features have been added. Here is a list:

  • A toolbar has been added with the buttons for running analyses, previewing graphics, saving files.
  • You can now drag & drop column into the analysis editor.
  • Some predefined analyses are now available by a right click on the columns.
  • The pattern editor is open when you create a new pattern so that you can easily modify your patterns.
  • A button for adding pattern indicator to a column in the analysis editor

Some bugs have been fixed. Among them, the most important are:

  • The frequency table now works
  • The cheat sheet is open at start
  • The number of elements is displayed correctly in the DQ repository view

Go to download page. Check also the “Getting started guide”: a new section with a short introduction to the usage of patterns.

by scorreia at July 05, 2008 10:29 AM

June 30, 2008

Stephane MALLET

RegExp with javascript: online tester

As GWT java client code is compiled into javascript code, regexp (in field validators) must be javascript compatible.

Here is some links that helps to design javascript regexp:

by stef at June 30, 2008 08:15 PM

Copyright © 2007 - 2008 Talend. All rights reserved. Talend Contributor Agreement