You are not logged in.
Announcement
Unanswered posts
|
Pages: 1

I'm using tExtractDelimitedFields to split a file name by field separator "_".
For example
I have a file named as a_b_c_d_e.jpg.
If I use split it by "_", I know I will get five strings a, b, c, d, e.jpg.
My question is:
Is there a way that I can split it by "_" as 4 strings a, b, c, d_e.jpg instead of five strings still using the tExtractDelimitedFields?
More further, a_b_c in the file name is fixed and needs to be separated and the part after that might be extendable. In details, the name could be a_b_c_d_e_f.jpg or even longer. But I just want a, b, c and all the string after that will be considered as one single string. Is it able to that too?
Appreciate the help!
Offline

Hi
Here is a workaround.
Add tJavaRow between InputComponent an tExtractDelimitedFields.
Set field separator ";".
Type in tJavaRow as follow.
String temp=input_row.basename;
for(int i=0;i<3;i++){
temp=temp.replaceFirst("_",";");
}
output_row.basename = temp;Regards,
Pedro
Offline
Can you use 2 delimiters? Use the - for the parts of the name you will parse out and _ for the remainder? (a-b-c_d_e_f.jpg)
If you like regular expressions, you can use this block of code. Put the matcher and matches lines in a tJavaRow. Map each part as separate schema fields.
String s1 = "a_b_c_d_e_f.jpg";
java.util.regex.Matcher m =
java.util.regex.Pattern.compile("([\\p{Alnum}]+)_([\\p{Alnum}]+)_([\\p{Alnum}]+)_(.*)").
matcher(s1);
m.matches();
System.out.println("Part 1=" + m.group(1)); // ex. output_row.majorNumber
System.out.println("Part 2=" + m.group(2)); // ex. output_row.minorNumber
System.out.println("Part 3=" + m.group(3)); // ex. output_row.brachNumber
System.out.println("The Rest=" + m.group(4)); // ex. output_row.fileNameLast edited by walkerca (2012-03-21 17:19:40)
Offline

walkerca wrote:
Can you use 2 delimiters? Use the - for the parts of the name you will parse out and _ for the remainder? (a-b-c_d_e_f.jpg)
If you like regular expressions, you can use this block of code. Put the matcher and matches lines in a tJavaRow. Map each part as separate schema fields.Code:
String s1 = "a_b_c_d_e_f.jpg"; java.util.regex.Matcher m = java.util.regex.Pattern.compile("([\\p{Alnum}]+)_([\\p{Alnum}]+)_([\\p{Alnum}]+)_(.*)"). matcher(s1); m.matches(); System.out.println("Part 1=" + m.group(1)); // ex. output_row.majorNumber System.out.println("Part 2=" + m.group(2)); // ex. output_row.minorNumber System.out.println("Part 3=" + m.group(3)); // ex. output_row.brachNumber System.out.println("The Rest=" + m.group(4)); // ex. output_row.fileName
The file name is determined by the users. It's not easy for me to force them to use two delimiters. But I tried your code in tJavaRow. It works great! I added a little bit so I'm able to extract the first three parts from the original name and rename the last part. Also the code works for changeable file name. The number of "_" is extendable. That's exactly what I want - separate the first 3 parts and the rest.
Really appreciate your help!!!
Last edited by elvalin1559 (2012-03-21 23:43:06)
Offline

elvalin1559 wrote:
walkerca wrote:
Can you use 2 delimiters? Use the - for the parts of the name you will parse out and _ for the remainder? (a-b-c_d_e_f.jpg)
If you like regular expressions, you can use this block of code. Put the matcher and matches lines in a tJavaRow. Map each part as separate schema fields.Code:
String s1 = "a_b_c_d_e_f.jpg"; java.util.regex.Matcher m = java.util.regex.Pattern.compile("([\\p{Alnum}]+)_([\\p{Alnum}]+)_([\\p{Alnum}]+)_(.*)"). matcher(s1); m.matches(); System.out.println("Part 1=" + m.group(1)); // ex. output_row.majorNumber System.out.println("Part 2=" + m.group(2)); // ex. output_row.minorNumber System.out.println("Part 3=" + m.group(3)); // ex. output_row.brachNumber System.out.println("The Rest=" + m.group(4)); // ex. output_row.fileNameThe file name is determined by the users. It's not easy for me to force them to use two delimiters. But I tried your code in tJavaRow. It works great! I added a little bit so I'm able to extract the first three parts from the original name and rename the last part. Also the code works for changeable file name. The number of "_" is extendable. That's exactly what I want - separate the first 3 parts and the rest.
Really appreciate your help!!!
Have problem with the group(2). It's using Alnum right now. But in the string a_b_c_d_e_f.jpg, in b part, there's possibility to have a special string "-" which is in \p{Punct} group. So wonder if there's a way to resolve that.
I still want the string in b part keep what it as is right now.
For example,
a_b1-b2_c_d_e_f.jpg after it's separated by underline "_" I want a, b1-b2, c, d, e, f.jpg. the hyphen is needed in the string. I tried some but not working. Could anyone take a look and give some suggestion?
Thanks!!!
Offline
Add a dash to the list of acceptable chars in the second group. The second "Alnum" is followed by the dash.
m = java.util.regex.Pattern.compile("([\\p{Alnum}]+)_([\\p{Alnum}-]+)_([\\p{Alnum}]+)_(.*)").
matcher(s1);
Offline

Works great! Hope there won't be any other special character in the string.
Thanks a lot!!!
Offline
Pages: 1