Friday, June 3, 2011

Tracking down the missing audio file

In this post, we'll attempt to debug a problem - most of the audio files to be played are found, but not all.

But first, a quick aside: if you're trying to do something like center text in a textbox using gravity="center", and it's not working, check out the width. If it's "wrap_content" instead of "fill_parent", it jams the text in the text box!

This solved the problem for me:

android:layout_width="fill_parent" android:layout_height="wrap_content"
android:paddingLeft="10dp"
android:gravity="top|center"
android:textSize="25sp" android:textColor="@color/text_color" />

Back to the topic of this post. The problem is that the file names are mostly spaces. They start with the japanese word, then a *bunch* of spaces, then end with the english word. The problem is that the comparison of the Japanese part doesn't work simply using "equals" due to the normalization issues - the same multi-byte characters can be represented using different combinations of bits, and the equals doesn't see that. I've accounted for that, though. So it must be something else.

Ok, let's find a kanji that doesn't match. It should be easy enough.

Well, one of them is ”あびる” which means to bathe. Here the problem might due to the fact that the word is probably carried with a kanji for it such as "浴びる", meaning it won't match. But, we're after the kanji problem first. Or a kanji problem.

We also have a mismatch on それ and それから. This is again due to the comparison, which matches the beginning and the end. So, they both must have the english meaning of "that". Btw, here is the comparison:


if (nameNFD.endsWith(endStringNFD)) {

if ((nameNFD.startsWith(kanjiNFD))){

//if (name.matches("" + question.kanji + ".*" + question.english + ".mp3")) {
Log.d(TAG, "============================== filefound - name is: " + soundFile.getName());
return soundFile;
}
}
else {
//Log.d(TAG, "notfound, file name: " + soundFile.getName());
}

where name is the file name.

Hitotsu also comes out in tiny print. It's either 56 or 519.

I'll get that later. First, I have this one, which I also know is either 54 or 55:

言葉

ことば

This means "word".

I don't get it - the database says it's number

Well, first let's see what logcat is showing:

I can't find the file name because I'm displaying it with each file name, like this:

Log.d(TAG, ">>>>>> file name: " + soundFile.getName());
Log.d(TAG, ">>>>>> nameNFD: " + nameNFD);
Log.d(TAG, ">>>>>> kanjiNFD: " + kanjiNFD);
Log.d(TAG, ">>>>>> endStringNFD: " + endStringNFD);

I just need to distinguish the display:

Log.d(TAG, ">>>>>> nameNFD: " + nameNFD + "x");


There it is:

word,language.mp3

Ah...there is no space.

From the database:

word, language

There is a space. So, hopefully this is the case for a lot of the multiple word file names.

Ah, the number was off because it's sorted by frequency of use.


Something like this: s.replaceAll(" ", "")

Well, I replaced them for the whole file name by accident - and it still found a different one. How? Something to do with NFD?

String endStringNFD = Normalizer.normalize(endString, Normalizer.Form.NFD);
endStringNFD.replaceAll(" ", "");


The above didn't work - Strings are immutable, right. So

String endStringNFD = Normalizer.normalize(endString, Normalizer.Form.NFD);
String searchEndStringNFD = endStringNFD.replaceAll(" ", "");

Btw, normalization (e.g. Normalization Form D, NFD) are schemes for how multi-byte characters can be represented in bytes; it could be by combining two characters to make one, or to use a pre-existing character. More info here:

http://weblogs.java.net/blog/joconner/archive/2007/02/normalization_c.html

And that's a wrap.

No comments:

Post a Comment