Friday, June 3, 2011

Testing the regex

Further testing reveal a bit of a problem. Watashi and Watakushi both come up without sound. Why?

Here's the search string:

>>>>>> searchEndString: .* (humble).*i,myself.*\.mp3$

Here's the file name:

私 わたし I,myself.mp3x

Ah, ok. The file doesn't contain the "humble" part.

I'm thinking we might be better off reverting to the "endswith". It's kind of a question of percentages. We need to keep that humble in there. But the "word,language" can always be modified. Well, that's a pity.

And what about case? Is that a problem?

It sucks the file names don't match - but not that surprising.

Owwch. The endWith is apparently case sensitve. Wait. Can I convert the file name to lower case? before doing the compare?

Let's try.

String nameNFD = Normalizer.normalize(soundFile.getName(), Normalizer.Form.NFD).toLowerCase();

Good - I got into the 30s before this problem:

searchEndString: to see, to watch.mp3

And I'm not showing a file being there on the log that ends with that.

And there's another problem - same kanjji, same meaning - different pronunciation. This is another tricky one to resolve, because the current logic check on a match in the kanji OR a match in the hiragana - not both.

I'm going to hope it pretty rare and leave it for now.

Yet another problem in 31-40 - juu / tou

searchEndString: ten.mp3

There are two files, neither with the kanji:

とお ten.mp3x
じゅう ten.mp3x
And the kanji is there:

What to do about this? I guess I could play one, if for example, the there was no match on the first but a kind of match on the second. Pfft.

kuchi, mouth doesn't match.

kanjiNFD: 口
search: mouth, opening.mp3

And the file ends with mouth, opening.mp3.

Well, here's what I'm thinking about doing. Create a list of all the file names. There are like 1200 or so of them. Get them into a database table. Bring it across into the into the Android. Create a routine which, runs through the search, and updates the table with the level and id of word. When that's done, I can just fill in the blanks manually.

I dunno. That seems like a lot of work. I'll have to think about it. It's a good thing I like Japanese.

No comments:

Post a Comment