Open in gitpod
This project seeks to reduce the sometimes innumerable number of trips back and forth between your favorite Japanese dictionary and the text you're reading. The included script parses a text from the Aozora Bunko literature corpus and looks up helpful information for terms in the text via the Jotoba API
Once you've cloned this repo and installed the necessary ruby gems (found in the Gemfile), you'll need to make sure you have a copy of the Aozora Bunko database file located in the ./data directory. Once you do, you can start annotating!
The easiest way to use this tool is via the command line. From the repo directory:
#to show all cli options and arguments
ruby azb.rb -h
#to search the database and return all texts with metainfo containing "源氏物語"
ruby azb.rb -s 源氏物語
#to pull information for text 165444, perform lookups, generate annotations, and render html and plaintext documents to the outputs directory
ruby azb.rb -i 165444
# to run the full pipeline as described above, this time with options!
ruby azb.rb -i 165444 -c -k -f 225%
Sometimes the api lookup behavior isn't perfect, so if you're planning on using this as a teaching aid or instructional materials, you can always fine tune the lookups by editing the json file after the initial lookup fetching
The dataset used for the current project was pulled from the following:
- Aozora Bunko Corpus for Japanese full texts
- Jotoba and Jotoba API for looking up terms. Jotoba brings together information from a range of free sources including JMDict, Tofugu, and Tatoeba and all sources are listed here
-
Annotation format breakdown
- Alternating
- One term with its annotations immediately between it and the next term
- term (annotation) term (annotation)
- Layered
- One sentence with all its annotations on the following line
- sentence
- (sentence annotations)
- sentence
- (sentence annotations)
- Parallel
- Full text with readings rendered above and meanings below, similar to the furigana annotation style commonly used
- (sentence readings)
- sentence
- (sentence meanings)
- Side by side
- One sentence with all its annotations displayed on the right of the page
- sentence || (sentence annotations)
- sentence || (sentence annotations)
- Alternating
-
Example outputs, generated from 三十三の死 by しづ素木:
- Alternating annotations in HTML
- Layered annotations in HTML
- Parallel annotations in HTML
- Side by side annotations in HTML
- Unannotated in HTML
- Alternating annotations in plain text
- Layered annotations in plain text
- Unannotated in plain text