Reporting data provenance #43
Replies: 2 comments
-
وعليكم السلام ورحمةاللّٰه وبركاته First of all, thank you for your interest in this library project. This is an interesting question, and if you don't mind, we will convert this issue to a discussion and pin it so that this information can be easily accessed by many people. Indeed, we initially used the concept of scraping to retrieve postal code data from existing websites such as carikodepos.com, nomorkodepos.com, and others. However, various issues started to arise, including returned data being an empty array (#19), returned data being null (#27), inconsistent responses (#29), and even internal server errors. From these issues, we eventually decided to provide the data statically and update it periodically (if there are data updates from the government). We began looking for authentic data sources. We found kodepos.posindonesia.co.id, which provides all postal code data in PDF format. We manually parsed it whenever we had free time, and the turning point was when an issue (which has now become a discussion) arose in #33. The websites we used for data retrieval went down, and we couldn't find similar websites anymore. We then rushed to complete all the data and committed it in #de11b82. For coordinate (latitude and longitude) as well as elevation, we retrieved and synchronized the data programmatically (the script for which is currently missing) with data from www.opentopodata.org. For time zones, we matched them with provincial data that can be found on the internet, including:
That's all I can share. Thank you. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your reply. I have continued this thread as a discussion. Is it an idea to close this issue and proceed in #42? |
Beta Was this translation helpful? Give feedback.
-
Assalamualaikum warahmatullahi wabarakatuh
Thank you for the library, I'm considering trying it out for a prototype.
One of the criteria that we validate prototypes on is data reliability, accuracy and maintenance.
Do you report anywhere where you get your data from?
I see it used to be scraped from Direktorikodepos. Am I right to assume that it is now statically scraped from this website?
I'm happy to help if there is any help necessary for developing a data provenance strategy.
Beta Was this translation helpful? Give feedback.
All reactions