English  |  正體中文  |  简体中文  |  Post-Print筆數 : 11 |  Items with full text/Total items : 88531/118073 (75%)
Visitors : 23458447      Online Users : 156
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: http://nccur.lib.nccu.edu.tw/handle/140.119/124835


    Title: 基於自動圖像標註之圖像檢索工具發展與應用研究
    Development and Application of an Image Retrieval Tool Based on Automatic Image Annotation
    Authors: 張志泓
    Chang, Chih-Hung
    Contributors: 陳志銘
    Chen, Chih-Ming
    張志泓
    Chang, Chih-Hung
    Keywords: 使用者研究
    數位人文
    數位圖像
    圖像辨識
    深度學習
    自動圖像標註
    實例分割
    Mask R-CNN
    人機互動
    詞頻統計
    行為分析
    user research
    digital humanities
    digital image
    image recognition
    deep learning
    automatic image annotation
    instance segmentation
    Mask R-CNN
    human-computer interaction
    word frequency statistics
    behavior analysis
    Date: 2019
    Issue Date: 2019-08-07 16:28:29 (UTC+8)
    Abstract: 「數位圖像」在資訊蓬勃發展的現代,已經成為支持數位人文研究的重要資料型態之一,而其發展亦為數位時代的人文研究開闢新的挑戰與發展機會。而過去許多研究指出數位圖像的顯示不該是一長串清單或縮略圖,應該存在能透過視覺立刻吸收資訊的物件訊息,會讓閱讀者具有更好的組織圖像能力。因此,能大量辨識數位圖像中存在物件並加以分析的需求愈發重要。而「圖像標註」在此扮演了不可或缺的地位,透過決定適當詞彙來描述數位圖像,以降低人類使用者對於圖像的解釋以及圖像低級特徵之間的語意落差。其中隨著科技發展而衍伸出的「自動圖像標註」則在圖像標註的基礎上,做到降低人工標註的成本與具備高效率及低主觀性等優點,進而促成本研究探索「自動圖像標註」技術輔助數位人文學者進行個體之圖像詮釋的使用差異與感受,嘗試以人文學者角度出發去瞭解使用者如何,以及為何使用圖像,進一步發展出得以有效輔助人文學者進行圖像情境解讀之數位人文工具。

    因此,本研究發展出「基於自動圖像標註之圖像檢索工具(Image Retrieval Tool Based on Automatic Image Annotation, IRT-AIA)」。該系統的核心技術採用圖像辨識領域中實現實例分割任務的演算法-Mask R-CNN,主要目的為圖像中的實體物件識別,除了能具體辨識圖像中各自獨立的實體物件所屬類別與所在位置以外,更進一步描繪出各實體物件之輪廓,藉此快速萃取數位圖像中的實體物件訊息,並作為圖像集合的替代資訊呈現,讓使用者得以快速吸收並有效組織圖像內容。最後輔以友善且有助於增進人文學者與系統互動之介面,讓人文學者得以在個體詮釋的角度下進行圖像標註以快速取得數位圖像之後設資料內容,進而促進人文學者更有效率地解讀圖像情境。

    為驗證本研究發展之IRT-AIA是否有助於人文學者進行圖像解讀,本研究採用準實驗研究法之對抗平衡設計,將使用者分為兩組,根據不同的系統使用順序來依次操作IRT-AIA與一般圖像檢索工具(General Image Retrieval Tool, GIRT)來完成不同階段之任務單。並透過行為歷程記錄技術來完整記錄使用者的系統操作行為、科技接受度問卷來反映使用者的實際感受,以及半結構式深度訪談來瞭解使用者的想法與建議,透過多種方法進行交互驗證,以瞭解本研究發展之IRT-AIA與GIRT在自動圖像標註之準確度、解讀圖像情境之檢索圖像正確率、解讀圖像情境之成效、科技接受度上的差異。

    研究結果發現:第一,IRT-AIA的自動圖像標註準確度已足以有效輔助使用者解讀圖像情境;第二,使用IRT-AIA能獲得更佳的圖像檢索精確率,以及良好的召回率;第三,使用GIRT與IRT-AIA在解讀圖像情境之成效上並未達顯著差異,從分析中顯示社群標籤與人工智慧標籤各有其擅長用途,因此兩者並重的系統才能滿足使用者的不同檢索需求;第四,使用GIRT與IRT-AIA在科技接受度上未達顯著差異,但是均有高於中間值的良好科技接受度;第五,使用者在使用社群標籤與人工智慧標籤輔以瀏覽與檢索的過程中,更為偏好採用人工智慧標籤,並更容易獲得使用者想要進一步瀏覽的目標圖像。
    “Digital image”, in the information development era, has become an important data pattern supporting research on digital humanities. The development also creates new challenge and development opportunities for humanities studies in the digital time. Past research indicated that the display of digital images should not be a long list or a thumbnail, but should exist in the object message which could immediately absorb information visually and allow readers presenting better image organization ability. For this reason, it becomes more important to largely recognize and analyze objects existing in digital images. “Image annotation” plays an inevitably role; digital images are described by determining proper vocabulary to reduce the semantic gap between human users’ image explanation and image low-level features. Along with the development of technology, “automatic image annotation”, based on image annotation, could reduce the cost for manual annotation and present advantages of high efficiency and low subjectivity. It facilitates the research on exploring humanists’ use difference and perception of individual image interpretation with the assistance of “automatic image annotation”. This study attempts to understand how and why users use images from the aspect of humanists and further develop an effective digital humanities tool assisting humanists in image situation interpretation.

    “Image retrieval tool based on automatic image annotation (IRT-AIA)” is therefore developed in this study. The core technology of the system is to apply Mask R-CNN, the algorithm to implement instance segmentation tasks in image recognition, to recognize physical objects in images. In addition to specifically recognize the categories and locations of independent physical object in images, it would further draw the profile of various physical objects to rapidly extract the physical object message in digital images. Such message is presented as the alternative information of image set, allowing users rapidly absorbing and effectively organizing image contents. Finally, a friendly interface to enhance the interaction between humanists and the system allows humanists preceding image annotation under individual interpretation to rapidly acquire the meta-data content of digital images and further facilitate more efficient image situation interpretation of humanists.

    To verify IRT-AIA developed in this study being able to assist humanists in image interpretation, counterbalanced design in quasi-experimental research is applied in this study. The users are divided into two groups to complete tasks at different stages by operating IRT-AIA and general image retrieval tool (GIRT), according to different system use sequence. Behavior process recording is also utilized for completely recording users’ system operation behaviors, technology acceptance model questionnaire is applied to reflect users’ actual perception, and semi-structured in-depth interview is used for understanding user’s ideas and suggestions. With the mutual verification through various methods, the differences in automatic image annotation accuracy, image retrieval accuracy in image situation interpretation, image situation interpretation effectiveness, and technology acceptance between IRT-AIA and GIRT developed in this study are understood.
    The research results are summarized as followings. First, the automatic image annotation accuracy of IRT-AIA could effectively assist users in interpreting image situation. Second, the use of IRT-AIA could acquire better image retrieval precision rate and good recall rate. Third, the image situation interpretation effectiveness between the use of GIRT and IRT-AIA does not achieve significant differences. The analyses reveal that community tag and artificial intelligence tag present distinct purposes that a system laying equal stress on both could satisfy users’ needs for different retrieval. Fourth, the use of GIRT and IRT-AIA does not reach remarkable differences in technology acceptance, but presents good technology acceptance above the median. Fifth, users, in the process of using community tag and artificial intelligence tag for browse and retrieval, prefer artificial intelligence tag, which allows users more easily acquiring the target image.
    Reference: 中文文獻
    政治大學(2007)。茶言觀政:政大校園影像記憶網。檢自 http://memory.lib.nccu.edu.tw/?m=c1104&doc_serial=1
    陳勇汀(2017)。行為順序檢定:滯後序列分析 / Behavior Analysis: Lag Sequential Analysis。檢自https://pulipulichen.github.io/HTML-Lag-Sequential-Analysis/
    項潔、涂豐恩(2011)。導論—什麼是數位人文。從保存到創造: 開啟數位人文研究 (頁9-28)

    英文文獻
    Abdulla, W. (2017). Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow. [Python, GitHub repository]. Retrieved from https://github.com/matterport/Mask_RCNN (Original work published 2017)
    Agosti, M., Ferro, N., Orio, N., & Ponchia, C. (2014). CULTURA Outcomes for Improving the User’s Engagement with Cultural Heritage Collections. Procedia Computer Science, 38, 34-39. doi:10.1016/j.procs.2014.10.007
    Bates, M. J. (2007). What is browsing—really? A model drawing from behavioural science research. Retrieved from http://www.informationr.net/ir/12-4/paper330.html
    Beaudoin, J. E. (2014). A framework of image use among archaeologists, architects, art historians and artists. Journal of Documentation, 70(1), 119-147. doi:10.1108/JD-12-2012-0157
    Beaudoin, J. E., & Brady, J. E. (2011). Finding Visual Information: A Study of Image Resources Used by Archaeologists, Architects, Art Historians, and Artists. Art Documentation: Journal of the Art Libraries Society of North America, 30(2), 24-36. doi:10.1086/adx.30.2.41244062
    Belkin, N. J. (2008). Some(What) Grand Challenges for Information Retrieval. In C. Macdonald, I. Ounis, V. Plachouras, I. Ruthven, & R. W. White (Eds.), Advances in Information Retrieval (pp. 1-1). Springer Berlin Heidelberg.
    Bhagat, P. K., & Choudhary, P. (2018). Image annotation: Then and now. Image and Vision Computing, 80, 1-23. doi:10.1016/j.imavis.2018.09.017
    Busa, R. (1980). The Annals of Humanities Computing: The Index Thomisticus. Computers and the Humanities, 14(2), 83-90.
    Chen, C.-M., & Tsay, M.-Y. (2017). Applications of collaborative annotation system in digital curation, crowdsourcing, and digital humanities. The Electronic Library, 35(6), 1122-1140. doi:10.1108/EL-08-2016-0172
    Chen, J., Wang, D., Xie, I., & Lu, Q. (2018). Image annotation tactics: transitions, strategies and efficiency. Information Processing & Management, 54(6), 985-1001. doi:10.1016/j.ipm.2018.06.009
    Cheng, Q., Zhang, Q., Fu, P., Tu, C., & Li, S. (2018). A survey and analysis on automatic image annotation. Pattern Recognition, 79, 242-259. doi:10.1016/j.patcog.2018.02.017
    Chew, B., Rode, J. A., & Sellen, A. (2010). Understanding the Everyday Use of Images on the Web. In Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries (pp. 102–111). New York, NY, USA: ACM. doi:10.1145/1868914.1868930
    Dutta, A., Gupta, A., & Zisserman, A. (2016). {VGG} Image Annotator ({VIA}). HTML, CSS and Javascript, Visual Geometry Group. Retrieved from http://www.robots.ox.ac.uk/~vgg/software/via/
    Eklund, P., Lindh, M., Maceviciute, E., & Wilson, T. D. (2006). EURIDICE Project: The Evaluation of Image Database Use in Online Learning. Education for Information, 24(4), 177-192.
    Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, 88(2), 303-338. doi:10.1007/s11263-009-0275-4
    Friedrichs, K., Münster, S., Kröber, C., & Bruschke, J. (2018). Creating Suitable Tools for Art and Architectural Research with Historic Media Repositories. In S. Münster, K. Friedrichs, F. Niebling, & A. Seidel-Grzesińska (Eds.), Digital Research and Education in Architectural Heritage (pp. 117-138). Springer International Publishing.
    Girshick, R. (2015). Fast R-CNN. ArXiv:1504.08083 [Cs]. Retrieved from http://arxiv.org/abs/1504.08083
    Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2013). Rich feature hierarchies for accurate object detection and semantic segmentation. ArXiv:1311.2524 [Cs]. Retrieved from http://arxiv.org/abs/1311.2524
    He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. ArXiv:1703.06870 [Cs]. Retrieved from http://arxiv.org/abs/1703.06870 (arXiv: 1703.06870)
    Hockey, S. M. (2004). The History of Humanities Computing. In S. Schreibman, R. Siemens, & J. Unsworth (Eds.), A Companion to Digital Humanities (pp. 3-19). Oxford: Blackwell Publishing. Retrieved from http://discovery.ucl.ac.uk/12274/
    Hwang, G.-J., Yang, L.-H., & Wang, S.-Y. (2013). A concept map-embedded educational computer game for improving students’ learning performance in natural science courses. Computers & Education, 69, 121-130. doi:10.1016/j.compedu.2013.07.008
    Im, D.-H., & Park, G.-D. (2015). Linked tag: image annotation using semantic relationships between image tags. Multimedia Tools and Applications, 74(7), 2273-2287. doi:10.1007/s11042-014-1855-z
    Ivasic-Kos, M., Ipsic, I., & Ribaric, S. (2015). A knowledge-based multi-layered image annotation system. Expert Systems with Applications, 42(24), 9539-9553. doi:10.1016/j.eswa.2015.07.068
    Jin, C., & Jin, S.-W. (2016). Image distance metric learning based on neighborhood sets for automatic image annotation. Journal of Visual Communication and Image Representation, 34, 167-175. doi:10.1016/j.jvcir.2015.10.017
    Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., … Dollár, P. (2014). Microsoft COCO: Common Objects in Context. ArXiv:1405.0312 [Cs]. Retrieved from http://arxiv.org/abs/1405.0312 (arXiv: 1405.0312)
    Llamas, J., Lerones, P. M., Zalama, E., & Gómez-García-Bermejo, J. (2016). Applying Deep Learning Techniques to Cultural Heritage Images Within the INCEPTION Project. In M. Ioannides, E. Fink, A. Moropoulou, M. Hagedorn-Saupe, A. Fresa, G. Liestøl, … P. Grussenmeyer (Eds.), Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection (pp. 25-32). Springer International Publishing.
    Lorang, E., Soh, L.-K., Datla, M. V., & Kulwicki, S. (2015). Developing an Image-Based Classifier for Detecting Poetic Content in Historic Newspaper Collections. D-Lib Magazine, 21(7/8). doi:10.1045/july2015-lorang
    Maihami, V., & Yaghmaee, F. (2017). A review on the application of structured sparse representation at image annotation. Artificial Intelligence Review, 48(3), 331-348. doi:10.1007/s10462-016-9502-x
    McCay-Peet, L., & Toms, E. (2009). Image use within the work task model: Images as information and illustration. Journal of the American Society for Information Science and Technology, 60(12), 2416-2429. doi:10.1002/asi.21202
    Münster, S., Kamposiori, C., Friedrichs, K., & Kröber, C. (2018). Image libraries and their scholarly use in the field of art and architectural history. International Journal on Digital Libraries, 19(4), 367-383. doi:10.1007/s00799-018-0250-1
    Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. ArXiv:1506.01497 [Cs]. Retrieved from http://arxiv.org/abs/1506.01497 (arXiv: 1506.01497)
    Schonfeld, R., & Long, M. (2015). Supporting the Changing Research Practices of Art Historians. New York: Ithaka S+R. doi:10.18665/sr.22833
    Schreibman, Susan. (2012). Digital Humanities: Centres and Peripheries. Historical Social Research-Historische Sozialforschung, 37(3), 46-58.
    Terras, M. (2012). Image Processing and Digital Humanities. In M. Terras, J. Nyhan, & C. Warwick (Eds.), Digital Humanities in Practice (pp. 71-90). Facet. Retrieved from http://discovery.ucl.ac.uk/1327983/
    Tikka, P. (2006). Image Retrieval: Theory and Research. Leonardo, 39(3), 268-269. doi:10.1162/leon.2006.39.3.268a
    Wang, J. Z., Grieb, K., Zhang, Y., Chen, C., Chen, Y., & Li, J. (2006). Machine annotation and retrieval for digital imagery of historical materials. International Journal on Digital Libraries, 6(1), 18-29. doi:10.1007/s00799-005-0121-4
    Warwick, C. (2012). Studying users in digital humanities. In C. Warwick, M. Terras, & J. Nyhan (Eds.), Digital Humanities in Practice (1st ed., pp. 1-22). Facet. doi:10.29085/9781856049054.002
    Whitelaw, M. (2015). Generous Interfaces for Digital Cultural Collections. Digital Humanities Quarterly, 009(1).
    Zhang, D., Islam, M. M., & Lu, G. (2012). A review on automatic image annotation techniques. Pattern Recognition, 45(1), 346-362. doi:10.1016/j.patcog.2011.05.013
    Description: 碩士
    國立政治大學
    圖書資訊與檔案學研究所
    1061550181
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G1061550181
    Data Type: thesis
    DOI: 10.6814/NCCU201900502
    Appears in Collections:[圖書資訊與檔案學研究所] 學位論文

    Files in This Item:

    File SizeFormat
    018101.pdf6941KbAdobe PDF0View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback