Professor

Personal Information

  • Business Address: 西南交通大学犀浦校区9教办公室
  • Alma Mater: 德国,达姆斯塔特工大
  • School/Department: 计算机与人工智能学院
  • Discipline:Software Engineering
    Computer Application Technology
    Computer Science and Technology
  • VIEW MORE

    Other Contact Information:

    Other :

    Email :


    Home > Research
    Research topics

      Research interests

      Web data mining; Privacy-preserving data mining; Social Networking Analysis; Big data management and intelligent analysis (NewSQL, NoSQL)

      Projects


      Project Description
      Web spam detection

      Web spam attempts to influence search engine ranking algorithm in order to boost the rankings of specific web pages in search engine results.10%-15% Web pages were  deliberately contaminated by hidden links and objectionable content. Various spamming tricks including advertisement or objectionable content injection, hidden links attack,cloaking and redirection, etc. 


      The objectives of Web spamming are to gain more benefits or to attack. This project will tackle the various challengeable problems of Web spam by modeling junk Web pages, extracting the spam features, analyzing the spammed  content and URL,designing and improving the malicious Web page detection approaches.


      Privacy-preserving data mining

      In 1998, Ann Cavoukian posed a very serious question in her paper "Data mining: Staking a claim on your privacy", i.e., data mining may be the biggest challenge that individual privacy protectors will face in the next decade. At present, the research and application of big data are in full swing. How to discover the value of big data without disclosing users' privacy is a key issue in the big data mining research area.

      Research topics in this field (PPDM: Privacy-Preserving Data Mining) include:

      • To analyze the individual cases of privacy issues involved in data mining from the aspect of society and law;

      • To study new approaches or improve the existing mining algorithms by integrating data security strategies(encryption, hiding, etc.), which protect sensitive information as more as possible.


      The researches can be conducted with combined distributed technology and data mining on the various aspects such as data publish, mining algorithm, and mining rule release. Most of PPDM methods protect data at the cost of a decline of information usability and mining accuracy. The purpose of these approaches is to find a trade-off among accuracy and privacy.


      Web Fraud Mining

      As Web information and applications are becoming increasingly rich and wide, lots of fraud onslaughts attack rampantly. New fraud and spam types appear, such as social networking fraud, multimedia spam, click fraud/spam, which main purpose is still to profit illegally from the cheating.

      The research is currently conducted on several objects: Twitter spammer discovery; Multimedia spam detection; Comment spam mining; Click fraud detection. Our focuses include spamming tricks and mechanism investigation, discriminative features extraction, high-performance detection algorithms development, etc.


      Web source quality assessment mechanism

      The extremely rich Web resources make the information acquisition and decision making very much easily. However,the Web source  quality is very problematic due to the peculiar characteristics of the Web,  such as, dynamics and autonomy of Web sources, enormous amount and various  types of Web data, multifarious quality requirements of Web applications, etc. These result in uneven and uncertain information quality and inferior Web-based planning and strategy making. With the popularization of Wiki sites, the Web source quality becomes increasing challenge.

      In this multistage project, we have proposed a Web quality model - WebQM for capturing the Web quality features from 3 dimensions.The feasibility and effectiveness of WebQM has been verified by SEM with actually observed data.We have developed the evaluation approaches under fuzzy environments based on WebQM and implemented a prototype of Web quality fuzzy assessment system,where the sensitivity analysis of the evaluation approach  was carried out. Our current work is to model the quality problem of Wiki sources and modify WebQM and the evaluation mechanism for assessing the content quality of Wiki sites.


      Product digitized design and manufacture services based on Internet+

      With the implementation of China's 2025 manufacturing plan, Internet plus technology has led to the deeply integration of industry and information. The conventional  patterns and methods for product design and manufacture will be changed. It is necessary and very important to study and develope new model, tools and platforms for adapting such a transformation.

      This project was carried out as follows:

      • Based on mobile Internet, a product design mode with crowd-creating is investigated, and a product crowd-design platform will be built in mobile phone, notebook computer, and personal computer, etc.;

      • A crowd-innovation service platform is established with the combination of virtual design technology and cloud computing;

      • A typical product design and development process as an examples of the digitized design pattern and service will be excuted on the platform for the demonstration.


      High-speed rail big data management system

      This project is key part of the digital simulation platform of high-speed rail. A lot of problems must be dealt with for building the platform, which includes heterogeneous and multi-structured high-speed rail data,a huge amount of data exchange among subsystems,the efficiency and system-independency of data access. This project will develop a data management system for solving the issues above and for supporting the branch- and coupling-simulation,as well as supporting the multi-dimensional and multi-level visualization.Specific approaches are designed for multi-source data ETL, loose-coupling data management,and multi-branch data access and fusion.


       


    Publications

                 

      Book and book chapter

      • Yan Zhu. Technology and Practice of Big Data Intelligent Management and Analysis—From Data Warehouse/OLAP to NoSQL and NewSQL. Southwest Jiaotong University Press, Chengdu, 2019

      • Yan Zhu. Integrating External Data from Web Sources into a Data Warehouse for OLAP and Decision Making. SHAKER Verlag, Aachen, Germany, 2004

      • Yan Zhu. Web Services Technology and Applications in a Web X.0 Environment. Southwest Jiaotong University Press, Chengdu, 2011

      • Cryptanalysis of T-function Based Ultralightweight RFID Authentication Protocols. In Collected Works: “Progress on Crytography - 20 Years of Cryptography in Taiwan”. McGraw-Hill International Enterprises,Taiwan, 2014

      Selected papers (Corresponding author)

      • Jia-Qing Wang, Yan Zhu, Huan He, Chun-Ping Li. Less is More: Feature Choosing under Privacy-Preservation for Efficient Web Spam Detection. DEXA2021, Austria, Sept. 2021. (EI)

      • Xu Zhuang,Yan Zhu, Qiang Peng, Faisal Khurshid. Using deep belief network to demote web spam. Future Generation Computer Systems. 118(2021): 94-106. (SCI)

      • Rong Wang, Benjamin C. M. Fung, Yan Zhu, Qiang Peng.Differentially private data publishing for arbitrarily partitioned data. Information Sciences, 553(2021):247-265. (SCI)

      • Cheng Cheng, Chunping Li, Yongfang Han, Yan Zhu. A semi-supervised deep learning image caption model based on Pseudo Label and N-gram. International Journal of Approximate Reasoning. 131(2021): 93-107. (SCI)

      • Rong Wang, Benjamin C.M.Fung, Yan Zhu. Heterogeneous data release for cluster analysis with differential privacy. Knowledge-Based Systems, 2020 (SCI) 

      • Rong Wang, Yan Zhu, Chin-Chen Chang, Qiang Peng. Privacy-Preserving High-dimensional data publishing for classification. Computer & Security, Vol 93, June 2020. (SCI)

      • Liangqiang Huang, Yan Zhu, Xin Wang, Faisal Khurshid. An Attribute-Based Fine-Grained Access Control Mechanism for HBase. DEXA2019, Austria, Aug 2019. (EI)

      • Jiefan Tan, Yan Zhu, Qiang Du. Triplet-CSSVM: Integrating Triplet-Sampling CNN and Cost-Sensitive Classification for Imbalanced Image Detection. DEXA2019, Austria, Aug 2019. (EI)

      • Faisal Khurshid,Yan Zhu, Xu Zhuang, Mushtaq Ahmad, Muqeet Ahmad. Enactment of Ensemble Learning for Review Spam Detection on Selected Features.International Journal of Computational Intelligence Systems. 12(1):387-394, 2019.(SCI)

      • Rong Wang,Yan Zhu, Tung-Shou Chen, Chin-Chen Chang. Privacy-Preserving Algorithms for Multiple Sensitive Attributes Satisfying t-Closeness. Journal of Computer Science and Technology. 33(6):1-12, 2018.(SCI)

      • Rong Wang,Yan Zhu, Tung-Shou Chen, Chin-Chen Chang. An Authentication Method Based on the Turtle Shell Algorithm for Privacy-Preserving Data Mining. The Computer Journal. 61(8): 1123–1132, 2018.(SCI)

      • Xu Zhuang,Yan Zhu, Chin-Chen Chang, Qiang Peng. Security Issues in Ultralightweight RFID Authentication Protocols. Wireless Personal Communications. 98(1):779-814, 2018. (SCI)

      • Xu Zhuang,Yan Zhu, Chin-Chen Chang, Qiang Peng. A Unified Score Propagation Model for Web Spam Demotion Algorithm. Information Retrieval Journal. 20(6):547-574, 2017.(SCI)

      • Xu Zhuang,Yan Zhu, Chin-Chen Chang, Qiang Peng. Feature Bundling in Decision Tree Algorithm, Intelligent Data Analysis. 21(2):371-383,2017. (SCI)

      • Sha Wei, Yan Zhu.Cleaning out Web Spam by Entropy-based Cascade Outlier Detection. DEXA2017, France, Aug 2017. (EI)

      • Rong Wang, Yan Zhu, Jiefan Tan, Binbin Zhou. Detection of Malicious Web Pages Based on Hybrid Analysis. Journal of Information Security and Applications. 35:68-74, 2017. (EI)

      • Faisal Khurshid,Yan Zhu. Recital of Supervised Learning on Review Spam Detection: An Empirical Analysis. The 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2017), China, Nov. 2017. (EI)

      • Shou-Hong Tang, Yan Zhu, Fan Yang, Qing Xu. Ascertaining Spam Web Pages Based on Ant Colony Optimization Algorithm. DEXA2014, Springer LNCS 8645. Sept 2014. (EI)

      • Bo Zhao,Yan Zhu. Formalizing and Validating the Web Quality Model for Web Source Quality Evaluation. Expert System with Application. 41(7): 3306-3312, 2014. (SCI)

      • X. Zhuang, Y. Zhu and C. C. Chang. A New  Ultralightweight RFID Protocol for Low-Cost Tags: R2AP. Wireless Personal  Communications. 77(4): 1787-1802, 2014. (SCI)

      • Shou-Hong Tang, Yan Zhu, Fan Yang, Qing Xu. Ascertaining Spam Web Pages Based on Ant Colony Optimization Algorithm. DEXA2014, Springer LNCS 8645. Sept 2014. (EI)


      Keynote speech

      • Multi-degree Detection of Web Spam with Web Quality Features, IEEE International Conference on Communication Software and Networks, Chengdu, PR China, June 6-7, 2015

      • Web  Quality/Safety Issues and Our SolutionsThe 3rd International Conference on Computer  Technology and Development (ICCTD 2011). Nov. 26-29, 2011.

      • Web Source/Information  Quality Modeling and Assessment. The 3rd International Conference on Computer and Electrical Engineering (ICCEE 2010). Nov. 17-18, 2010.


      Conference presentation

      • An Attribute-Based Fine-Grained Access Control Mechanism for HBase. DEXA2019, Linz, Austria, Aug 2019.  

      • Integrating Triplet-Sampling CNN and Cost-Sensitive Classification for Imbalanced Image Detection. DEXA2019, Linz, Austria, Aug 2019.

      • Cleaning out Web Spam by Entropy-based Cascade Outlier Detection.The 28th International Conference on Database and Expert Systems Applications (DEXA 2017 ). France Lyon, Aug.27-Sep. 1,2017.

      • Blocking Web Spam by Entropy-based Cascade Outlier Detection. Department of Computer Science, Georgia State University, USA, Sep.3,2016.

      • A Brief Introduction of Internet and Web Spam Detection. Tsinghua University. April 16-17, 2015.

      • Ascertaining Spam Web Pages Based on Ant Colony Optimization Algorithm. The 25th International Conference on Database and Expert Systems Applications (DEXA 2014 ). Germany, Sep. 1-4,2014.

      • WSD-AIS: an Artificial Immune System Approach against Web Spam. International Computer Symposium (ICS). Taiwan, Dec. 12-14,2014.

      • Visual clusttering reduction for visualizing large spatio-temporal data sets. IEEE CSE2014, Chengdu, Dec. 19-21,2014.

      • The Comfort Evaluation of  Wearable Devices Using Fuzzy Assessment Technologies. Sino-German Bilateral Workshop on Wearable Computing(WearCom2010). Chengdu, Aug. 24-29,2010.


      Academic visiting

      • August 25~August 31,2019: Johannes Kepler University Linz, Austria

      • September 18~October 1,2018: Technical University of Braunschweig; Ostfalia University of Applied Sciences; Albert-Ludwig University of Freiburg, Germany

      • Feburary 12~16,2018: Harvard University; Massachusetts Institute of Technology (MIT); IBM, USA

      • Feburary 06~09,2018: Georgia State University; Georgia Institute of Technology, USA

      • August 27~September 01,2017: Université Jean Moulin-Lyon 3, France

      • March 03~19, 2017: Leeds University, United Kingdom

      • September 02~17,2016: Georgia State University; Northwestern University, USA

      • February 02 ~ 12,2016: Politecnico di Torino, Italy

      • December 12-14, 2014: Feng Chia University); Tunghai Univerisity,Taichung, China

      • September 01~04,2014: Ludwig Maximilian University of Munich, Darmstadt University of Technology, Germany

       Professional service

      • TPC member:  DEXA (2011-2021), CMS (2010-2016), IEEE ICEBE 2015,2018-2020, ICS2014, IEEE CSAE (2012-2013), MSVVEIS (2012), RSKT (2009-2010)

      • Chair of international conference organization committee: ICCTD2011, MDMKD2009

      • Journal reviewer: 

             IEEE Transactions on Knowledge and Data Engineering (TKDE)

            Expert Systems with Applications (ESWA)

            Journal of Information Security and Applications

            Transactions on Large-Scale Data- and Knowledge-Centered Systems (TLDKS)

            The Journal of Universal Computer Science (J. UCS)

            Information Systems Journal(ISJ)

            Recent Patents on Computer Science