Web data mining; Privacy-preserving data mining; Social Networking Analysis; Big data management and intelligent analysis (NewSQL, NoSQL)
Project | Description |
---|---|
Web spam detection | Web spam attempts to influence search engine ranking algorithm in order to boost the rankings of specific web pages in search engine results.10%-15% Web pages were deliberately contaminated by hidden links and objectionable content. Various spamming tricks including advertisement or objectionable content injection, hidden links attack,cloaking and redirection, etc. The objectives of Web spamming are to gain more benefits or to attack. This project will tackle the various challengeable problems of Web spam by modeling junk Web pages, extracting the spam features, analyzing the spammed content and URL,designing and improving the malicious Web page detection approaches. |
Privacy-preserving data mining | In 1998, Ann Cavoukian posed a very serious question in her paper "Data mining: Staking a claim on your privacy", i.e., data mining may be the biggest challenge that individual privacy protectors will face in the next decade. At present, the research and application of big data are in full swing. How to discover the value of big data without disclosing users' privacy is a key issue in the big data mining research area. Research topics in this field (PPDM: Privacy-Preserving Data Mining) include:
The researches can be conducted with combined distributed technology and data mining on the various aspects such as data publish, mining algorithm, and mining rule release. Most of PPDM methods protect data at the cost of a decline of information usability and mining accuracy. The purpose of these approaches is to find a trade-off among accuracy and privacy. |
Web Fraud Mining | As Web information and applications are becoming increasingly rich and wide, lots of fraud onslaughts attack rampantly. New fraud and spam types appear, such as social networking fraud, multimedia spam, click fraud/spam, which main purpose is still to profit illegally from the cheating. The research is currently conducted on several objects: Twitter spammer discovery; Multimedia spam detection; Comment spam mining; Click fraud detection. Our focuses include spamming tricks and mechanism investigation, discriminative features extraction, high-performance detection algorithms development, etc. |
Web source quality assessment mechanism | The extremely rich Web resources make the information acquisition and decision making very much easily. However,the Web source quality is very problematic due to the peculiar characteristics of the Web, such as, dynamics and autonomy of Web sources, enormous amount and various types of Web data, multifarious quality requirements of Web applications, etc. These result in uneven and uncertain information quality and inferior Web-based planning and strategy making. With the popularization of Wiki sites, the Web source quality becomes increasing challenge. In this multistage project, we have proposed a Web quality model - WebQM for capturing the Web quality features from 3 dimensions.The feasibility and effectiveness of WebQM has been verified by SEM with actually observed data.We have developed the evaluation approaches under fuzzy environments based on WebQM and implemented a prototype of Web quality fuzzy assessment system,where the sensitivity analysis of the evaluation approach was carried out. Our current work is to model the quality problem of Wiki sources and modify WebQM and the evaluation mechanism for assessing the content quality of Wiki sites. |
Product digitized design and manufacture services based on Internet+ | With the implementation of China's 2025 manufacturing plan, Internet plus technology has led to the deeply integration of industry and information. The conventional patterns and methods for product design and manufacture will be changed. It is necessary and very important to study and develope new model, tools and platforms for adapting such a transformation. This project was carried out as follows:
|
High-speed rail big data management system | This project is key part of the digital simulation platform of high-speed rail. A lot of problems must be dealt with for building the platform, which includes heterogeneous and multi-structured high-speed rail data,a huge amount of data exchange among subsystems,the efficiency and system-independency of data access. This project will develop a data management system for solving the issues above and for supporting the branch- and coupling-simulation,as well as supporting the multi-dimensional and multi-level visualization.Specific approaches are designed for multi-source data ETL, loose-coupling data management,and multi-branch data access and fusion. |
Yan Zhu. Technology and Practice of Big Data Intelligent Management and Analysis—From Data Warehouse/OLAP to NoSQL and NewSQL. Southwest Jiaotong University Press, Chengdu, 2019
Yan Zhu. Integrating External Data from Web Sources into a Data Warehouse for OLAP and Decision Making. SHAKER Verlag, Aachen, Germany, 2004
Yan Zhu. Web Services Technology and Applications in a Web X.0 Environment. Southwest Jiaotong University Press, Chengdu, 2011
Cryptanalysis of T-function Based Ultralightweight RFID Authentication Protocols. In Collected Works: “Progress on Crytography - 20 Years of Cryptography in Taiwan”. McGraw-Hill International Enterprises,Taiwan, 2014
Jia-Qing Wang, Yan Zhu, Huan He, Chun-Ping Li. Less is More: Feature Choosing under Privacy-Preservation for Efficient Web Spam Detection. DEXA2021, Austria, Sept. 2021. (EI)
Xu Zhuang,Yan Zhu, Qiang Peng, Faisal Khurshid. Using deep belief network to demote web spam. Future Generation Computer Systems. 118(2021): 94-106. (SCI)
Rong Wang, Benjamin C. M. Fung, Yan Zhu, Qiang Peng.Differentially private data publishing for arbitrarily partitioned data. Information Sciences, 553(2021):247-265. (SCI)
Cheng Cheng, Chunping Li, Yongfang Han, Yan Zhu. A semi-supervised deep learning image caption model based on Pseudo Label and N-gram. International Journal of Approximate Reasoning. 131(2021): 93-107. (SCI)
Rong Wang, Benjamin C.M.Fung, Yan Zhu. Heterogeneous data release for cluster analysis with differential privacy. Knowledge-Based Systems, 2020 (SCI)
Rong Wang, Yan Zhu, Chin-Chen Chang, Qiang Peng. Privacy-Preserving High-dimensional data publishing for classification. Computer & Security, Vol 93, June 2020. (SCI)
Liangqiang Huang, Yan Zhu, Xin Wang, Faisal Khurshid. An Attribute-Based Fine-Grained Access Control Mechanism for HBase. DEXA2019, Austria, Aug 2019. (EI)
Jiefan Tan, Yan Zhu, Qiang Du. Triplet-CSSVM: Integrating Triplet-Sampling CNN and Cost-Sensitive Classification for Imbalanced Image Detection. DEXA2019, Austria, Aug 2019. (EI)
Faisal Khurshid,Yan Zhu, Xu Zhuang, Mushtaq Ahmad, Muqeet Ahmad. Enactment of Ensemble Learning for Review Spam Detection on Selected Features.International Journal of Computational Intelligence Systems. 12(1):387-394, 2019.(SCI)
Rong Wang,Yan Zhu, Tung-Shou Chen, Chin-Chen Chang. Privacy-Preserving Algorithms for Multiple Sensitive Attributes Satisfying t-Closeness. Journal of Computer Science and Technology. 33(6):1-12, 2018.(SCI)
Rong Wang,Yan Zhu, Tung-Shou Chen, Chin-Chen Chang. An Authentication Method Based on the Turtle Shell Algorithm for Privacy-Preserving Data Mining. The Computer Journal. 61(8): 1123–1132, 2018.(SCI)
Xu Zhuang,Yan Zhu, Chin-Chen Chang, Qiang Peng. Security Issues in Ultralightweight RFID Authentication Protocols. Wireless Personal Communications. 98(1):779-814, 2018. (SCI)
Xu Zhuang,Yan Zhu, Chin-Chen Chang, Qiang Peng. A Unified Score Propagation Model for Web Spam Demotion Algorithm. Information Retrieval Journal. 20(6):547-574, 2017.(SCI)
Xu Zhuang,Yan Zhu, Chin-Chen Chang, Qiang Peng. Feature Bundling in Decision Tree Algorithm, Intelligent Data Analysis. 21(2):371-383,2017. (SCI)
Sha Wei, Yan Zhu.Cleaning out Web Spam by Entropy-based Cascade Outlier Detection. DEXA2017, France, Aug 2017. (EI)
Rong Wang, Yan Zhu, Jiefan Tan, Binbin Zhou. Detection of Malicious Web Pages Based on Hybrid Analysis. Journal of Information Security and Applications. 35:68-74, 2017. (EI)
Faisal Khurshid,Yan Zhu. Recital of Supervised Learning on Review Spam Detection: An Empirical Analysis. The 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2017), China, Nov. 2017. (EI)
Shou-Hong Tang, Yan Zhu, Fan Yang, Qing Xu. Ascertaining Spam Web Pages Based on Ant Colony Optimization Algorithm. DEXA2014, Springer LNCS 8645. Sept 2014. (EI)
Bo Zhao,Yan Zhu. Formalizing and Validating the Web Quality Model for Web Source Quality Evaluation. Expert System with Application. 41(7): 3306-3312, 2014. (SCI)
X. Zhuang, Y. Zhu and C. C. Chang. A New Ultralightweight RFID Protocol for Low-Cost Tags: R2AP. Wireless Personal Communications. 77(4): 1787-1802, 2014. (SCI)
Shou-Hong Tang, Yan Zhu, Fan Yang, Qing Xu. Ascertaining Spam Web Pages Based on Ant Colony Optimization Algorithm. DEXA2014, Springer LNCS 8645. Sept 2014. (EI)
Multi-degree Detection of Web Spam with Web Quality Features, IEEE International Conference on Communication Software and Networks, Chengdu, PR China, June 6-7, 2015
Web Quality/Safety Issues and Our Solutions. The 3rd International Conference on Computer Technology and Development (ICCTD 2011). Nov. 26-29, 2011.
Web Source/Information Quality Modeling and Assessment. The 3rd International Conference on Computer and Electrical Engineering (ICCEE 2010). Nov. 17-18, 2010.
Conference presentation
An Attribute-Based Fine-Grained Access Control Mechanism for HBase. DEXA2019, Linz, Austria, Aug 2019.
Integrating Triplet-Sampling CNN and Cost-Sensitive Classification for Imbalanced Image Detection. DEXA2019, Linz, Austria, Aug 2019.
Cleaning out Web Spam by Entropy-based Cascade Outlier Detection.The 28th International Conference on Database and Expert Systems Applications (DEXA 2017 ). France Lyon, Aug.27-Sep. 1,2017.
Blocking Web Spam by Entropy-based Cascade Outlier Detection. Department of Computer Science, Georgia State University, USA, Sep.3,2016.
A Brief Introduction of Internet and Web Spam Detection. Tsinghua University. April 16-17, 2015.
Ascertaining Spam Web Pages Based on Ant Colony Optimization Algorithm. The 25th International Conference on Database and Expert Systems Applications (DEXA 2014 ). Germany, Sep. 1-4,2014.
WSD-AIS: an Artificial Immune System Approach against Web Spam. International Computer Symposium (ICS). Taiwan, Dec. 12-14,2014.
Visual clusttering reduction for visualizing large spatio-temporal data sets. IEEE CSE2014, Chengdu, Dec. 19-21,2014.
The Comfort Evaluation of Wearable Devices Using Fuzzy Assessment Technologies. Sino-German Bilateral Workshop on Wearable Computing(WearCom2010). Chengdu, Aug. 24-29,2010.
Academic visiting
August 25~August 31,2019: Johannes Kepler University Linz, Austria
September 18~October 1,2018: Technical University of Braunschweig; Ostfalia University of Applied Sciences; Albert-Ludwig University of Freiburg, Germany
Feburary 12~16,2018: Harvard University; Massachusetts Institute of Technology (MIT); IBM, USA
Feburary 06~09,2018: Georgia State University; Georgia Institute of Technology, USA
August 27~September 01,2017: Université Jean Moulin-Lyon 3, France
March 03~19, 2017: Leeds University, United Kingdom
September 02~17,2016: Georgia State University; Northwestern University, USA
February 02 ~ 12,2016: Politecnico di Torino, Italy
December 12-14, 2014: Feng Chia University); Tunghai Univerisity,Taichung, China
September 01~04,2014: Ludwig Maximilian University of Munich, Darmstadt University of Technology, Germany
TPC member: DEXA (2011-2021), CMS (2010-2016), IEEE ICEBE 2015,2018-2020, ICS2014, IEEE CSAE (2012-2013), MSVVEIS (2012), RSKT (2009-2010)
Chair of international conference organization committee: ICCTD2011, MDMKD2009
Journal reviewer:
IEEE Transactions on Knowledge and Data Engineering (TKDE)
Expert Systems with Applications (ESWA)
Journal of Information Security and Applications
Transactions on Large-Scale Data- and Knowledge-Centered Systems (TLDKS)
The Journal of Universal Computer Science (J. UCS)
Information Systems Journal(ISJ)
Recent Patents on Computer Science