Web data mining; Privacy-preserving data mining; Social Networking Analysis; Big data management and intelligent analysis (NewSQL, NoSQL)
Project | Description |
---|---|
Web spam detection | Web spam attempts to influence search engine ranking algorithm in order to boost the rankings of specific web pages in search engine results.10%-15% Web pages were deliberately contaminated by hidden links and objectionable content. Various spamming tricks including advertisement or objectionable content injection, hidden links attack,cloaking and redirection, etc. The objectives of Web spamming are to gain more benefits or to attack. This project will tackle the various challengeable problems of Web spam by modeling junk Web pages, extracting the spam features, analyzing the spammed content and URL,designing and improving the malicious Web page detection approaches. |
Privacy-preserving data mining | In 1998, Ann Cavoukian posed a very serious question in her paper "Data mining: Staking a claim on your privacy", i.e., data mining may be the biggest challenge that individual privacy protectors will face in the next decade. At present, the research and application of big data are in full swing. How to discover the value of big data without disclosing users' privacy is a key issue in the big data mining research area. Research topics in this field (PPDM: Privacy-Preserving Data Mining) include:
The researches can be conducted with combined distributed technology and data mining on the various aspects such as data publish, mining algorithm, and mining rule release. Most of PPDM methods protect data at the cost of a decline of information usability and mining accuracy. The purpose of these approaches is to find a trade-off among accuracy and privacy. |
Web Fraud Mining | As Web information and applications are becoming increasingly rich and wide, lots of fraud onslaughts attack rampantly. New fraud and spam types appear, such as social networking fraud, multimedia spam, click fraud/spam, which main purpose is still to profit illegally from the cheating. The research is currently conducted on several objects: Twitter spammer discovery; Multimedia spam detection; Comment spam mining; Click fraud detection. Our focuses include spamming tricks and mechanism investigation, discriminative features extraction, high-performance detection algorithms development, etc. |
Web source quality assessment mechanism | The extremely rich Web resources make the information acquisition and decision making very much easily. However,the Web source quality is very problematic due to the peculiar characteristics of the Web, such as, dynamics and autonomy of Web sources, enormous amount and various types of Web data, multifarious quality requirements of Web applications, etc. These result in uneven and uncertain information quality and inferior Web-based planning and strategy making. With the popularization of Wiki sites, the Web source quality becomes increasing challenge. In this multistage project, we have proposed a Web quality model - WebQM for capturing the Web quality features from 3 dimensions.The feasibility and effectiveness of WebQM has been verified by SEM with actually observed data.We have developed the evaluation approaches under fuzzy environments based on WebQM and implemented a prototype of Web quality fuzzy assessment system,where the sensitivity analysis of the evaluation approach was carried out. Our current work is to model the quality problem of Wiki sources and modify WebQM and the evaluation mechanism for assessing the content quality of Wiki sites. |
Product digitized design and manufacture services based on Internet+ | With the implementation of China's 2025 manufacturing plan, Internet plus technology has led to the deeply integration of industry and information. The conventional patterns and methods for product design and manufacture will be changed. It is necessary and very important to study and develope new model, tools and platforms for adapting such a transformation. This project was carried out as follows:
|
High-speed rail big data management system | This project is key part of the digital simulation platform of high-speed rail. A lot of problems must be dealt with for building the platform, which includes heterogeneous and multi-structured high-speed rail data,a huge amount of data exchange among subsystems,the efficiency and system-independency of data access. This project will develop a data management system for solving the issues above and for supporting the branch- and coupling-simulation,as well as supporting the multi-dimensional and multi-level visualization.Specific approaches are designed for multi-source data ETL, loose-coupling data management,and multi-branch data access and fusion. |