Jump to content

EU Centre of ExcellenceISO 9001

ERCIMW3C MemberFraunhofer Project Center

Adatrosta / Data Riddle

Adatrosta / Data Riddle
Full name: Data Riddle - Analyzing of Largescale Database of Web Datawarehouses by Datamining and Statistical Tools
Department: Data Mining and Search Group
Start date: 2002. 11. 01.
End date: 2005. 05. 31.
External identifier: NKFP-2/0017/2002

Project manager

András Benczúr
András Benczúr
Address: 1111 Budapest, Lágymányosi u. 11.
Room number: L 412
Phone: +36 1 279 6172
Fax: +36 1 209 5269
E-mail: benczurEZT_TOROLJE_KI@EZT_TOROLJE_KIsztaki.mta.hu
Homepage: http://datamining.sztaki.hu/

András Lukács

Participants

ELTE, BME, MTA SZTAKI, T-Online (Axelero), econet.hu

Description

The great promise of the digital economy lies in that a rich store of information is available on customer behavior which enables far more accurate and efficient planning than it was possible earlier. The companies and organizations appearing on the web have the possibility to learn their customers by analyzing the usage logs. These analyses can give them statistics on how many people visited their sites, and by mapping the most frequent access routes and using various monitoring techniques, they can also help identify user profiles of similar interest and consumer behaviour.

The aim of this application is to present a system suitable for analyzing log files. The quantity of data to be analyzed at the Internet service provider members of the consortium exceeds the volume that commercial analyzer software can handle, so individual modules need to be developed which feature statistical methods; refined data mining algorithms capable of handling billions of records; database size reduction procedures using random sampling and algebraic solutions; methods of episode mining and Fourier analysis to identify sequential patterns and repeated time periods; spectral decomposition, discriminant analysis and graph theory for the cluster analysis of usage pattern.

We pay special attention to two key development issues. First, we build and test the analyzer software on various platforms to make it architecture independent. Second, we base the analysis on anonymous user identifiers to protect personal data.