Changes for document Session Big Data
From version 11.1
edited by Catherine Nuel
on 2012/11/14 12:01
on 2012/11/14 12:01
To version 12.1
edited by Selvalakshmi R
on 2012/11/16 11:35
on 2012/11/16 11:35
Change comment: There is no comment for this version
| Metadata changes | ||
|---|---|---|
| Property | Previous value | New value |
| Document author | Catherine Nuel | Selvalakshmi R |
| Content changes |
|---|
**Schedule**: 12:0010:00 - 12:15am10:15am **Schedule:** Thursday Nov 29, 12:1510:15 - 12:30pm10:30am ===== Talend: The Big Challenge of Big Data and Hadoop Integration ===== **Schedule:** Thursday Nov 29, 12:3010:30 - 12:45pm10:45am ===== BPMconseil: Using Vanilla to manage Hadoop database ===== **Schedule:** Thursday Nov 29, 12:4510:45 - 01:00pm11:00am ===== PKU: Tracking code evolution for open source universe =====11:00 - 11:15 : Coffee Break **Speaker:**Minghui Zhou, Peking University **Schedule:** Thursday Nov 29, 02:00 - 02:15pm **Abstract:** The existing large amount of OSS artifacts has provided abundant materials for understanding how code is reused in open source universe, in particular, what code pieces are mostly reused, in what circumstances people reuse code, and so forth. Understanding this process could help with legacy software maintenance, as well as help to explore best practice of software development. Targeting the change history data of thousands of open source projects, we try to answer the following question: First, how is code reused by other projects? Second, how are code files organized in project and how does this organization structure change over time? To answer these questions, there are several technical difficulties we have to overcome. For example, because of the different kinds of VCSs, it is hard to figure out a uniform model which can represent the evolution progress of code files stored in them. Also, each VCS may have its own data format, so, extracting data from them is a big challenge. Furthermore, using current software algorithm and hardware platform to analyze the version iteration and reuse information of about a billion code files is another challenge. |
Follow us on Twitter
Network @ LinkedIn









