Portál ZČU - Browse IS/STAG

Browse IS/STAG (S025)

Main menu for Browse IS/STAG

Search for a Thesis

Print/export:

Data export to PDF format - which you can print easily...

Bookmark this link in your browser so that you may quickly load this IS/STAG page in the future.

Not logged-in user will see only submitted theses.

Only logged-in user will see student personal numbers.

Dates found, count: 1

Search result paging

Found 1 records Print Export to xls List URL

Surname (Maiden name)	Name	Title	Thesis status		Supervisors	Reviewers	Type of thesis	Date of def.	Title
Student	Type of thesis	-	-	-	-	-	-	-	-	-	-
NYKL	Michal	Evaluation of significance based on PageRank variants			Ježek Karel	-	Doctoral thesis	19.04.2016	Evaluation of significance based on PageRank variants
Michal NYKL	Doctoral thesis	0XX	0XX	0XX	0XX	0XX	0XX	0XX	0XX	0XX	0XX

Thesis info Hodnocení významnosti variantami PageRanku

Basic data

Annotation
The document you are accessing is protected by copyright law. Unauthorised use may lead to criminal sanctions.
Name	NYKL Michal
Acad. Yr.	2013/2014
Assigning department	KIV
Date of defence	Apr 19, 2016
Type of thesis	Doctoral thesis
Thesis status	Thesis finished and defended successfully (DUO).
Completeness of mandatory entries	- All mandatory fields for this Thesis are filled in.
Main topic	Hodnocení významnosti variantami PageRanku
Main topic in English	Evaluation of significance based on PageRank variants
Title according to student	Hodnocení významnosti variantami PageRanku
English title as given by the student	Evaluation of significance based on PageRank variants
Parallel name	-
Subtitle	-
Supervisor	Ježek Karel, Prof. Ing. CSc.
Annotation	Tato práce se zabývá výzkumem metod pro hodnocení významnosti vrcholů v rozsáhlých grafových strukturách. Navržené metody jsou aplikovány při vyhodnocení citačních sítí a sítí vytvořených z Linked Data. V úvodu práce jsou popsány cíle, které nás k návrhu nových metod vedly. Následně lze text práce pomyslně rozdělit na dvě části, z nichž první a obsáhlejší část je věnována návrhu metod pro hodnocení autorů vědeckých publikací a druhá část je věnována návrhu metody pro určení klíčových slov textového dokumentu. Společnou vlastností všech navržených metod je použitý algoritmus PageRank. V první části práce je nejprve shrnut aktuální stav poznání v oblasti citační analýzy a zmíněny nejznámější bibliografické databáze a algoritmy, které bývají při citační analýze používány. Zvláštní prostor je věnován popisu algoritmu PageRank, který jsme při výzkumu používali a dále upravovali. Následně první část obsahuje popis návrhu nových metod pro hodnocení významnosti autorů a popis experimentálního ověření jejich kvality. Pro experimenty byly použity datové kolekce CiteSeer, DBLP a WoS, přičemž výsledky získané z kolekce WoS byly, vzhledem k jejím vlastnostem, prohlášeny za nejdůvěryhodnější. Poté, co se prokázala vhodnost nově navržených metod pro hodnocení autorů, jsme provedli další experimenty, jejichž cílem bylo metody ještě více vylepšit. Zde se pro hodnocení autorů ukázalo nejvhodnější parametrizovat PageRank aplikovaný na citační síť publikací významností časopisů, ve kterých byly publikace zveřejněny. Vhodnost navržených metod a platnost vyvozených závěrů byly ověřeny také vyhodnocením specializovaných kategorií WoS. V druhé části práce jsou nejprve zmíněny významné práce z oblasti klasifikace textových dokumentů a z oblasti využití PageRanku pro extraktivní sumarizaci obsahu dokumentu. Následně je popsán návrh naší metody pro volbu klíčových slov textového dokumentu. Tato metoda využívá PageRank a Linked Data, čímž dokáže určit k textu dokumentu vysoce relevantní klíčová slova, která v textu nemusejí být explicitně uvedena. Kvalita navržené metody byla experimentálně ověřena jejím použitím v klasifikátoru dokumentů, který byl aplikován na dokumenty z kolekce diskusních článků 20 Newsgroups a na dokumenty z vlastní kolekce konferenčních Call-for-Papers. Určená klíčová slova byla použita jako vlastnosti dokumentů. Závěrem bylo, že navržená metoda je vhodná zejména v situacích, kdy máme malé množství dat pro natrénování klasifikátoru. Autorovy vědecké přínosy, které jsou popsány v této práci, byly publikovány formou pěti vědeckých článků, z nichž dva byly zveřejněny v časopisech a tři v konferenčních sbornících.
Annotation in English	This thesis deals with the research of methods of evaluating the significance of nodes in large graph structures. The proposed methods are applied to evaluating citation networks and networks created from Linked Data. The introduction describes the goals that led us to propose the new methods. The text is divided into two parts, while the first one deals with the suggestion of methods of evaluating the authors of scientific publications, the second part is dedicated to the suggestion of a method of determining text document keywords. The common feature of all the proposed methods is the use of the PageRank algorithm. The first part provides the summary of the current state of knowledge in citation analysis and there are mentioned the best known bibliographic databases and algorithms that are used in the citation analysis. A special section is devoted to the description of the PageRank algorithm, which we used and further modified in our research. Subsequently, the first part contains the description of the new evaluation methods of author's significance and the description of the experimental verification of their quality. For the experiments, we used the CiteSeer, DBLP and WoS data collections, while the results obtained from the WoS collection have been declared as the most accurate, due to its characteristics. After proving the suitability of the newly developed evaluation methods of authors, we performed additional experiments aimed at their further improvement. The most appropriate author's evaluation method proved to be PageRank applied to the citation network of publications and parameterized with the significance of journals in which the publications were published. The suitability of the proposed methods and the validity of the drawn conclusions were also verified by the evaluation of WoS specialized categories. In the second part we first mention the most significant works in the field of text documents classification and in the field of PageRank using for extractive summarization of the document content. Then we describe our suggested method for the text document keywords selection. This method uses PageRank and Linked Data, so that it can identify the most relevant keywords from the text, which may not even be explicitly present. The quality of the proposed method was experimentally verified by using it in a document classifier, which has been applied to the documents from the collection of 20 Newsgroups discussion articles and also on documents from our own collection of conference Call-for-Papers. The identified keywords have been used as document features. The conclusion is that the method is particularly suitable in situations where we have a small amount of data for training the classifier. The author's scientific contributions that are described in this thesis have been published in the form of five scientific articles, two of which were in journals and three in conference proceedings.
Keywords	dolování dat, citační analýza, PageRank, hodnocení autorů, volba vlastností textových dokumentů
Keywords in English	data-mining, citation analysis, PageRank, author evaluation, feature selection for textual documents
Length of the covering note	c, VII, 119
Language	CZ
Tato práce se zabývá výzkumem metod pro hodnocení významnosti vrcholů v rozsáhlých grafových strukturách. Navržené metody jsou aplikovány při vyhodnocení citačních sítí a sítí vytvořených z Linked Data. V úvodu práce jsou popsány cíle, které nás k návrhu nových metod vedly. Následně lze text práce pomyslně rozdělit na dvě části, z nichž první a obsáhlejší část je věnována návrhu metod pro hodnocení autorů vědeckých publikací a druhá část je věnována návrhu metody pro určení klíčových slov textového dokumentu. Společnou vlastností všech navržených metod je použitý algoritmus PageRank. V první části práce je nejprve shrnut aktuální stav poznání v oblasti citační analýzy a zmíněny nejznámější bibliografické databáze a algoritmy, které bývají při citační analýze používány. Zvláštní prostor je věnován popisu algoritmu PageRank, který jsme při výzkumu používali a dále upravovali. Následně první část obsahuje popis návrhu nových metod pro hodnocení významnosti autorů a popis experimentálního ověření jejich kvality. Pro experimenty byly použity datové kolekce CiteSeer, DBLP a WoS, přičemž výsledky získané z kolekce WoS byly, vzhledem k jejím vlastnostem, prohlášeny za nejdůvěryhodnější. Poté, co se prokázala vhodnost nově navržených metod pro hodnocení autorů, jsme provedli další experimenty, jejichž cílem bylo metody ještě více vylepšit. Zde se pro hodnocení autorů ukázalo nejvhodnější parametrizovat PageRank aplikovaný na citační síť publikací významností časopisů, ve kterých byly publikace zveřejněny. Vhodnost navržených metod a platnost vyvozených závěrů byly ověřeny také vyhodnocením specializovaných kategorií WoS. V druhé části práce jsou nejprve zmíněny významné práce z oblasti klasifikace textových dokumentů a z oblasti využití PageRanku pro extraktivní sumarizaci obsahu dokumentu. Následně je popsán návrh naší metody pro volbu klíčových slov textového dokumentu. Tato metoda využívá PageRank a Linked Data, čímž dokáže určit k textu dokumentu vysoce relevantní klíčová slova, která v textu nemusejí být explicitně uvedena. Kvalita navržené metody byla experimentálně ověřena jejím použitím v klasifikátoru dokumentů, který byl aplikován na dokumenty z kolekce diskusních článků 20 Newsgroups a na dokumenty z vlastní kolekce konferenčních Call-for-Papers. Určená klíčová slova byla použita jako vlastnosti dokumentů. Závěrem bylo, že navržená metoda je vhodná zejména v situacích, kdy máme malé množství dat pro natrénování klasifikátoru. Autorovy vědecké přínosy, které jsou popsány v této práci, byly publikovány formou pěti vědeckých článků, z nichž dva byly zveřejněny v časopisech a tři v konferenčních sbornících.
Annotation in English
This thesis deals with the research of methods of evaluating the significance of nodes in large graph structures. The proposed methods are applied to evaluating citation networks and networks created from Linked Data. The introduction describes the goals that led us to propose the new methods. The text is divided into two parts, while the first one deals with the suggestion of methods of evaluating the authors of scientific publications, the second part is dedicated to the suggestion of a method of determining text document keywords. The common feature of all the proposed methods is the use of the PageRank algorithm. The first part provides the summary of the current state of knowledge in citation analysis and there are mentioned the best known bibliographic databases and algorithms that are used in the citation analysis. A special section is devoted to the description of the PageRank algorithm, which we used and further modified in our research. Subsequently, the first part contains the description of the new evaluation methods of author's significance and the description of the experimental verification of their quality. For the experiments, we used the CiteSeer, DBLP and WoS data collections, while the results obtained from the WoS collection have been declared as the most accurate, due to its characteristics. After proving the suitability of the newly developed evaluation methods of authors, we performed additional experiments aimed at their further improvement. The most appropriate author's evaluation method proved to be PageRank applied to the citation network of publications and parameterized with the significance of journals in which the publications were published. The suitability of the proposed methods and the validity of the drawn conclusions were also verified by the evaluation of WoS specialized categories. In the second part we first mention the most significant works in the field of text documents classification and in the field of PageRank using for extractive summarization of the document content. Then we describe our suggested method for the text document keywords selection. This method uses PageRank and Linked Data, so that it can identify the most relevant keywords from the text, which may not even be explicitly present. The quality of the proposed method was experimentally verified by using it in a document classifier, which has been applied to the documents from the collection of 20 Newsgroups discussion articles and also on documents from our own collection of conference Call-for-Papers. The identified keywords have been used as document features. The conclusion is that the method is particularly suitable in situations where we have a small amount of data for training the classifier. The author's scientific contributions that are described in this thesis have been published in the form of five scientific articles, two of which were in journals and three in conference proceedings.
Keywords
dolování dat, citační analýza, PageRank, hodnocení autorů, volba vlastností textových dokumentů
Keywords in English
data-mining, citation analysis, PageRank, author evaluation, feature selection for textual documents
Research Plan	-
Research Plan
-
Recommended resources	-
Recommended resources
-
Enclosed appendices	-
Appendices bound in thesis	illustrations, graphs, tables
Taken from the library	Yes
Full text of the thesis
Thesis defence evaluation	Passed
Appendices
Reviewer's report
Supervisor's report
Defence procedure record	-
Defence procedure record file

Browse IS/STAG - Portál ZČU