Developing a Text Information Retrieval System "project for college"
$30-250 USD
ที่ทำเสร็จแล้ว
โพสต์ ประมาณ 9 ปีที่ผ่านมา
$30-250 USD
ชำระเงินเมื่อส่งงาน
Introduction
Information retrieval is the process of extracting useful information from data. In the
current era, text constitutes an important form of data. This includes web pages, emails,
SMS messages and several other text documents types.
Text documents need to be represented in an appropriate format (usually in the form
of vectors of numbers) in order to be used for further processing. Once properly repre-
sented, text documents can be used for various tasks such as classication, for instance,
deciding whether an email is a spam, or search, for example, deciding whether two web
pages have similar content.
Before representing documents as numbers, however, they must be preprocessed. Text
preprocessing is the tasks of removing unnecessary information from the text. This is
achieved through several steps, which are summarized hereafter
1. Initial preprocessing: The goal of this step is to "clean up" the document and
prepare it for the remaining tasks. The dierent tasks conducted in this step are:
(a) Replace tabulation, return and new line by space.
(b) Remove all non-letter characters: turn punctuation, numbers, etc. into spaces.
(c) Switch all letters to lowercase.
(d) Substitute multiple spaces by a single space.
(e) Remove words that are shorter than 3 characters long. For example, remove
"an" but keep "him".
2. Stop words removal: Some words such as "a", "the", "and" are very common in
English and should be removed from the text in order to only leave useful words.
This task is simply done by removing any word that appears in a predened list of
stop words.
3. Stemming: The same word can take dierent forms depending on its role and
position in the sentence.
Hello
I am Java expert and interested in this project. I have reviewed your requirements and confident to handle this project perfectly. Please communicate to discuss further.
Regards
Anshu
$54 USD ใน 1 วัน
4.7 (319 รีวิว)
7.2
7.2
6 freelancers are bidding on average $71 USD for this job
Greetings! I have passed the Information retrieval course with 4.0 and I have already completed a similar task as assignment using java lucene library. If you are allowed to use a java library, I can complete this task quickly and efficiently. Do let me know, thank you.
Hello. This is not really compicated task. I can do it just because I need to improve my freelancer reputation. So if you are interested in a quick solution - let me know.