References:
[1] http://en.wikipedia.org/wiki/
[2] http://botsinc.com/
ИНЖЕНЕР ФАКУЛЬТЕТІ
«АҚПАРАТТЫҚ ЖҮЙЕЛЕР» ЖӘНЕ «КОМПЬЮТЕРЛІК ИНЖЕНЕРИЯ»
КАФЕДРАЛАРЫ
“БАҒДАРАЛАМАЛЫҚ ҚАМТАМАСЫЗ ЕТУ ЖӘНЕ КОМПЬЮТЕРЛІК
МОДЕЛЬДЕУ”
УДК: 004.420
COMPARISONS BETWEEN RUBY AND PYTHON
Asem Seysenbay
MSc Student, Suleyman Demirel University, Almaty
asemesboken@yahoo.com
Zhanat Kopbayeva, MSc Student, Suleyman Demirel University, Almaty
kpb_janat@mail.ru
Abstract
Nowadays web development programming is very popular. Using object-oriented programming
(OOP) languages in it made it fun to design overall architectures, functionalities and ease of usability. As
the scripting languages did not lose their popularity in this process, on their own field they also were
making progress for making web development interesting for a programmer. Also scripting languages’
popularity is stable because of their compatibility and ease of use with the other languages. Considering
these in this paper the comparison of basic items is made, that exist in Ruby and Python. Python itself
was already popular, but coming of Ruby into the world of developers made them to begin many
discussions. This work will give you basic view to make yourself a conclusion which one to choose and
here it’s assumed that you already know one of the OOP languages or at least have some basic
understanding of it.
Keywords: programming languages, script, Python, Ruby
В этой работе было приведено сравнение двух скриптовых языков Руби и Питон. Этот
предварительный просмотр позволит для дальнейшего прогресса в ознакомлении с этими двумя
языками.
Executing statements
Python statement example in Python as shown in table 1 and its output in figure 1.1[1]:
Table 1 Statement in Python
Code line
from time import
time
t1=time()
print("Hello
World!")
print time()-t1
Figure 1.1 Output in Python
The same example in Ruby:
Table 2 Statement in Ruby
Code line
t1=Time.now
puts "Hello
World!"
puts Time.now-
t1
Figure 1.2. Output in Ruby
Comparison:
Code lines: Python: 4 lines, Ruby: 3 lines
Time for executing: Python time is less than Ruby time
Note: Actually in Python it was import statement taking one more line. And because of this kind of
statement not required in Ruby rubiests claim it to be totally object oriented.
Method / Function defining
Method and function example in Python as shown in table 2 and its output in figure 2.1[2]:
Table 3 Method and function in Python
Code line
def sum(n1,n2):
return n1+n2
from time import
time
t1=time()
print(sum(3,4))
print time()-t1
t1=time()
print(sum("Hello
","World!"))
print time()-t1
Figure 2.1. Output in Python
Ruby defining:
Table 4 Method and function in Ruby
Code line
def sum(n1,n2)
n1+n2
end
t1=Time.now
puts sum(3,4)
puts Time.now-t1
t1=Time.now
puts sum("Hello ","World!")
puts Time.now-t1
Figure 2.2. Output in Ruby
Comparison:
Code lines for defining method: Python: 2 lines, Ruby: 3 lines
Time: Python time is less than Ruby time
Note: Ruby needs additional end statement to finish defining and Python needs return keyword to be
processed like a function not procedure. Without return or return without arguments will give us None
result in Python. And in Ruby every called method returns a value (although no rule says you have to use
that value).The value of a method is the value of the last statement executed during the method’s
execution. Also in Ruby because it looks at everything as objects nil argument calling will give
undefined method error for NilClass as in Python calling with None argument.
IF Statement and Standard input, gets
Python IF statement:
Table 5 if statement in Python
Code line
from time import time
x=int(input("Please enter an integer:"))
t1=time()
if x<0:
x=0
print('Negative changed to zero')
elif x==0:
print('Zero')
elif x==1:
print('One')
else:
print('More')
print time()-t1
Figure 3.1. Output in Python
Ruby IF statement:
Table 6 If statement in Ruby
Code line
print "Please enter an integer:"
x=Integer(gets)
t1=Time.now
if x<0
x=0
print(“Negative changed to zero\n”)
elsif x==0
print “Zero\n”
elsif x==1
puts 'One'
else
puts 'More'
end
puts Time.now-t1
Figure 3.2. Output in Ruby
Comparison
Code lines for input, gets: Python: 1 line, Ruby: 2 lines
Code lines for IF statement: Python: 9 lines, Ruby: 10 lines
Time: Almost same
Note: In Python after the beginning of any statement we need to put colon. In Ruby there is no
colon to be put and the brackets of any methods are optional if after method call there are no operations
on return values of methods. In Ruby the brackets of print statement is must with one argument, with
more arguments it must be omitted.
Conclusion
In this paper basic comparison between two scripting languages Ruby [1] and Python [2] was
made. This preview will make it possible for your further progress in introduction with these two
languages.
References
[1]. Python Tutorial, Release 3.2.3., Guido van Rossum, Fred L.Drake, Jr., September 2012.
[2]. Programming Ruby, The Pragmatic Programmers’ Guide, Dave Thomas with Chad Fowler and
Andy Hunt, Second Edition.
УДК: 004.325
TEXT BASED DOCUMENT SIMILARITY MEASURE
Shnibekov Zhasulan
ABSTRACT
Do you have a shortage of data? Not very likely. A consequence of the pervasive use of computers is
that most data originate in digital form. If we trade a stock or write a book or buy a product online, these
events evolve electronically. Since so many paper transactions are now in paperless digital form, lots of
“big” data are available for further analysis.
The concept of data mining, finding valuable patterns in data, is an obvious response to the
collection and storage of large volumes of data. Data mining is no longer an emerging technology
awaiting further development. Although its application is far from universal, the techniques of data
mining are highly developed and for some forms of analysis are entering a mature phase.
We would like to say “Give us data and we will find the patterns.”
Unfortunately, data-mining methods expect a highly structured format for data, necessitating
extensive data preparation. Either we have to transform the original data, or the data are supplied in a
highly structured format.
Data-mining methods learn from samples of past experience. If we speak to specialists in predictive
data mining, their data will be in numerical form. These people are the “numbers guys.” The “text
miners” do not expect an orderly series of numbers. They are happy to look at collections of documents,
where the contents are readable and their meaning is obvious.
This is our first distinction between data and text mining: numbers versus text. That doesn’t mean
that these are two distinct concepts. Both are based on samples of past examples. The composition of the
examples is very different, yet many of the learning methods are similar. That’s because the text will be
processed and transformed into a numerical representation.
АННОТАЦИЯ
В современном мире нет понятия дефицита данных и информаций. Большинство данных
представлены в цифровом виде из-за повсеместного использования компьютеров и
компьютерных технологий. Такие события, как покупка различных товаров в интернете, торговля
различными акциями, публикация книги, происходят в электронном виде. Так как большинство
бумажных сделок происходят в электронной форме, много “больших” данных нуждаются в
анализе.
Концепция интеллектуального анализа данных - это поиск закономерностей, которые
являются очевидным ответом на сбор и хранение больших объемов данных. Технологии анализа
данных актуальны на сегодня и не нуждаются в доработке. Хотя применение далеко не
универсально, однако методы интеллектуального анализа данных являются весьма развитыми. К
сожалению, для применения этих методов, необходимо чтобы данные имели хорошо
структурированный формат. В большинстве случаях данные нуждаются в переводе в тот вид,
который требуют методы анализа.
Методы анализа основаны в сравнении образцов данных. Специалисты работают с данными,
которые представлены в численном виде. Но большинство документов имеют текстовый вид,
содержание которых легко можно прочитать.
Методы анализа обрабатывают численные значения. Однако текст можно легко представить в
цифровом виде. Это единственная разница между текстом и числами в анализе данных.
ТҮЙІН
Қазіргі әлемде деректердің және информация тапшылығының ұғымы жоқ. Деректердің
көпшілігі компьютердің және компьютерлік технологияның игерушілігінің мол болғаны соң,
цифрлық көріністе ұсынады. Мысалы, интернеттен түрлі тауарларды сатып алған жағыдайда,
түрлі акциялармен сауда жүргізгенде, кітаптың жарияланымы, электрондық көріністе болып
жатады. Себебі қағаздық мәмілесінің көпшілігі құжаттандырылмаған соң, көп "кесек-кесек"
деректер анализда мұқтаж болады.
Деректердің зияткерлік анализының тұжырымдамасы - ол бір заңдылықтың ізденісі.
Деректер жиыны және деректердің кесек-кесек көлемінің сақтауы айқын жауап болып табылады.
Деректер анализының технологиялары осы заманда өзектілік және пысықтауға мұқтаж емес.
Қолданысы да ұзақ әмбебап емес, алайда деректердің зияткерлік анализының әдістері ең дамыған
болып табылады. Өкінішке қарай, осы әдістерді қолдану үшін, деректер жөн құрылған форматта
болуы тиіс. Көп жағыдайда, деректер анализының әдістері, мазмұнысы керек көрініске
аударылуын мұқтаж етеді.
Анализ әдістерінің негіздері - деректердің үлгісінің салыстыруымы. Мамандар сандық
көріністе ұсынған деректермен жұмыс істейтін. Бірақ құжаттың көпшілігі мәтіндік көріністе
болған соң, мазмұнысы жеңіл оқылынады.
Анализдың әдістері сандық мағыналарды өңдейді. Алайда мәтіндi цифрлық көрініске
аударуы жеңіл болып табылады. Деректердің анализында, мәтіннің және сандардың
айырылымашылығы осы.
FORMULATION OF A PROBLEM
Simple Desktop application to see the difference between two text based documents. Application
will use various types of similarity measure functions. The idea of measuring text based document
similarity has received considerable attention in several domains, including information retrieval and text
mining. To begin with, we first transfer data into numerical vectors, and then we use similarity measure
functions that would measure document similarities in general. Transferring data into numerical vectors
are tough work, where we use tokenization, filtering stopwords, stemming, etc, and then calculating by
functions. In the end when data is in terms of numerical vectors then we use all the similarity measure
functions.
Tokenization
Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other
meaningful elements called tokens. The list of tokens becomes input for further processing such as
parsing or text mining. Tokenization is useful both in linguistics (where it is a form of text segmentation),
and in computer science, where it forms part of lexical analysis.
Stopwords
In computing, stopwords are words which are filtered out prior to, or after, processing of natural
language data (text). For some search machines, these are some of the most common, short function
words, such as the, is, at, which and on. Default English stopwords list: a, about, above, after, again,
against, all, am, an, and, any, are, …etc.
Stemming
In linguistic morphology, stemming is the process for reducing inflected (or sometimes derived)
words to their stem, base or root form – generally a written word form. A stemmer for English, for
example, should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat",
and "stemmer", "stemming", "stemmed" as based on "stem". A stemming algorithm reduces the words
"fishing", "fished", "fish", and "fisher" to the root word, "fish".
Vector generation
Vector generation (or vector space model) is an algebraic model for representing text documents as
vectors of identifiers.
Documents and queries are represented as vectors.
dj = (w
1,j
,w
2,j
,...,w
t,j
)
q = (w
1,q
,w
2,q
,...,w
t,q
)
Each dimension corresponds to a separate term. If a term occurs in the document, its value in the
vector is non-zero.
IDF-inverse document frequency is a measure of the general importance of the term
| D | : the total number of documents
number of documents where the term t
i
appears (that is ).
If the term is not in, this will lead to a division-by-zero:
TF-term frequency, defined as:
where n
i,j
is the number of occurrences of the considered term (t
i
) in document d
j
, and the
denominator is the sum of number of occurrences of all terms in document d
j
, that is, the size of the
document | d
j
| .
Finally,
Experimental analysis
Consider a document containing 100 words wherein the word cow appears 3 times. The term
frequency (TF) for cow is then (3 / 100) = 0.03. So, TF = 0.03.
Now, assume we have 10 million documents and cow appears in one thousand of these. Then, the
inverse document frequency is calculated as log(10 000 000 / 1 000) = 4. IDF = 4.
The TF-IDF score is the product of these quantities: 0.03 × 4 = 0.12.
TF-IDF = 0.12
Example 2:
Suppose the query "gold silver truck". The database collection consists of three documents (D = 3)
with
the
following
content
D1:
"Shipment
of
gold
damaged
in
a
fire"
D2:
"Delivery
of
silver
arrived
in
a
silver
truck"
D3: "Shipment of gold arrived in a truck"
Table 1 shows us the result of calculation.
Table 1:
Term vector model based on w(i) = tf(i)*IDF(i)
Query, Q: “gold silver truck”
D1:
"Shipment
of
gold
damaged
in
a
fire"
D2:
"Delivery
of
silver
arrived
in
a
silver
truck"
D3: "Shipment of gold arrived in a truck"
D = 3; IDF = log(D/df(i))
Counts,
tf(i)
Weights,
w(i)
=
tf(i)*IDF(i)
T
ERM
S
Q D
1
D
2
D
3
d
f(i)
D
/df(i)
I
DF(i)
Q
D
1
D
2
D
3
a
0
1
1
1
3
3/
3=1
0
0
0
0
0
ar
rived
0
0
1
1
2
3/
2=1.5
0.
1761
0
0
0
.1761
0
.1761
d
amage
d
0
1
0
0
1
3/
1=3
0.
4771
0
0
.4771
0
0
d
eliver
0
0
1
0
1
3/
1=3
0.
4771
0
0
0
.4771
0
y
fi
re
0
1
0
0
1
3/
1=3
0.
4771
0
0
.4771
0
0
g
old
1
1
0
1
2
3/
2=1.5
0.
1761
0
.1761
0
.1761
0
0
.1761
in
0
1
1
1
3
3/
3=1
0
0
0
0
0
of
0
1
1
1
3
3/
3=1
0
0
0
0
0
si
lver
1
0
2
0
1
3/
1=3
0.
4771
0
.4771
0
0
.9542
0
s
hipme
nt
0
1
0
1
2
3/
2=1.5
0.
1761
0
0
.1761
0
0
.1761
tr
uck
1
0
1
1
2
3/
2=1.5
0.
1761
0
.1761
0
0
.1761
0
.1761
Similarity Analysis
First for each document and query, we compute all vector lengths (zero terms ignored)
|Di|=
i,j
|Q|=
Q,j
Next, we compute all dot products (zero products ignored)
Q*D1 = 0.1761 * 0.1761 = 0.0310
Q*D2 = 0.4771 * 0.9542 + 0.1761 * 0.1761 = 0.4862
Q*D3= 0.1761 * 0.1761 + 0.1761 * 0.1761 = 0.0620
Q*Di =
Now we calculate the similarity values
Cosineθ
D1
=
=
= 0.0801
Cosineθ
D2
=
=
= 0.08246
Cosineθ
D3
=
=
= 0.3271
Cosineθ
Di
= Sim (Q,Di)
Sim (Q,Di) =
Rank
1:
Doc
2
=
0.8246
Rank
2:
Doc
3
=
0.3271
Rank 3: Doc 1 = 0.0801
CONCLUSION
The application of these algorithms is the optimal solution for the analysis and comparison of text
data.
REFERENCE
•
[1] Sholom M. Weiss, Nitin Indurkhya, Tong Zhang, Fred J. Damerau, “Text mining: Predictive
Methods for Analyzing Unstructured Information”, Springer, USA, 2005, pp. 1-2, 15-25, 85-89.
•
[2] Christopher D. Manning,
Prabhakar Raghavan
and
Hinrich Schütze
, “Introduction to
Information Retrieval”, Cambridge University Press, USA, 2008, pp. 117-119.
•
[3]
Leo Egghe
, “New relations between similarity measures for vectors based on vector norms”,
•
[4] Deza, Michel Marie, Deza, Elena, “Encyclopedia of Distances”, Springer, Berlin, 2009,
pp.298-304.
УДК: 004.420
Online Ordering Taxi on Android
Makhymgaliyeva Gulden 4D_04
Nowadays people are very busy, stressful and always in an extremely active life. Sometimes they
need relax, just sit and think about how life is wonderful. Also they need some service from another
people, and to get agreement with someone when you are very tired is so difficult. This kind of problem
makes you annoying. So I present decision of this issue by using mobile application Get a Taxi. This
application locates the passenger's position automatically or can be set to pick-up from the user's favorite
locations, e.g. work, home etc. The application then finds and orders the nearest available taxi and
informs the user of the driver's name and ratings, and how much will cost distance. Map shows the
passenger's position and the position of the taxi and displays the distance left and the estimated time of
arrival. Booking and managing rides is quick and easy, saving you time and hassle. That is an awesome
convenience when you are in a rush. A must-have friend in your pocket, ready for when you need it. The
passenger can track the taxi‘s arrival on the map including time of arrival as well as the driver’s profile
with picture, name, rating and phone number.
The word “mobile” means capable of changing quickly from one state or condition to another,
tending to travel and change settlements frequently, e.g. "a highly mobile face". What about “mobile
application”? Everyone can use it everywhere and take with you; anyway it is the most convenient
system in human life. It can be faster, easy, safe time etc. You can easily economize your time, just sit
and choose more suitable application to solve your task.
Android has agreement with Google. So the most important function in my application is using
maps. GoogleMap is coming to help. It’s easy to navigate, has many easy options to print out or save
your maps as well as place-marks for everyone else to view, such as those for businesses and landmarks.
Google Maps is the most well-known map service on the net offering basic street maps, terrain maps,
satellite images and hybrid view which is a combination of the street maps and satellite images. Setting
up such a service and keeping it running as well as making it better constantly is a tough job, so let’s take
a look at how Google maps works. Any company or business can also add place-marks and information
to Google Apps by using Mapplets. Mapplets is a service, similar to Maps API which allows you to
either add new features to Google Maps or to overlay information. For example, for my application I
need agreement from taxi company.
Beginning with the Android SDK release v1.0, you need to apply for a free Google Maps API key
before you can integrate Google Maps into your Android application. To apply for a key, you need to
follow the series of steps outlined below. You can also refer to Google's detailed documentation on the
process at http://code.google.com/android/toolbox/apis/mapkey.html. By default, the Google Maps
displays the map of the United States when it is first loaded. However, you can also set the Google Maps
to display a particular location. In this case, you can use the animateTo() method of the MapController
class. The final step is to add the API key to your application. It goes in your application's manifest,
contained in the file AndroidManifest.xml. From there, the Maps API reads the key value and passes it to
the Google Maps server, which then confirms that you have access to Google Maps data. Google Maps is
one of the many applications bundled with the Android platform. In addition to simply using the Maps
application, you can also embed it into your own applications and make it do some very cool things. In
this article, I will show you how to use Google Maps in your Android applications and how to
programmatically perform the following:
1.Change the views of Google Maps
2.Obtain the latitude and longitude of locations in Google Maps
3.Perform geocoding and reverse geocoding
4.Add markers to Google Maps
0> Достарыңызбен бөлісу: |