часть речи
омонимия
элемент базы данных
русский перевод
нормативность
характеристика 1
характеристика2
характеристика 3
Рисунок 1 - Структура элемента терминологической базы данных.
«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» IV ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ
30
Таблица 1 - Базы терминологического банка башкирского языка по рубрикатору ГРНТИ
Наименование
Код ГРНТИ
Астрономия
41
Биология
34
Внутренняя
торговля.
Туристско-экскурсионное
обслуживание
71
Военное дело
78
География
39
Информатика
20
Искусство
18
Культура и культурология
13
Литературоведение
17
Математика
27
Медицина и здравоохранение
76
Народное образование и педагогика
14
Охрана окружающей среды и экология
87
Политика и политические науки
11
Психология
15
Сельское и лесное хозяйство
68
Социология
4
Строительство и архитектура
67
Техника
81
Физика
29
Физическая культура и спорт
77
Философия
2
Химия
31
Экономика. Экономические науки
10
Государство и право. Юридические науки
10
Языкознание
16
Основным элементом регистрации в базе данных является терминологическое слово.
Регистрации в базе данных подлежат все виды вариантов термина, включая:
- орфографические варианты типа: тоннель — туннель, ноль — нуль;
- словообразовательные варианты, например: бандажи-ровщик — бандажник —
бандажсы, дерматофития — дерматофитоз — фитодерматоз, комбайнер - комбайнсы;
- лексико-синтаксические варианты, например: бурение взрывом — взрывное бурение
— бурение методом взрыва, кожные болезни – тире ауырыу ары, болезни кожи – тире
ауырыу ары.
В базе данных описываются следующие параметры:
- перевод термина (башкирский, русский),
- часть речи,
- омонимия,
- нормативность,
- характеристика 1,
- характеристика 2.
Поле части речи включает пометы частей речи максимально приближенных к системе
помет, предложенных в Лейпцигских правилах глоссирования (Таблица 9).
«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» IV ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ
31
Таблица2 - Принятые пометы длябашкирской терминологической базы данных
существительное
N
прилагательное
A
глагол
V
наречие
ADV
числительное
NUM
местоимение
PRON
послелог
POST
союз
CONJ
частица
PART
междометие
INTJ
Поле характеристика 1 определяет является ли термин простым (s `simple word`) или
составным (d `decomposite`). Например:
дәлил `аргумент` s,
ассоциативность `ассоциативлы ` s,
сикһе лек `бесконечность ` s,
дәүмәл `величина` s,
абсолют хаталы `абсолютная погрешность` d,
сикһе э мә-э лелек `бесконечная последовательность` d,
периодик булмаған сикһе унарлы кәсер `бесконечная непериодическая десятичная
дробь` d,
күп ырлы йө
өң түбәһе `вершина многогранной поверхности` d.
Характеристика 2 классифицирует простые термины как корневые (r `root`),
производные (d `derivative`) и словосложение (c `composite`).
Например:
тулы `полный` r,
аксиома `аксиома` r,
дәүмәл `величина` r,
ихтимал `вероятный` r,
ассоциативлы `ассоциативность` d,
сикһе `бесконечный` d,
тарма ланыу `ветвление` d,
батын ы `вогнутый` d,
ү -ара `взаимно` c,
э мә-э лелек `последовательность` c,
һиге
ыр `восьмигранник` c,
күпмөйөш `многоугольник` c.
Поле характеристика 3 определяет структуру составных терминов по компонентам.
Например: существительное + существительное, существительное + прилагательное,
существительное + глагол и т.д.
«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» IV ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ
32
Рисунок 2 - Пример выдачи по русскому переводу термина.
Пример:
а нур А (прилагательное) + N (существительное),
күп ырлы йө Adv (наречие) + A (прилагательное) + N (существительное),
тура һы ы тар ың ү -ара торошо A (прилагательное) + N (существительное) +A
(прилагательное) + N (существительное),
артығы менән алынған һан ADV (наречие) + POST (послелог) + V (глагол) + N
(существительное).
Поле нормативность включает такие характеристики, как стандартизованный (st
`standardized`), рекомендуемый (r `recommended`), разрешенный (p `permitted`).
Например:
шартһы `безусловный` st,
вектор `вектор` st,
тарма ланыу `ветвление` st,
тулы `абсолютный` p,
операционный `операция` p,
операционный `операциялар` p,
тиге лек `равенство` p,
тиңлек `равенство` p.
На сегодня нами составлена генеральная терминологическая база терминов русского и
башкирского языков общим объемом более 200 тыс. единиц. Разработана программа,
позволяющая производить гибкий поиск термина и выдавать все характеристики. Идет
наполнение значений информационных полей базы данных.
ЛИТЕРАТУРА
1. Сиразитдинов З.А. Моделирование грамматики башкирского языка. Словоизменительная
система. – Уфа: Гилем, 2006. – 160 с.
2. Российская терминология (БД РОСТЕРМ)// http://www. vniiki.ru/catalog/databank.aspx (дата
обращения: 17.09.2013).
3. Жұбанов А.Қ. Компьютерлік лингвистикаға кіріспе: Алматы: Қазақ университеті, 2007, -204
б.
4. Канадский
государственный
терминологический
банк
данных
TERMIUM//
www.btb.termiumplus.gc.ca (дата обращения: 17.09.2013).
«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» IV ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ
33
СЕКЦИЯ 1
SECTION 1
ЖАСАНДЫ ЗЕРДЕ
ИСКУССТВЕННЫЙ ИНТЕЛЛЕКТ
ARTIFICIAL INTELLIGENCE
«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» IV ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ
34
UDC 004
PALTASHEV T.
1
, PERMINOV I.
2
CHALLENGES FOR GRAPHICS AND HETEROGENEOUS ARCHITECTURES:
APPLICATIONS AND TECHNOLOGY
(
1
Graphics Ip Engineering, Advanced Micro Devices, Sunnyvale, California, U.S.A.,
2
National Research University ITMO, St. Petersburg, Russian Federation)
Abstract
The STAR on first part presents comprehensive overview of popular applications demands
and their mapping to graphics and computing architectures of recent Graphics Processing Units
(GPU) and Accelerated Processing Units (APU). Heterogeneous System Architecture (HSA)
progress and merge of multicore CPUs with GPU cores in AMD product line is analyzed from
potential user point of view. Semiconductor technology progress and power reduction challenges
with their influence to graphics and compute architecture evolution have been considered as well.
On the second part we present HSA architecture principles comprehensive overview and
demonstrate the example of ASTC texture compression algorithm mapping to modern GPU/APU
architecture and OpenCL-HSA software stack. The performance measurements show significant
improvement with applied algorithms adjustments and modifications.
Keywords: GPU, CPU, APU, graphics architecture, heterogeneous architecture, compute
architecture, texture compression, ASTC.
1. KEY INDUSTRY CHALLENGES FOR GRAPHICS ARCHITECTURE IN 2013-
2017
Graphics and computing architecture progress has several inflection points based on different
industry branches, media content creation and massive entertainment industry merging with
communication and computing domains. We may briefly define programming platforms and
application programming interfaces (API) with direct influence to graphics and compute
capabilities of modern hardware. Implementation of certain requested functionality hardly depends
on semiconductor industry technology progress as well as on general computation technology
advances. Power budget reduction for the same application execution is considered to be one of
critical features of all new designs in all range from handheld mobile computing to supercomputing
in data centers. Extremely high cost of semiconductor manufacturing in small size nodes 14nm and
below requires new approaches on system architecture using multichip configurations.
Very important influence also comes from independent software vendors (ISV) developing
game engines and visual computing applications. Game and movie content creators always enquire
for new visual effects of processing capabilities implemented in both software and hardware levels.
Below is listed brief overview of platforms and technology development.
Platforms and APIs:
- OpenGL ES 3.0, OpenGL 4.4, Mantle (AMD), OpenGL (common)
- Windows 8.1 with DirectX 11.2, Windows 2015 “Threshold” with DirectX 12 and SVM lite
support, Windows 2017 with DirectX 13 and full SVM support,
- OpenCL 2.0 (2014-15), OpenCL 2.1 (2015-16) and OpenCL 3.0 (2016-17)
- Technology major trends:
- Interposer technology including advanced semiconductor and systems packaging with
interposing on silicon (organic and glass as well)
- Using HBM or High Bandwidth Memory in GPU, CPU and APU
- Sequential transitions to 20 nm (2014), 14 nm (2016) and 10 nm (2017+) semiconductor
manufacturing processes on foundries
- Virtual page migration (2014-15), Low power HSA-based DSP (2015)
- Chiplets (tiny chips) combined on multichip module (MCM)
«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» IV ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ
35
Major CPU/GPU and SoC vendors develop their product roadmap responding to challenges in
platforms and technology:
AMD response on product line (public info limited by 2015):
- Discrete GPUs: Bonaire and Hainan (28nm/2013), Hawaii (28nm/2013-2014), Tonga and
Iceland(28nm/2014), Bermuda and Fiji (28nm/2015)
- APUs: Kabini (28nm/2013), Kaveri and Mullins(28nm/2014), Carrizo (28nm/2015) and
Amur/Nolan (20nm/2015).
Intel’s response on product line (public info limited by 2016):
- CPU-GPU SoC: Haswell and Silvermont(22nm/2013), Broadwell and Airmont
(14nm/2014), Braswell and Goldmont (14nm/2015), Cannonlake (10nm/2016).
Nvidia’s response on product line:
- DGPU Kepler II (GK11x) (28nm/2013), Maxwell (28nm/2014) and project Denver with
custom ARM 64-bit core (28nm/2014)
- Mobile SoC and application processors: Logan (28nm/2013), Tegra K1 (28nm/2014) Tegra
M1 (20nm/2015).
2. SOFTWARE VENDORS VISIBLE CHALLENGES
Graphics ISVs traditionally have their own set of requests and challenges which may enable
new applications. We may consider following list which can be complemented any time by new
ideas:
1. Virtual Reality Holographic Rendering for head-mounted displays (HMD).
2. Global illumination rendering in real time.
3. Decoupled shading to process highly detailed scenes.
4. Object+texture space combined memory hierarchy.
VR for HMD generates a lot of attention, like product of Occulus Rift and Valve startup
companies. They have significantly higher requirements for processing speed due to zero latency
tolerance problem. Head movement demands smooth and soft image update, visible to eye and not
causing movement artifacts. It requires high refresh rates and high resolution stereo image
generation comparing to existing game consoles. In addition it requires image warp and post
rendering to account for simulated lens optics. Next few years will be spent to find optimized
solutions for HMD VR image generation. It may require significant computational power growth
for GPU considering high resolution frame rate doubled or tripled versus latest game consoles.
Global illumination is one of favorite applications applied by researchers to GPU since they
become more programmable in mid-2000s. Traditional local illumination shading and texturing
algorithms implementation in popular game titles have been used as architecture optimization
anchor while leaving serious capabilities for general computing which support global illumination
models. We may group target applications and with their influence to architecture specifications.
First group with dense stream compute pattern (more suitable for GPU cores):
- 3D graphics in games and engineering
- High performance libraries for compute problems suitable for GPU acceleration
Second group with sparse compute pattern (more suitable for CPU/ latency optimizing
system)
- Compiled OpenCL/C++ code for sparse problems
- Ray tracers for global illumination
- Other Khronos group platforms and based applications
Third group with special signal and image processing pattern (more suitable for DSP cores
+ fixed function blocks)
- Media processing
Such application groups create different vectors on architecture trends and are challenging
designers to create computing machines which may fulfill several requirements. Some of them may
look quite opposite and generate several issues on architecture optimization. Modern complex
Systems-on-Chip (SoC) with different types of processing cores could be potential platforms. But
«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» IV ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ
36
simply putting together on the piece of silicon multiple cores does not solve the problem of
programmability; even it makes the problem worse. New architecture concept of Heterogeneous
System Architecture (HSA) could solve potential problem of creating multipurpose and power/cost
efficient computing machines.
3. INTRODUCTION TO HETEROGENEOUS SYSTEM ARCHITECTURE (HSA)
HSA is a new hardware architecture that integrates heterogeneous processing elements into a
coherent processing environment. Coherent processing as a technique ensures that multiple
processors see a consistent view of memory, even when values in memory may be updated
independently by any of those processors. Memory coherency has been taken for granted in
homogeneous multiprocessor and multi-core systems for decades, but allowing heterogeneous
processors (CPUs, GPUs and DSPs) to maintain coherency in a shared memory environment is a
revolutionary concept. Ensuring this coherency poses difficult architectural and implementation
challenges, but delivers huge payoffs in terms of software development, performance and power.
The ability for CPUs, DSPs and GPUs to work on data in coherent shared memory eliminates copy
operations and saves both time and energy. The programs running on a CPU can hand work off to a
GPU or DSP as easily as to other programs on the same CPU; they just provide pointers to the data
in the memory shared by all three processors and update a few queues. Without HSA, CPU-resident
programs must bundle up data to be processed and make input-output (I/O) requests to transfer that
data via device drivers that coordinate with the GPU or DSP hardware. HSA allows developers to
write software without paying much attention to the processor hardware available on the target
system configuration with or without GPU, DSP, video hardware and other types of specialized
compute accelerators.
Fig.1 depicts generic HSA APU with multiple CPU cores and accelerated compute units (CU)
which may include any type.
Figure 1 - Generic HSA Accelerated Processing Unit (APU)
4. HSA OVERVIEW
Essential HSA features include:
- Full programming language support
- User Mode Queueing
- Heterogeneous Unified Memory Access (hUMA)
- Pageable memory
- Bidirectional coherency
- Compute context switch and preemption
Shared page table support. To simplify OS and user software, HSA allows a single set of
page table entries to be shared between CPUs and CUs. This allows units of both types to access
memory through the same virtual address. The system is further simplified in that the operating
system only needs to manage one set of page tables. This enables Shared Virtual Memory (SVM)
semantics between CPU and CU.
Unified Coherent Memory
CPU
CU
1
CPU
CU
N
…
CPU
CU
2
HSA
CU
1
HSA
CU
2
HSA
CU
M-1
HSA
CU
M
…
HSA
CU
3
«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» IV ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ
37
Page faulting. Operating systems allow user processes to access more memory than is
physically addressable by paging memory to and from disk. Early CU hardware only allowed
access to pinned memory, meaning that the driver invoked an OS call to prevent the memory from
being paged out. In addition, the OS and driver had to create and manage a separate virtual address
space for the CU to use. HSA removes the burdens of pinned memory and separate virtual address
management, by allowing compute units to page fault and to use the same large address space as the
CPU.
User-level command queuing. Time spent waiting for OS kernel services was often a major
performance bottleneck in prior throughput computing systems. HSA drastically reduces the time to
dispatch work to the CU by enabling a dispatch queue per application and by allowing user mode
process to dispatch directly into those queues, requiring no OS kernel transitions or services. This
makes the full performance of the platform available to the programmer, minimizing software
driver overheads.
Hardware scheduling. HSA provides a mechanism whereby the CU engine hardware can
switch between application dispatch queues automatically, without requiring OS intervention on
each switch. The OS scheduler is able to define every aspect of the switching sequence and still
maintains control. Hardware scheduling is faster and consumes less power.
Coherent memory regions. In traditional GPU devices, even when the CPU and GPU are
using the same system memory region, the GPU uses a separate address space from the CPU, and
the graphics driver must flush and invalidate GPU caches at required intervals in order for the CPU
and GPU to share results. HSA embraces a fully coherent shared memory model, with unified
addressing. This provides programmers with the same coherent memory model that they enjoy on
SMP CPU systems. This enables developers to write applications that closely couple CPU and GPU
CU codes in popular design patterns like producer-consumer. The coherent memory heap is the
default heap on HSA and is always present. Implementations may also provide a non-coherent heap
for advance programmers to request when they know there is no sharing between processor types.
The HSA platform is designed to support high-level parallel programming languages and
models, including C++ AMP, C++, C#, OpenCL, OpenMP, Java and Python, as well as few others.
HSA-aware tools generate program binaries that can execute on HSA-enabled systems supporting
multiple instruction sets (typically, one for the CPU-type CU and one for the GPU/DSP type CU)
and also can run on existing architectures without HSA support.
Program binaries that can run on both CPUs and CUs contain CPU ISA (Instruction Set
Architecture) for CPU unit and HSA Intermediate Language (HSAIL) for the CU. A finalizer
converts HSAIL to CU ISA. The finalizer is typically lightweight and may run at install time,
compile time, or program execution time, depending on choices made by the platform
implementation.
HSA architecture example platform is depicted on Figure 2.
Figure 2 - HSA architecture example platform.
CPU
GPU
Audio
Process
or
Video
Hardware
DSP
Image
Signal
Processing
Fixed
Function
Acctr
Encode
Decode
Sh
ar
ed
M
em
or
y
C
o
h
er
en
cy
,
U
se
r
M
o
d
e
Q
u
eu
es
«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» IV ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ
38
5. HSA IMPLEMENTATION AND CONCEPTS
Unified Programming Model. General computing on GPUs has progressed in recent years
from graphics shader-based programming to more modern APIs like DirectCompute and
OpenCL™. While this progression is definitely a step forward, the programmer still must explicitly
copy data across address spaces, effectively treating the GPU as a remote processor.
Task programming APIs like Microsoft’s ConcRT, Intel’s Thread Building Blocks, and
Apple’s Grand Central Dispatch are recent innovations in parallel programming. They provide an
easy to use task-based programming interface, but only on the CPU. Similarly, Thrust from
NVIDIA provides a similar solution on the GPU.
HSA moves the programming bar further, enabling solutions for task parallel and data parallel
workloads as well as for sequential workloads. Programs are implemented in a single programming
environment and executed on systems containing both CPUs and CUs.
HSA provides a programming interface containing queue and notification functions. This
interface allows devices to access load-balancing and device-scaling facilities provided by the
higher-level task queuing library. The overall goal is to allow developers to leverage both CPU and
CU devices by writing in task-parallel languages, like the ones they use today for multicore CPU
systems. HSA’s goal is to enable existing task and data-parallel languages and APIs and enable
their natural evolution without requiring the programmer to learn a new HSA-specific programming
language. The programmer is not tied to a single language, but rather has available a world of
possibilities that can be leveraged from the ecosystem.
Queuing. HSA devices communicate with one another using queues. Queues are an integral
part of the HSA architecture. CPUs already send compute requests to each other in queues in
popular task queuing run times like ConcRT and Threading Building Blocks. With HSA, both
CPUs and CUs can queue tasks to each other and to themselves.
The HSA runtime performs all queue allocation and destruction. Once an HSA queue is
created, the programmer is free to dispatch tasks into the queue. If the programmer chooses to
manage the queue directly, then they must pay attention to space available and other issues.
Alternatively, the programmer can choose to use a library function to submit task dispatches.
A queue is a physical memory area where a producer places a request for a consumer.
Depending on the complexity of the HSA hardware, queues might be managed by any combination
of software or hardware. Queue implementation internals are not exposed to the programmer.
Hardware-managed queues have a significant performance advantage in the sense that an
application running on a CPU can queue work to a CU directly, without the need for a system call.
This allows for very low-latency communication between devices, opening up a new world of
possibilities. With this, the CU device can be viewed as a peer device, or a co-processor.
CPUs can also have queues. This allows any device to queue work for any other device.
Достарыңызбен бөлісу: |